Device Baby!

It was classic “legacy” code design, tightly coupled to Oracle in stored procedures, and the code did not follow the loose coupling/high cohesion design pattern that I feel is core to any good software design. But it was fast, and it worked, and it gets the job done. Customers liked the results, so I won’t fault that at all. It needed work to be maintainable over the long run. That would need to wait though, the company wanted a “lighter/faster/cheaper” version first that they could sell for a 1/10 of the cost. This would be used to account fraud instead of financial fraud, since the risk was lower, but the volume higher. We knew the risk was high, but there seemed to be a market, so we split off a small team which began prototyping a solution.

At the time we were Operationally constrained so getting access to resources like new servers and such was time consuming. So we decided to make the prototype in AWS. We could cheaply build out the system and then bring it in house when it was ready, or run it from the cloud. Of course our preference was to keep it in the cloud, but this was a new concept for the company, so we wanted to not rock too many boats. Our currently implementation was too complex and we didn’t want to repeat it, so we decided on using a good prototyping language and began to port the important stuff over. During the process we designed out the system in a more ideal way, breaking jobs into their own services (go figure), and having each service work through others via contracts instead of “peeking” into each others data model because we could. The team grew a little and it took a little over a year, but the system proved reliable and the cloud a great place for the system. It became a set of micro-services which each played well with each other. Plus they could each be grown and shrunk based on their utilization(albeit manually). Once it was live, we immediately began improving and maintaining both systems. (you probably see where this is going)

We knew off the bat that we were duplicating services (actually we were trying to re-write the original), and that in the long run we could not keep both. However, The new system had some features that were specific to the service being offered, and not to devices themselves, as was the original system. So we wanted to just take the device service and have it stand on it’s own, serving both products. This would remove double maintenance, better separate concerns, and make it easier to scale. This is good because, the use of this service has doubled year over year for the last 3 years, and with it’s current growth rate, it’s gonna stress the system in a few years.

We decided to run a little competition during one of our ship-it days, and evaluate which of the “scalable enterprise” languages could handle the most throughput and respond quickly peforming some simple tasks:

  1. receive some input via http post
  2. combine and manipulate the data
  3. store the data in a cache service
  4. retrieve the data from a cache service
  5. return the data.

we decided to compare the following technologies:

  1. Java
  2. GoLang
  3. Python 2.7 (original)
  4. Python 3.0
  5. Node.js
  6. PHP 7
  7. Elixr
  8. Legacy Metal PHP implementation (48 cores)*

* We had some hard limits due to the number of threads and the version of PHP we used. This was just a baseline value to compare our speed tests with.


Following the Competition Assumptions, each solution was tested with a Locust cluster with 16 users and 40 slaves. The solutions were running on M3 Medium boxes (1 CPU, 3.74GB RAM) and maxing out the CPU.

We thought CPUs might play a factor, so we re-rant the Elixir and Go implementations. The stats below show this running on M3.xlarge (4 CPU, 15 GB RAM). Phoenix was maxed out on CPU, and Go still had about 20% to spare.

Now it’s not apples to apples, but our current Legacy metal servers have a 4000 TPS limit (4x boxes with 12 cores & 48GB each) so we put that on the chart to show the comparison.

In the end Go was the clear winner and we moved forward with it as the preferred language on our extremely time sensitive applications. Go seemed to take advantage of the additional CPU and threads in a dynamic way without requiring us to deal with most of the multi-threading hassles of other languages. We didn’t have to deal with Mutexes or other locking mechanisms and the code stayts simple, efficient, yet took advantage of the hardware it was given.

Also, as Go improved during development (there were 2 major releases of the language before we went live) we got another large boost due to language optimizations being made.