Everyone in software is familiar with the 80-20 rule, or perhaps outside software less catchily known as the pareto principle - the idea that you spend about 20% of the time it takes to complete a project building out the first 80% of the features, with the remaining 80% of the duration to complete the elusive final 20%. There are countless applications of this idea - kind of like when you’ve just learnt a new word, examples start to crop up everywhere. The most recent example I’ve come across describes the amount of time spent working with persistence technologies from the point of view of developers and operations. But first some background.
In the days of working on back office software, if there wasn’t a strong opinion already about how to solve a problem, developers were free to pick tools that met their basic checklist - will it work in production? Will the project cover the costs if they exist? And most importantly, HOW SHINY IS IT?
I’m as guilty as the next developer of having picked tools to work with based primarily on how much I wanted to play with them. Being an easy tool to code with use generally made my life as a developer easier, but it was also an easy sell to management - if it takes me, the expensive developer, less time to build a solution with, it must be good for the project. Of course, over time, lots of counter examples have cropped up: just throw “rails performance bottleneck” or “entity framework SQL fail” into google and read to learn what happens when developer magic sauce is spread liberally and allowed to determine the architecture.
The product I’m working on at the moment generates a reasonable amount of financial data - around 10Gb / day. It turns out that when you need to keep everything, and the product is growing daily, managing an additional 10Gb of data every day can be moderately painful. Painful to the extent that we have an operations team, one of whose key roles is essentially to keep everything running. Over the last year or so we’ve scaled vertically, run out of 32-bit integers, added lots of database nodes, added data centres and strategised about archiving operational data.
Turns out I had fallen victim to the most common of easy vs simple fails. In architecting the original product, I’d focussed on tools that I knew - those that were close at hand, that I was familiar with; and therefore easy for me, the developer to use. In this case MySql. Now, MySql is extremely capable of scaling to massive data volumes, it’s currently powering numerous huge databases like Google Adwords and Facebook albeit in a much customised form. But scaling traditional RDBMs does come with a significant ops overhead. For this particular product, the data architecture at the code level is now pretty mature, we almost never add new data fields. But as a business, we spend a lot of time working with the data at the persistence level - just to keep the lights on and the replicas fresh.
On reflection, it seems I chose a persistence technology ignorant of the devops 80-20 rule. I’d estimate that to date, of all the many, many man-months of time the business has spent building and maintaining this product on MySql, about 20% of that time (if not less) was spent by developers on building the original product against that persistence mechanism, during which we benefitted from the wealth of developer tools and documentation at our disposal - it was easy to do. At least 80% of the time has been spent by the operations team, keeping things going.
With my Simple made Easy hat on, and my 20/20 hindsight goggles engaged, I see now that it was short sighted to select the easy, comfortable technology, the one that I had experience with, without understanding how the total cost of ownership (TCO) for my decision would be met largely by maintenance down the line, rather than the minimal build effort.
So what else can we do? I think the main learning point has been to select technology with a stronger consideration for TCO. A less smooth developer experience may well be preferable to more complex operational maintenance strategy. The persistence marketplace these days is awash with distributed database solutions with significantly improved cluster management and auto-healing capabilities, offering different types of storage and varying levels of consistency. Some of them even support ACID.