In the world of DevOps and systems administration, we talk a great deal about measuring and optimising “mean times” of things.
When physical hardware was something we had to worry about, we’d measure MTBF, the “Mean Time Between Failures”, a metric used to determine component reliability, and thus figure out how often you’d have to drag yourself to the data centre to replace broken disks.
One of the Accelerate metrics is “Mean Time To Recovery”, the average amount of time it takes to get a system back up and running when something goes wrong - a number that we try to keep as small as possible by adopting good DevOps and cloud architecture practices.
There is, however, a metric that we don’t talk about that much. It’s another “Mean Time” measure. Optimising for it has caused harm to platforms, and pain to teams everywhere for years. That metric? Mean Time To Hello World.
Mean Time to What?
Mean Time To Hello World is the average amount of time it takes a developer to achieve a minimal tangible result with a new piece of technology. So named, of course, because all programmers trying a new language must, by tradition, find a way to squirt “Hello, World!” onto the console before attempting to build anything more meaningful.
We can also use MTTHW to describe how quick it is to get something basic up and running against a third party API.
In the case of data storage, we might consider MTTHW the measure of how long it takes to write, and then read back a piece of data.
For container orchestration platforms, MTTHW is how long it takes to get a “hello world” or echo server container responding to requests.
In each of these cases, we make the assumption that a lower MTTHW is better, that we should aim to make it possible to achieve simple things quickly and easily. That seems entirely reasonable on the surface, but what are the consequences of this?
Worse is Better
Once upon a time, I was web developer. I first got into building web applications in the late 90s, the heady days of Perl 5, PHP 3, and HTML pages made entirely of
<table> tags. If, like me, you were big on Open Source software there were two database engines to choose from: MySQL and PostgreSQL.
Postgres was, on paper, the better database: it had a number of grown-up things that MySQL lacked, such as transactions and triggers, and it had broader support for ANSI standard SQL. It had a sensible access control model, which integrated well with local user identities.
The problem was that it was difficult to get up and running. If I’m remembering correctly, after installing the Postgres packages, you had to
su to a local system
postgres user account to be able to start a database console. You’d use this connection to create a new DBMS user and database schema for your application, and the permissions required for the former to access the latter. After that, more configuration work and a service restart were required to persuade the database to allow connections over TCP/IP so that the PHP database driver could connect to it with those new credentials. All this was poorly documented and, if you got any of it wrong, resulted in some pretty terse and unhelpful error messages. Searching the web for these messages in the days before Stack Overflow led to equally unhelpful forums full of alpha nerds berating n00bs for daring to bother them with these basic questions.
Compare this to the MySQL experience: after installing the packages, you’d… just be able to connect as a superuser without any credentials, create a database, and start INSERTing and SELECTing things.
MySQL had the shorter Mean Time To Hello World. and I strongly believe that was a major contributing factor to its popularity over the superior Postgres option.
Everything is a Trade-off
MySQL had (intentionally or otherwise) made a trade-off that Postgres didn’t: worse security for a better MTTHW, and it’s not the only popular piece of software to have done this.
NoSQL database MongoDB was released in December 2009 and rapidly became popular with developers who believed traditional relational databases like MySQL to be insufficiently “Web Scale” for their needs (a fact that nobody ever seems to have mentioned to Facebook). It didn’t have authentication features until version 2.0 was released almost two years later, and even then those weren’t enabled by default.
Elasticsearch is another tool that quickly became popular with developers, since it came with a number of MTTHW benefits over SOLR, its logical predecessor. Out of the box, it had no security enabled, allowing connections from anywhere to read and write search indexes. Great for MTTHW, awful from an operational security perspective. Worse still, this software would bind to every IP interface on the host, ensuring that machines with no firewall configured would have its Elasticsearch port open to all connected networks, even if these had public addresses.
Security Trade-offs Have Consequences
In 2017 there was a spate of ransomware attacks. Hackers running automated tools to seek out publicly accessible and unsecured MongoDB and Elasticsearch data stores (and there were plenty), delete their contents, and leave a ransom note demanding money be sent by Bitcoin if the owners wanted their data back.
Those ransomware attacks created a lot of fanfare. If these Elasticsearch servers held sensitive data, who knows how much of that was stolen by less vocal actors?
A more amusing and probably mostly benign story by comparison: Elasticsearch broadcasts on its local networks, looking for other running Elasticsearch servers to cluster with. Developers connected to conference wifi, running Elasticsearch on laptops with no firewall, would find they’d accidentally clustered with other attendees and received index replicas of data they didn’t recognise.
Both of these situations are a direct result of Elastic prioritising Mean Time To Hello World over security concerns.
Functional vs Non-Functional Requirements
Optimising for MTTHW is a specific example of a more general problem: optimising only for functional requirements. There are examples of this all over the place, and it’s easy to see why: providing your users with the features they need is the thing that encourages them to hand over their cash.
However, failing to consider non-functional requirements can be a business-limiting move. Security, backups, monitoring and logging are all important considerations whose absence isn’t always obvious until it’s way too late.
Is someone championing these for your platform, or are you stopping once you see “Hello, World!” on the console?