The annual AWS re:Invent conference is underway as I write this. Under normal circumstances it would be taking place over a week in Las Vegas (the worst place in America) for the simple reason that, with over 60,000 attendees, it’s just too huge to fit anywhere else. Because of all the usual awful 2020 reasons, this year’s event is three weeks long, and entirely online.
There’s a lot to announce during re:Invent, so much that not every piece of news gets top billing in one of the major keynote events. This was true for my favourite announcement so far, which I only heard about because it appeared in one of the many AWS RSS feeds that we have delivering news into Slack.
This is a seemingly small change in the detail of how S3 works, and simultaneously an absolutely huge deal. It’s also a textbook example of why we use the cloud in the first place. Let’s look at why.
S3: Not just a big filesystem
Amazon S3, at first glance, looks like a massive filesystem. The world’s biggest network attached hard disk, running at planetary scale, with exabytes of storage. You write files, you read files, you update files, you delete files. Simples. The thing is, until yesterday that wasn’t entirely true.
When it launched in 2006, S3 provided an eventual consistency model. Under the hood S3 is a huge distributed system, taking care of replicating your data to achieve frankly ludicrous 99.9999999999% durability guarantees. On completing an update operation, there’d be a short window of replication time during which subsequent reads wouldn’t always see those changes, depending on which part of the system you were asking.
This limitation of S3 was not widely known or understood. Developers, seduced by the idea that “S3 is just a filesystem”, would make assumptions about object consistency that weren’t valid, causing that classic problem of distributed computing: weird bugs that only show up sometimes.
(Incidentally, this sort of nuanced understanding of the platform is exactly why you should engage an AWS Consulting Partner like us)
The fix is in
Literally overnight, AWS have removed that entire problem. Now, when you write an object to S3, subsequent reads see that change immediately. Easy to explain, straightforward to understand, and no doubt achieved only through an absolutely staggering engineering effort measured in person-decades.
Computer science talks about CAP Theorem which, glibly described, says that for any distributed data store there are guarantees on Consistency, Availability, and Partition Tolerance. The catch? You can pick any two.
The release announcement says “S3 delivers strong read-after-write consistency for any storage request, without changes to performance or availability, without sacrificing regional isolation for applications, and at no additional cost”.
So are AWS violating the laws of physics (or at least of computer science) to bring us this change? Not exactly. They haven’t yet told us anything about how this was achieved, but likely they’ve taken a similar approach as Google did for their distributed Spanner database.
In this paper about Spanner, we learn that it’s possible to build a CA system (one which prioritises Consistency and Availability) and also build a network so good that the risk of Partitions to be Tolerant of is negligible enough to effectively ignore.
It’s somebody else’s problem
Aside from academic interest, it doesn’t really matter how AWS have made this happen, and that’s exactly the point - because you’re using the cloud you don’t have to care. It exists behind the Somebody Else’s Problem field.
Somewhere in Seattle, a team you’ve never met have worked hard to solve a difficult engineering challenge, to make your life a little bit easier, and solve a problem you maybe didn’t even realise you had, in a way that means you haven’t had to change a line of code.
And they’re not charging a penny more for it. Isn’t that something?
We’re covering the re:Invent announcements as they happen over on Twitter