If only we'd known

Matt Jackson
5 May, 2022

Please note that this post, first published over a year ago, may now be out of date.

As consultants we’re often asked to assist when there’s a problem. Because starting an engagement with us requires CTO or even board approval, we see problems that are big enough to be talked about at board level. Problems that risk stopping the company meeting its goals.

What almost all of these problems have in common is that they would be easier to fix if we knew earlier.

Graph showing cost to fix increasing over time

How early is too early?

It’s easy to agree after a security breach or an outage that more could have been done proactively. It’s harder to agree concretely on what should have been done and when.

When it’s clear the company goals won’t be met
This is a common category of work for us as we deal mainly with hypergrowth SaaS companies. They’ve just signed an enormous new client and now that’s being jeopardised because the platform can’t scale to meet the new demand. By this point the only solution is often a long, risky and expensive project that changes some core part of the platform.

Before the launch decision
This is another decision point at which we’re often invited to advise. It’s not too late to change anything but there’s already been a lot of investment. Unless there’s a huge red flag it’s likely the right business decision will be to go live with the service as it is.

I had a review like this yesterday. I briefly sketched out a target architecture the customer could use if making another service like this one, but that information would have been much more valuable at the design stage.

Reviewing everything all the time

So far the examples all point to earlier being better, but that must run into a limit somewhere.

We have a retainer product that lets your team ask us questions directly on Slack. There are experts in our team on everything from databases, to security to Kubernetes. From the customer’s engineers’ perspective we’re just another part of the team. I don’t think there’s anything more we could do to reduce friction! And yet even when an expert review is just a Slack message away it doesn’t mean all problems will be caught early. Sometimes the blocker is that people don’t even know there’s a problem they should be asking about

If developer starts using DynamoDB and all their unit tests pass and their code meets standards, how would they know if they’ve introduced a security vulnerability? If it passed a security review as well, how would they know it won’t continue work at scale?

A moving target

To make things harder still, the goal posts aren’t static. Many of the hypergrowth companies we work with were quite recently startups. Their processes, infrastructure and code used to all work towards the goal of making an MVP and getting their first round of funding. Now they need to think about how they can make enterprise clients feel comfortable that their service is secure and how their system will handle all the new users that are being on-boarded.

Customers may not have the skills in house to spot where new non-functional requirements aren’t being met. Even if they do their team are likely very busy. Proactive work to look at functionality that seems to be working, or on changes that aren’t needed immediately, might not be top of the priority list.

Does our developer know that what worked for 10M users isn’t going to scale to 100M? Do they even know that they need to target 100M customers? Assuming they do and they ask someone from the new DevOps team, is that team going to have time to respond?

What can be done

Fixing problems early isn’t as easy as it sounds. A lot of investment can be required, in order to deliver that capability to identify and reorient early enough:

Having expertise in all areas critical to your business, even as those areas change
Ensuring those experts have capacity to do proactive reviews
Being clear about what the baseline for security, performance and availability is (and linking this to business outcomes that mean something)
Having a robust process for ensuring that baseline is reached

I’m a big fan of a flow chart. Are you are doing X? if yes, do Y. Have you used a new technology? If yes, check your implementation meets the scale requirements documented in our wiki.

If you are a small and flexible team without a lot of process it could be as simple as “Is this in your comfort zone? If not find someone to help”

All of this proactive reviewing might put a lot of pressure on the experts on your team. If they don’t have capacity, consider offloading some of that work to us. It might be too late to find all the problems in the design phase but remember to the old saying: “The best time to find out about architectural issues is in the design phase; the second best time is now”.

Is it worth the effort ?

Making a large architectural change that allows your platform to scale enough to sign that huge new client feels heroic. Whoever successfully manages that project to completion is getting a bonus! Making the same change 2 years before its needed when it is still easy doesn’t feel the same. It might even be seen as ‘over engineering’.

Working on your processes, making capacity in key teams for proactive work or even taking on external help might feel expensive. The more problems that are avoided, the harder it can be to justify the proactive work that avoided them. “Why are we spending so much avoiding bugs, we hardly have any!”. But if your team isn’t confident in what they’re building or the tools they’re using, then there are going to be mistakes. The choice is whether you fix them now or fix them later.

If you’d like your team to have frictionless access to a range of experts at any stage of the product life cycle consider our The Scale Factory offers a dedicated Support subscription, giving you access to our team of consultants, and also including hands-on workshops to skill up your team along with much more hands-on training run through the Scale Factory Academy. Get in touch to let us know how we can help you.

Tags:

Engineering Principles

Back to Blog

This blog is written exclusively by The Scale Factory team. We do not accept external contributions.

If only we'd known

How early is too early?

Reviewing everything all the time

A moving target

What can be done

Is it worth the effort ?

Free Healthcheck

Discover how we can help you.

Consulting packages

Growth solutions

Support services