You’re launching a brand new product, or you’re about to do a big marketing push and drive more traffic to your app. How do you know you’re prepared for these events?
We can test the limits of your system to ensure it can handle these new demands without compromising performance.
Understanding your system
In advance of your event, we work to understand your Service Level Objectives around performance, availability and latency, as well as your operational timeline, so that we know the time and extent of expected increases in traffic.
Assessing the risks
We review your architecture and provide a detailed assessment of any SLO-affecting risks you’re carrying. Where necessary, we run load tests and simulate failure modes to understand how your platform responds under certain conditions. Each risk is then scored by likelihood and impact.
Mitigating those risks
Based on our assessment, we prioritise changes to your infrastructure and code that will improve performance and mitigate those risks. We can then make those changes or advise you on making them.
Monitoring and alert systems
Next, we make sure your monitoring is set up properly, that telemetry is available to relevant people, and that you know how to interpret the graphs and metrics we collect.
We add alerts to key monitoring metrics, so that alarms are only triggered when service levels are in danger of being breached, and there’s something for a human operator to take care of.
Planning your incident response
Collaborating with you, we put together runbooks and playbooks for incident response. These take into account your own systems, as well as those of third parties you rely on to deliver your service.
We help you put together an on-call rota, and rehearse on-call processes with participants.
Providing 24/7 event support
During the event we will be available 24x7 to help keep everything running smoothly.
Tell us what you need
If you’d like to chat to one of our consultants, schedule conversation to see how we could support you.