STPA (System Theoretic Process Analysis) at Google

Google's SRE team pioneered methods to keep failures rare by engineering reliability into every part of the stack—Service Level Objectives (SLOs), error budgets, isolation strategies, thorough postmortems, progressive rollouts, and other techniques. In the face of increasing system complexity and emerging challenges: what's next? How can we continue to push the boundaries of reliability and safety?

To address these challenges, Google is using System Theoretic Process Analysis (STPA) to analyze pure software systems and discover the unknown unknowns: risks of which you are unaware and not actively seeking. Learn more about how we're using STPA in the following resources.

The Evolution of SRE at Google

by Tim Falzone and Ben Treynor Sloss

STPA - Teaching a new way to prevent
outages at Google

by Garrett Holthaus