The SRE Prodcast is Google's podcast about Site Reliability Engineering and production software. In Season 1, we discuss concepts from the SRE Book with experts at Google.
Incident Management with Adrienne Walcer
Adrienne Walcer discusses how to approach and organize incident management efforts throughout the production lifecycle.
On-Call Rotations with Andrew Widdowson (APW)
Andrew Widdowson (APW) shares strategies for successful on-call rotations.
Automation with Pierre Palatin
Pierre Palatin dives into different automation strategies, how to build confidence in your system, and why designing the UI may be your biggest challenge.
Client-Transparent Migrations with Pavan Adharapurapu
Pavan Adharapurapu details how to approach large-scale migrations while optimizing for user experience.
Rethinking SLOs with Narayan Desai
Narayan Desai explains why SLOs can be problematic and proposes alternative methods for monitoring complex, large-scale systems.
Alerting with Amelia Harrison
Amelia Harrison advises on when and how to alert, ideal coverage, and tuning.
Customer-Centric Monitoring with Silvia Esparrachiari
Silvia Esparrachiari talks about the challenges of monitoring and the importance of understanding your users.
SRE Philosophy with Jennifer Mace (Macey)
What is SRE, anyway? Jennifer Mace (Macey) gives us her definition of "site reliability engineer," discusses how to manage risk, and shares key questions to ask developers.
Core team: Betsy Beyer, MP English, Salim Virji, Viv
In addition to our Prodcast guests, we'd like to thank (in alphabetical order): Javi Beltran, Cara Pardo, Jennifer Petoff, John Reese.