SRE Prodcast

The SRE Prodcast is Google's podcast about Site Reliability Engineering and production software. In Season 1, we discuss concepts from the SRE Book with experts at Google.

Subscribe via:

RSS
SRE Prodcast

Episode 8

Incident Management with Adrienne Walcer

Adrienne Walcer discusses how to approach and organize incident management efforts throughout the production lifecycle.

Episode 7

On-Call Rotations with Andrew Widdowson (APW)

Andrew Widdowson (APW) shares strategies for successful on-call rotations.

Episode 6

Automation with Pierre Palatin

Pierre Palatin dives into different automation strategies, how to build confidence in your system, and why designing the UI may be your biggest challenge.

Episode 5

Client-Transparent Migrations with Pavan Adharapurapu

Pavan Adharapurapu details how to approach large-scale migrations while optimizing for user experience.

Episode 4

Rethinking SLOs with Narayan Desai

Narayan Desai explains why SLOs can be problematic and proposes alternative methods for monitoring complex, large-scale systems.

Episode 3

Alerting with Amelia Harrison

Amelia Harrison advises on when and how to alert, ideal coverage, and tuning.

Episode 2

Customer-Centric Monitoring with Silvia Esparrachiari

Silvia Esparrachiari talks about the challenges of monitoring and the importance of understanding your users.

Episode 1

SRE Philosophy with Jennifer Mace (Macey)

What is SRE, anyway? Jennifer Mace (Macey) gives us her definition of "site reliability engineer," discusses how to manage risk, and shares key questions to ask developers.


Acknowledgments

Core team: Betsy Beyer, MP English, Salim Virji, Viv

In addition to our Prodcast guests, we'd like to thank (in alphabetical order): Javi Beltran, Cara Pardo, Jennifer Petoff, John Reese.