Site Reliability Engineering

Jump to Content

SRE Book Updates, by Topic

Click on a chapter thumbnail to see relevant publications, conference talks, and workshops by Google SREs.

The Production Environment at Google, from the Viewpoint of an SRE

2. The Production Environment at Google, from the Viewpoint of an SRE

3. Embracing Risk

Service Level Objectives

4. Service Level Objectives

Eliminating Toil

5. Eliminating Toil

Monitoring Distributed Systems

6. Monitoring Distributed Systems

The Evolution of Automation at Google

7. The Evolution of Automation at Google

Release Engineering

8. Release Engineering

9. Simplicity

Practical Alerting

10. Practical Alerting

11. Being On-Call

Effective Troubleshooting

12. Effective Troubleshooting

Emergency Response

13. Emergency Response

Managing Incidents

14. Managing Incidents

Postmortem Culture: Learning from Failure

15. Postmortem Culture: Learning from Failure

Tracking Outages

16. Tracking Outages

Testing for Reliability

17. Testing for Reliability

Software Engineering in SRE

18. Software Engineering in SRE

Load Balancing at the Frontend

19. Load Balancing at the Frontend

Load Balancing in the Datacenter

20. Load Balancing in the Datacenter

Handling Overload

21. Handling Overload

Addressing Cascading Failures

22. Addressing Cascading Failures

Managing Critical State: Distributed Consensus for Reliability

23. Managing Critical State: Distributed Consensus for Reliability

Distributed Periodic Scheduling with Cron

24. Distributed Periodic Scheduling with Cron

Data Processing Pipelines

25. Data Processing Pipelines

Data Integrity: What You Read Is What You Wrote

26. Data Integrity: What You Read Is What You Wrote

Reliable Product Launches at Scale

27. Reliable Product Launches at Scale

Accelerating SREs to On-Call and Beyond

28. Accelerating SREs to On-Call and Beyond

Dealing with Interrupts

29. Dealing with Interrupts

Embedding an SRE to Recover from Operational Overload

30. Embedding an SRE to Recover from Operational Overload

Communication and Collaboration in SRE

31. Communication and Collaboration in SRE

The Evolving SRE Engagement Model

32. The Evolving SRE Engagement Model

Lessons Learned from Other Industries

33. Lessons Learned from Other Industries