Site Reliability Engineering
Jump to Content
Home
In Conversation
Resources
new_releases
Hear from SRE Experts
SRE Books
Mobaa
Site Reliability Engineering
Jump to Content
Latest Resources
Practices & Processes
SRE Best Practices for Capacity Management
Article
Practices & Processes
Supplementary Materials
Article
Practices & Processes
SRE Classroom: Distributed PubSub
Article
Practices & Processes
A Brief Guide to Running ML Systems in Production
Article
Foundations and Principles
Complexities of Capacity Management for Distributed Services
Video
Practices & Processes
SLO Adoption and Usage in SRE
Book
1
2
3
4
5
6
Foundations and Principles
Practices and Processes
Management
Choose format
All
Article
PDF
Video
Book
Online Training
Webcast
Events
Foundations and Principles
Complexities of Capacity Management for Distributed Services
Video
Building Secure and Reliable Systems
Book
How to Get Started with Site Reliability Engineering
Video
Capacity Planning
Article
(Un)Reliability Budgets: Finding Balance between Innovation and Reliability
Article
2018 Award Winner - The SRE Book
Video
Google's Production Environment Tech Talk
Video
SRE YouTube Playlist
Video
Error Budgets and Risks
Video
Keys to SRE
Video
PostOps: A Non-Surgical Tale of Software, Fragility, and Reliability
Video
Invent More, Toil Less
Article
The Calculus of Service Availability
Article
USENIX SREcon Conferences
Online Training
Practices and Processes
SRE Best Practices for Capacity Management
Article
SRE Classroom: Distributed PubSub
Article
A Brief Guide to Running ML Systems in Production
Article
SLO Adoption and Usage in SRE
Book
Multi-single-tenant architectures in Cloud
Article
Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program
Book
The Art of SLOs
Article
Creating a Production Launch Plan
Book
Case Studies in Infrastructure Change Management
Book
Taming Chaos: Preparing for Your Next Incident
Article
A Case Study in Community-Driven Software Adoption
Book
Engineering Reliable Mobile Applications: Strategies for Developing Resilient Native Mobile Applications
Article
Reduce Toil Through Better Alerting
Article
Cloud CRE Production Maturity Assessment
Article
Making “Push on Green” a Reality
Article
BeyondCorp: A New Approach to Enterprise Security
Article
Reliable Cron across the Planet
Article
Managing Incidents
Article
Weathering the Unexpected
Article
Being an On-Call Engineer: A Google SRE Perspective
Article
10 Years of Crashing Google
Video
Incident Analysis
Video
Bad Machinery— Managing Interrupts Under Load
Video
How Container Clusters Like Kubernetes Change Operations
Video
Distributed Consensus Algorithms for Extreme Reliability
Video
Continuous Pipelines at Google
Video
Postmortem Action Items: Plan the Work and Work the Plan
Video
Canary Analysis Service
Article
Management
SRE As A Team Sport
Article
The System Engineering Side of Site Reliability Engineering
Article
Hiring Site Reliability Engineers
Article
Deploying SRE Training Best Practices to Production: How We SRE'ed Our SRE Education Program
Video
Interrupt Reduction Projects
Article
Sorry, no
available at the moment.