What is Site Reliability Engineering (SRE)?
SRE is what you get when you treat operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems behind all of Google’s public services — Google Search, Ads, Gmail, Android, YouTube, and App Engine, to name just a few — with an ever-watchful eye on their availability, latency, performance, and capacity.
What is SRE?
Since 2004, SRE has evolved to become the industry-leading practice for service reliability.
Hear from key figures about the history of SRE and what’s next for the SRE community.
What we do as SRE
Our job is a combination not found elsewhere in the industry. Like traditional operations groups, we keep important, revenue-critical systems up and running despite hurricanes, bandwidth outages, and configuration errors.
How We SRE At Google
As SRE, we flip between the fine-grained detail of disk driver IO scheduling to the big picture of continental-level service capacity, across a range of systems and a user population measured in billions.
Hear from our SREs
Hear four veteran Googlers describe their experiences as SREs: how their backgrounds led them to their current roles, what their day-to-day work looks like, and how they've seen the core questions SRE tackles (stability vs. agility, operational work vs. software engineering, proactive vs. reactive work) play out.