I like to think of SRE as an engineering discipline dedicated to making a good service a magnificent one by elevating its resilience all while allowing it to scale, and reach its full potential!
Kristie N. Howard
SRE is not about technology. SREs enable change while maintaining sufficient reliability - but what is "reliability"? How much of it is "sufficient"? It depends. At its heart, SRE is about understanding what people (users, developers, customers) need from systems, and setting up processes and structures to better meet those needs.
SRE fuses the disciplines of software engineering and systems engineering with a deep understanding of how people and software interact at scale. By working across boundaries, Site Reliability Engineers drive transformational innovation in reliability and efficiency.
Site Reliability Engineering seeks to balance the risk of unavailability with the goals of rapid innovation and efficient service operations, so that users’ overall happiness—with features, service, and performance—is optimized.
Here’s what you do when someone breaks something or finds something very difficult to debug: You say thank you. Thank you for finding this edge case. Thank you for highlighting this overcomplicated part of our system. Thank you for pointing out this gap in our docs. And then you go make it so nobody can break it the same way again.
SREs engineer services, instead of binaries. This is a shift in perspective that exploits unusual skills and creativity. SREs are specialists in making changes safely.
John T. Reese
Fundamentally, it's what happens when you ask a software engineer to design an operations function.