Two previous O’Reilly books from Google — Site Reliability Engineering and The Site Reliability Workbook — demonstrated how and why a commitment to the entire service life cycle enables your organization to successfully build, deploy, monitor, and maintain software systems. In this detailed report, Google Cloud Reliability Advocate Steve McGhee and Google Cloud Solutions Architect James Brookbank dive deeper into the specific challenges engineers face when adopting SRE in their organization.
Despite SRE’s popularity, many enterprises have experienced a significant gap between initial enthusiasm for SRE and its often modest level of adoption. If you're a product owner or have a stake in reliable services and need to know more about SRE adoption, this report will methodically guide you through the process.
- Get started by evaluating your existing environment and setting expectations
- Examine SRE’s approach to reliability—and learn why reliability is the most desired product feature
- Learn how to map SRE’s guiding principles, such as embracing risk, to your existing organization
- Develop a set of SRE practices for your team, based on what team members can do, what they know, and what tools they use
- Learn tips on how to actively nurture success and keep SRE working in your organization