The Art of SLOs is a workshop developed by Google's Customer Reliability Engineering team. The goal of the workshop is to introduce participants to the way Google measures service reliability—in terms of Service Level Indicators (SLIs) and Service Level Objectives (SLOs)—and give them some hands-on experience with creating these measures in practice. These are important, fundamental concepts: it is far easier to have a meaningful conversation about the reliability of a service when you have an objective way of measuring that reliability.
In the theoretical part of the workshop, participants learn how setting a target that describes the desired reliability of their services can resolve the organizational tension that so often arises between development and operations teams. They are shown how SLOs and Error Budgets can be used to measure and manage the reliability of a service in a data-driven, objective and user-focused manner. The workshop takes a technical turn as participants are given a brief introduction to the qualities that make for good SLIs. Finally, the session wraps up with an application of the four-step process for developing SLIs to a simple interaction users have with the server-side infrastructure of a fictional mobile game.
The practical part of the workshop asks participants to apply what they've learned to more complex interactions between the users of the game and its infrastructure. Each interaction has a particular twist that challenges them to think hard about what the user expects, and how to find a good proxy measure for how well the service meets those expectations. Finally, they're given example answers, which they can compare to their own progress and reasoning.
The workshop content is relatively technical and primarily aimed at development and operations engineers and their immediate management. However, you'll have the best results if you can include technically-minded product people and business leaders as well. SLO targets need to be set with your users in mind, and error budgets can only resolve organizational tensions if the consequences for exceeding them have executive backing.
If you don't have a whole team to educate, you might be interested in our Measuring and Managing Reliability course on Coursera, which is a more thorough, self-paced dive into the world of SLIs, SLOs and error budgets.
The four documents above are released under the Creative Commons CC-BY-4.0 license for anyone to use and reuse—as long as Google is credited as the original author. If you want to suggest improvements, have any problems with the content, or just want to ask a question please create a bug in our issue tracker component.