SRE Classroom: Distributed ImageServer

Imageserver

Introduction

SRE Classroom: Distributed ImageServer is a workshop developed by Google's Site Reliability Engineering group. The goals of this workshop are to (1) introduce participants to the principles of non-abstract large systems design (NALSD), and (2) provide hands-on experiences with applying these principles to the design and evaluation of these systems. We consider NALSD a concept fundamental to SRE, and understanding its principles provides a basis for having meaningful conversations about the design and operation of large software systems.

In the first theoretical part of the workshop, participants learn about some foundational large system design principles and concepts. Topics include correctness, reliability, performance, different inter-system communication styles, and more. We introduce the problem requirements in detail and walk through the first parts of an example solution.

This is a presentation style workshop, where the speakers first present the problem, then walk through an example solution and any relevant system design concepts. However, we welcome participants to work through their own solutions before learning about the sample solution. This will give participants an opportunity to apply the principles they have learned to develop an ImageServer system that meets certain performance and correctness requirements and Service Level Objectives (SLOs).

Target Audience

This workshop includes technical content, and its primary audience is software developers and site reliability engineers. We have also welcomed folks in various other roles, including product management and senior engineering management, to this workshop.

The workshop includes hands-on work well-suited for groups of five, and scales well from 1 to 20 groups-as many as a hundred participants!

Workshop Materials

This presentation is the backbone of the workshop. It contains the training content that prepares participants for the practical exercises. We provide a pre-recorded video version of this workshop for reference. There are also detailed speaker notes for presenters that make it possible to deliver their own instance of the workshop with minimum preparation. We also provide a Presenter Guide with additional tips and guidance for leading the workshop.

The Participant Handout contains additional details about the exercise. The Latency Numbers Everyone Should Know handout contains reference numbers that are useful for back-of-the-envelope calculations. The NALSD Workbook contains reference material that is useful both during the workshop and more generally when applying the NALSD approach to solving system design problems. The Imageserver project code is an open-source repository with a simple imageserver web application using GCP services. The application allows users to upload photos in a few different formats, search photos by tags, and then download them. Use this as a reference system for exploring the behavior of microservices running inside a Kubernetes cluster.

The Facilitator Guide contains tips and guidance for facilitators of the workshop. Facilitators should read this ahead of time to prepare for making the workshop an awesome experience for everyone involved. The breakout template can be used to set up breakout groups during the hands-on portion of the workshop. This preparation step can be done by either the facilitators or the presenter – be sure to coordinate and make a game plan ahead of time!

Additional Resources

We aim to develop durable SRE Classroom materials for folks learning about NALSD. If you find this useful, tell us what you want to see in future exercises. Please use the issue tracker to send us your thoughts and suggestions. Alternatively, send us a tweet at @googlesre. Visit the SRE Classroom page to learn more about NALSD and SRE.

Licensing

The materials above are released under the Creative Commons CC-BY-4.0 license for anyone to use and reuse, as long as Google is credited as the original author. If you want to suggest improvements, have any problems with the content, or just want to ask a question, please create a bug in our issue tracker component.