A conversation with John Allspaw at SREcon Part 1

John Allspaw of Adaptive Capacity Labs talks with the Prodcast about the challenges of dynamic systems, the value of learning, and more — in two parts, live from SREcon!

A conversation with John Allspaw at SREcon Part 1

[JAVI BELTRAN & JORDAN GREENBERG, "TELEBOT"]

SPEAKER 1: Welcome to season six of The Prodcast, Google's podcast about site reliability, engineering, and production software. This season, we met with SREs in person to hear what's on their minds, to explore the importance of psychological safety, and to learn what's coming next for SRE. And of course, the most important part is the friends we made along the way. Happy listening, and may all your incidents be novel.

STEVE MCGHEE: OK, we're here again. I'm Steve, still. This is The Prodcast, still, the podcast from Google about SRE and production software. You're John, aren't you?

JOHN ALLSPAW: Yep, my name is John Allspaw.

STEVE MCGHEE: Yep, cool. You've been at this conference the whole time, I think, so far, right? You spoke. You did a thing.

JOHN ALLSPAW: Yeah. Yeah, that's true. That's true.

STEVE MCGHEE: You did one thing which was like a workshop-y thingy, and it involved how learning works.

JOHN ALLSPAW: Yeah.

STEVE MCGHEE: Yeah, what was that all about? So I thought this was an SRE conference. What's going on there?

JOHN ALLSPAW: Yeah, yeah, yeah, so it may seem strange for learning to be talking about learning.

STEVE MCGHEE: I thought we were doing computers here. Come on.

JOHN ALLSPAW: Exactly. Exactly. As it turns out, I think it's kind of important. And all of the reasons that make-- there's a whole bunch of stuff that companies, yours included, put together, like this sort of scaffolding that recognizes, at least implicitly, that there are people who are new to the company. There are people who have been around for the company, and in the company, have seen all kinds of different things, and they are really knowledgeable about these things. There's nothing unique about Google on that front because that's how people learn.

STEVE MCGHEE: All companies do that, right?

JOHN ALLSPAW: All companies do that. But what do they do? Think about it for a second. You start. If there's an expectation-- there's no expectation that, the first day you're working, you will be productive, and contributing, just like somebody who's been there for six years. So what's the difference between that and when they are recognized and being productive? It's-- out on a limb here. It's because they've learned things. And not all of what they're learning comes on the wiki and is in a formalized way.

STEVE MCGHEE: Yeah, it ain't all book learning.

JOHN ALLSPAW: In fact, not only is it ain't all book learning, but everybody, especially this community, recognizes that book learning-- not even book learning. Let's say wiki learning. Because I'm pretty sure there's no computer science course where, when you come out, you are going to be able to contribute to Google's code base.

STEVE MCGHEE: Not yet. Maybe. Who knows?

JOHN ALLSPAW: Yes, I'm not good at prediction, so I'm going to-- but we don't pay much attention. On the one hand, we spend a lot of time doing interviewing, interviewing shadowing, and constructing how we're going to interview, how many people will interview you, a new person, whatever. What is that all about? That's all about gauging, evaluating, having some understanding, some picture of what expertise this candidate has. So it's really important-- it seems like-- established, we should figure out how we keep it. How is it working?

The trick that we were talking about earlier today is that it doesn't work like computers. Computers don't have expertise. We can have analogs to it, but all analogies have laws. And so it's about time. Nothing happens. There's no success. There's actually no failure without expertise.

So it's so ubiquitous, like gravity, that it's not even really thought about. And there are people in organizations that do think about it. They're just not in engineering parts of the org. And we're usually pretty full of ourselves-- and I mean all of us-- that we know how this works. We don't.

STEVE MCGHEE: So what's special about this-- why are you at this conference? Why aren't you at the CPA of America's conference? They also work for companies who need to learn things, I think.

JOHN ALLSPAW: And they party hard too. So there is something that's wild. Imagine I didn't have a background in software. I would still be attracted to practitioner conferences and communities like this, for at least one reason. There's many reasons. But the one I'm going to choose to talk about is that-- or to mention, is that-- we work and get things to do what we want, the technology to behave the way we intend it, without ever seeing it happen, in the same way that we develop a runner, right, or a weightlifter, or a boxer. We don't see code run.

We see representations that tell us something about the code running. But you can't go to the data center and say, if you squint, if you look--

STEVE MCGHEE: Show me the bits, John.

JOHN ALLSPAW: --you're going to see, there's the request. There it is. And there's a photo. Oh, you missed it. Yeah, that's not a thing. So we create these models. It has a lot more in common with NICU, N-I-C-U, Neonatal Intensive Care, because their patients can't talk to you-- very similar. And we're so good at it. And it's as if it's reeling and concrete. That's what attracts us.

STEVE MCGHEE: Yeah, I think some of the most interesting cases are when the graphs are lying to you, or the system is lying to you. I mean, that's what we say. It's not really lying to you, it's just that it always has been lying this whole time. Like, it's always been an interpretation game.

JOHN ALLSPAW: And somebody on the team, or somebody that you're working with, is like, something feels weird. They can't tell you what, but it is weird. There are some people who say, ah, I don't know about this. There are some people, when they say that, everyone listens because they've got a really good track record.

STEVE MCGHEE: Can you connect this to the concept of above the line and below the line, just in case people see this term out there in the world, like the idea of the line? What is that line?

JOHN ALLSPAW: Yeah, my colleagues and I put together this-- it's not really a model. It's just sort of a-- description of the world, no matter what company you're in. And there's below the line. There's this line of what we call the line of representation. And below the line is all of the technical stuff, the wires, all the code, all of that. And above the line, there are all of us.

And we only understand how things work below the line by building stories. Mental models aren't pictures, they're stories. And we build those, and we refine them over time. But we're so used to it that below the line feels concrete. But we never get to see it, really, in any way. But it is still good-- we're still good, shockingly, at understanding it.

STEVE MCGHEE: So is this not happening in the CPA's world? Or is it happening less? I mean, it seems like it would happen in medicine.

JOHN ALLSPAW: It is happening in medicine, absolutely. There's lots of kinds of differences and similarities. The world of accounting, which I don't know very much, absolutely, there's certainty, uncertainty, and ambiguity. But I just can't imagine that it's as dynamic. I could be wrong. But if things are changing all the time, your ability to influence those things are finite. And by the time you could possibly get a map, it would be stale. So don't even try.

STEVE MCGHEE: That sounds familiar.

JOHN ALLSPAW: Whereas GAAP accounting guidelines, maybe they're not as static as-- actually, no, time zones are actually not very static, are they? But they're a bit more static.

STEVE MCGHEE: Yeah, money doesn't flow at gigahertz speeds, probably, I'm guessing.

JOHN ALLSPAW: It does not flow at gigahertz speeds.

STEVE MCGHEE: Well, on that, thank you very much, John. That was fun. That was a quick one. I hope you enjoy the rest of the conference.

JOHN ALLSPAW: I hope you enjoy the rest of the conference as well. Thank you.

STEVE MCGHEE: Sounds good.

[JAVI BELTRAN & JORDAN GREENBERG, "TELEBOT"]

SPEAKER 1: You've been listening to The Prodcast, Google's podcast on site reliability, engineering, and production software. Visit us on the web at sre.google, where you can find books, papers, workshops, videos, and more about SRE. This season is brought to you by hosts Jordan Greenberg, Steve McGhee, Florian Rathgeber, and Matt Siegler, with contributions from many SREs behind the scenes. The Prodcast is produced by Paul Guglielmino and Salim Verjee. The Prodcast theme is Telebot by Javi Beltran and Jordan Greenberg.