Narrator: You're listening to the humans of DevOps podcast, a
podcast focused on advancing the humans of DevOps through skills,
knowledge, ideas and learning, or the SK il framework.
Jason Baum: Hey everyone, it's Jason Baum, Director of Member
experience at DevOps Institute. And this is the humans of DevOps
podcast. Welcome back. Hope you had another great week this
week. I always hope you have a great week. So I hope this one
was even greater than the last one. Today we're gonna be
talking about incidents, mistakes, do overs, we talked
about blameless culture. It's one of the core principles of
DevOps. But in reality, is it as easy as just saying, we have a
blameless culture. In preparing for today's episode, I'm
reminded of a quote by Phoebe Waller bridge, the creator of
fleabag, and show runner of killing Eve. That's the very
reason they put rubbers on the ends of pencils because people
make mistakes. I love that quote. If you don't know who she
is, you know, Waller bridge has made a career of bringing to
life unconventional women who make a lot of mistakes. But what
her series have in common is that her flawed characters get a
chance at redemption, moving past mistakes, and offering them
an opportunity to prove something to themselves and come
out stronger and more confident on the other side. I feel like
this quote, the metaphor she made is a perfect setup for our
conversation today. Incidents are a great opportunity to
gather both context and skill. They take people out of their
day to day roles and force teams to solve unexpected and
challenging problems together. Joining me today to discuss this
topic is Lisa Carlin Curtis, Lisa is a product engineer
incident.io. In fact, Lisa was employee number two@incident.io.
She started out as a consulting working at Accenture before
accidentally becoming a developer. I'd love to hear how
that happened. Lisa loves building stuff. But it's also
interested in how people interact with each other in a
work environment, particularly in software engineering. Outside
of work, Lisa loves cooking, and pretty much any competitive
sports, we definitely have that in common. But I guess she likes
the British one. She says, I really don't know that much
about them. So Lisa, welcome to the podcast. Thank you for
joining me.
Lisa Karlin Curtis: Hey, lovely to be here. I'm really looking
forward to awesome.
Jason Baum: Are you ready to get human? Alright, let's do it. All
right. So we're talking about incidents? How can incidents be
considered an opportunity to gather context and skill?
Lisa Karlin Curtis: So I kind of started thinking about this.
When I was reflecting on Yeah, I kind of became a software
engineer accidentally, and I've accelerated quite quickly, I've
been very fortunate. And part of that is because a lot of the
stuff I did before I was an engineer was actually quite
useful. But also part of it was, I realized, I started doing this
thing where I was basically running towards the fire. So
stuff would go wrong. And I'd be like, Oh, that looks kind of
interesting. And all of the times where I learned most the
sort of step changes in my understanding or my context,
were around incidents. So they were like, something would go
wrong. And either I would learn, like, while we were fixing that
problem, I would learn about your stuff. Or straight
afterwards, when I was like reflecting on it and talking to
people about it, I'd learned a bunch of stuff. And so I kind of
started thinking about this and talking to people about it. And
it turns out other people had had the same experience, what a
surprise. And so I think that there is there is something very
unusual about incident, which is why it is an incident Right?
Like something, something happens that is unexpected that
you didn't know was going to happen. And then you have to
react to it. And that pushes people outside their comfort
zone. And it pushes you to do things and see things that you
wouldn't otherwise see. So I guess like, I can think of like
three, three key areas where it's really useful. So one is
about like broadening your horizons because you see the
stuff that you wouldn't see in your day to day. One is about
like teaching you how to build stuff that fails gracefully, and
then an observable way. So I think that one of the one of the
key differentiators between good software engineering and great
software engineering is about what happens when the thing that
you didn't think could happen happens. So like step one, make
it work. Step two, make it work really fast. Step three, make it
work really fast. And when you get a negative number that you
weren't expecting you explode really, really loudly as opposed
to just take the negative number and let's just Pay somebody a
negative amount of money or you know, whatever it might be. And
then sorry, just to finish off, the third is about building your
network. So you have a whole bunch of connections with
different people in your organization, you work with,
like your team. But in an incident, often you have to, you
have to work with lots and lots of people from across the
organization. And that builds bonds that are really important.
And I think really valuable both to you as an individual and to
the company.
Jason Baum: Yeah, absolutely. If you listen to this podcast,
you've heard me use parenting often as examples. Because I
think parenting is so applicable to what goes on in day to day
life outside of your home. And one of the things that I was
told when I was a new parent was plan for the unexpected or plan
for the implantable. And I think that's applicable here. I think
with incidents and mistakes, it's almost like, you need to
expect it, it's going to happen, if you're building a program,
it's going to have a bug. It's what happens after it happens.
That really matters, right? That's where all the everything
happens.
Lisa Karlin Curtis: Yeah, I think that's the differentiator
in terms of, if you're, if you're building a system, you,
you will predict a certain number of the possible things
that are going to happen, and you will make your system behave
well in them. And that's all great. And then you read a book
that's like, oh, you should make your system observable. And you
go, Okay, I will add some loglines. And I will add some
metrics. And like, it's really easy to do that in a way that
doesn't really add any value. And we've all we've all seen
examples of that. We've all seen dashboards that really mean
anything. And the way to get from like, having read in a book
to being actually able to do it, valuably, I think is just to see
it. And it's very difficult to do that into learn that in the
abstract, but as soon as you see someone trying to debug a
problem, and you see like, you know, what, what is the
breadcrumb? What are the breadcrumbs that they are
following, in order to get from our API is slow to this is the
root cause I can now fix this problem. And if you see somebody
do that enough times, you start to be able to lay your own
breadcrumbs. Because you can kind of imagine you can, you can
empathize, you can put yourself into that person's shoes who's
trying to debug it and be like, Oh, maybe it'll be useful for me
to like, have a metric here. Because if this specific bit
starts to go weird, we want to know about it. And I think that
that's something as you say, where it's like, it's all about
preparing for the unexpected. And a lot of that is actually
counter intuitively, perhaps it's not being able to handle
every case, it's being able to either be sure that what you're
doing is right, or get a human to help you out, right, and that
ENCODE is like throwing an exception or panicking or
whatever you want to call it. And that's the most important
thing, particularly if you're building your billing software,
like cars and planes and you know, software that we trust
with our lives, then you need to be sure that if the software
sees anything that it doesn't expect it the first the human.
And that's the same in like FinTech, which is my background.
And it's the same in lots of bits of software. And that's
something that like, you're not really taught that and if you
read it in a book, it doesn't really land. But once you see
it, you can really start to engage with it and sort of do it
yourself.
Jason Baum: So I mentioned blameless culture. And that's
really important in DevOps, with incidents of on blameless
culture anywhere, really. But I feel like it is something that
is said, I've heard it is said so much that it almost becomes a
buzzword. And you really question the authenticity of
when someone says, oh, we have a blameless culture? How do you
actually have a blameless culture? How do how do incidents
lead to learning? Without feeling like, the person who
made the mistake is getting in the way or, you know, is? Well,
yeah, feeling like you made a mistake, let people down. I
think that's, that's inherent nature for all of us, right?
Lisa Karlin Curtis: Yeah, absolutely. I think that there's
a lot of people have a lot of shame around, like making
mistakes at work. And that is a very like human thing, humans
are incredibly susceptible to shame. And what that means is
that there is so much psychological pressure when you
make a mistake to try and cover it up. And that is like the
worst possible thing that you can do in a software engineering
environment. And we know that, and yet all of us still have
that, right. All of us still have that moment when we find a
mistake was made. And we're like, maybe I'll just fix it,
and it will be fine. And no one will ever know. And I think that
is such a, it's so hardwired into our brains, that you have
to work very, very hard to combat it. And so some of the
obvious things that you can do as an individual, try and be
very open about your mistakes, particularly if you're in a
leadership role. Or some or if you have quite a lot of social
capital, because you've been in that organization for a long
time. That means that people will kind of monkey see monkey
do right. If you do it, other people will, will copy you and
we'll follow your example. And then there's another part of it,
which is I think you talked about failing together. So The
way that most most technology is more complicated than one person
made one error. It's normally lots of people made lots of
decisions that have all coalesced into a bad thing.
There was a famous one company, I used to work out where a
junior engineer had kind of gone on to there, they'd written some
code that was supposed to send an email, telling people who
weren't paying that they should pay basically kind of trying to
prompt and increase conversion. And the logic was a bit wrong.
And it was actually targeting all the people who were paying.
And they ran it in staging. And customer support got inundated
with requests. And the studio engineer is sitting there being
like, put it in staging, I'm really confused. This is very
stressful, like the team jumps in, like they go to support
support, then is sort of told this the mistake, don't worry,
your billings, fine. And they start to look back. And it turns
out that somebody has seeded staging with production data to
run some load tests, and they didn't anonymize the emails. And
so they've got they've just ran. Basically, they've run their
code in production, but they didn't know. And something like
that the junior engineer is, is really mortified because they've
done this thing. And they've they clicked a button and a
bunch of emails went out. And that's really bad. But I think
it's important to look at that as a group and be like, Well,
how would you have known that? Possibly, right? What Why did we
put production data in staging, why did we not anonymize it? Why
is staging setup so that it can send unlimited emails to
unlimited numbers of people? And there are a whole load of other
questions, right, and you can start to look at it as a
systemic problem. Or you can look at it, there's like the
Swiss cheese analogy of like, all the holes have to line up.
And I think that if you talk about things like that a lot,
then people get it, and people buy into it. And at that point,
it's much more comfortable to admit your mistake, because you
know that your team is going to gather around you and you know
that your team is going to take accountability. And so if you'd
like if you succeed together, if you fail together, you can build
this blameless culture. But if you hang people out to dry, if
you mock people, if you're mean, that's just going to reinforce
the shame that that person is already worried about.
Jason Baum: So if I if I'm hearing you, it's, it's that
proactive honesty, it's the mistake is made calling it out.
But saying, basically, it's calling it out for what it is
this happened. How do we address it? What do we do coming
together, getting everybody to rally around it? Without
pointing the finger?
Lisa Karlin Curtis: Yeah, I think that's exactly it. And
then you need to combine that with incident shouldn't be a big
scary monster. So I think there's a blog post on our blog
about like incidents and no bad thing, you should be declaring
more incidents. And there are there are, there are lots of
organizations who sort of measure their success on number
of incidents, which I think is is a really perverse incentive.
But I think that if if incidents become the norm, then mistakes
become the norm. And if you're all talking about incidents, and
if that information about that those incidents is really
accessible to people in your organization, then you've made a
mistake, just like everybody else on the team has made a
mistake has made hundreds of mistakes. Whereas if that is all
kept hush hush within the team, and it's not broadcast, then all
of a sudden, like that's the first mistake anyone in the
company has ever made as far as you're concerned. And that's a
really terrifying place to be.
Jason Baum: Yeah, and we all know that's not true. But yet we
feel it's just inherent human nature, right? To think that
your mistake is the worst mistake ever made, and oh my
god, they're gonna fire me or they're gonna like, black list
me or something bad is gonna happen. I'm never gonna work
again in this at the end of my life. And like, we just have
this habit, I think even as like from like kids through
adulthood, I always hope that that feeling would go away, and
it never has. Why?
Lisa Karlin Curtis: I think it's kind of it's partly the imposter
syndrome thing of like, the more you progress, the more
responsibility to have you have, the more you can see all the
things that you don't have to do. But I think that also there
is a there's another part of it, which is we, I think we inherent
we net, we inherently strive for perfection. And we want
ourselves to be perfect, because it's quite inconvenient that
we're not, because every single decision you make, you're like,
I think I'm right. You as a software engineer, you spend
your entire day going. Yeah, I think this is right, let's do
it. And if you've got a little voice in your head all the time
going, what if you're not what if you're not, that becomes
really stressful and really tiring. And so what we do is we
go, yeah, I'm right. Most of the time, this is kind of fine. And
then when something goes wrong, that punctures that sense of
confidence, and then that's really destructive. And then we
get really stressed because we think that the only reason
anybody has hired us is because we're right all the time. And
they haven't. But because because that makes our day to
day easier. I think it's very easy to kind of fall into that
and fall into the trap of almost believing your own rubbish that
you are, in fact, perfect and flawless.
Jason Baum: It's so funny because it's even, I would say
probably, if not the most common one. have the most common
questions that come up in an interview processes. Name and
incident that happened that were you like, for example, where you
made a mistake, but how did you handle it? And how did you
overcome it? How did you overcome it? Right? That's like
one of the most common questions, I think. If you
haven't, if you haven't had it, I don't know, maybe you haven't
ever applied for a job before in your life? Because I think it
must be the most common one. And yet, when you're hired to do a
job, it's almost like yeah, we do strive for perfection. And
that question went out the window. It's like, now I can't
make a mistake.
Lisa Karlin Curtis: Yeah, it's, it's really, it feels like one
of those things that there should be a better answer to as
well. But I don't think there is I don't think there's a silver
bullet. I think like, you talk about it, you lead from the
front. You you try and catch it when it does happen. And, and
you hope that slowly but surely, people, you know that that
feeling of wanting to hide it, that feeling of shame just
becomes less and less strong, and the muscle, you develop this
muscle of overriding it. But you know, I'm talking about this in
evangelizing, I still absolutely have that instinct. The only
thing that I've learned is I have a muscle that I can now be
like, I can recognize it. And I can look it in the face and be
like, we're not doing that today. Because that's not
useful. But it's it's definitely an active thing. It's not it's
not a sort of default.
Jason Baum: Yeah, yeah. And, and we have to look at it, it's
based on what you're saying. It's like isolated in each
incident is an incident, isolated, we have to take care
of it, learn from it, and just move past that. If I'm gathering
that.
Lisa Karlin Curtis: Think emotionally, absolutely. I think
there's there's another side to that, right, which is, if you as
an organization, I think incidents are really important
source of data to understand where you should be putting your
chips. So if you have good reporting, if you can look at
your incidents in aggregate, if you have like a good way of
recording them, and categorizing them, then you can start to use
that data and start to go, oh, this bit of tech seems to cause
us a lot of problems or, you know, this process seems to be
really risky for us. And now that will help that will help us
decide like, where are we going to invest next. So I think from
that point of view, you don't want to kind of leave them
behind the tool. And that's in direct conflict with what you
want people to do emotionally, which is to, you know, be there
in the moment, solve the problem, close it and not worry
about it. And I think that that's quite difficult to
manage, because you simultaneously are telling
people to leave it behind, and also telling them to constantly
be thinking about them and be you know, sitting in a quarterly
review being like, what went wrong this quarter? What do we
want to invest in to help make our platform more reliable,
right?
Jason Baum: You know, what it makes me think of, and apologies
ahead of time, American football, we have a quarterback
and the quarterback, throws interceptions. It's a given
thing. Everyone knows their quote, no matter who it is Tom
Brady is I mean, they're going to throw interceptions. And yet,
they strive for perfection. Because what sport what athlete
doesn't, right? And then when they happen, the one thing that
I would say is in that culture is they get on the phone, or
they go next to the offensive coordinator coach, and they look
at what happened in that play. Here's what happened. Here's why
he didn't see it. This is what happens. And then they are
supposed to forget it. Forget it ever happened and move on?
Because how do you move on with the rest of the game, if all
you're thinking about is the one big mistake you made? And I just
I think that for me, when you're when you're talking about kind
of forgetting it, that that instantly popped into my head?
So many things could be applied to that, I think.
Lisa Karlin Curtis: Yeah, I think also when we talk about
incidents, there's nothing that's specific to engineering
about them. Really. I think the engineers talk about them a lot.
We have a language that we discussed them in. But there are
loads of examples of incidents that are not engineering. And I
think almost all the stuff that we discuss around incidents
being a chance to build your network with other people being
a chance to touch things that you don't normally interact
with, you know, being a chance to watch what your system does
when it fails. Like that feels very engineering. But actually,
if you're a customer success team, you know what happens when
your processes fall over? What happens when the person who's
doing all of the glue work has gone on holiday and all of a
sudden something bad happens? And like you're still stress
testing, you're still finding the edges. It's just a slightly
different environments.
Ad: Are you looking to get DevOps certified? Demonstrate
your DevOps knowledge and advance your career with a
certification from DevOps Institute? get certified in
DevOps leader, SRE or dev SEC ops, just to name a few. Learn
anywhere, anytime. The choice is yours. Choose to get certified
through our vast partner network self study programs, or our new
skillup elearning videos. The exams are developed in
collaboration with industry thought leaders, and subject
matter experts in the DevOps space and Learn more at DevOps
institute.com/certifications.
Lisa Karlin Curtis: I think what's what I find interesting
about that is that you you start at, you're like, oh, when things
go wrong when it's bad. And we've had a number of incidents
where I would say the net impact on our company has been
positive. Because somebody reports it, we see it, we've got
some really quite good observability. So often, we can
like, find it pretty quickly fix it, turn it around and say half
an hour. And the customer ends that interaction, actually
feeling better about us than when they started, which is
probably quite counterintuitive, because really, if we just
hadn't broken it in the first place, maybe that would have
been better for them. But we we've joked internally about
maybe we should deliberately come up with bugs because of how
how much like great feedback we get when we fix things from
people,
Jason Baum: right? I feel like that's the evolution, right of
any good product is the feedback. So, you know, in
thinking about letting it go and thinking about these
improvements. I feel like there must be obstacles, though,
besides ourselves, right? And our own internal turmoil that we
put ourselves through, when we make a mistake, or when an
incident happens. There's deadlines, and you need to hit
them, you need to meet them. And when you miss them, that's a big
deal. So how does that play into when incidents happen? How does
that how does that, I guess, impact that feeling that we're
already feeling right, the shame that you talked about, and then
we have this deadline looming over our heads?
Lisa Karlin Curtis: I think that's really interesting. I
think it's very, very difficult because you have a trade off
generally. So normally, there's that there's a triangle of like,
speed, and quality, and the common warts on the other end of
the triangle. But the idea being you have to, you have to trade
off something number of people, maybe, maybe we should just
scrap all of that, I'm gonna start again, that's fine. I
think there's a trade off here. So when something goes wrong,
the first thing is I need to fix the things broke. And that takes
you however long it takes you and basically nothing else
matters. Generally, there is there is a sort of a type of
failure mode, where you're just trying to bring your system back
up, or resolve the bug or stop anything getting any worse. And
that's a really easy decision, because it's there, it's on
fire, we got to fix it. And then you get to a sort of second
stage of an incident, which maybe is like follow ups. Or
maybe you're still kind of in the incident mode where nothing
is on fire anymore. But there's a lot of things that you could
do that would make it less likely to happen or resolve it
in a neater way. And that's where you need your strong
engineers to come in and make those trade offs. And it's like,
what is the value of this piece of work? How long is it going to
take us? How much in the wrong direction? Is it from what we
thought we were doing? And can we afford to punt it? Can we
afford to delay it? And you get to this point where you've got a
deadline, people are set of problems. And you need to make a
decision about basically which is more important. And that is a
that is a strange shootout trade off often because it's like, we
have four people in our team, we have two weeks, what shall we
do? And the answer to that is not worth 60 Nowadays, because
in all likelihood, you won't get anything more than in my
experience. And so instead it has to be right which of these
which of these is more of a risk to us? What happens if we missed
the deadline. And that's a decision that needs to get
escalated to someone who has the authority to make that call and
the information. So that person that means that you have to make
that information really available to them in terms of,
you know, what, what is the work that we could do to mitigate it?
What would it be mitigating? And versus how far behind? Are we on
deadline? What does it mean, if we don't get the deadline? And
that is one of those. I think the lots of people have this,
oh, we'll find a creative solution. And sometimes there's
a creative solution and somebody's overskirt something
and actually, it's all gonna be fine. And sometimes there isn't.
There isn't enough time, and you have to pick something. And I
think identifying that is really important and being really
honest, from a kind of motivation and human point of
view. I think the the times where I've seen that go badly is
when people either kind of try and have their cake and eat it
and sort of say, Oh, I know you said you can't do these two
things. But what if I told you you could. And then there's
another problem where there is a trade off and nobody makes the
decision. And then you just end up in a situation where both
like the team is all kind of looking at each other being
like, do we make the decision now because we don't think it's
our choice, but but I guess no one's telling us what to do. And
then somebody shouts at them afterwards because they made the
wrong decision.
Jason Baum: I think that leads us right to my next question.
It's what can the leaders do to make this mindset part of the
culture of the organization and encourage it across all teams.
Lisa Karlin Curtis: So I think the first thing to look for is
look out with anybody playing the hero. In all organizations
that I've ever seen, there are a group of people who take on more
than their fair share of the burden of dealing with
incidents. And we could call them heroes. And that is really
good until it's really bad. And it's good, because they're
probably very good at dealing with incidents, because they've
had a lot of practice. And they're often the people who've
been at the company for a long time. But it's bad because it
means that nobody else is learning how to do it. And so
all of those benefits that we talked about right at the start
on, no one else is getting that. So they're kind of gatekeeping,
the skill needed to debug these issues. And that's very
problematic, because it stunts other people's growth. And when
that person burns out, or when that person goes on holiday, or
when that person leaves the company, you're suddenly in a
really bad situation. So you end up with these really bad key man
dependencies. And as a leader, I think it's really important to
identify those patterns. And if you're, if you're using tooling
you can look at who's answering who's getting paged, who's
taking your on call load, you can look at your incident, who's
leading your incidents, you know, have you got somebody
who's leading 50% of your incidents? That's probably not a
good sign. And you can use that data to find those people and
then chat to them Be like, why are you doing that? And probably
the answer is, well, because I think it's useful. And that's
like, great. But now we're gonna have a conversation about why we
need to spread this load out of the team. And it's a combination
of like protecting you and your mental health, frankly, but
also, it's about spreading the knowledge and spreading the
experience. So I think that that kind of pattern of having those
superheroes is really damaging. And it restricts your ability to
scale your incident response. And then, as a leader, the other
things you can do is encourage people to show that working. So
if you want people to learn from incidents you to make that
information available to them. And then you need to make it
accessible by which I mean available is like, have your
conversations in a public Slack channel. Ideally, use some
incident tooling so that you can curate those conversations and
build a timeline that somebody can interact with, write a post
mortem, share the post mortem. And then by accessible, I mean,
try and make it really easy for people to get that information.
So have it in a knowledge base that people can look at to find
something that they're interested in. And if you're,
it's like step one, write the thing. But if you just write a
post mortem that goes into draw, no one's really one at that
point. So push it out to people and make it clear to people that
reading those materials is part of their job and considered a
very good use of their time. And that's a difficult balance,
because there are some, sometimes you need people to
ship stuff. But I think it's important to talk about this
explicitly, and talk about the fact that if you look at what
other people did in their incidents, you can build better
software, you're going to have less, fewer incidents or less
fewer, you're gonna have fewer incidents, or your incidents are
going to be less severe and easier to debug. And that then
means that you'll be able to sort of teach the next
generation to the next generation, and you get this
great positive feedback loop, if everybody's talking about it,
and learning from each other, as opposed to the negative feedback
loop, where people are keeping it very secret where people are
gatekeeping it and where not everyone is getting involved.
Jason Baum: I feel like the word of the day, the word of the day
is transparency. I feel like that that's pretty much what
you're saying. Not like not to put a word in your mouth,
because I don't think you've said it specifically. But what
I'm hearing is transparency, transparency, and transparency.
As someone who works with the engineering team, or you know,
I'm just relaying information from people to people, on a on a
leadership team, for example, and half the time with a
deadline. It's because the leadership doesn't necessarily
understand it. You know, they they don't necessarily know,
what is the specific issue? What is the specific reason why a
deadline isn't being hit? Or? Or, you know, and that's where I
feel like the culture part can sometimes go awry, right. We
allow it to happen when there isn't transparency.
Lisa Karlin Curtis: Yeah, I think that when, when you lack
transparency that gives you it gives humans a lot more remit to
try and get what they want from bad, bad ways, basically. And if
you don't have transparency, you can lie. And you can put forward
an argument that suits whatever you think your goal is. And in
an ideal world, everybody in your organization has exactly
the same goal, and they're all pulling in the same direction.
But in reality, often people view their goals as being
slightly like in conflict with other people's goals, because
they're trying to get more resources for their team,
because they think their problem is the most important thing. And
if you if you don't have transparency, it's very
difficult to hold people accountable to those things.
Whereas if you do and if people are very honest and open, then
you The organization can make the right choice for the
organization. And if you think about a sort of individual first
versus organization, first type culture, ideally, as an
organization, you should be putting your chips and the thing
that is most important, not on the thing that has the best
argument. And the way that you don't make that mistake is to
use transparency, and to be open and honest and like generate
that culture and generate the culture of it being okay to make
mistakes. And also it being okay to say, I don't think this is so
important. And that's not know that that's not gonna impact
your career. And I think that, that's one of the reasons why
people don't do that. If because there is this view that like to
get promoted, or to get that job that you really want to get
influenced, you need to be the most important person, you need
to be doing the most important thing. And none of us are always
doing the most important thing. i This week I work I'm not doing
the most important thing that is very clear. And I and that's
kind of fun. But it also means that if somebody else needs an
extra pair of hands, I'm going to jump on their thing I'm not
going to just pursue with mine, because I want to look good. And
that's the sort of team first thinking that I think you need
to try and get into your cultural DNA as an organization.
Jason Baum: I love that I would love to hear in a in a team
update with the company. This week, I'm not working on the
most important thing. You know, I don't think we ever hear that.
Because everyone who does want to be the most important, I
think. So, you know, you you said accountability. And I
hadn't planned on asking this question, but it does now
trigger something in me. Incidents are okay. And they are
learning experiences. We just spent the past 30 minutes
talking about it. But when is how does accountability play
into this? When our incidents? Not that they're not okay. But
when do we need to hold accountability? Because I think
there needs to be an element of accountability. How does that
play in? Especially in a blameless culture,
Lisa Karlin Curtis: I think it's a really difficult balance. And
I think I'd come back to the stuff I was saying about failing
together. So I think that you, you can, as a team, you take
accountability for what happens in your team. There are
obviously occasions where somebody goes rogue and does
something that the team thinks was terrible idea that that's
sort of a HR issue, frankly, and I think it's very separate. But
generally, it's you as a team have made some some choices. And
you're now looking at the consequences of those choices. I
think that the way to hold teams accountable around incidents is
the same way that you hold teams accountable for any kind of
delivery. So if you imagine an incident is normally something
that that team maintains, has broken in some way. And so that
team has a kind of agreement or a contract with the rest of the
org that they will have this service and it will do this
stuff. And sometimes it won't, because incidents happen and
mistakes happen. I think you make people accountable by by
making them transparent, and by making them expose the trade
offs that they're making. And so as an example, if you're in a
team, which is under loads and loads of pressure for time, and
they're like, you've got to shut this thing as quickly as you can
as quickly as you can, if you as the as the senior engineer, or
the tech leader are looking at them saying, cool, we'll do
that. But there's risk. And these are the things that might
go wrong. If we do that, are we comfortable with that risk? Then
you're accountable. Because if something goes wrong, it's
either like, yeah, I said, these things will go wrong. And we
talked about it. And we decided we were okay with the risk, or
it's something completely different has gone wrong, that
is maybe significantly more severe than the things that we
thought my right let's talk about why we thought this was
safe. And why we thought why that wasn't in our risk
assessment. And so you're you're holding people accountable for
the trade offs that they're making. You're not holding
people accountable for an individual thing that went
wrong. It's not like why did this happen? It's why did you
think that we should take this risk, what pressure was put on
you, and let's look at it like at a system level, as opposed to
some person pressed a button on a backfill, and it made the
database really sad. And now we're going to run around
screaming saying that they should be fired. And I think the
other thing about accountability is it's about time. So I think
incidents are generally a lagging indicator, as opposed to
a leading indicator, if that's terminology people are familiar
with. leading indicator basically means you find out
pretty quickly whether a choice you're making is good or bad.
And a lagging indicator is something where a choice that
you make has impact sometime in the future. And because
incidents are a lagging indicator, often the people
handling the incidents are not the people who made those trade
offs. And that's really important to recognize when that
is true. And to understand what is the what are the root causes
to have that kind of discussion whether whether you go down the
five why's route or some other route But to have a really
meaningful discussion about what were the choices we could have
made to avoid this. And why didn't we make them? Was it
because we had loads of pressure on delivery? Was it because no
one was thinking about it, and we just didn't think it was a
risk. And then that's the problem that you have to solve.
And those are the things that you can hold people accountable
for think.
Jason Baum: Awesome. Thank you for answering that. One that I
don't know. It just came when you said accountability. It's it
just popped into my head because I feel like all we've ever heard
for years was accountability, accountability, who's
accountable for this? who's accountable for that? All the
accountability, stuff that people say? And, yeah, it's hard
to be blameless, when, when that's what the buzzword was
before blameless. So now, we're at the point of the podcast,
where I like to ask sort of a fun question of you personally,
because this is the humans of DevOps, and we're all about the
humans. So if there was one thing that you could be
remembered for, what would that be?
Lisa Karlin Curtis: I think I would like to be remembered as
someone who he made systems work better for people.
Jason Baum: Great. I think that's, that's certainly
applicable for today's conversation. Well, thank you,
Lisa, so much for joining me today. It was an absolute
pleasure.
Lisa Karlin Curtis: Thanks so much. I really enjoyed it. And
thank
Jason Baum: you for listening to this episode of the humans of
DevOps Podcast. I'm going to end this episode the same way I
always do encouraging you to become a member of DevOps
Institute to get access to even more great resources just like
this one. Until next time, stay safe, stay healthy, and most of
all, stay human, live long and prosper.
Narrator: Thanks for listening to this episode of the humans of
DevOps podcast. Don't forget to join our global community to get
access to even more great resources like this. Until next
time, remember, you are part of something bigger than yourself.
You belong
We recommend upgrading to the latest Chrome, Firefox, Safari, or Edge.
Please check your internet connection and refresh the page. You might also try disabling any ad blockers.
You can visit our support center if you're having problems.