Eveline Oehrlich: Hello, welcome. This is my first humans
of DevOps podcast and I am excited to have the boss lady
with me today, Jane Groll. Hello, Jane, how are you?
Jayne Groll: Hello, Evelyn, how are you? And thank you so much
for taking on this new role as host and facilitator for the
humans of DevOps Podcast. I'm really excited to be here with
you today.
Eveline Oehrlich: And I'm excited to take it on because I,
as I already shared with you, there's tons of things we can
talk about. But today, we're actually here to talk about a
most exciting topic, which is Site Reliability Engineering. We
did a recent report where, you know, we had a tons of
responses. So first question Site Reliability, why is that
such a big topic today?
Jayne Groll: So, you know, you and I both come from IT ops. And
so I think, and, of course, share your your perspective on
that. But it's very gratifying to see Site Reliability
Engineering, really giving it ops, kind of a new lens, or a
new perspective, or even a new role, right, an official role
that organizations are hiring and a set of practices that
organizations can embrace. I think it puts it ops look a lot
about shift left in DevOps, it really shifts IT ops. So far
left, that, you know, according to Google's definition, SRE is
what happens when you take a software engineer to help design
and and, you know, be part of it operations. Right. So it's a
really exciting set of practices. Is there a lot of new
in SRE? Maybe, maybe not. I mean, I always think of it is,
you know, kind of a new, modern approach to it, service
management. But I think there's just an underlying human
excitement. And I said, you and I've been watching it, you know,
trending for a while with the upskilling report. And I think
it's extra exciting today, that SRE has crossed the chasm,
right? I mean, we saw that in the report that it cross the
chasm. And now enterprises are starting to embrace it. What
about your excitement? I said, we both you know, we've we've
known each other a long time, we've kind of watched this rise
together.
Eveline Oehrlich: Yeah. So as you said, I've been in IT ops
also started out my career there in 94. And my first day on the
job was called a programmer by somebody in the business line.
Of course, I thought I was at that time programmer wasn't a
great at that time, it was not a great term. But I have to say it
was, at that time already a fun job I had to do and could do a
lot of things. So maybe both of us already were SOEs, without
even knowing it, then. But again, exciting for what reasons
what, what are some of the things we found, if you want to
go dive into why is it exciting? What's changing? Why is
everybody talking about it? And why did we do the research?
Jayne Groll: Well, I think the big change is looking at
operations as an engineering discipline. And so when we look
at software engineering, we look at DevOps engineering, right, we
look at a lot of the things that that kind of have this
engineering mentality, IT operations was always on the
sidelines, right? You know, you and I stood with the help desk
and everyone else, you know, with our hands up against the
fence waiting for something to be thrown over. But we really
weren't part of the solution. We were there to fix the problems.
And so I think that SRE kind of respond that right, respond that
in a way, starting with Google, but then new practices grafting
on top of it. So that led us to think about this report. Because
you know, you know, you and I have been doing as part of
DevOps Institute's upskilling report for years. And one of the
things we've seen is this, you know, this trend with SRE, and
then given our experience with IT operations, and then you
know, kind of this new excitement about this role of
site reliability engineer, it made sense for you and I to say,
Okay, let's take an agnostic approach. And let's go back to
the community, like we do with the upscaling report and say,
Tell us what's happening in the real world with this set of
practices? Is it the same? Is it you know, the same as other
frameworks? What are organizations doing? How do you
feel about about the role and what kind of benefits or
activities are you executing? And so I was excited because I
came to you and said, Hey, can we do this? And you went, sure.
And then we were even fortunate to get, you know, Sumo Logic and
StackState and Sedai to help underwrite it. So, yes, it's a
passion project, for sure. I think for you and for I, right.
Eveline Oehrlich: Yeah. So at first, you know, I thought going
down into the research and crafting the survey questions, I
was really focusing more on the practices and the automation and
of course, the adoption and the different team topologies. And
then when we talked it over with the sponsors, we realized there
was also a change in behavior. And we wanted to tickle out that
potential opportunity for people to get excited about something
different. Maybe you can talk a little bit about that from a
career perspective, because both of us have been, you know,
coaches, we've been leaders, we've been in IT ops, you've
done ITSM. I've been an analyst. I've had lots of conversations
with people who are just kind of like, okay, yeah, this is my
job. I go in in the morning and then come home in the evening.
But it's it felt to me that there was something more tell us
a little bit I know you use you see it in the research, and I
know you like that tickle as well. What did you what? What's
your thought around that?
Jayne Groll: So the first thing that struck me about SRE and I
had the opportunity to go to a very early ESRI con, right,
where there was still discussion, and it was still
very much something that was very Google specific. So
technology companies were looking at, you know, how do we
replicate what was in the books, but there's an underlying
excitement or a dynamic that that is baked into SRA, because
it looks at some things that maybe in the past have been a
little bit heavier or more disconnected, like, change
management is a great example of that, you know, for years and
years and years, organizations have struggled with how do we
manage changes? How do we control changes? Same thing with
service level objectives, right? What is the service level? Is it
how we, you know, fix something when it's broken? Or is it
actually maintaining a level of reliability. And then you add on
top of that, the fact that in the early days of DevOps, there
was a lot of talk about no ops. Now, the intent of no ops was
more automation, which is also baked into sre. But you know, as
an IT ops person as assistance administrator, as perhaps a role
that wasn't considered part of the cool kids club. Right? No
ops kind of spoke to no job, no respect. And now you spin
around, and now you've got SRE, which is, is very much a cool
job. But but also, there's a very human element that's baked
into SRE as well, it is IT Service Management. You know, I
talked about this all the time, it is about Incident Management,
change management, Service Level Management, you know,
observability event management. And so it is a kind of a cool
version of ITSM, more lightweight than perhaps some of
the other practices. But some of the core principles that were
brought forth in essary are time to make tomorrow better than
today. So in a pure world, and SRE is supposed to spend about
50% of their time, you know, reacting to, you know,
whatever's happening, and 50% of their time reducing toil, right,
so looking at technical debt, looking at automation, looking
at some of the things that have just been dragging it for a
very, very long time. SRE says, Hey, take some time, learn,
right shadow share. And let's look at ways to reduce that
manual redundant work that nobody likes to do. Right. But
there's automation that will do it for you. Right, but you've
got to be able to have that kind of innovation and the
proactiveness. So there's a lot of principles that are baked
into SRE that are outside just how do we manage your
infrastructure? How do you handle level two that are very
human. And I think that may be something that's contributing.
And we saw in the report that people said they were more
excited about their roles that they were being paid better that
they were more engaged in, in you know, the practices? I mean,
you're more familiar with, with the responses than I am? What
did you see in terms of like, did you feel an excitement in
the data?
Eveline Oehrlich: Yeah, a couple of things, which, when you were
talking about service management, and the excitement,
one is the topic of collaboration, always seem to
everybody always talks about, you know, DevOps, we need to be
better collaborators and collaboration and knowledge
management. But what we found in the report is first, knowledge
management was one of those things where SRE is actually
paid to capture their knowledge and shifted left so that others
upstream or even right downstream can actually leverage
to improve. So your job is becoming somebody who on a
proactive and while you're reacting at a proactive level,
insert your knowledge. So the next time somebody knows a
better way of doing things while you actually do that, that is
fantastic. So while we're still reacting to some things, because
we can't fix it all Oh, we can be proactive. I think that was
that was most important to me being that collaborator. And if
somebody asks me, What makes how, what should I be what my
skills need to be to be in an SRE? Well, you have to be, like
you said, an engineer, you have to know data modeling, process
modeling, you need to be in analytics, you need to know
software development lifecycle, you have to have some knowledge
about all this tech things. But you also need to be able to
understand where you can actually allow the toil to be
reduced. Where can you actually help to make the biggest
difference to developers, to testers, to customers, to
product line owners to all these internal and external customers.
And to me as an extrovert I am unfortunately, some people think
I may be too extrovert. That is, that's the beauty of this job,
you get to talk to people, you get to collaborate, you get to
work on multiple projects, while you actually fix things, and you
actually can see an outcome. And that was reflected in being
valued as a team member. Because as I remember, back in my time,
IT operations, the only time I think I was really valued was
when I carried the pager. The factory and I could actually
drive in, in, you know, whatever, 10 minutes I was, was
at the data center. So that excitement, I think is that
collaboration gives excitement, and allows you to become much
more of a valuable member than then you were maybe in the past
and an IT ops job.
Jayne Groll: You know, and it's funny you say that, because I
think one of the areas of essary, there really shows how
important collaboration is incident management. So in your
experience, in my experience, right, an incident occurs,
everybody goes not my fault, not my fault, right? You spend half
your time trying to figure out what change because it hasn't
been captured, or documented anywhere. And it becomes this
like hot potato of a ticket in an ITSM system being passed
around according to some escalation path, and you have 15
minutes to get it and two hours to fix it or whatever. And it
becomes a follow the pointing finger. Right. So like, you
know, it's very non collaborative. In SRE, there's a
couple of things, I think that really showcase that
collaboration. First of all, there's an incident command
system, right? Where it isn't just pass around a ticket, but
there's a collaborative opportunity to figure out what
change because that's usually the root cause of an incident,
what change, how do we fix it? Who can we get involved and, and
instead of playing, you know, pass the ticket around, it
becomes a collaborative event. The other side of it, I thought
was really interesting is, if there's a breach if the service
level objectives are breached, because of an incident, all
innovative work stops, right, all new work stops until they
figure out what caused the breach. Now, for a business that
can be very disruptive. So who wants to breach? Nobody? Right?
One of the disappointing parts, I think that we found in the
survey was that service level objectives were not necessarily
the first thing that organizations did, we saw that
in ITIL, as well, we're Service Level Management was was later
hopefully, we can flip the model on that. But it shows a
collaborative spirit, that instead of just pointing out it
wasn't me, or it was you and we follow that pointing finger,
that there isn't ever to work together, and to solve problems
together. And hopefully, that, first of all transfers
knowledge, but also helps to reduce toil, because in order to
reduce toil or technical debt, you first have to accept that
it's there. Yeah. And so you know, so I think there's some
pieces of that, what do you think about that SLO piece? So
why do you think that's a struggle?
Eveline Oehrlich: Yeah, I think the the challenge of it has
always been really defining what it is they are delivering from
an outcome perspective and the conversation between those who
are creating it, or those who are fixing it, and those who are
operating it. And those who are actually consuming it be at the
employee level of systems of record or system of engagements
at the customer level, those conversations are very, very
difficult to have, and then put that into some kind of a
perspective. I remember one conversation at my job at
Forrester, where a gentleman was asking me to help him move from
a 99.4 nines, right 99.9999 availability to five nines. And
so of course, as an analyst, my question was, why would you want
to do that? And he had no answer. He really didn't have an
answer. So we actually calculated for him, he just
thought that was the right thing to do. We didn't calculate it
that what it would cost for him to actually do that. And he was
like, Oh my gosh, I don't think my business will fund that. So,
right, the the conversation of having a service level
objective, which is tied to an SLA, and is defined through
SLIs, becomes difficult to define. And as we are in this
digital world as things, our customer experience and employee
experience is like the utmost goal of everybody, how is it
defined? What's the benchmark? How do I capture that? And I
think that's the challenge. There's plenty of more work to
do to understand how and what and I think it's really a matter
of bringing people together across the value stream, looking
at from the outside in, what is it we want to do versus what we
and it think we should be doing?
Jayne Groll: Absolutely. And you know, the other thing about SRE
that I find particularly fascinating, is while there's a
series of books that were published by Google, that isn't
necessarily the you know, the body of knowledge or the
prescriptive guidance, or it isn't religious, right, and we
say some of these frameworks became religious. And so now,
we've grafted other things into this umbrella of essary, like
observability, or chaos engineering, right? Things that
would have existed outside of it, and could have been part of
framework wars, right. But now, you know, this whole concept of
IT operations under Site Reliability, or systems,
reliability engineering, now, graphs, other practices. So what
are you seeing in that direction? I think, because
again, as a human, that gives me new opportunities, collaborate
gives me new things to learn, like observability, in
particular, I think is just really risen, as practices that
are considered part of that surgery.
Eveline Oehrlich: Yeah. So on the people side, if I, if I
think of it from people process technology, from a people side,
I think, individuals who are really looking for a new career,
this is a great way to get into engineering, if you are an
operations professional, get, you know, get yourself some
upskilling and move. That's one thing. from a leadership
perspective, I think having the perspective to look at those
people you have in your team and give them the chance to maybe
shift to a role like that is another thing. And then maybe
from an organizational perspective, that's still around
people, maybe we need to think about how we organize our teams
a little bit different from a process perspective. All these
things we mentioned, you know, knowledge management, incident
management, configuration management, right? The evil
CMDBA, the CMDB, yes, or no, or change management, all these
things will become more natural. And then from a technology
perspective, everybody wants to go to the cloud, we know we have
to go to the cloud organizations are shifting, but there is
opportunities as well to become a kind of a stack person around
the cloud. Now, one thing I want to make sure everybody knows
this SRE is not just for companies who are in the cloud
or going to the cloud, we see a lot of organizations who have
very mixed and hybrid environments who are applying
sre. And even in the mainframe, there isn't really a perfect
technology stack for adopting SRE. SRE is more of how and what
you do rather than around a specific technology. So let me
ask you a question. Jane. What about next year? What's your
prediction for next year? And I hope we will do the research
again. And we'll look forward to of course, do it in the same
cadence. We usually do our survey around the November
September timeframe. And then sorry, in May and then
republished around July. So any predictions for next year?
Jayne Groll: Yeah, I think SRE is going to continue to be
adopted by organizations. I think that, you know, if we look
at kind of the evolution of practices, starting with agile,
which is, what, 20 years ago, now over 20 years ago, and you
know, kind of the scaling of agile and then DevOps, which is
11/12 years ago, and looking at how do we shift left? And how do
we, you know, adopt more automation and the human
collaboration, it feels like essary is that third piece of
the puzzle. And I think that organizations will start to see
and there'll be more case studies that are released, of
course, our report will do will do every year. And I think
there'll be a migration because the digital first world which
we're moving into, and we got pushed into maybe a little
faster than we thought we would. You're right, you don't have to
be in the cloud to adopt SRE practices, right. It's just
about good practices and IT operations but I think also It's
gonna break some of the barriers that we've seen with say IT
Service Management that have always been, there's some really
great guidance and IT Service Management. But there's always
been a disconnect, say between those that are pre production
and those that are post production. And SRE seems to
break down those those walls. I think that the job, we're gonna
see more and more people really start to move into those roles,
regardless of what their history was whether they came from
opposite they came from dev, right. So I see this trending
faster, because the definition of IT operations is no longer no
ops, we called it "new ops" five years ago. And I think SRE is
really new ops, and it will build very, very rapidly,
probably more rapidly than agile or DevOps. What do you think?
What do you see next year?
Eveline Oehrlich: Yeah, I agree. Absolutely agree. And I'm a
little bit disappointed in us in the analyst community at the
time that we did not come up with that term Site Reliability
Engineering, and we talked about new ops. But yes, I think the it
gives a complete new career path and allows for a, an engineering
path for young folks to get into the millennials of today, as we
know, they're very different. I have to at home, they don't have
it careers, but I do see how they click and how they work. So
for them, I think that's going to be a great opportunity. All
right. Well, for anybody listening in, Jane, thank you.
This was fantastic. Appreciate your time. We have a report
anybody listening in if you are a fantastic case study on SRE,
reach out if you need some ideas on how to upskill reach out, you
know, where we where we are, the report can be found on the
DevOps Institute comm. website. If you have questions, comments,
we are happy to entertain. Great, thank you. This was a
very easy first podcast for me. Thank you for being so easy to
transition into this role. Have a great day and thank you,
everybody.
Jayne Groll: Thanks, Evelyn. Thanks, everyone.
We recommend upgrading to the latest Chrome, Firefox, Safari, or Edge.
Please check your internet connection and refresh the page. You might also try disabling any ad blockers.
You can visit our support center if you're having problems.