0% found this document useful (0 votes)

105 views3 pages

SRE SRE: Site Reliability Engineering

The document discusses site reliability engineering (SRE) as a paradigm shift in service management. SRE aims to improve reliability, accept failure as normal, reduce tedious operational tasks, implement monitoring and metrics, set service level indicators and objectives with error budgets, and focus on automation to continuously improve systems over time.

Uploaded by

Pallab Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views3 pages

SRE SRE: Site Reliability Engineering

Uploaded by

Pallab Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

SRE

Site Reliability Engineering

Paradigm Shift in Service Management

Dev

DevOps SRE Architecture

OPs

Pallab Sarkar
1
Why SRE ?
Does all these sounds familiar ?
Things in Production fails, fails again…. And again.
Firefighting on the Root Cause and responsibility when
things breaks in Production and impact customers

Too many production issues, many repetitive issues,

things breaks too often, overloaded Support / Ops team

Everyone is busy, but things doesn’t get any

better over time. No single ownership for
Reliability / Availability / Allowed Error Rate

Too many escalations and handovers before the

right person gets engaged & start fixing the problem

SILO Culture. Defined process &

boundary delays cross team resolution

No time to fix Ops issues permanently, Dev is

always busy in roll out new features and they
want fast delivery of features.

2
SRE Principles & Objectives
SRE is a model where Software Engineers are engaged to run Operations and write
codes / use tools to measure, monitor, improve, automate
operational tasks.

Improve Reliability
SREs are always focused on Reliability. Try to
pinpoint and mitigate Point of Failures in the
system wherever possible.

Accept Failure
Accept Failure as normal, always be ready with
possible remediation and learn from failures

Reduce TOIL
Tedious, manual, repetitive, boring work SREs spend around 50% time improving the
systems they manage.

Measure & Monitor

Various tools are used for monitoring /alerting. Always keep the Customer first in mind
and alerting mechanism should be primarily to have minimal or zero customer impact.

Set SLI, SLO, Error Budget

Quantitative measure of Availability, Latency, Error Rate and set expected Target. Any
SLO breach to have consequence.

Automation / Make tomorrow better than today

Automating infrastructure and Ops activities are important for consistency, time saving,
faster/auto repair. SREs always take out time to make tomorrow better than today via
automation, implement self service, fix toil

SRE Insights for Google Cloud Users
No ratings yet
SRE Insights for Google Cloud Users
58 pages
White Paper - EDT11 - Site Reliability Engine
No ratings yet
White Paper - EDT11 - Site Reliability Engine
7 pages
Site Reliability Engineering v2
No ratings yet
Site Reliability Engineering v2
115 pages
SRE & Error Budgets for Reliability
No ratings yet
SRE & Error Budgets for Reliability
45 pages
SRE and Incident Management
No ratings yet
SRE and Incident Management
58 pages
Unit 05 - SRE
No ratings yet
Unit 05 - SRE
15 pages
Site Reliability Engineering (SRE)
No ratings yet
Site Reliability Engineering (SRE)
3 pages
Sre 250821 235741
No ratings yet
Sre 250821 235741
5 pages
An Architect's Guide to SRE
No ratings yet
An Architect's Guide to SRE
375 pages
SRE SRE at Google. Jamie Wilkinson, Hope Is Not A Strategy. - DOTC Melbourne 2018
100% (2)
SRE SRE at Google. Jamie Wilkinson, Hope Is Not A Strategy. - DOTC Melbourne 2018
43 pages
Ebook The Sre Transformation
No ratings yet
Ebook The Sre Transformation
8 pages
SRE Essentials: Key Principles & Practices
100% (1)
SRE Essentials: Key Principles & Practices
20 pages
Site Reliability Engineering
No ratings yet
Site Reliability Engineering
3 pages
SRE Paper
No ratings yet
SRE Paper
26 pages
Site Reliability Engineering Ebook PDF
No ratings yet
Site Reliability Engineering Ebook PDF
21 pages
Site Reliability Engineering Ebook
100% (2)
Site Reliability Engineering Ebook
21 pages
SRE Best Practices Guide
No ratings yet
SRE Best Practices Guide
11 pages
SRE 21 ShivagamiGugan SlideDeck
No ratings yet
SRE 21 ShivagamiGugan SlideDeck
27 pages
SRE Blueprint: Mastering SLOs for Success
No ratings yet
SRE Blueprint: Mastering SLOs for Success
4 pages
Site Reliability Engineering Course Content (SRE)
No ratings yet
Site Reliability Engineering Course Content (SRE)
5 pages
SRE Success: Philosophy, Tools, Habits
No ratings yet
SRE Success: Philosophy, Tools, Habits
31 pages
SRE Foundation V1 - 0 - Value Added Resources 11 - 2019
No ratings yet
SRE Foundation V1 - 0 - Value Added Resources 11 - 2019
8 pages
Catchpoint 2021 SRE Report
No ratings yet
Catchpoint 2021 SRE Report
33 pages
On-Call in Action
No ratings yet
On-Call in Action
13 pages
SREF Blueprint
No ratings yet
SREF Blueprint
1 page
Developing A SRE Culture-English
No ratings yet
Developing A SRE Culture-English
4 pages
SRE Course for FAANG Aspirants
No ratings yet
SRE Course for FAANG Aspirants
13 pages
What Is SRE
100% (1)
What Is SRE
40 pages
Cloud ITIL
No ratings yet
Cloud ITIL
92 pages
M2 - DevOps, SRE, and Why They Exist
No ratings yet
M2 - DevOps, SRE, and Why They Exist
34 pages
Site Reliability Engineer Nanodegree Program Syllabus
No ratings yet
Site Reliability Engineer Nanodegree Program Syllabus
16 pages
LinkedIn's SRE Implementation Guide
No ratings yet
LinkedIn's SRE Implementation Guide
12 pages
Site Reliability Engineering
No ratings yet
Site Reliability Engineering
9 pages
Google Cloud DevOps Engineer Exam Prep Sheet
No ratings yet
Google Cloud DevOps Engineer Exam Prep Sheet
16 pages
Ebook 10 Essential Skills of A Site Reliability Engineer Sre
100% (3)
Ebook 10 Essential Skills of A Site Reliability Engineer Sre
18 pages
SRE Google Notes
100% (1)
SRE Google Notes
8 pages
RP State of Sre Report 2022
No ratings yet
RP State of Sre Report 2022
46 pages
The SRE Report 2024 - Catchpoint
No ratings yet
The SRE Report 2024 - Catchpoint
59 pages
Google SRE: Engineering Web Reliability
No ratings yet
Google SRE: Engineering Web Reliability
21 pages
Career Framework - SRE
No ratings yet
Career Framework - SRE
12 pages
SRE Basics for IT Professionals
No ratings yet
SRE Basics for IT Professionals
5 pages
Wepik Integrating Site Reliability Engineering and Devops For Enhanced Operational Excellence 20240822082600iu2w
No ratings yet
Wepik Integrating Site Reliability Engineering and Devops For Enhanced Operational Excellence 20240822082600iu2w
8 pages
M6 - Apply SRE in Your Organization
No ratings yet
M6 - Apply SRE in Your Organization
41 pages
Becoming SRE Engineer
No ratings yet
Becoming SRE Engineer
3 pages
SRE Foundation Blueprint
No ratings yet
SRE Foundation Blueprint
1 page
Google Cloud DevOps Exam Prep Guide
No ratings yet
Google Cloud DevOps Exam Prep Guide
10 pages
Site Reliability Engineer Nanodegree Program Syllabus
No ratings yet
Site Reliability Engineer Nanodegree Program Syllabus
13 pages
SRE-Lecture 2-Principles OF SRE
No ratings yet
SRE-Lecture 2-Principles OF SRE
46 pages
Dayanand Jagatap Engineering Leader
No ratings yet
Dayanand Jagatap Engineering Leader
4 pages
Catchpoint 2018 SRE Report
No ratings yet
Catchpoint 2018 SRE Report
15 pages
Sportserve - Lead Site Reliability Engineer (SRE)
No ratings yet
Sportserve - Lead Site Reliability Engineer (SRE)
2 pages
SRE Report 2023 Catchpoint
No ratings yet
SRE Report 2023 Catchpoint
56 pages
SRE Practitioner v1.0 Exam Study Guide - July2021
No ratings yet
SRE Practitioner v1.0 Exam Study Guide - July2021
94 pages
Enterprise Site Reliability Engineering Contino
No ratings yet
Enterprise Site Reliability Engineering Contino
19 pages
Site Reliability Engineering Handbook by Anupam Singh
No ratings yet
Site Reliability Engineering Handbook by Anupam Singh
299 pages
Enterprise Roadmap To Sre
No ratings yet
Enterprise Roadmap To Sre
62 pages
Campus - Site Reliability Engineer
No ratings yet
Campus - Site Reliability Engineer
2 pages
Nutanix JD - Sre Role
No ratings yet
Nutanix JD - Sre Role
1 page
DevOps to SRE Transition Guide
No ratings yet
DevOps to SRE Transition Guide
38 pages
5073 Prelim P2 Mark Scheme
No ratings yet
5073 Prelim P2 Mark Scheme
6 pages
4E Chem (SPA) - PRELIM 2017 P1 - Edited
No ratings yet
4E Chem (SPA) - PRELIM 2017 P1 - Edited
14 pages
Gogul Flutter Developer 1705323106
No ratings yet
Gogul Flutter Developer 1705323106
2 pages
PSM I
No ratings yet
PSM I
1 page
Ayush Vaccination Certificate
No ratings yet
Ayush Vaccination Certificate
3 pages
4aa4 9872enw
No ratings yet
4aa4 9872enw
9 pages
DO 46 s2020
No ratings yet
DO 46 s2020
5 pages
SCW OS Level V
No ratings yet
SCW OS Level V
59 pages
TPG E-Wars
No ratings yet
TPG E-Wars
1 page
Challenges in Workplace Communication Coursework
100% (2)
Challenges in Workplace Communication Coursework
8 pages
Meyo Stream Policy 4.0-EN-2
No ratings yet
Meyo Stream Policy 4.0-EN-2
1 page
Emt 11 - 12 Q1 0403 FD
No ratings yet
Emt 11 - 12 Q1 0403 FD
20 pages
Ayomide - 2
No ratings yet
Ayomide - 2
32 pages
Aspakali Pinjar Pune
No ratings yet
Aspakali Pinjar Pune
5 pages
Rust Experimental v2017 DevBlog 179 x64 #KnightsTable
No ratings yet
Rust Experimental v2017 DevBlog 179 x64 #KnightsTable
2 pages
Atmega809/1609/3209/4809 - 48-Pin: 48-Pin Data Sheet - Megaavr® 0-Series
No ratings yet
Atmega809/1609/3209/4809 - 48-Pin: 48-Pin Data Sheet - Megaavr® 0-Series
82 pages
Chess Master Club
No ratings yet
Chess Master Club
2 pages
HTML, JavaScript, XML, JSP Quiz
No ratings yet
HTML, JavaScript, XML, JSP Quiz
11 pages
Lec 2 LAN Technologies
No ratings yet
Lec 2 LAN Technologies
46 pages
Syllabus For PH.D Entrance Test, RGPV. Common For CSE/IT/CA
No ratings yet
Syllabus For PH.D Entrance Test, RGPV. Common For CSE/IT/CA
6 pages
DevOps Bootcamp Full Curriculum
No ratings yet
DevOps Bootcamp Full Curriculum
33 pages
Compressor control-TS - L Manual Operacion
80% (5)
Compressor control-TS - L Manual Operacion
80 pages
R5105N Series Microprocessor Supervisor
No ratings yet
R5105N Series Microprocessor Supervisor
14 pages
Winter 2022 Adj
No ratings yet
Winter 2022 Adj
1 page
Mail Merge
No ratings yet
Mail Merge
3 pages
Information Technology in A Global Society
No ratings yet
Information Technology in A Global Society
21 pages
Wa0010.
No ratings yet
Wa0010.
27 pages
Military Applications of Internet of Things: Operational Concerns Explored in Context of A Prototype Wearable
No ratings yet
Military Applications of Internet of Things: Operational Concerns Explored in Context of A Prototype Wearable
12 pages
Harris HD-0165 Keyboard Encoder Specifications (1977)
No ratings yet
Harris HD-0165 Keyboard Encoder Specifications (1977)
3 pages
4009 2807 PDF
No ratings yet
4009 2807 PDF
351 pages
Backlog Exam - Routine - Even - 2023 - 24 - Sem - 2 - 4 - 6 - 8 - Spl. Supple
No ratings yet
Backlog Exam - Routine - Even - 2023 - 24 - Sem - 2 - 4 - 6 - 8 - Spl. Supple
14 pages
i.MX Linux Users Guide
No ratings yet
i.MX Linux Users Guide
111 pages
System Programming by Dhamdhere Text
No ratings yet
System Programming by Dhamdhere Text
456 pages
Digital Economy & Blockchain SWOT
No ratings yet
Digital Economy & Blockchain SWOT
8 pages
1X-F4-xx: Product Data Sheet
No ratings yet
1X-F4-xx: Product Data Sheet
2 pages

SRE SRE: Site Reliability Engineering

Uploaded by

SRE SRE: Site Reliability Engineering

Uploaded by

SRE

Site Reliability Engineering

DevOps SRE Architecture

Too many production issues, many repetitive issues,

Everyone is busy, but things doesn’t get any

Too many escalations and handovers before the

SILO Culture. Defined process &

No time to fix Ops issues permanently, Dev is

Measure & Monitor

Set SLI, SLO, Error Budget

Automation / Make tomorrow better than today

You might also like