0% found this document useful (0 votes)

32 views15 pages

SRE Principles

The document outlines principles for automation, emphasizing that all processes should be automated or replaced if not feasible. It describes a culture of ephemeral servers and engineers, continuous integration and deployment, and the importance of monitoring, alerting, and incident response. Security and financial consciousness are integral, with a preference for externally managed cloud services and collaboration between SREs and software engineers.

Uploaded by

tenequm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views15 pages

SRE Principles

Uploaded by

tenequm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Automatio

Everything should be completely automated.

• If an existing process cannot be automated, it

will be replaced.

• If a proposed process cannot be automated, it

will be rejected.

• The SRE’s job is to automate themselves out of

a job. In practice this means constantly
automating menial tasks and moving on to
solve more interesting problems.
n

Ephemeralit
Servers are ephemeral. They can and will go
away at any time.
• Servers live in auto-scaling groups that self-
heal.

• Servers have health checks that assert the

health of their process(es).

• Servers boot from images that are fully

equipped and operational.

• Configuration management does not run

against existing servers. It is used only to create
images.

• Application servers are stateless.

Engineers are ephemeral. They can and will
go away at any time.
• Engineering workloads are shared. There are no
individual silos.

• Engineering practices are documented.

Documentation is up to date.

• All engineers have access to all codebases.

Continuous Integratio
All code changes are made via pull requests,
verified, and approved.
• All code is functionally tested, unit tested, and
linted.

• Linters are extremely opinionated. Engineers

should feel empowered to propose changes to
the rules in isolated discussions and pull
requests.

• Unit tests and linters run on every pull request,

preventing merges when the build fails.

• Functional tests run on every deploy,

preventing (or rolling back) deploys when the
build fails.
n

Continuous Deploymen
Deploys are easy, fast, safe, and frequent.
• Changes are deployed on every merge.
• Deploys do not require any human interaction
or approval.
• Deploy time matters and engineers should
strive to make it faster.
• Deploys can be started manually with a single
button. As many engineers as possible should
have access to the button.
• Rollbacks happen automatically when a failed
deploy is automatically detected.
• Rollbacks are held to all the same standards as
deploys.
• The master branch is the only branch that gets
deployed. All git branching is for the benefit of
the engineer prior to merging the changes into
master.
• It is easy to tell which commit is deployed.
• There is no such thing as a code freeze.
• Features are released by feature flags. Flipping
a flag does not require a deploy. A “flip freeze”
is acceptable.
t

Software Engineerin
SRE’s operate as software engineers, not
system administrators.
• Everything is managed in code. Any change to a
system is a code change.

• Code is written to be read by other engineers. It

is self-documenting.

• All processes are automated with software.

• CI/CD principles apply to all SRE code.

• The entire engineering team has access to all

SRE code.
g

Monitorin
All systems are monitored for critical
metrics.
• Metrics are easily available and consumable in a
single interface.

• Critical metrics are displayed on dashboards for

each system.

• The system that does the monitoring is

monitored by a separate system.
g

Alertin
When self-healing fails, engineers are
intelligently notified.
• Alerts summarize the problem succinctly and
include suggested actions.

• Engineers are only paged off-hours for

production. Other environments may alert
engineers during business hours.

• After resolving the alert as quickly as possible,

the next step (during business hours) is to
ensure the same alert never fires again.

• Excessive alerting is unacceptable. It is

addressed immediately.
g

Incident Respons
On-call engineers (both SRE’s and SE’s) feel
empowered to respond in a timely manner.
• SE’s are on-call for the systems they create and
own.

• SRE’s are on-call for low level systems and to

assist developers.

• All escalation policies have backups or

fallbacks.

• All escalation policies have rotations. No

engineer is on-call for a system full time.

• Escalating is acceptable if needed. Escalation

generates a follow-up task to understand why
the on-call engineer could not solve the
problem.
e

Postmortem
All user-facing incidents require a
postmortem.
• Postmortems are blameless.

• The process for a postmortem is easy to conduct

and has very little overhead. A few sentences is
sometimes sufficient. A meeting is not always
required.

• Postmortems are conducted reasonably soon

after the incident is resolved.

• A repository of postmortems is easily accessible.

Securit
Security is automated and baked into
everything.
• Security checks run as part of CI/CD.
• Intrusion detection systems are in place.
• Identity and access management is used to gate
all actions.
• As few infrastructure components as possible
are publicly accessible, ideally zero.
• Client applications only use public APIs.
• Engineers are trusted but verified.
• Credentials are not stored in plain text,
especially not in code.
• Credentials can be easily rotated.
• Access is revoked in a single place, which
propagates to all systems.
Offload security to managed services.
• Servers receive requests through managed load
balancers.
• All data stores receive requests from inside the
network only.
• Static content is delivered through a CDN.
Buckets are private.
y

Financ
SRE’s are financially conscious in all aspects
of their work.
• Costs measurements include engineering time
and effort.

• Tooling is used to monitor all engineering costs

an SRE can affect.
e

Cloud Architectur
An externally managed cloud is the default
place to run services. Running services by
any other means requires justification.
• Multi-region is appropriate when downtime vs
cost is properly measured.

• Multi-cloud (for redundancy) is almost never

worth the effort and loss of features.
On-premise solutions are appropriate when:
• A modern cloud front-end is in place
(OpenStack, etc).

• IT, capacity planning, and system

administration are all top-notch.

• The increased overhead is drastically cost-

effective when engineering time is considered,
and is projected to remain this way for the
foreseeable future.

• SRE’s are not expected to physically interact

with the data center.
e

Containerized orchestration is appropriate

when:
• Services are shown to successfully run in
containers.

• Services are in a healthy state and sufficiently

modularized.

• The increased overhead is deemed acceptable.

• The company is willing to invest heavily in

tooling.
Serverless solutions are appropriate when:
• Tooling and automation are used to managed
serverless functions.

• Service owners are willing to accept the

limitations of serverless.
Supporting Service
The default option for supporting services
(logging, monitoring, alerting, etc) is
externally managed and hosted. Running
these services internally requires
justification.
• SRE’s are constantly evaluating supporting
service options, new and old. The ability to
consolidate is a factor.

• Supporting services are secure, cost effective,

and useful to engineers.
s

Peopl
SRE’s and SE’s are on the same team. They
are all engineers.
• SRE’s are not blockers and allow access to as
many systems as possible.

• SE’s own their services and do not “throw code

over the wall.”

• SRE’s are willing and able to contribute to and

debug application code.

• SRE’s use and contribute to open source, if

possible.

• SE’s and SRE’s work together to plan new

services and architectures.

• SRE’s strive to make the lives of all engineers

better through automation.
e

DevOps Interview Questions Recently Asked ByMNCs in 24-25
No ratings yet
DevOps Interview Questions Recently Asked ByMNCs in 24-25
4 pages
Financial Maths
No ratings yet
Financial Maths
5 pages
Mar-A-Lago Accord - Condensed
No ratings yet
Mar-A-Lago Accord - Condensed
10 pages
Q&A - Dissolution of Partnership
No ratings yet
Q&A - Dissolution of Partnership
2 pages
Devops Full Notes
No ratings yet
Devops Full Notes
223 pages
Mastering Core DevOps Scenarios
No ratings yet
Mastering Core DevOps Scenarios
15 pages
ESDS Investor Presentation
No ratings yet
ESDS Investor Presentation
59 pages
SRE and Incident Management
No ratings yet
SRE and Incident Management
58 pages
AWS DEVOPS Interview Questions and Answers
No ratings yet
AWS DEVOPS Interview Questions and Answers
9 pages
DevOps Shack Pipeline Stages
No ratings yet
DevOps Shack Pipeline Stages
9 pages
Effects of Supply Chain Management On Customer Satisfaction
No ratings yet
Effects of Supply Chain Management On Customer Satisfaction
17 pages
Devops Interview Questions For 2025
No ratings yet
Devops Interview Questions For 2025
4 pages
Management for Nursing Students
No ratings yet
Management for Nursing Students
18 pages
Helm For Freshers (Step by Step Guide)
No ratings yet
Helm For Freshers (Step by Step Guide)
14 pages
Azure Kubernetes High Availability & DR
No ratings yet
Azure Kubernetes High Availability & DR
6 pages
Practitioners Guide To Scaling IaC
No ratings yet
Practitioners Guide To Scaling IaC
25 pages
Anurag Arwalkar: Web Developer Profile
No ratings yet
Anurag Arwalkar: Web Developer Profile
1 page
DevOps CI/CD Pipeline Mastery
No ratings yet
DevOps CI/CD Pipeline Mastery
2 pages
AWS DevOps Learning Roadmap
No ratings yet
AWS DevOps Learning Roadmap
2 pages
An Architect's Guide to SRE
No ratings yet
An Architect's Guide to SRE
375 pages
Sample DevOps Resume
No ratings yet
Sample DevOps Resume
2 pages
Export Challenges in India - May 2020 PDF
No ratings yet
Export Challenges in India - May 2020 PDF
12 pages
Code of Coduct Usha
No ratings yet
Code of Coduct Usha
11 pages
Devops Interview Question
No ratings yet
Devops Interview Question
19 pages
Recruitment Process at Bajaj Allianz
100% (1)
Recruitment Process at Bajaj Allianz
86 pages
Troubleshooting in DevOps
No ratings yet
Troubleshooting in DevOps
5 pages
Thingworx Devops
No ratings yet
Thingworx Devops
120 pages
Next-Generation Test Data Management: How To Deliver The Right Test Data, To The Right Teams, at The Right Time
No ratings yet
Next-Generation Test Data Management: How To Deliver The Right Test Data, To The Right Teams, at The Right Time
10 pages
Irrigation Design Engineer Application
No ratings yet
Irrigation Design Engineer Application
5 pages
Presentation On-Consumers Behaviour Towards Rental Garments and Their W Illingness To Accept It
No ratings yet
Presentation On-Consumers Behaviour Towards Rental Garments and Their W Illingness To Accept It
17 pages
TodaysFlix Financial Model NEW
No ratings yet
TodaysFlix Financial Model NEW
15 pages
GCP Architect Interview Questions
No ratings yet
GCP Architect Interview Questions
4 pages
Employee of the Month KPI Guide
No ratings yet
Employee of the Month KPI Guide
3 pages
50 Jenkins Interview Questions and Answers 2023
No ratings yet
50 Jenkins Interview Questions and Answers 2023
10 pages
10 Corporate Realtime Shell Scripts
No ratings yet
10 Corporate Realtime Shell Scripts
7 pages
Automating DevOps With GitLab CICD Pipelines - Mobilarian Forum - Official Symbianize Forum
No ratings yet
Automating DevOps With GitLab CICD Pipelines - Mobilarian Forum - Official Symbianize Forum
3 pages
General Interview Questions
No ratings yet
General Interview Questions
6 pages
Chapter 6 (2) Control System
No ratings yet
Chapter 6 (2) Control System
28 pages
WWW Acte in AWS Training in Hyderabad
No ratings yet
WWW Acte in AWS Training in Hyderabad
18 pages
AWS Scenario Based Interview Guide
No ratings yet
AWS Scenario Based Interview Guide
26 pages
Biala
No ratings yet
Biala
5 pages
CaseStudy Cisco Web
No ratings yet
CaseStudy Cisco Web
2 pages
Organizational Security Policy
No ratings yet
Organizational Security Policy
4 pages
Razor Wire Price Quotation
No ratings yet
Razor Wire Price Quotation
2 pages
SRE 21 ShivagamiGugan SlideDeck
No ratings yet
SRE 21 ShivagamiGugan SlideDeck
27 pages
Managerial Economics Module 5
No ratings yet
Managerial Economics Module 5
3 pages
Edureka Training - DevOps Certification Training Course
No ratings yet
Edureka Training - DevOps Certification Training Course
11 pages
Compliance Analyst Forensic Testing
No ratings yet
Compliance Analyst Forensic Testing
2 pages
DevOps Training in Kukatpally
100% (1)
DevOps Training in Kukatpally
4 pages
RptSancDig ClassifiedAbstract
No ratings yet
RptSancDig ClassifiedAbstract
10 pages
Active Trader Magazine - Linda Bradford Raschke - The Rituals of Trading PDF
100% (6)
Active Trader Magazine - Linda Bradford Raschke - The Rituals of Trading PDF
7 pages
DevOps Shack - Jenkins Pipeline Issues and Solutions
No ratings yet
DevOps Shack - Jenkins Pipeline Issues and Solutions
32 pages
AWS Interview
No ratings yet
AWS Interview
31 pages
Concorde Ertekpapir ZRT 2245885475 HU21 1040 2018 5052 7051 9087 1009 EUR
No ratings yet
Concorde Ertekpapir ZRT 2245885475 HU21 1040 2018 5052 7051 9087 1009 EUR
1 page
KOMSA Samsung KNOX Reseller Agreement
No ratings yet
KOMSA Samsung KNOX Reseller Agreement
4 pages
CHW ReproductiveHealth PH Rwanda Kinyarwanda 3
No ratings yet
CHW ReproductiveHealth PH Rwanda Kinyarwanda 3
77 pages
DevOps Engineer Canada
No ratings yet
DevOps Engineer Canada
3 pages
Architecting Clouds
No ratings yet
Architecting Clouds
37 pages
Software Quality & Cost Analysis
No ratings yet
Software Quality & Cost Analysis
6 pages
Dev m5 Casestudy v1 HTF wvx8xgj
No ratings yet
Dev m5 Casestudy v1 HTF wvx8xgj
3 pages
Cje Study Guide Final
No ratings yet
Cje Study Guide Final
14 pages
Ogden Plaza Monthly Application 2024
No ratings yet
Ogden Plaza Monthly Application 2024
2 pages
Docker Interview Questions Guide
No ratings yet
Docker Interview Questions Guide
4 pages
Autosys EEM Implementation Guide
No ratings yet
Autosys EEM Implementation Guide
71 pages
The Docker Handbook: by Anand Nevase
No ratings yet
The Docker Handbook: by Anand Nevase
57 pages
Kubernetes Command Reference Guide
No ratings yet
Kubernetes Command Reference Guide
1 page
Jenkins Project Types Overview
No ratings yet
Jenkins Project Types Overview
11 pages
SAFe 4 Agilist Exam Study Guide (4.6)
No ratings yet
SAFe 4 Agilist Exam Study Guide (4.6)
14 pages
Sybcom - 234 Business Management - I
No ratings yet
Sybcom - 234 Business Management - I
4 pages
CI - CD With Jenkins Pipelines, Part 1 - .NET Core Application Deployments On AWS ECS - by Alexander Savchuk - Xero Developer
No ratings yet
CI - CD With Jenkins Pipelines, Part 1 - .NET Core Application Deployments On AWS ECS - by Alexander Savchuk - Xero Developer
12 pages
Balanced Scorecard SOP Guide
100% (2)
Balanced Scorecard SOP Guide
17 pages
Memorandum of Agreement - Agent Seller
71% (7)
Memorandum of Agreement - Agent Seller
3 pages
DevOps for Software Developers
No ratings yet
DevOps for Software Developers
8 pages
Dev and Ops: Devops + Aws Training Overview Introduction To Devops & Aws
No ratings yet
Dev and Ops: Devops + Aws Training Overview Introduction To Devops & Aws
3 pages
20240619TMBTHANACHARTBANK (TTBTB) Solidbottomlineamidfragilecorebusiness
No ratings yet
20240619TMBTHANACHARTBANK (TTBTB) Solidbottomlineamidfragilecorebusiness
12 pages
Adobe Scan Jul 15, 2025
No ratings yet
Adobe Scan Jul 15, 2025
10 pages
Contract Assessment Guide
No ratings yet
Contract Assessment Guide
4 pages
Developing A Google SRE Culture
100% (1)
Developing A Google SRE Culture
25 pages
Devopsbootcampcloer
No ratings yet
Devopsbootcampcloer
115 pages
PGP Devops Brochure
No ratings yet
PGP Devops Brochure
23 pages
Ma Ansible Automation Platform 2 Ebook f30107 202109 en 1
No ratings yet
Ma Ansible Automation Platform 2 Ebook f30107 202109 en 1
14 pages
04 Resource Monitoring
100% (1)
04 Resource Monitoring
35 pages
Gitlabcimeetup 220330181442
No ratings yet
Gitlabcimeetup 220330181442
37 pages
CST 438 Exam
No ratings yet
CST 438 Exam
8 pages
Ansible For Teenagers
No ratings yet
Ansible For Teenagers
23 pages
Golang Unit and Integration Testing Guide
No ratings yet
Golang Unit and Integration Testing Guide
59 pages
Ethans-Prakash 1
No ratings yet
Ethans-Prakash 1
37 pages
Docker - Kubernetes Readme
No ratings yet
Docker - Kubernetes Readme
10 pages
Cloud Computin G: Sanjay Gandhi Institute of Engineering & Technology
No ratings yet
Cloud Computin G: Sanjay Gandhi Institute of Engineering & Technology
27 pages
50,000 Chit - 20 Months: S.No Month Monthly Amount Chit Amount
No ratings yet
50,000 Chit - 20 Months: S.No Month Monthly Amount Chit Amount
1 page

SRE Principles

Uploaded by

SRE Principles

Uploaded by

Automatio

Everything should be completely automated.

• If an existing process cannot be automated, it

• If a proposed process cannot be automated, it

• The SRE’s job is to automate themselves out of

• Servers have health checks that assert the

• Servers boot from images that are fully

• Configuration management does not run

• Application servers are stateless.

• Engineering practices are documented.

• All engineers have access to all codebases.

• Linters are extremely opinionated. Engineers

• Unit tests and linters run on every pull request,

• Functional tests run on every deploy,

• Code is written to be read by other engineers. It

• All processes are automated with software.

• CI/CD principles apply to all SRE code.

• The entire engineering team has access to all

• Critical metrics are displayed on dashboards for

• The system that does the monitoring is

• Engineers are only paged off-hours for

• After resolving the alert as quickly as possible,

• Excessive alerting is unacceptable. It is

• SRE’s are on-call for low level systems and to

• All escalation policies have backups or

• All escalation policies have rotations. No

• Escalating is acceptable if needed. Escalation

• The process for a postmortem is easy to conduct

• Postmortems are conducted reasonably soon

• A repository of postmortems is easily accessible.

• Tooling is used to monitor all engineering costs

• Multi-cloud (for redundancy) is almost never

• IT, capacity planning, and system

• The increased overhead is drastically cost-

• SRE’s are not expected to physically interact

Containerized orchestration is appropriate

• Services are in a healthy state and sufficiently

• The increased overhead is deemed acceptable.

• The company is willing to invest heavily in

• Service owners are willing to accept the

• Supporting services are secure, cost effective,

• SE’s own their services and do not “throw code

• SRE’s are willing and able to contribute to and

• SRE’s use and contribute to open source, if

• SE’s and SRE’s work together to plan new

• SRE’s strive to make the lives of all engineers

You might also like