Cloud Native, SaaS and cloud

What is an SRE or Site Reliability Engineer?

Back to overview

saas cloudnative devops sre

Quite some time ago Google coined the the term SRE or Site Reliability Engineer as part of the function Site Reliability Engineering. This role is now popping up more and more and is even replacing the widespread use of "DevOps Engineer".

The definition according to Wikipedia:

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.

It is about a half/half split between:

  1. Operations: supporting (internal) customers, fixing issues, doing manual changes, etc
  2. Automation and development: preventing recurring problems and improving operational quality through automation

Typically the work revolves around technologies like public cloud (like AWS or Azure), CI/CD (like Github and Concourse), monitoring (like Grafana and Prometheus), orchestration tooling (like Terraform) and is mostly in close cooperation with the development teams.

You could also consider SRE as an prescribed method on implementing DevOps.

Related insights