How to be a well-grounded Site Reliability Engineer ⚒️
A short summary on the skills needed to be a well grounded Site Reliability Engineer
Many ask on what is SRE and what you need to learn to be a good Site Reliability Engineer.
Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams use software as a tool to manage systems, solve problems, and automate operations tasks. (RedHat)
In this post, I will share some of the fundamental & crucial skills required to be a well grounded Site Reliability Engineer.
Skills & Knowledge Required:
1. Coding skills - having the problem solving mindset to solve coding problems. Can be in any language.
2. Computer networking - you need to have a good understanding of the OSI model, TCP/IP, different networking protocols and how they interlink. Aware of simple linux networking services like ssh, ping, curl, dig, netstat etc.
3. Linux fundamentals - a good understanding of basic linux commands and internals of linux operating system. This will be useful when troubleshooting Linux problems.
4. System design fundamentals - one of my favourite topics. A good understanding of how to design highly available and scalable systems. Aware of fundamental concepts like proxies (forward & reverse), load balancing, DNS, CDN, client/web server model, caching, message queues, event driven architecture and more. I go through all these in my blog CoderCo.
5. Technologies - AWS/Azure/GCP, Git, Linux, Kubernetes, Chef/Puppet/Ansible, Terraform, Docker, GitHub Actions, Golang/Python etc.
6. SRE processes - an indicator should have basic understanding & awareness about good monitoring/alerting practices, SLO/SLA/SLIs, error budgets, release engineering(staging/canary), oncall, postmortem, capacity planning etc.
I write about software engineering, devops, sre and system design content on my blog here at CoderCo.
#sre #devops #softwareengineering #aws #azure #kubernetes #architecture