Site Reliability Engineer (SRE) Job Description

Site Reliability Engineer (SRE) Job Description Template

Our company is looking for a Site Reliability Engineer (SRE) to join our team.

You partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform;
You stay current on technical trends in order to suggest innovative tools and approaches to interesting problems;
You share your expertise with the entire Engineering organization;
You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules.

A passion for problem solving with strong analytical capabilities;
Comfort with Linux/Unix command line;
Exceptional and demonstrable web development experience;
Know at least one of {Python, Ruby, Java, C++, C#, Go} at an intermediate level;
Experience with relational databases, and NoSQL databases;
3+ years of AWS administration;
Experience in automating releases, continuous integration/delivery systems and relevant tools (e.g. Jenkins, CircleCI, Travis CI, Buildkite, etc.);
Excellent knowledge of a scripting language like; Ruby, Python or Go;
Experience with Docker in a production environment including container orchestration (e.g. Nomad, Mesos, Kubernetes, etc.);
AWS-based, cloud-native infrastructure and managed services, such as AWS Redshift, EC2, S3 and other storage options, VPCs, IAM;
Experience working on cloud based infrastructure e.g AWS, GCP, Azure;
Experience with infrastructure as code (Terraform or CloudFormation);
You are empathetic: You take others’ opinions into account and clearly communicate your thoughts to reach technical solutions quickly;
Knowledge of configuration management systems like Ansible, Chef or Puppet;
You consider it important to understand and appreciate your customers, and enjoy seeing your work improve the work of others.