Site Reliability Engineer (SRE) Job Description

Site Reliability Engineer (SRE) Job Description Template

Our company is looking for a Site Reliability Engineer (SRE) to join our team.

Responsibilities:

  • You partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform;
  • You stay current on technical trends in order to suggest innovative tools and approaches to interesting problems;
  • You share your expertise with the entire Engineering organization;
  • You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules.

Requirements:

  • A passion for problem solving with strong analytical capabilities;
  • Comfort with Linux/Unix command line;
  • Exceptional and demonstrable web development experience;
  • Know at least one of {Python, Ruby, Java, C++, C#, Go} at an intermediate level;
  • Experience with relational databases, and NoSQL databases;
  • 3+ years of AWS administration;
  • Experience in automating releases, continuous integration/delivery systems and relevant tools (e.g. Jenkins, CircleCI, Travis CI, Buildkite, etc.);
  • Excellent knowledge of a scripting language like; Ruby, Python or Go;
  • Experience with Docker in a production environment including container orchestration (e.g. Nomad, Mesos, Kubernetes, etc.);
  • AWS-based, cloud-native infrastructure and managed services, such as AWS Redshift, EC2, S3 and other storage options, VPCs, IAM;
  • Experience working on cloud based infrastructure e.g AWS, GCP, Azure;
  • Experience with infrastructure as code (Terraform or CloudFormation);
  • You are empathetic: You take others’ opinions into account and clearly communicate your thoughts to reach technical solutions quickly;
  • Knowledge of configuration management systems like Ansible, Chef or Puppet;
  • You consider it important to understand and appreciate your customers, and enjoy seeing your work improve the work of others.