Site Reliability Engineer Job Description Template
Our company is looking for a Site Reliability Engineer to join our team.
Responsibilities:
- Work within a highly skilled team of engineers to deliver revolutionary improvements to the cloud and scale them;
- Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability;
- Knowledge of cloud platforms (AWS/Azure);
- Practice sustainable incident response and blameless postmortems;
- Taking action to get our HA production environments to “just work” without manual intervention or midnight alerts;
- Create and maintain operational documentation and runbooks;
- Understand complex technical details and build test methodologies for them;
- Provide defect tracking and broken link tracking;
- You will cultivate full-team participation in high quality, thoughtful software;
- You will add, tune and maintain alert configurations and documentation as needed;
- Help define Dave’s best practices and teach your teammates how to use them moving forward;
- You will monitor, maintain and help scale services that are integrated into S&P’s platform;
- Assist with the implementation and integration of AWS services into the CMT cloud service infrastructure to enhance scalability and robustness;
- Engage with other Engineering and Product teams to improve reliability, performance, availability and security of the Coupa Cloud;
- Evangelize the adoption of best practices in relation to performance and reliability.
Requirements:
- CI/CD Pipeline Fundamentals;
- Bachelor’s Degree in Computer Science, Computer Engineering or a closely related field;
- An understanding and passion for testing, architecture, and observability;
- 2+ years of experience in UNIX/Linux operating system;
- Experience with source control tooling, such as TFS or GIT, in a team environment;
- Demonstrated proficiency in at least one of programming the following languages; Python, Java, Golang;
- Experience working with large scale production deployments of thousands of servers;
- Linux, no matter the flavor;
- Experience designing, debugging and running fault tolerant large-scale distributed systems;
- Cloud Formations;
- Master’s degree in computer science or related degree;
- Assist in the development and management of the Infrastructure as Code (IaC) processes;
- Bachelor’s degree in Computer Science or related discipline;
- Knowledge of cloud and virtualization technology;
- Knowledge of web technologies IE: JBoss, Tomcat, Apache Server, WebSphere, etc.).