Site Reliability Engineer Job Description

Site Reliability Engineer Job Description Template

Our company is looking for a Site Reliability Engineer to join our team.

Work within a highly skilled team of engineers to deliver revolutionary improvements to the cloud and scale them;
Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability;
Knowledge of cloud platforms (AWS/Azure);
Practice sustainable incident response and blameless postmortems;
Taking action to get our HA production environments to “just work” without manual intervention or midnight alerts;
Create and maintain operational documentation and runbooks;
Understand complex technical details and build test methodologies for them;
Provide defect tracking and broken link tracking;
You will cultivate full-team participation in high quality, thoughtful software;
You will add, tune and maintain alert configurations and documentation as needed;
Help define Dave’s best practices and teach your teammates how to use them moving forward;
You will monitor, maintain and help scale services that are integrated into S&P’s platform;
Assist with the implementation and integration of AWS services into the CMT cloud service infrastructure to enhance scalability and robustness;
Engage with other Engineering and Product teams to improve reliability, performance, availability and security of the Coupa Cloud;
Evangelize the adoption of best practices in relation to performance and reliability.

CI/CD Pipeline Fundamentals;
Bachelor’s Degree in Computer Science, Computer Engineering or a closely related field;
An understanding and passion for testing, architecture, and observability;
2+ years of experience in UNIX/Linux operating system;
Experience with source control tooling, such as TFS or GIT, in a team environment;
Demonstrated proficiency in at least one of programming the following languages; Python, Java, Golang;
Experience working with large scale production deployments of thousands of servers;
Linux, no matter the flavor;
Experience designing, debugging and running fault tolerant large-scale distributed systems;
Cloud Formations;
Master’s degree in computer science or related degree;
Assist in the development and management of the Infrastructure as Code (IaC) processes;
Bachelor’s degree in Computer Science or related discipline;
Knowledge of cloud and virtualization technology;
Knowledge of web technologies IE: JBoss, Tomcat, Apache Server, WebSphere, etc.).