Senior Site Reliability Engineer Job Description

Senior Site Reliability Engineer Job Description Template

Our company is looking for a Senior Site Reliability Engineer to join our team.

Implement automation tools and frameworks;
Continuously refine monitoring processes, configurations, and thresholds;
Practice sustainable incident response and blameless postmortems;
You’ll be based out of our SF office or work remotely based in the United States;
Participate in a rotating on-call schedule to troubleshoot and resolve production escalations from our 24x7x365 NOC;
Build tools to help Operations teams to quickly pinpoint, isolate and resolve issues related to infrastructure, plaform services and applications;
You will monitor, maintain and help scale services that are integrated into S&P’s platform;
Develop playbooks and tools to streamline processes and shorten problem resolution time;
Automate all the things;
Monitor and optimize application performance within the deployment architecture;
Write code that improves scalability, performance, maintainability and security;
You will add, tune and maintain alert configurations and documentation as needed;
Ability to operate in the high-pressure environment and troubleshoot complex issues quickly, while successfully handling multiple priorities;
You will cultivate full-team participation in high quality, thoughtful software;
Learn or increase your expertise in coding – we use Python.

Scripting languages like Ruby, Groovy, Bash, PowerShell or Python;
Object-Oriented Software development in Java, Scala, etc;
NoSQL (etc., Couchbase, Cassandra);
Programming expertise in either Python or Ruby, with demonstrated knowledge of software engineering best-practice development (e.g., linting, testing);
Experience programming with Python/Java, and/or the ability and interest to learn, is required;
Experience in infrastructure like GCP, AWS, mysql;
Knowledge of best practices and IT operations in an always-up, always-available service;
Good experience with SQL and with Postgres or similar RDBMS;
5+ years of experience working in operations;
You possess expertise in scalable testing, automation, continuous integration frameworks and best practices;
BS Degree in Computer Science, Electrical & Computer Engineering or Mathematics or equivalent experience;
Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning;
Strong background in Linux/Unix Administration;
5+ years of experience with Windows and/or Linux operating systems internals and administration (e.g., filesystems, inodes, system calls);
Experience with algorithms, data structures, complexity analysis and software design.