Senior Site Reliability Engineer Job Description Template
Our company is looking for a Senior Site Reliability Engineer to join our team.
Responsibilities:
- Implement automation tools and frameworks;
- Continuously refine monitoring processes, configurations, and thresholds;
- Practice sustainable incident response and blameless postmortems;
- You’ll be based out of our SF office or work remotely based in the United States;
- Participate in a rotating on-call schedule to troubleshoot and resolve production escalations from our 24x7x365 NOC;
- Build tools to help Operations teams to quickly pinpoint, isolate and resolve issues related to infrastructure, plaform services and applications;
- You will monitor, maintain and help scale services that are integrated into S&P’s platform;
- Develop playbooks and tools to streamline processes and shorten problem resolution time;
- Automate all the things;
- Monitor and optimize application performance within the deployment architecture;
- Write code that improves scalability, performance, maintainability and security;
- You will add, tune and maintain alert configurations and documentation as needed;
- Ability to operate in the high-pressure environment and troubleshoot complex issues quickly, while successfully handling multiple priorities;
- You will cultivate full-team participation in high quality, thoughtful software;
- Learn or increase your expertise in coding – we use Python.
Requirements:
- Scripting languages like Ruby, Groovy, Bash, PowerShell or Python;
- Object-Oriented Software development in Java, Scala, etc;
- NoSQL (etc., Couchbase, Cassandra);
- Programming expertise in either Python or Ruby, with demonstrated knowledge of software engineering best-practice development (e.g., linting, testing);
- Experience programming with Python/Java, and/or the ability and interest to learn, is required;
- Experience in infrastructure like GCP, AWS, mysql;
- Knowledge of best practices and IT operations in an always-up, always-available service;
- Good experience with SQL and with Postgres or similar RDBMS;
- 5+ years of experience working in operations;
- You possess expertise in scalable testing, automation, continuous integration frameworks and best practices;
- BS Degree in Computer Science, Electrical & Computer Engineering or Mathematics or equivalent experience;
- Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning;
- Strong background in Linux/Unix Administration;
- 5+ years of experience with Windows and/or Linux operating systems internals and administration (e.g., filesystems, inodes, system calls);
- Experience with algorithms, data structures, complexity analysis and software design.