Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Site Reliability Engineer (Intelligent Automation)

$162.6k - $302k

Genentech

The Position A healthier future. It’s what drives us to innovate. To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more time with the people we love. That’s what makes us Roche. Advances in AI, data and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organizations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The Computational Sciences Center of Excellence (CS CoE) is a strategic, unified group whose goal is to harness the transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and life-changing medicines for patients worldwide. Within the CS CoE organisation, the Data and Digital Catalyst (DDC) organization leads the modernization of our computational and data ecosystems by integrating digital technologies across Research and Early Development to empower stakeholders, advance data-driven science and accelerate decision-making. The Solutions team within the DDC Organization develops modernized and interconnected computational and data ecosystems. As a Site Reliability Engineer in the Solutions Engineering capability, you will work closely with our engineering colleagues to play a pivotal role in designing, implementing, and maintaining scalable, resilient, and supportable cloud-based platform solutions. The focus will be on enabling research Application, Machine Learning (ML) workloads and HPC environments through automation, efficient resource management, and Infrastructure as Code (IaC) using tooling. As a member of the DDC team you will help mature the scalable platforms that help unlock the potential of our diverse scientific data, accelerating the discovery and development of life-changing treatments for patients. The Opportunity Architect and implement IaC solutions using tools like Terraform, Spacelift, or CloudFormation to provision and manage cloud infrastructure for ML and HPC workloads. Automate the deployment of scalable ML pipelines, HPC clusters, and supporting services across global regions. Architect resilient and highly available solutions for ML and HPC workloads using cloud-native practices such as auto-scaling, load balancing, and failover mechanisms. Implement disaster recovery (DR) and business continuity plans for critical systems to ensure global operational integrity. Conduct chaos engineering experiments to validate system reliability and identify potential weaknesses. Develop automation scripts and workflows to streamline infrastructure management, deployment, and scaling for ML and HPC use cases. Implement robust monitoring, logging, and alerting frameworks using tools like Prometheus, Grafana, Datadog, or ELK Stack to provide deep insights into system health and performance. Knowledge of AIOps incident management, processes and tooling. Provide technical leadership to a team of engineers, fostering a culture of collaboration, innovation, and continuous improvement. Partner with cross-functional teams to align infrastructure solutions with business objectives and ML/HPC workload requirements. Mentor and train junior engineers in IaC practices, ML, and HPC infrastructure design. Monitor and optimize cloud infrastructure usage and costs for ML and HPC workloads. Ensure compliance with organizational security, governance, and regulatory policies in all IaC and cloud implementations. Who You Are Bachelor’s or Master’s degree in Computer Science or similar technical field, or equivalent experience and 7+ years of experience in software engineering Site Reliability Engineering (SRE). Proven expertise in supporting and deploying IaC solutions in cloud environments (AWS, Azure, or GCP) for ML and HPC workloads. Background in MLOps pipelines, including model versioning, CI/CD for ML, and feature store integration including experience with managed ML services (e.g., AWS SageMaker, Google AI Platform, or Azure ML). Deep understanding of cloud-native architectures, including autoscaling, serverless, and multi-region deployments. Technical Skills: Advanced proficiency with IaC tools: Terraform, Pulumi, or CloudFormation. Expert in scripting and automation: Python, Bash, or Go. Strong understanding of GPU-accelerated computing (e.g., NVIDIA CUDA, TensorFlow) and HPC workload scaling. Knowledge of distributed systems, storage solutions, and data pipelines. Familiar with monitoring and observability tools: Prometheus, Grafana, Datadog, or similar. Soft Skills: Strong problem-solving skills, with a methodical approach to troubleshooting. Excellent communication, leadership, and mentoring abilities. Ability to work collaboratively across teams in a fast-paced, dynamic environment. Preferred Qualifications Certifications in cloud platforms (e.g., AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Azure Solutions Architect). Experience with distributed ML frameworks and data engineering pipelines (e.g., Horovod, TensorFlow Distributed, Apache Airflow, Apache Spark ). Experience with compliance frameworks (e.g., GDPR, SOC 2, ISO 27001). Onsite presence, on our South San Francisco campus, is expected for at least 3 days a week. Relocation benefits are not available for this job posting. The expected salary range for this position based on the primary location of California is $162,600 - $302,000. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. A discretionary annual bonus may be available based on individual and Company performance. This position also qualifies for the benefits detailed at the link provided below. Benefits Genentech is an equal opportunity employer. It is our policy and practice to employ, promote, and otherwise treat any and all employees and applicants on the basis of merit, qualifications, and competence. The company's policy prohibits unlawful discrimination, including but not limited to, discrimination on the basis of Protected Veteran status, individuals with disabilities status, and consistent with all federal, state, or local laws. If you have a disability and need an accommodation in relation to the online application process, please contact us by completing this form Accommodations for Applicants. #J-18808-Ljbffr Genentech

Vacancy posted 23 hours ago
Similar jobs that could be interesting for youBased on the Principal Site Reliability Engineer (Intelligent Automation) in South San Francisco, CA vacancy
  • $300 per month

     ...time Location Type On-site Department Cloud Engineering Crusoe's mission is to...  ...abundance of energy and intelligence. We’re crafting the...  ...About This Role As a Principal Site Reliability Engineer, you will play...  ...production issues occur Drive automation across provisioning,... 
    Principal
    Full time
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    23 hours ago
  •  ...Xona is the navigational intelligence company bringing real-time, centimeter-level certainty...  ...protection. We are seeking a Site Reliability Engineer (SRE) to architect and manage the...  ...our production environments, from automating deployments via Infrastructure as Code... 
    Suggested
    Permanent employment

    Xona Space Systems, Inc

    Burlingame, CA
    1 day ago
  • $285k - $315k

    Ironclad Inc. is seeking a Principal Engineer in San Francisco to drive the development of AI-powered contract solutions. The role requires over 10 years of experience in software engineering, especially in designing and evolving distributed systems. You'll collaborate... 
    Principal
    Contract work

    Ironclad Inc.

    San Francisco, CA
    2 days ago
  •  ...ML Engineer – AI-Powered Automation & Workflow Intelligence A pioneering tech startup at the intersection of AI and intelligent automation is transforming how enterprises scale complex operations. With backing from global clients in high-value sectors and an experienced... 
    Suggested

    Blue Signal Search

    San Francisco, CA
    4 days ago
  • $117.2k - $176.7k

     ...efforts. Job Category: Software Engineering Job Details About...  ...of Salesforce. Our Threat Intelligence team focuses on defending our...  ...capacity of a Threat Intelligence Automation Developer, you operate at...  ...AI to operate accurately and reliably. Minimum Requirements A minimum... 
    Suggested
    Remote work

    Centaur Labs

    San Francisco, CA
    14 hours ago
  • $132k - $222.2k

     ...scientific agentic AI, lab automation, and unified data platforms...  ...laboratory science. You will engineer the connective tissue between...  ...engineers to deploy intelligent systems that accelerate molecule...  ...transition prototypes into reliable lab operations Deploy and maintain... 
    Principal
    Full time
    Flexible hours

    Eli Lilly and Company

    San Francisco, CA
    1 day ago
  •  ...significantly outperforms individual engineers. We combine language models...  ...are seeking an experienced Site Reliability Engineer to join our...  ...foundational platforms and automation that enable our engineering...  ...: Prioritizing collective intelligence # Fearless Innovators :... 

    CodeRabbit

    San Francisco, CA
    1 day ago
  • $163k - $203k

     ...team, responsible for the reliability, scalability, and security...  ...This is as much of a platform engineering role as it is SRE role -...  ...guardrails or policy engines for automated systems Track record of...  ...to view our Artificial Intelligence Notice for Applicants. At... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper.com

    San Francisco, CA
    4 days ago
  • $220k - $235k

     ...Staff/Senior Staff Site Reliability Engineer Ironclad is the leading AI contracting platform that...  ...Ironclad unifies the entire process on one intelligent platform, providing leaders with the...  ...record of designing and driving an 'automate-everything' culture (build, test,... 
    Full time
    Contract work
    Work at office

    Ironclad Inc

    San Francisco, CA
    1 day ago
  • $287k

     ...that. Ivo is the contract intelligence platform of choice for companies...  ...the last 12 months. Engineering at Ivo Engineers at Ivo...  ...for an Senior or Staff Site level Reliability Engineer as part of...  ...an AI-native platform to automate legal drudgery. People love... 
    Contract work
    Work at office
    Remote work

    IVO Inc

    San Francisco, CA
    4 days ago
  •  ...Site Reliability Engineer (SRE) FLUIX is building the AI operating system that plans, designs,...  ...Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to...  ...deployment. Develop and maintain automation tools to streamline operations, improve... 
    Work at office
    Weekend work

    Fluix AI

    San Francisco, CA
    4 days ago
  •  ...Plenful is on a mission to move pharmacy forward through intelligent automation. We build AI-powered software that eliminates...  ...continue to scale. About the role We're hiring a Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability... 
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    4 days ago
  • $350k

     ...Site Reliability Engineer (SRE) San Francisco Thinking Machines Lab's mission is to empower...  ...through advancing collaborative general intelligence. We're building a future where...  ...problems, including building tooling and automation. Experience with production... 
    Local area
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines Lab

    San Francisco, CA
    23 hours ago
  • $200k

    About the Role: AngelList is seeking Senior Software Engineers to join our Intelligence team. You will design, build, and operate services that power...  ..., and retrieval, and able to turn prototypes into reliable, instrumented services. Experience working with LLM APIs... 
    Work at office
    2 days per week

    AngelList Venture

    San Francisco, CA
    14 hours ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril...  ...Most Innovative Company in Artificial Intelligence for 2026. Our engineering team is...  ...on' role — you will build the automation, observability, and tooling that allows... 
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    4 days ago
  • $275k - $290k

     ...advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push...  ...quality. About the position CodeRabbit is seeking a Principal Competitive Intelligence Manager to join its Product Marketing organization. This... 
    Principal
    Full time
    Shift work

    CodeRabbit

    San Francisco, CA
    4 days ago
  •  ...About Mercor Mercor's mission is to organize human intelligence to power the AI economy. We partner with leading AI labs...  ...Francisco, NYC, or London offices. About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability across... 
    Work at office
    Relocation package

    Mercor Alabaster

    San Francisco, CA
    4 days ago
  • Aera Technology in Mountain View, CA, seeks an experienced pre-sales professional to lead client discussions around their Decision Intelligence platform. You will be responsible for customizing demos, engaging in workshops, and communicating the platform’s value... 

    Aera Technology

    San Francisco, CA
    1 day ago
  •  ...apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE...  ...You Will Be Doing Improving production reliability and system resilience within an SRE scoped...  ...and blameless postmortems A focus on automation, reducing toil, and preventing problem... 
    Flexible hours

    Megaport

    Brisbane, CA
    23 hours ago
  • Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco, California, United...  ...enables teams to design, execute, and monitor intelligent agents that drive automation, insights, and action, while providing the control... 
    Full time
    Remote work

    Neara

    San Francisco, CA
    1 day ago
  • $160k - $300k

     ...manage over $30 trillion in assets globally. We deliver the intelligence that gives finance professionals a definitive edge. Our AI...  ...performance, alpha, and market leadership. The Role Platform engineering at Hebbia is about excellent, scalable enablement. You are... 
    Work experience placement

    Hebbia

    San Francisco, CA
    2 days ago
  • $163k - $203k

     ...team, responsible for the reliability, scalability, and security...  ...This is as much of a platform engineering role as it is SRE role —...  ...guardrails or policy engines for automated systems Track record of...  ...to view our Artificial Intelligence Notice for Applicants. At Prosper... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper

    San Francisco, CA
    1 day ago
  • $50 per hour

     ...innovation in computationally intensive fields including artificial intelligence, graphics rendering and computational biology. Our values...  ...to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in... 
    Temporary work
    Work experience placement

    Epoch Biodesign

    San Francisco, CA
    4 days ago
  • $205k - $305k

     ...SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE team...  ...GitHub workflows, and infrastructure automation. Help engineering teams become...  ...trusted metrics and practical operational intelligence. Improve deployment automation,... 
    Temporary work
    Work at office
    Local area
    Worldwide
    Flexible hours

    Stellar

    San Francisco, CA
    1 day ago
  • $165k - $260k

     ...looking for a Staff/ Senior Staff/ Principal Systems Engineer in R&D as the technical authority for...  ...studies: Lead tradeoffs of performance, reliability, safety, cost, manufacturability,...  ...others, and have strong emotional intelligence. You’re an adept collaborator and able... 
    Principal
    Full time
    Contract work
    Work at office

    TryApplyNow

    South San Francisco, CA
    1 day ago
  • $194k - $267k

     ...have a passion for solving large-scale automation, testing, and tuning problems, we...  ...tools. Position Overview: The Site Reliability Engineer (SRE) will play a key role in building...  ...Local Law 144, that use artificial intelligence, machine learning, or other... 
    Permanent employment
    Work at office
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    1 day ago
  • $152k - $190k

     ...the Role We are seeking a Senior Automation Engineer to design, implement, and scale...  ...supporting technologies that make them reliable, traceable, and extensible. This is a...  ...recommend corrective actions, or support more intelligent workflow execution under appropriate... 
    Flexible hours

    Xaira Therapeutics

    South San Francisco, CA
    23 hours ago
  •  ...enterprise-ready products. About the Site Reliability Engineering Team The Site Reliability...  ...reliability risks Develop internal tools and automations that make it easier to operate and...  ...disability or age. We may use artificial intelligence (AI) tools to support parts of the... 
    Remote work

    WorkOS

    San Francisco, CA
    4 days ago
  • $170k - $200k

    Menlo Ventures in San Francisco is seeking a founding engineer to build their venture intelligence platform, shaping tools that enhance investment decisions. This unique role combines software and AI development, allowing you to work directly with experienced General Partners... 

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $170k - $200k

    The Opportunity Be the founding engineer of Menlo’s venture intelligence platform. You'll build data-driven technology and tools that help the entire...  ...mapping tools Founder intelligence systems Deal flow automation Competitive intelligence platforms Your own ideas... 

    Menlo Ventures

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Site Reliability Engineer (Intelligent Automation). Be the first to apply!