Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Site Reliability Engineer (Intelligent Automation)

$162.6k - $302k

F. Hoffmann-La Roche AG

Role Overview Within the CS CoE, the Solutions team develops interconnected computational and data ecosystems. As a Site Reliability Engineer in the Solutions Engineering capability, you will design, implement, and maintain scalable, resilient, and supportable cloud‑based platform solutions that enable research applications, machine‑learning (ML) workloads, and high‑performance computing (HPC) environments. Responsibilities Infrastructure as Code (IaC) Design and Implementation Architect and implement IaC solutions using Terraform, Spacelift, or CloudFormation to provision and manage cloud infrastructure for ML and HPC workloads. Automate the deployment of scalable ML pipelines, HPC clusters, and supporting services across global regions. Global Availability and Resiliency Build resilient, highly available solutions for ML and HPC workloads leveraging cloud‑native practices such as auto‑scaling, load balancing, and failover mechanisms. Implement disaster recovery (DR) and business continuity plans for critical systems to ensure global operational integrity. Conduct chaos engineering experiments to validate system reliability and identify potential weaknesses. Automation and Observability Develop automation scripts and workflows to streamline infrastructure management, deployment, and scaling for ML and HPC use cases. Implement robust monitoring, logging, and alerting frameworks using Prometheus, Grafana, Datadog, or ELK Stack to provide deep insights into system health and performance. Apply AIOps incident management processes and tooling. Collaboration and Leadership Provide technical leadership to a team of engineers, fostering a culture of collaboration, innovation, and continuous improvement. Partner with cross‑functional teams to align infrastructure solutions with business objectives and ML/HPC workload requirements. Mentor and train junior engineers in IaC practices, ML, and HPC infrastructure design. Cost Optimization and Governance Monitor and optimize cloud infrastructure usage and costs for ML and HPC workloads. Ensure compliance with organizational security, governance, and regulatory policies in all IaC and cloud implementations. Qualifications Bachelor’s or Master’s degree in Computer Science or a related technical field, or equivalent experience. 7+ years of experience in software engineering Site Reliability Engineering (SRE). Proven expertise in supporting and deploying IaC solutions in cloud environments (AWS, Azure, or GCP) for ML and HPC workloads. Background in MLOps pipelines, including model versioning, CI/CD for ML, and feature store integration, with experience using managed ML services such as AWS SageMaker, Google AI Platform, or Azure ML. Deep understanding of cloud‑native architectures, including autoscaling, serverless, and multi‑region deployments. Technical Skills Advanced proficiency with IaC tools: Terraform, Pulumi, or CloudFormation. Expertise in scripting and automation: Python, Bash, or Go. Strong understanding of GPU‑accelerated computing (e.g., NVIDIA CUDA, TensorFlow) and HPC workload scaling. Knowledge of distributed systems, storage solutions, and data pipelines. Familiarity with monitoring and observability tools: Prometheus, Grafana, Datadog, or similar. Soft Skills Strong problem‑solving skills with a methodical approach to troubleshooting. Excellent communication, leadership, and mentoring abilities. Ability to work collaboratively across teams in a fast‑paced, dynamic environment. Preferred Qualifications Certifications in cloud platforms (e.g., AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Azure Solutions Architect). Experience with distributed ML frameworks and data engineering pipelines such as Horovod, TensorFlow Distributed, Apache Airflow, and Apache Spark. Experience with compliance frameworks (e.g., GDPR, SOC2, ISO27001). Location & Work Onsite presence in South San Francisco is expected for at least 3 days a week. Relocation benefits are not available for this job posting. Salary Based on the primary location of California: $162,600–$302,000. Actual pay will be determined by experience, qualifications, geographic location, and other job‑related factors permitted by law. Benefits This position qualifies for benefits detailed at the link provided below. EEO Statement Genentech is an equal opportunity employer. We employ, promote, and otherwise treat all employees and applicants on the basis of merit, qualifications, and competence. The company's policy prohibits unlawful discrimination, including but not limited to discrimination on the basis of protected veteran status, disability status, and consistent with all federal, state, or local laws. If you have a disability and need an accommodation in the online application process, please contact us by completing the Accommodations for Applicants form. #J-18808-Ljbffr F. Hoffmann-La Roche AG

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Principal Site Reliability Engineer (Intelligent Automation) in South San Francisco, CA vacancy
  • $162.6k - $302k

     ...transformative power of data and Artificial Intelligence (AI) to assist our scientists in...  ...and data ecosystems. As a Site Reliability Engineer in the Solutions Engineering capability...  ...workloads and HPC environments through automation, efficient resource management, and... 
    Principal
    Local area
    Worldwide
    Relocation package
    3 days per week

    Genentech

    South San Francisco, CA
    5 days ago
  • $162.6k - $302k

     ...power of data and Artificial Intelligence (AI) to assist our...  ...accelerate decision-making. As the Principal Engineer/Tech Lead for the...  ...within the Engineering - Lab Automation capability, you will be a key...  ...resulting systems are scalable and reliable. Your work will be vital in... 
    Principal
    Local area
    Worldwide
    Relocation package

    Disability Solutions

    South San Francisco, CA
    1 day ago
  • $162.6k - $302k

     ...power of data and Artificial Intelligence (AI) to assist our...  ...and performance, and mentor engineers across the capability. Lead...  ...for delivering scalable and reliable instrument data integration...  ...data systems. Qualifications - Principal Engineer B.S. in Computer Science... 
    Principal
    Local area
    Worldwide
    Relocation package

    F. Hoffmann-La Roche AG

    South San Francisco, CA
    5 days ago
  • $146.4k - $235.38k

     ...simplify people's lives. With intelligent agreement management,...  ...has a dedicated Intelligent Automation Center of Excellence (CoE) that...  ...Senior Intelligent Automation Engineer, you will be a key contributor...  ...role reporting to the Principal Automation Engineering Manager... 
    Suggested
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    DocuSign

    San Francisco, CA
    2 days ago
  •  ...grows. A key element is evolving Business Intelligence across QuickBooks and Intuit Enterprise...  ...profitability. We seek a visionary Principal Product Designer with excellent craft skills...  ...‑functionally with product managers, engineers, data analysts, and marketers to... 
    Principal

    Intuit

    San Francisco, CA
    2 days ago
  •  ...ML Engineer – AI-Powered Automation & Workflow Intelligence A pioneering tech startup at the intersection of AI and intelligent automation is transforming how enterprises scale complex operations. With backing from global clients in high-value sectors and an experienced... 

    Blue Signal Search

    San Francisco, CA
    4 days ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer...  ...engineering to build and maintain automated, resilient, and observable systems that...  ...have a high degree of emotional intelligence and the ability to work as a team towards... 
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    San Francisco, CA
    3 days ago
  • $117.2k - $176.7k

     ...efforts. Job Category: Software Engineering Job Details About...  ...of Salesforce. Our Threat Intelligence team focuses on defending our...  ...capacity of a Threat Intelligence Automation Developer, you operate at...  ...AI to operate accurately and reliably. Minimum Requirements A minimum... 
    Remote work

    Centaur Labs

    San Francisco, CA
    5 days ago
  • $163k - $203k

     ...team, responsible for the reliability, scalability, and security...  ...This is as much of a platform engineering role as it is SRE role —...  ...guardrails or policy engines for automated systems Nice to have: 2...  .... We may use artificial intelligence (AI) tools to support parts... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week

    Prosper.com

    San Francisco, CA
    3 days ago
  • $200k

     ...About the Role: AngelList is seeking Senior Software Engineers to join our Intelligence team. You will design, build, and operate services that...  ..., and retrieval, and able to turn prototypes into reliable, instrumented services. Experience working with LLM APIs... 
    Work at office
    2 days per week

    AngelList

    San Francisco, CA
    2 days ago
  • $190k - $230k

     ...Senior Software Engineer, Intelligent Messaging Attentive® is the AI marketing platform for 1:1 personalization redefining the way brands...  ...our customer-facing products, ensuring high performance and reliability. Collaborate with cross-functional teams to build... 
    Full time

    Softbank Investment Advisers

    San Francisco, CA
    3 days ago
  • $132k - $222.2k

     ...scientific agentic AI, lab automation, and unified data platforms...  ...laboratory science. You will engineer the connective tissue between...  ...engineers to deploy intelligent systems that accelerate molecule...  ...transition prototypes into reliable lab operations Deploy and maintain... 
    Principal
    Full time
    Flexible hours

    Eli Lilly and Company

    San Francisco, CA
    1 day ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team...  ...blameless learning, and relentless automation. We are seeking an experienced and...  ...potential and the pitfalls of integrating intelligent systems into critical operations.... 
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    3 days ago
  • $163k - $203k

     ...team, responsible for the reliability, scalability, and security...  ...This is as much a platform engineering role as it is an SRE role—...  ...guardrails or policy engines for automated systems. Track record of...  ...rights under the Artificial Intelligence Notice for Applicants. #J-... 
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    2 days per week

    GoTo Meeting

    San Francisco, CA
    5 days ago
  • $160k - $300k

     ...manage over $30 trillion in assets globally. We deliver the intelligence that gives finance professionals a definitive edge. Our AI...  ...performance, alpha, and market leadership. The Role Platform engineering at Hebbia is about excellent, scalable enablement. You are... 
    Work experience placement

    Hebbia

    San Francisco, CA
    1 day ago
  • $164.2k - $225.7k

     ...business impact. Founded by engineers and driven by customer...  ...Engineer for Customer Experience Intelligence, you’ll shape the future of...  ...architecture for Databricks’ Support Automation and Tooling ecosystem...  ...quality, safety, and reliability standards Design agentic workflows... 
    Local area
    Worldwide

    Databricks Inc.

    San Francisco, CA
    4 days ago
  •  ...seeking a highly skilled Software Engineer with deep expertise in AI-...  ...building enterprise-grade intelligent systems powered by Large...  ...architectures for production reliability Implement agent skills,...  ...Validation & Reliability Build automated evaluation frameworks for... 

    ClifyX

    San Francisco, CA
    5 days ago
  •  ...Senior AI Engineer – Health Intelligence On-site - San Francisco, California Our mission at Oura is to empower every person to own their inner...  ...Design and implement services and workflows that meet reliability and performance expectations. Take ownership of operational... 
    Temporary work
    Work at office
    Local area
    Remote work
    Flexible hours

    Oura

    San Francisco, CA
    1 day ago
  • Arbor is looking for a talented individual to join our team in San Francisco. In this hybrid role, you will help us develop an intelligence system that dynamically interacts with a live marketplace, pricing in real time. With 2-3 years of experience shipping LLM products... 

    Arbor

    San Francisco, CA
    4 days ago
  • $78 per hour

     ...space is seeking an Automotive / Fleet Operations Business Intelligence Engineer to help drive data-informed operational decision-making....  ...and Data Engineering teams to enhance data availability and reliability What We're Looking For Required... 
    Contract work

    Comrise

    Foster, CA
    2 days ago
  • $75 per hour

     ...Our client, a leader in autonomous transportation technology, is seeking a Business Intelligence Engineer to join their team. As a Business Intelligence Engineer, you will be part of the Fleet Operations department supporting operational decision-making and data analysis... 
    Weekly pay
    Temporary work
    Flexible hours

    Manpower Engineering

    San Mateo, CA
    3 days ago
  • $75 per hour

     ...Fleet Operations Business Intelligence Engineer Our client, a leader in autonomous transportation technology, is seeking a Business Intelligence Engineer to join their team. As a Business Intelligence Engineer, you will be part of the Fleet Operations department supporting... 
    Weekly pay
    Temporary work
    Flexible hours

    Manpower

    San Mateo, CA
    3 days ago
  • $145k - $175k

     ...Avive Solutions, Inc. ( is a growth stage Automated External Defibrillator (AED) company...  ...Required Skills BS degree in Electrical Engineering, Computer Engineering, Physics, or...  ...office We may use artificial intelligence (AI) tools to support parts of the... 
    Work at office
    Local area

    Avive

    Brisbane, CA
    2 days ago
  •  ...teams for Google Workspace. What You'll Do Build the Intelligence layer at Sierra. You'll work on systems that analyze millions...  ...being on the frontier of AI products. ~ Strong software engineering fundamentals and experience building production systems. ~... 
    Full time
    Flexible hours

    Sierra

    San Francisco, CA
    3 days ago
  •  ...significantly outperforms individual engineers. We combine language models...  ...are seeking an experienced Site Reliability Engineer to join our...  ...foundational platforms and automation that enable our engineering...  ...eliminating toil through intelligent tooling and processes... 

    CodeRabbit

    San Francisco, CA
    5 days ago
  • $200k - $230k

     ...Solutions, Inc. ( is a growth stage Automated External Defibrillator (AED) company...  ...About the Role: We are seeking a Principal Firmware Engineer for a critical role within Avive's R...  ...- $230,000 We may use artificial intelligence (AI) tools to support parts of the... 
    Principal
    Work experience placement
    Local area

    Avive

    Brisbane, CA
    2 days ago
  • $260k - $340k

     ...mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI...  ...each other, come build with us at Crusoe. Principal Systems Software Engineer San Francisco, Sunnyvale (On-site) About This Role: As the Principal Systems... 
    Principal
    Full time
    Temporary work

    Crusoe

    San Francisco, CA
    3 days ago
  • $170.2k - $316k

     ...power of data and Artificial Intelligence (AI) to assist our...  ...intersection of computation, engineering and science with ambition to...  ...Therapeutics Discovery, this Senior Principal Product Leader will play a key...  ..., medicinal chemistry, automation, molecule profiling, and AI/... 
    Principal
    Local area
    Worldwide
    Relocation package

    Genentech

    South San Francisco, CA
    7 days ago
  •  ...leading biotechnology firm in South San Francisco is seeking a Site Reliability Engineer to architect and implement Infrastructure as Code (IaC)...  ...AWS, Azure, or GCP. This role involves developing automation for scalable ML pipelines, technical leadership, and mentoring... 
    3 days per week

    Genentech

    South San Francisco, CA
    5 days ago
  • $260k - $275k

     ...Senior Principal Software Engineer Saviynt is an identity platform built to power and protect the...  ...unparalleled visibility, control and intelligence to better defend against threats...  ...processes, tooling, and operational reliability. Collaborate with internal teams... 
    Principal

    Saviynt

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Site Reliability Engineer (Intelligent Automation). Be the first to apply!