Senior Machine Learning Engineer, DevOps/SRE
$148.75k - $361kRoku
Teamwork makes the stream work. Roku is changing how the world watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers. From your first day at Roku, you'll make a valuable - and valued - contribution. We're a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines. About the team The Advertising Performance group focuses on performance for all participants in the Advertising ecosystem - Advertisers, Publishers, and Roku. The systems and solutions span multiple disciplines and technologies to perform real-time multi-objective optimization across distributed systems at large scale and with low latency. We use Machine Learning, Reinforcement Learning, AI, Control and Optimization Systems, and Auction Dynamics to solve a large set of complex problems. At the core of this is our Machine Learning, Experimentation, and Inference Platform that powers the entire landscape, which we continuously evolve over time. About the role We are seeking a talented and experienced Senior Software Engineer, MLOps/DevOps, to join the Advertising Performance team and play a critical role in supporting and scaling our Machine Learning infrastructure. The ideal candidate has a strong background in DevOps/SRE practices, cloud infrastructure management, and MLOps tooling — with a passion for building platforms that accelerate ML experimentation and deployment at internet scale. You will partner closely with ML Scientists and Engineers to streamline the end-to-end ML lifecycle across training, evaluation, deployment, and monitoring — on top of a modern, cloud-native stack running on GCP and AWS using Kubernetes, Apache Airflow, Spark, Ray, MLflow, Chronon, etc. For California Only - The estimated annual salary for this position is between $148,750 - $361,000 annually. Compensation packages are based on factors unique to each candidate, including but not limited to skill set, certifications, and specific geographical location. This role is eligible for health insurance, equity awards, life insurance, disability benefits, parental leave, wellness benefits, and paid time off. What you’ll be doing Lead the design and operation of scalable, production-grade cloud infrastructure for ML workloads across AWS and GCP, including GPU/TPU-based training and inference environments Architect and improve CI/CD systems for ML models and platform services to enable fast, reliable, and safe production releases Own and evolve low-latency infrastructure for real-time model inference, including KV store and vector databases Define and enforce observability standards for ML systems, including model performance monitoring, drift detection, capacity planning, and pipeline health metrics Participate in on-call rotation, leading incident response and root-cause analysis for critical ML training and serving infrastructure Partner with data scientists and ML engineers to improve platform usability, accelerate model iteration, and implement strong MLOps and SRE best practices Champion operational excellence across ML infrastructure through automation, resilience engineering, disaster recovery planning, and continuous improvement We’re excited if you have BS or MS in Computer Science, Engineering, or a related quantitative field 8+ years of experience in DevOps, SRE, or ML infrastructure, including 4+ years supporting large-scale ML or AI systems Strong programming skills in Python, and/or Scala, or Java for platform automation and tooling Deep experience with Kubernetes and container orchestration on GCP (GKE) and/or AWS (EKS) Expertise with NoSQL or low-latency data stores such as Aerospike or similar technologies Hands-on experience with data and orchestration technologies such as Apache Spark, Apache Flink, Apache Airflow, and Kafka Experience building and maintaining CI/CD systems using tools such as Jenkins or GitLab Runner Familiarity with feature engineering platforms such as Chronon and model lifecycle tools such as MLflow Strong infrastructure-as-code experience with Terraform or similar tooling Experience with observability platforms such as Prometheus, Grafana, and Datadog Excellent communication and cross-functional collaboration skills Experience in the Advertising domain is a plus
#LI-DH2
Our Hybrid Work Approach Roku fosters an inclusive and collaborative environment where teams work in the office Monday through Thursday. Fridays are flexible for remote work except for employees whose roles are required to be in the office five days a week or employees who are in offices with a five day in office policy. Benefits Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Employees are supported in taking time off, in accordance with local leave policies and other personal needs to support their evolving work and life needs. It's important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter. Accommodations Roku welcomes applicants of all backgrounds and provides reasonable accommodations and adjustments in accordance with applicable law. If you require reasonable accommodation at any point in the hiring process, please direct your inquiries to View email address on click.appcast.io. The Roku Culture Roku is a great place for people who want to work in a fast-paced environment where everyone is focused on the company's success rather than their own. We try to surround ourselves with people who are great at their jobs, who are easy to work with, and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. We're independent thinkers with big ideas who act boldly, move fast and accomplish extraordinary things through collaboration and trust. In short, at Roku you'll be part of a company that's changing how the world watches TV. We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers, which itself is a two-part idea. We come up with the solution, but the solution isn't real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation, one that has served us well since 2002. To learn more about Roku, our global footprint, and how we've grown, visit By providing your information, you acknowledge that you want Roku to contact you about job roles, that you have read Roku's Applicant Privacy Notice, and understand that Roku will use your information as described in that notice. If you do not wish to receive any communications from Roku regarding this role or similar roles in the future, you may unsubscribe at any time by emailing View email address on click.appcast.io.- A leading technology company is looking for a Java SRE Engineer to support large-scale cloud migrations and production systems on AWS and... ...The ideal candidate has over 10 years of experience in Cloud/DevOps/SRE, strong hands-on skills with AWS and Kubernetes, and is adept...SeniorDevops
$138.9k - $256.5k
A leading technology company is looking for a Senior DevOps Engineer in Cupertino, CA. The ideal candidate will have over 5 years in SRE and DevOps, with strong AWS management skills. Responsibilities include setting up infrastructure for big data applications, supporting...SeniorDevops$120k - $145k
Fortinet, Inc. is seeking a Staff SRE to scale FortiSASE’s cloud infrastructure. The ideal candidate will have over 7 years of SRE/DevOps experience, focusing on design and implementation of multi-cloud systems. Responsibilities include leading initiatives across teams,...SeniorDevops- ...seeking an experienced MLOps / AI Ops Engineer for a remote 12-month contract. The role... ...building and automating CI/CD pipelines for machine learning models, establishing monitoring... ...Candidates should have over 4 years of MLOps or DevOps experience with a strong focus on...SeniorDevopsRemote jobContract work
$130k - $160k
...customer-impacting incidentsCoordinate cross-functional teams (engineering, support, product) during outages and service... ...operational excellence and customer successExperience with ITIL, DevOps, and SRE principlesFamiliarity with observability tools (e.g., Datadog...SeniorDevopsFull time- A technology company is seeking an experienced SRE DevOps with AWS for a hybrid role in Sunnyvale, CA. The ideal candidate will have at least 8 years of experience focused on reliability engineering, with a strong understanding of programming in languages like Python and...Devops
- We are seeking a highly skilled and experienced Senior DevOps Engineer to join NVIDIA’s Robotics DevOps team! The ideal candidate will bring deep... ...(equivalent experience). 8+ years of experience in DevOps, SRE, or infrastructure engineering roles, including ownership of...SeniorDevopsNight shift
- ...insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the... ...experience) and 5+ years operating production distributed systems as SRE/DevOps/Platform Ops. Proven ownership of reliability for an...SeniorDevops
$126k - $204.5k
...XSOAR, and XPANSE. As a member of the Cortex DevOps team, your role involves operating and... ..., you will collaborate closely with our engineering teams to develop innovative solutions that... ...5+ years of experience as a DevOps/SRE engineer with a passion for technology and...SeniorDevops$152k - $241.5k
NVIDIA’s deep learning and HPC platforms have made a huge impact in various fields and are... ...with a team to develop leading machine learning frameworks, NVIDIA PhysicsNeMo... ...science, mathematics, computational science/engineering, or related technical field, or equivalent...Senior- NVIDIA Gruppe is seeking a skilled engineer to tackle software integration challenges for next-generation data center platforms. You will... ...a strong software engineering background and significant experience in DevOps or Systems Integration. #J-18808-Ljbffr NVIDIA GruppeSeniorDevops
$184k - $287.5k
Intelligent machines powered by Artificial Intelligence computers that can learn, reason and interact with people are no longer science fiction. GPU Deep Learning... ...relevant experience in Computer Science, Computer Engineering, or a related technical field. 2+ years of...SeniorOdd jobWork experience placement$155k - $230k
A leading cybersecurity company in Santa Clara is seeking a Senior/Staff Software Engineer to provide technology leadership in their DevOps Team. You'll design and manage resilient infrastructures and implement CI/CD pipelines while mentoring junior engineers. The ideal...SeniorDevops- Java SRE Engineer Onsite San Francisco Bay Area Infrastructure Engineer (2 Positions) We are looking for an experienced Java SRE / Platform... ...supporting skill. Required Skill: AWS, AWS EKS, Kubernetes, DevOps / SRE, Java Key Responsibilities: Lead large-scale...Devops
$248k - $391k
NVIDIA Gruppe in Santa Clara is looking for a Senior Software Engineering Manager to lead a high-impact engineering team. In this role, you will... ...candidates should have extensive experience in Infrastructure and DevOps, alongside strong leadership skills. The successful...SeniorDevops- Job Title: Principal DevOps, SRE & Application Infrastructure Architect Location: Sunnyvale, CA Duration: Contract Need 12+ Years Candidate Responsibilities Infrastructure & GitOps, Kubernetes & Containerization: Design, deploy, and optimize secure Docker/Kubernetes (...DevopsContract work
- A leading IT consulting firm in Sunnyvale, CA, is seeking a Senior Engineer - Full Stack. This role involves end-to-end development lifecycle... ...in database design. This position offers a chance to work hands-on within a DevOps model. #J-18808-Ljbffr Pyramid ConsultingSeniorDevops
- Title: SRE DevOps with AWS Location: Sunnyvale, CA (hybrid - 3x/ week onsite) Duration: 6 months Qualifications Must have Apple experience. At least 8+ years in a Reliability Engineering, DevOps or infrastructure focused role. Advanced experience with programming languages...DevopsPermanent employmentContract workLocal area
$181.1k - $318.4k
Senior Machine Learning Engineer, Video Quality Systems Cupertino, California, United States Hardware Apple’s Camera ISP Algorithm team is looking for dedicated engineers to shape the future of photography and video across all Apple products. You’ll work on powerful camera...SeniorRelocation$181.1k - $318.4k
Senior Machine Learning Research Engineer, NLP, Input Experience Cupertino, California, United States Machine Learning and AI Our team’s mission is to enhance ML powered user experiences on all Apple platforms through personalized multimodal input, composition, and understanding...SeniorRelocation$110k - $140k
Key Responsibilities Design, deploy, and optimize secure Docker/Kubernetes (AKS) environments using Helm and ArgoCD. Manage cloud Ingress, Load Balancers, and end-to-end certificate management (SSL/mTLS). Automate tasks with Shell/Python; build GitOps pipelines and manage...DevopsWork at office$161k - $222k
Senior Software Engineer - Data Protection Software Engineering (C, C++) Senior Software Engineer - Data Protection Software Engineering (C, C++... ...Kubernetes ecosystems, experience with CI/CD pipelines, DevOps practices, observability platforms, and software delivery automation...SeniorDevopsFull time$176k - $333.5k
NVIDIA Corporation in Santa Clara is seeking a Site Reliability Engineer (SRE) to design and maintain large-scale production systems focusing on reliability and observability. Candidates should have a BS in Computer Science or related field and 8+ years' experience in infrastructure...Senior- ...requirements for various functions including Position Management, Manage Seniority Ranks, Managing of Staff Demographics, Staff Assignment History... ...Experience working with PowerBI. Experience working with DevOps. Experience implementing interfaces and creating interface...SeniorDevopsWork at officeShift work
- A global technology leader is looking for an experienced SRE software engineer in Cupertino, California, to build and enhance compute infrastructure for Apple's services. The role involves developing AI-powered tooling, automating deployment, and ensuring that services...Senior
- We are seeking a Senior Backend Developer with strong expertise in Java, Spring Boot, and... ...backend solutions. Participate in DevOps practices for automation and deployment.... ...’s degree in Computer Science, Software Engineering, or a related field (or equivalent work...SeniorDevopsWork experience placement
- About the Role CrowdStrike's engineering organization depends on shared... ...grows. We're hiring at all seniority levels — scope and compensation... ...- Work with Infrastructure, SRE, and Data Services on shared... ...What You'll Need 8+ years in DevOps, SRE, or platform engineering...DevopsWork at officeLocal area2 days per week
$152k - $241.5k
We are now looking for a Senior Machine Learning Applications and Compiler Engineer! NVIDIA is seeking engineers to develop algorithms and optimizations for our LPX inference and compiler stack. You will work at the intersection of large-scale systems, compilers, and deep...Senior$168k - $258.75k
The Deep Learning Software Team is seeking a Senior Technical Program Manager to lead software... ...AI researchers and engineers to create the future of computing... ...in computer science, machine learning, deep learning,... ...support methodology and DevOps. Prior experience in the...SeniorDevops$200k - $322k
...AI infrastructure. We’re looking for a Senior Technical Program Manager to drive storage... ...is a high‑impact role interfacing with engineering, product, operations, finance, and our global... ...development, Agile methodologies, and DevOps best practices. Familiarity with Cloud...SeniorDevops
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Machine Learning Engineer, DevOps/SRE. Be the first to apply!
- machine learning engineer San Jose, CA
- senior data management analyst San Jose, CA
- senior app developer San Jose, CA
- senior game producer San Jose, CA
- senior manager quality engineering San Jose, CA
- senior compensation manager San Jose, CA
- senior director engineering San Jose, CA
- senior accounts receivable San Jose, CA
- senior vice president of operations San Jose, CA
- sr industrial engineer San Jose, CA
