Software Engineer - Data Infra Reliability
$170k - $360kLuma AI
Software Engineer - Data Infra Reliability
As our models scale to "omni" capabilities, our data infrastructure must be unbreakable. We are looking for a Data Reliability Engineer who brings a Site Reliability Engineering (SRE) mindset to the world of massive-scale data. You will be responsible for the resilience, automation, and scalability of the petabyte-scale pipelines that feed our research. This is not just about keeping the lights on; it's about treating infrastructure as code and building self-healing data systems that allow our researchers to train on massive datasets without interruption. Whether you are a junior engineer with a passion for automation or a seasoned SRE veteran, you will play a critical role in hardening the backbone of Luma's intelligence.
Automate Everything: Apply Infrastructure-as-Code (IaC) principles using Terraform to provision, manage, and scale our data infrastructure.
Harden Data Pipelines: Build reliability and fault tolerance into our core data ingestion and processing workflows, ensuring high availability for research jobs.
Scale Kubernetes & Ray: Operate and optimize large-scale Kubernetes clusters and Ray deployments to handle bursty, high-throughput workloads.
Define Reliability: Establish Service Level Objectives (SLOs) and observability standards (Prometheus/Grafana) for our data platforms.
Debug & Heal: Serve as the first line of defense for complex infrastructure failures, diagnosing root causes in distributed storage and compute systems.
Deep SRE/DevOps proficiency: You live and breathe Linux, networking, and automation.
Infrastructure-as-Code Native: You have extensive experience with Terraform, Ansible, or similar tools to manage complex cloud environments (AWS/GCP).
Kubernetes Expert: You have managed Kubernetes in production and understand its internals, not just how to deploy containers.
Python Proficiency: You can write high-quality Python code for automation, tooling, and infrastructure management.
Data-Minded: You understand the specific challenges of stateful data systems and high-throughput storage (S3/Object Store).
Experience managing GPU clusters or AI/ML workloads.
Background in both Software Engineering and Operations (DevOps).
Experience with high-performance networking (InfiniBand/RDMA).
The base pay range for this role is $170,000 – $360,000 per year.
Luma's mission is to build unified general intelligence that can generate, understand, and operate in the physical world. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.
$170k - $216k
...U.S. states. The Planner/Perception Reliability team builds out architectures, tools, and... ...reliability and is accountable for onboard software health while ensuring high development... ...you will report to a Staff Software Engineer / Tech Lead Manager. You will: Architect...SuggestedFull timeImmediate startRemote work$213k - $263k
...ML workflows manageable and reliable. This team also partners closely... ...and contribute to Waymo's data infrastructure platform to... ...models via data store and data infra ecosystem. Work closely... ...experience in the field of software engineering ~ Experience programming in...SuggestedFull timeRemote work$238k - $302k
...Senior Software Engineer, ML Evaluation Infra and Efficiency Waymo is an autonomous driving technology company... ...that can scale across compute, data, and environments to improve model... ...computations, ensuring scalability and reliability across distributed environments....SuggestedFull timeRemote work- ...technology delivery partner is hiring an AI Quality Infrastructure Engineer in Mountain View, California. This full-time role involves... ...frameworks for large-scale AI operations, with a focus on reliability and system excellence. Candidates should possess a degree in Computer...SuggestedFull timeH1bVisa sponsorship
- ...Engineering Role at Latica At Latica, our goal is to unlock the value of data to transform patient care. We're building a secure data... ...tradeoffs between performance, reliability, maintainability, and cost,... ...+ years building production software systems; care deeply about...Suggested
$200k - $287.5k
...observability platform built on the Snowflake AI Data Cloud and engineered for scale. We ingest and store logs,... ...of telemetry daily while maintaining reliability at enterprise scale. As part of... .... We are hiring a Senior Software Engineer for the Observe Data Management...Flexible hours$180k - $220k
...Software Engineer, Data Los Angeles, Palo Alto, San Francisco About HeyGen At HeyGen, our mission is to make visual storytelling accessible... ..., enhancing storage and computation efficiency. Data Reliability & Observability: Implement data quality checks, data...Work experience placement$196k - $230k
...are high, and so are the rewards. The Data Engineering team builds and maintains the... ...decision-making across Robinhood. We design reliable, scalable data systems that support product... ...end-to-end data pipelines * Hands-on software engineering experience, with the ability...Work at officeFlexible hoursShift work3 days per week- ...humanoid robots — from high-performance, software-defined hardware to the foundational... .... We're looking for a Senior ML & Data Infrastructure Engineer to own and scale the systems that... ...clips with strong guarantees around reliability, latency, and cost efficiency Design...Immediate start
$144k - $216k
...new listings every day, we're just getting started. As a Software Engineer, Data, you will be developing and enhancing our marketplace... ...Write comprehensive data quality tests to ensure data reliability Work closely with data teams to implement complex data...Work at officeWork from homeFlexible hours2 days per week3 days per week$162.8k - $203.5k
...Rivian Senior Data Engineer Rivian is on a mission to keep the world adventurous forever... ...contribute to the implementation of scalable, reliable, and secure data pipelines, remaining... ...of experience in data engineering, software engineering, or distributed systems. Proven...Full timeContract workTemporary workPart timeLocal areaShift work- ...Software Engineer II-1 The Business Experimentation and Optimization (BE&O) teams within Mastercard... ...users around the world to make data-driven decisions through advanced analytics... ...skills while helping the team deliver reliable, high-quality software. Our teams...Immediate start
$165.2k - $223.6k
...Spark, Python and other runtime engines. We are scaling the backend... ...our team: - Be part of big data revolution in cloud - Be... ...industry best-practices to produce reliable, fault-torrent and dependable... ...non-internship professional software development experience - 2+...InternshipLocal areaFlexible hours- ...innovation. We lead in intelligent data infrastructure—delivering... ...meet performance, scale, reliability, and enterprise-readiness requirements... ...property. Coach and mentor engineers across the team (including... ...of industry experience in software development. 5 years of experience...Work at officeLocal area
$168.93k - $192.5k
...more, visit Role Overview ID.me is seeking a Software Development Engineer III to join the Data Acquisition & Normalization team. This team is... ...and normalization services that ensure ID.me delivers reliable, real-time validation of identity attributes at internet...Full timeTemporary workWork at officeRemote workFlexible hours$281k - $356k
...Senior Staff Software Engineer, Perception Data Waymo is an autonomous driving technology company with... ...building the automated "flywheels" and "infra-as-product" solutions that transform... ...problems, ensuring our models can reliably understand the long-tail of rare events...Full timeRemote work$160.36k - $240.54k
...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology company on a mission to make... ...of autonomous driving systems by creating a scalable and reliable data infrastructure. This infrastructure is designed to produce...Work experience placement$206.5k - $258.1k
...Summary The Autonomy org at Rivian is seeking a Staff Software Engineer, Data Ops to join the Data team who can provide expertise... ...automated workflows. Build and optimize highly reliable, scalable, and distributed infra using microservice architecture. Collaborate...Full timeContract workTemporary workPart timeLocal areaShift work$180k - $197k
...Software Engineer, Data Infrastructure Mountain View, California Intrinsic is an AI robotics group at Google aiming to reimagine the potential... ...Forward Design, develop, and maintain scalable and reliable data pipelines for collecting, processing, and storing...Full timeLocal area$165k - $242k
...Senior Software Engineer, Data Center Infrastructure Tooling CoreWeave is The Essential Cloud for... ...engineers, and operations, and other infra teams the ability to plan, visualize,... ..., CI/CD pipelines, observability, and reliability practices. What We're Looking For...Temporary workFlexible hours- ...Software Engineer - Data Infrastructure Services Sunnyvale, CA / Bellevue, WA CoreWeave is The... ...infrastructures for CoreWeave. The data infra includes but is not limited to... ...Improve the performance, security, reliability, and scalability of our data platforms...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hours
$275.8k - $340.5k
...About the team: The AV ML Infra team at GM builds ML infrastructure... ...as Embodied AI, Simulation, Data Science, and more. We enable... ...enhance the productivity of ML engineers, and drive the adoption of... ...simulation workloads and managing reliable ML inference pipelines. ML...Local areaRemote workWork from homeRelocationRelocation packageFlexible hours$147k - $211k
PMax and Automation Infra Software Engineer Google, Mountain View, CA, USA Bachelor’s degree or equivalent... ...-scale system design, networking and data storage, security, artificial... ...passionate about building highly scalable, reliable, and intelligent systems using...Full time$240k - $280k
...highly motivated, and focused on engineering excellence. This organization... ...discovery. High-quality data is fundamental to every stage... ...work at the intersection of software, data, infrastructure, and machine... ...models train effectively and reliably. As a Software Engineer on...Temporary work$180k - $225k
...hiring a Machine Learning Infrastructure Engineer to help build the backbone that trains,... ...improvements end-to-end-partnering with product and data teams, reducing latency and cost, and... ...: making training faster and more reliable, improving model serving performance, and...Full timeLocal areaWork from home$180k - $250k
...own large models on their own data. The current industry... ...an experienced Data Platform Engineer to join as a member of our core... ...while ensuring scalability, reliability, and security. Architect, build... ...have experience maintaining the infra that supports these. Proficiency...Work at officeVisa sponsorshipRelocation package$153k - $222k
...the role We are looking for infrastructure engineers with expertise in scaling open-source data infrastructure to join the Data & ML infra group. This role will work across the... ...hooks. Develop and deploy high-quality software using modern tooling and frameworks, especially...Full timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift$185k - $215k
...teammate to join us on this exciting journey. We are building the foundational data platform that powers reliable, scalable data across Mudflap's systems. As a Senior Software Engineer, Data Platforms , you'll play a critical role in designing and operating the...Remote work$166k - $225k
...are passionate about enabling data teams to solve the world's... ...improve their business. Founded by engineers — and customer obsessed — we... ...SQL query engines. As a software engineer on the Runtime team... ...Data Plane Storage : Provide reliable and high performance services...Local areaWorldwide$120k - $300k
...What to Expect As a Tool and Infrastructure Software Engineer of the Reliability and Test team, you will develop and transition software stack, on... ...middleware to communicate with PXI test hardware and read back data Build robust and flexible Python tools to automate...Hourly payFull timeTemporary workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer - Data Infra Reliability. Be the first to apply!
- graduate software developer Palo Alto, CA
- rust software engineer Palo Alto, CA
- senior software design engineer Palo Alto, CA
- software engineer amazon Palo Alto, CA
- software developer positions Palo Alto, CA
- software engineer full time Palo Alto, CA
- new graduate software engineer Palo Alto, CA
- software engineer Palo Alto, CA
- software engineer intern Palo Alto, CA
- agile software developer Palo Alto, CA

