Software Engineer - Data Infra Reliability
$170k - $360kLuma AI
Software Engineer - Data Infra Reliability
As our models scale to "omni" capabilities, our data infrastructure must be unbreakable. We are looking for a Data Reliability Engineer who brings a Site Reliability Engineering (SRE) mindset to the world of massive-scale data. You will be responsible for the resilience, automation, and scalability of the petabyte-scale pipelines that feed our research. This is not just about keeping the lights on; it's about treating infrastructure as code and building self-healing data systems that allow our researchers to train on massive datasets without interruption. Whether you are a junior engineer with a passion for automation or a seasoned SRE veteran, you will play a critical role in hardening the backbone of Luma's intelligence.
What You'll Do
- Automate Everything: Apply Infrastructure-as-Code (IaC) principles using Terraform to provision, manage, and scale our data infrastructure.
- Harden Data Pipelines: Build reliability and fault tolerance into our core data ingestion and processing workflows, ensuring high availability for research jobs.
- Scale Kubernetes & Ray: Operate and optimize large-scale Kubernetes clusters and Ray deployments to handle bursty, high-throughput workloads.
- Define Reliability: Establish Service Level Objectives (SLOs) and observability standards (Prometheus/Grafana) for our data platforms.
- Debug & Heal: Serve as the first line of defense for complex infrastructure failures, diagnosing root causes in distributed storage and compute systems.
Who You Are
- Deep SRE/DevOps proficiency: You live and breathe Linux, networking, and automation.
- Infrastructure-as-Code Native: You have extensive experience with Terraform, Ansible, or similar tools to manage complex cloud environments (AWS/GCP).
- Kubernetes Expert: You have managed Kubernetes in production and understand its internals, not just how to deploy containers.
- Python Proficiency: You can write high-quality Python code for automation, tooling, and infrastructure management.
- Data-Minded: You understand the specific challenges of stateful data systems and high-throughput storage (S3/Object Store).
What Sets You Apart (Bonus Points)
- Experience managing GPU clusters or AI/ML workloads.
- Background in both Software Engineering and Operations (DevOps).
- Experience with high-performance networking (InfiniBand/RDMA).
Compensation
The base pay range for this role is $170,000 – $360,000 per year.
Luma's mission is to build unified general intelligence that can generate, understand, and operate in the physical world. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.
$170k - $216k
....S. states. The Planner/Perception Reliability team builds out architectures, tools, and... ...and is accountable for onboard software health while ensuring high development velocity... ...role you will report to a Staff Software Engineer / Tech Lead Manager. You will:...SuggestedFull timeImmediate startRemote work$204k - $259k
...hybrid role, you will report to the Senior Engineering Manager of Semantics. You Will:... ...and implement new features in the VLM data infra and validate the changes for the model... ...professional experience in the field of software engineering ~ Proficiency in C++ ~ Experience...SuggestedFull timeWork at officeRemote work$213k - $263k
...ML workflows manageable and reliable. This team also partners closely... ...and contribute to Waymo's data infrastructure platform to... ...models via data store and data infra ecosystem. Work closely... ...experience in the field of software engineering ~ Experience programming in...SuggestedFull timeRemote work$165.2k - $223.6k
...to crunch through exabytes of data in the cloud per day to make... ...are looking for the innovative engineers to help shape the future of... ...non-internship professional software development experience ~2+... ...architecture (design patterns, reliability and scaling) of new and existing...SuggestedInternshipLocal areaFlexible hours$144k - $216k
...new listings every day, we're just getting started. As a Software Engineer, Data, you will be developing and enhancing our marketplace... ...Write comprehensive data quality tests to ensure data reliability Work closely with data teams to implement complex data...SuggestedWork at officeWork from homeFlexible hours2 days per week3 days per week$162.8k - $203.5k
...Rivian Senior Data Engineer Rivian is on a mission to keep the world adventurous forever... ...contribute to the implementation of scalable, reliable, and secure data pipelines, remaining... ...of experience in data engineering, software engineering, or distributed systems. Proven...Full timeContract workTemporary workPart timeLocal areaShift work$165.2k - $223.6k
...Come build the future of data streaming with the Amazon Data Firehose (ADF) team... .... We are looking for a Software Development Engineer for the Amazon Data Firehose Team. The... ...design or architecture (design patterns, reliability and scaling) of new and existing systems...InternshipLocal areaFlexible hours$168.93k - $192.5k
...more, visit Role Overview ID.me is seeking a Software Development Engineer III to join the Data Acquisition & Normalization team. This team is... ...and normalization services that ensure ID.me delivers reliable, real-time validation of identity attributes at internet...Full timeTemporary workWork at officeRemote workFlexible hours$180k - $220k
...Software Engineer, Data Los Angeles, Palo Alto, San Francisco About HeyGen At HeyGen, our mission is to make visual storytelling accessible... ..., enhancing storage and computation efficiency. Data Reliability & Observability: Implement data quality checks, data...Work experience placement$196k - $230k
...are high, and so are the rewards. The Data Engineering team builds and maintains the... ...decision-making across Robinhood. We design reliable, scalable data systems that support product... ...end-to-end data pipelines * Hands-on software engineering experience, with the ability...Work at officeFlexible hoursShift work3 days per week- ...Engineering Role at Latica At Latica, our goal is to unlock the value of data to transform patient care. We're building a secure data... ...tradeoffs between performance, reliability, maintainability, and cost,... ...+ years building production software systems; care deeply about...
$200k - $287.5k
...observability platform built on the Snowflake AI Data Cloud and engineered for scale. We ingest and store logs,... ...of telemetry daily while maintaining reliability at enterprise scale. As part of... .... We are hiring a Senior Software Engineer for the Observe Data Management...Flexible hours- ...Software Engineer - Data Infrastructure Services Sunnyvale, CA / Bellevue, WA CoreWeave is The... ...infrastructures for CoreWeave. The data infra includes but is not limited to... ...Improve the performance, security, reliability, and scalability of our data platforms...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hours
$160.36k - $240.54k
...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology company on a mission to make... ...of autonomous driving systems by creating a scalable and reliable data infrastructure. This infrastructure is designed to produce...Work experience placement$206.5k - $258.1k
...Summary The Autonomy org at Rivian is seeking a Staff Software Engineer, Data Ops to join the Data team who can provide expertise... ...automated workflows. Build and optimize highly reliable, scalable, and distributed infra using microservice architecture. Collaborate...Full timeContract workTemporary workPart timeLocal areaShift work$165k - $242k
...Senior Software Engineer, Data Center Infrastructure Tooling CoreWeave is The Essential Cloud for... ...engineers, and operations, and other infra teams the ability to plan, visualize,... ..., CI/CD pipelines, observability, and reliability practices. What We're Looking For...Temporary workFlexible hours$180k - $197k
...Software Engineer, Data Infrastructure Mountain View, California Intrinsic is an AI robotics group at Google aiming to reimagine the potential... ...Forward Design, develop, and maintain scalable and reliable data pipelines for collecting, processing, and storing...Full timeLocal area$281k - $356k
.... The Perception Data team at Waymo is responsible... ..."flywheels" and "infra-as-product" solutions... ...ensuring our models can reliably understand the long-... ...report to a Director of Engineering You will:... ...~10+ years of software engineering experience...Full timeRemote work$180k - $225k
...hiring a Machine Learning Infrastructure Engineer to help build the backbone that trains,... ...improvements end-to-end-partnering with product and data teams, reducing latency and cost, and... ...: making training faster and more reliable, improving model serving performance, and...Full timeLocal areaWork from home$175k - $215k
...tens of billions in simulation across 15+ U.S. states. Software Engineering builds the brains of Waymo's fully autonomous driving technology... ...vehicles.Experience designing and implementing robust, reliable APIs for core geospatial or logistics services.Experience with...Full timeRemote work$238k - $302k
...core to our autonomous driving software. We help our partners by... ...driving. We are looking for engineers with ML system expertise to help... ...can scale across compute, data, and environments to improve... ...computations, ensuring scalability and reliability across distributed...Full timeRemote work$185k - $215k
...teammate to join us on this exciting journey. We are building the foundational data platform that powers reliable, scalable data across Mudflap's systems. As a Senior Software Engineer, Data Platforms , you'll play a critical role in designing and operating the...Remote work$120k - $300k
...What to Expect As a Tool and Infrastructure Software Engineer of the Reliability and Test team, you will develop and transition software stack, on... ...middleware to communicate with PXI test hardware and read back data Build robust and flexible Python tools to automate...Hourly payFull timeTemporary workFlexible hours$180k - $250k
...own large models on their own data. The current industry... ...an experienced Data Platform Engineer to join as a member of our core... ...while ensuring scalability, reliability, and security Architect, build... ...have experience maintaining the infra that supports these....Full timeWork at officeVisa sponsorshipRelocation package$155k - $185k
...Opportunity We are looking for an experienced Software Engineer with a passion for building robust and scalable data infrastructure to join our Data Platform team.... ...information into actionable intelligence — efficiently, reliably, and at scale. If you're excited about building...Permanent employmentFull time$175k - $215k
...driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states. Waymo's software reliability engineers (SRE) are responsible for the stable operation of Waymo's fully autonomous systems and supporting infrastructure. As an...Full timeRemote work$180k
...small, highly motivated, and focused on engineering excellence. This organization is for individuals... ...research and systems teams to deliver reliable, ultra-scalable infrastructure that... ...xAI is an equal opportunity employer. For details on data processing, view our...Full timeTemporary work- ...Software Engineer - Data Center Emulator Location: On-site, Santa Clara, CA Overview: Seeking an experienced contract software engineer... ...precise, unambiguous specifications that can be implemented reliably and verified against clear acceptance criteria. ~ Strong...Contract workFor contractorsImmediate start
$181.1k - $318.4k
...Senior Software Engineer, Control/Data Plane Apple is where individual imaginations gather together, committing to the values that lead to great... ...infrastructure to build scalable, highly available, and reliable services that operate seamlessly. We actively listen to diverse...Relocation$147.4k - $272.1k
...Software Development Engineer - Data The Apple Services Engineering team is one of the most exciting examples of Apple's long-held passion for... ...for high-volume commerce data Ensure data quality, reliability, and observability (metrics, monitoring, validation)...RelocationShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer - Data Infra Reliability. Be the first to apply!
- software engineer full time Palo Alto, CA
- startup software engineer Palo Alto, CA
- research software engineer Palo Alto, CA
- rust software engineer Palo Alto, CA
- work from home software developer Palo Alto, CA
- software developer Palo Alto, CA
- software development engineer aws Palo Alto, CA
- ngo software engineer Palo Alto, CA
- software engineer staff Palo Alto, CA
- part time software developer Palo Alto, CA


