Software Engineer, ML & Data Infra
$180kXai
About xAI
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
About the Role
The ML and Data Infrastructure team is responsible for building the foundational infrastructure that powers frontier AI models and truth-seeking agents—from petabyte-scale data acquisition and multimodal crawling, to web-scale search/retrieval systems, reliable high-throughput inference serving, low-level GPU/kernel optimizations, compiler/runtime innovations, and high-speed interconnect fabrics for massive clusters. In this role, you will collaborate across pre-training, multimodal, reasoning, and product teams in a fast-paced, meritocratic environment where you will tackle ambiguous, high-stakes problems with first-principles thinking and rigorous execution.
Responsibilities
- Design, build, and operate petabyte-to-exabyte scale distributed systems for data acquisition, web crawling, preprocessing, filtering/classification, and multimodal pipelines (CPU/GPU workloads).
- Architect high-performance search/retrieval engines (vector/hybrid/semantic) at trillion-document scale, integrating with LLMs/agents for truth-seeking, low-hallucination reasoning, and real-time knowledge access.
- Develop reliable inference serving infrastructure: load balancing, autoscaling, KV cache, batching, fault-tolerance, monitoring (Prometheus/Grafana), CI/CD (Buildkite/ArgoCD), and benchmarking for 100% uptime and optimal tail latency.
- Optimize low-level performance: CUDA kernels (GeMM, attention), Triton/CUTLASS extensions, quantization/distillation/speculative decoding, GPU memory hierarchy, and model-hardware co-design for next-gen architectures.
- Innovate on compilers/runtimes (JAX/XLA/MLIR, custom features for Hopper/Blackwell), distributed profiling/debugging tools, and interconnect fabrics (copper/optical, 1.6T+, SerDes/photonics, topology simulation, vendor roadmaps).
- Manage complex workloads across clouds/clusters: orchestration (Kubernetes), data bookkeeping/verifiability, high-speed interconnect validation, failure analysis, and telemetry/automation for production reliability.
Required Qualifications
- Strong systems engineering skills with proven impact on large-scale distributed infrastructure (data processing, search, inference, or cluster networking).
- Proficiency in Python and at least one compiled language (Rust, C++, Go, Java); experience building bespoke libraries, optimizing performance, and debugging complex systems.
- Hands-on experience with at least one key area: petabyte-scale data pipelines/crawling (Spark/Ray/Kubernetes), web-scale search/retrieval (vector DBs, ranking, RAG), inference optimization (SGLang, kernels, batching), compiler features (JAX/XLA), or high-speed interconnects (optical/copper, SerDes, signal integrity).job
- Deep understanding of distributed systems challenges: high-throughput ops/sec, latency/throughput tradeoffs, fault-tolerance, monitoring, and scaling to production billions-of-users or 100k+ GPUs.
- Passion for AI infrastructure: keeping up with SOTA techniques, first-principles problem-solving, meticulous organization/bookkeeping, and delivering rigorous, high-quality results.
Preferred Qualifications
- Experience with multimodal data (images/video/audio), epistemics/truth-seeking in retrieval, or agentic systems (long-horizon reasoning, feedback loops).
- Low-level optimizations: CUDA kernel development (Tensor cores, attention), GPU profiling (Nsight), low-precision numerics, or interconnect pathfinding (LPO/LRO/CPO, photonics).
- Production expertise in inference reliability (0% error target), CI/CD for ML, or cluster networking (topology, vendor collaboration, failure root-cause).
- Track record owning end-to-end projects in hyperscale environments, with strong debugging, vendor management, or open-source contributions (e.g., SGLang).
Annual Salary Range
$180,000 - $440,000 USD
Benefits
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer. For details on data processing, view our
$170k - $360k
...Software Engineer - Data Infra Reliability As our models scale to "omni" capabilities, our data infrastructure must be unbreakable. We are looking... ...Bonus Points) Experience managing GPU clusters or AI/ML workloads. Background in both Software Engineering and Operations...Suggested$204k - $259k
...developing and deploying advanced ML models that interpret traffic... ...will report to the Senior Engineering Manager of Semantics. You... ...new features in the VLM data infra and validate the changes for... ...professional experience in the field of software engineering ~ Proficiency...SuggestedFull timeWork at officeRemote work- ...performance in the industry. Position Overview As a Software Engineer, Data Infra you are the architect of the "Laboratory" where Dyna's robotic... ...between raw multimodal sensor streams and production-ready ML models. This is a high-impact, hands-on role where you...Suggested
$213k - $263k
...across 15+ U.S. states. The ML Ops team, part of Waymo ML... ...Develop and contribute to Waymo's data infrastructure platform to... ...via data store and data infra ecosystem. Work closely with... ...professional experience in the field of software engineering ~ Experience programming in...SuggestedFull timeRemote work$153k - $222k
...the role We are looking for infrastructure engineers with expertise in scaling open-source data infrastructure to join the Data & ML infra group. This role will work across the... ...hooks. Develop and deploy high-quality software using modern tooling and frameworks, especially...SuggestedFull timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift$162.8k - $203.5k
Rivian is searching for a Staff Software Engineer on the Data team, responsible for expertise in cloud and data engineering. The role requires a solid understanding of the AWS Cloud Data Platform, leading critical infrastructure services for the ADAS team. Key qualifications...- ...generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world... ...-up to make that a reality. We're looking for a Senior ML & Data Infrastructure Engineer to own and scale the systems that power our model...Immediate start
- ...Engineering Role at Latica At Latica, our goal is to unlock the value of data to transform patient care. We're building a secure data network... ...5+ years building production software systems; care deeply about... ...requirements • Exposure to ML pipelines, feature stores, or...
$160k - $230k
...the future of how work gets done. The Data Governance team builds services, systems... ..., and auditability access. We leverage ML techniques across our product offerings... ...of Data Stewards easier. AS A SENIOR SOFTWARE ENGINEER IN DATA GOVERNANCE AT SNOWFLAKE, YOU WILL...Flexible hours$180k - $220k
...Software Engineer, Data Los Angeles, Palo Alto, San Francisco About HeyGen At HeyGen, our mission is to make visual storytelling accessible... ...AI models. Power Intelligent Features: Collaborate with ML engineers to implement data structures and APIs for new,...Work experience placement$153k - $222k
...Infrastructure Engineer Applied Intuition, Inc. is powering the future of... ...expertise in scaling open-source data infrastructure to join the Data & ML infra group. This role will work across... ...Develop and deploy high-quality software using modern tooling and frameworks...Full timeFor contractorsFor subcontractor$281k - $356k
...states. The Perception Data team at Waymo is... ...automated "flywheels" and "infra-as-product" solutions that... ...to a Director of Engineering You will: Define... ...: ~10+ years of software engineering experience... ...distributed systems or ML infrastructure. ~ System...Full timeRemote work$193.93k - $352.29k
...Staff Software Engineer, Behavior ML Data Mountain View, California (HQ) Who We Are Nuro is a self-driving technology company on a mission to... ...Functional Leadership: Work across autonomy teams and data infra teams to build effective ML data pipelines and products...Shift work$180k - $300k
...compute is wasted training on data that are already learned, irrelevant... ...both data research and data engineering necessary to solve this... ...have experience maintaining the infra that supports these. Proficiency... ...Team. Experience building ML/DL systems and/or data...Work at officeVisa sponsorshipRelocation package$193.93k - $291.15k
...Sr. Software Engineer, Perception Data Infrastructure Mountain View, California (HQ) About the Role We are a team of high-output generalists where ML and systems engineering converge to push autonomy performance forward. As a Senior Perception ML Data Infrastructure...$206.5k - $258.1k
...Autonomy org at Rivian is seeking a Staff Software Engineer, Data Ops to join the Data team who can... ...highlighting AWS Cloud Platform and Data/Dev/ML Ops practices. Responsibilities... ...highly reliable, scalable, and distributed infra using microservice architecture....Full timeContract workTemporary workPart timeLocal areaShift work$160.36k - $240.54k
...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with...Work experience placement$272k - $431.25k
...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA... ...Work closely with a variety of teams, such as researchers, data engineers, and DevOps professionals, to develop a cohesive...$162k - $260k
Senior Software Engineer - Vehicle Data Management Aurora’s Vehicle Data Management team is responsible for offloading, ingesting, and scaling data across... ...to hundreds of PB of multimodal data (sensor/log/sim/ML datasets). Improve reliability and performance of offline...Local area$240k - $280k
...highly motivated, and focused on engineering excellence. This organization... ...discovery. High-quality data is fundamental to every stage... ...We work at the intersection of software, data, infrastructure, and machine... ...closely with acquisition teams, ML engineers, and data engineers...Temporary work- Staff Software Engineer, GenAI, Data Quality corporate_fare Google place Mountain View, CA, USA Apply Minimum... ...or a related field. Familiarity with ML production tools and lifecycle. About... ...Data Science, Product, UX/UX Researcher, Infra and Operations teams. #J-18808-Ljbffr...
$180k - $225k
...hiring a Machine Learning Infrastructure Engineer to help build the backbone that trains, serves... ...end-to-end-partnering with product and data teams, reducing latency and cost, and... ...launched model. You'll work across the ML lifecycle: making training faster and more...Full timeLocal areaWork from home$140k - $252k
...What to Expect As a Software Engineer within the Supercomputing AI Infrastructure team, you will... ...across training jobs, experiments, and data pipelines. In this role, you will own the... ...job throughput Work closely with the ML team to understand workload patterns and...Hourly payFull timeTemporary workFlexible hours$275.8k - $340.5k
About the Team The AV ML Infra team at GM builds ML infrastructure designed to meet the unique... ...teams such as Embodied AI, Simulation, Data Science, and more. We enable scalable and... ..., enhance the productivity of ML engineers, and drive the adoption of cutting‑edge ML...Remote workRelocationRelocation packageFlexible hours$193.93k - $291.15k
...a team of high-output generalists where ML and systems engineering converge to push autonomy performance forward. As a Perception ML Data Engineer, you’ll bridge machine learning... ...~ Experience: ~4+ years of industry software engineering experience with Python fluency...Full time$150.32k - $225.48k
...Software Engineer II - Data Platform Pittsburgh, PA Latitude AI develops automated driving technologies, including L3, for Ford vehicles at... ...Airflow and Superset to serve Latitude's unique autonomy and ML use cases A Rich Metadata Layer: Provide the automation...Permanent employmentFull timeWork at officeImmediate startVisa sponsorship$160k - $240k
...trained on rich, continuous neural data — a high-resolution model of... ...come from researchers and engineers working as a single, tightly collaborative... ...the Role We are hiring Software Engineers to build the data... ...closely with researchers, ML engineers, and infrastructure...Full timeVisa sponsorshipFlexible hours$166k - $244k
Senior Software Engineer, Infra, Vertex Gemini API+ Serving - Sunnyvale, CA, USA. About the job Google'... ...architecting production‑quality Machine Learning (ML) infrastructure. Experience in AI/ML... ...field. 5 years of experience with data structures/algorithms. 1 year of...Full time$160k - $225k
...used to expand our product and engineering teams, bringing our vision of... ...Why Join Now While traditional software has a clear playbook, building... ...stack , from the foundational data platforms that feed our agents... ...accuracy across analytics and ML applications. Implement...$154.4k - $212.3k
About the role This role sits within our Data Layer and Marketing AI (MAI) platform, where... ..., distributed compute, and platform engineering. Key Responsibilities Design and build scalable... .... Collaborate with product, AI/ML, and platform teams to deliver end‑to‑end...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, ML & Data Infra. Be the first to apply!
- software engineer full time Palo Alto, CA
- startup software engineer Palo Alto, CA
- research software engineer Palo Alto, CA
- rust software engineer Palo Alto, CA
- work from home software developer Palo Alto, CA
- software developer Palo Alto, CA
- software development engineer aws Palo Alto, CA
- ngo software engineer Palo Alto, CA
- software engineer staff Palo Alto, CA
- part time software developer Palo Alto, CA


