Machine Learning - Infrastructure
Causal Labs
Infrastructure Engineer
Our mission is general causal intelligence, AI that is capable of predicting the future and identifying the optimal actions to change that future.
To achieve this breakthrough, we are building a Large Physics foundation Model (LPM) because domains governed by physics have inherent cause and effect relationships, unlike visual or textual data.
Weather is the ideal training ground for an LPM. It is the most well-observed physical system, offering rapid, objective ground truth feedback from sensory observations and data at a scale that dwarfs what is used to train today's LLMs.
Causal Labs is a team of researchers and engineers from self-driving, drug discovery, and robotics - including Google DeepMind, Cruise, Waymo, Insitro, and Nabla Bio - who believe general causal intelligence will be the most important technical breakthrough for civilization.
We look for infrastructure engineers who are excited to tackle unsolved problems.
Our training and inference challenges demand deep expertise in setting up distributed training clusters and optimizing performance for large models. If you have experience building large-scale ML infrastructure in related fields such as language and vision models, robotics, biology -- join us on this mission.
Responsibilities
- Design, deploy, and maintain large distributed ML training and inference clusters
- Develop efficient, scalable end-to-end pipelines to manage petabyte-scale datasets and model training throughout the entire ML lifecycle
- Research and test various training approaches including parallelization techniques and numerical precision trade-offs across different model scales
- Analyze, profile and debug low-level GPU operations to optimize performance
- Stay up-to-date on research to bring new ideas to work
What We're Looking For
- A relentless approach to problem-solving, rapid execution, and the ability to quickly learn in unfamiliar domains.
- Strong grasp of state-of-the-art techniques for optimizing training and inference workloads
- Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
- Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings
- Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
- Background working on distributed task management systems and scalable model serving & deployment architectures
- Understanding of monitoring, logging, observability, and version control best practices for ML systems
You don't have to meet every single requirement above.
- ...About the Role We are seeking a Data Infrastructure Engineer to build and operate the infrastructure that turns drone, aerial, and orbital sensing data into production datasets, models, and customer-facing insights. This role spans ingestion, processing, storage,...SuggestedPermanent employmentFull time
$217k - $288.4k
Job Description Databricks is looking for a Senior Manager, Infrastructure Data Science to shape the future of Databricks... ...What we look for: 10+ years of infrastructure data science, machine learning, advanced analytics experience in high velocity, high-growth...SuggestedWork at officeWorldwide- Nerdleveltech is seeking an L4 Machine Learning Engineer to join our Trust Intelligence Platform team in San Francisco. You will design, build, and operate ML infrastructure that enables real-time intelligence in customer interactions. Responsibilities include architecting...SuggestedRemote work
- Cartesia is looking for a Software Engineer to build the data infrastructure for its AI models in San Francisco. In this hands-on role, you will design and implement scalable data pipelines for multimodal data, particularly audio. Candidates should have experience with...SuggestedWork at office
- ...ML Infrastructure Engineer Today's Data Scientists are in pain - spending their time manually wrangling data, building models through... .... Our team has previously led large data science and machine learning teams (covering both applications and infrastructure), built...SuggestedWork at office
- ...Machine Learning Engineer In ML Runtime & Optimization Zensors is the spatial intelligence platform for the physical world. Our AI... ...efficiently on both cloud and edge compute resources. The AI Infrastructure team at Zensors builds the engine that powers our visual...Work at office
- A progressive technology company in San Francisco is looking for a Data Infrastructure Engineer to design and operate data and ML infrastructure on AWS. The ideal candidate will have strong software engineering fundamentals and experience building production systems, particularly...
- ...Data Infrastructure Engineer Los Angeles, Palo Alto, San Francisco, Toronto About HeyGen At HeyGen, our mission is to make... ...infrastructure Collaborate with data scientists and machine learning engineers to understand their computational and data needs...
$250k - $350k
...Job Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-...$292k - $417.2k
...Director, ML Engineering & Infrastructure San Francisco, CA; Los Angeles, CA; New York, NY (Hybrid); USA - Remote About the Role: The Machine Learning team at Tubi drives the innovation behind personalized user experiences. With the largest inventory in the industry...Full timeTemporary workLocal areaRemote workFlexible hours- ...serve custom inference stacks that have irregular GPU load. We're looking for people that have done genuinely amazing work in infrastructure that are interested in a challenge, working with both traditional infrastructure such as load balancers, NLB, etc., as well as...
$320k - $405k
...experts, and business leaders working together to build beneficial AI systems. About the role We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you'll build and scale the critical infrastructure that powers our...Work at officeVisa sponsorshipFlexible hours- ...Workshop Labs Job Posting Build the infrastructure to serve personal AI models privately and... ...first truly private, personal AI – one that learns your skills, judgment, and preferences... ...Have • A deep understanding of the machine learning stack. You can dive into the details...Remote workShift work
- An innovative AI infrastructure startup is seeking a Sales Engineer to lead technical discovery and drive successful evaluations with... ...experience in customer-facing technical roles focused on AI and machine learning infrastructure. Responsibilities include conducting demos,...Remote work
- ...house while utilizing cloud technology to create reliable data infrastructure. The ideal candidate has 5+ years of software engineering experience and expertise in managing large datasets for machine learning applications. The position offers a competitive salary and...
- A frontier research laboratory in San Francisco is seeking a Senior / Principal ML Engineer to enhance their ML infrastructure. The role involves designing experimental frameworks for data scientists, collaborating with various teams, and ensuring rigorous practices in...
$216k - $270k
...As a Software Engineer on the Machine Learning Infrastructure team, you will build the "Operating System" for our large-scale GPU clusters. You will architect a high-performance training platform that handles the immense complexity of multi-thousand GPU workloads, ensuring...Full time- A leading AI infrastructure firm based in San Francisco is looking for engineers to join their founding core team. You will work directly with the founders to develop AI models that optimize network operations and anticipate failures. This unique position offers the opportunity...
- A dynamic AI company is seeking an Infrastructure Software Engineer in San Francisco to build and maintain components of an ML inference... ...generous PTO policies. Join a collaborative team dedicated to advancing AI and machine learning infrastructure. #J-18808-Ljbffr Baseten
- Roboflow in San Francisco is seeking a versatile Infrastructure Engineer to enhance our core infrastructure and scale our cloud operations... ...security best practices. Candidates with a background in machine learning will thrive in our dynamic, fast-paced environment where...
- Repovive, Inc. seeks an experienced ML Engineer to build infrastructure for fraud detection and bank intelligence at Plaid. The role requires a minimum of 5 years of applied ML experience and emphasizes expertise in ML graph embeddings and feature stores. Interested candidates...
- A leading AI research organization in San Francisco seeks an Infrastructure Engineer to design and maintain large distributed ML training and inference clusters. The ideal candidate will have a strong grasp of optimizing training workloads and experience with distributed...
$197.3k - $313.7k
...is looking for a Staff Software Engineer to join the Data Infrastructure team within the broader Data Engineering organization. The... ...core data platforms that enable data engineers, analysts, and machine learning practitioners to deliver trusted insights and data products...- .... Your Role As a Senior Software Engineer, AIOps & Infrastructure at Eloquent AI, you will be responsible for designing, building... ...of our enterprise AI agents. Your work will enable machine learning engineers and AI teams to train, fine-tune, and deploy LLMs...
$183.7k - $248.6k
...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real-time systems that power Unity's global advertising platform. This is a high-scale, low-latency environment — processing billions...Work at officeRemote workWorldwideRelocation package- ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
- ...is building intelligent robotic arms that can learn new skills in hours, not months. Backed by Y Combinator... ...in robotics, embodied AI, and large-scale machine learning. The Role We’re looking for a Robotics Data Infrastructure Engineer to own and build the data systems that...Full timeWork experience placementImmediate start
$100k - $200k
...Simulation & Evaluation that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.20% - 1.00% Location... ...hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self‑host versus...Full timeLive inWork at office- ...San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in San Francisco Compensation: Competitive Salary + Equity Who We Are UniversalAGI is...Work at officeFlexible hours1 day per week
$200k - $280k
...Francisco Full-time $200,000 - $280,000 About the Role Join our ML Infrastructure team to build the systems that train, deploy, and serve our AI models at scale. You'll work at the intersection of machine learning and systems engineering. What You Will Do Build and maintain...Full timeWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning - Infrastructure. Be the first to apply!
- machine learning remote San Francisco, CA
- machine learning scientist San Francisco, CA
- machine learning consultant San Francisco, CA
- machine learning intern San Francisco, CA
- data engineer machine learning San Francisco, CA
- machine learning San Francisco, CA
- machine learning part time San Francisco, CA
- machine learning research scientist San Francisco, CA
- intern - quantum machine learning for quantum computing San Francisco, CA
- artificial intelligence - machine learning intern San Francisco, CA

