Machine Learning Infrastructure Engineer

TRM

Build a Safer World.

TRM Labs provides AI-powered intelligence solutions that help public and private sector agencies investigate and disrupt crime. TRM's platforms enable investigators to trace illicit activity, build cases, and construct operating pictures of threat networks. Leading agencies and businesses worldwide rely on TRM to make the world safer and more secure.

TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM's blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.

At TRM, we're on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables financial institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale.

As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM's AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning - building the foundation that enables high-throughput, production-grade ML workloads.
The impact you'll have here:

Design and operate GPU cluster infrastructure.
Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users.
Optimize high-throughput inference.
Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads.
Enable distributed inference strategies.
Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models.
Implement model optimization and compilation workflows.
Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost.
Schedule heterogeneous workloads.
Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand.
Build observability into ML infrastructure.
Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability.
Partner across engineering teams.
Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services.

What we're looking for:

Bachelor's degree (or equivalent) in Computer Science or related field.
5+ years of experience building and operating distributed systems or infrastructure in production environments.
Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost.
Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus.
CUDA familiarity and experience debugging GPU-related issues is a plus.
Adaptable. Goals can change fast. You anticipate and react quickly.
Autonomous. You own what you work on. You move fast and get things done.
Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing.
Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization.

Life at TRM

We are building a safer world. That promise shows up in how we work every day.

TRM moves quickly. We are a high velocity, high ownership team that expects clarity, follow-through, and impact. People who thrive here are energized by hard problems, experimentation, and continuous feedback. If something takes months elsewhere, it will ship here in days.

Our work sits at the intersection of AI, national security, and fighting crime. The problems are complex, the stakes are real, and the environment evolves quickly. The pace and intensity of the work reflect the importance of the mission. As a result, the way we operate requires a high level of ownership, adaptability, collaboration, and creative problem-solving.

At TRM, you should expect:

Priorities and targets to change quickly as we experiment and iterate
Work that often requires operating with a high degree of ambiguity
A high level of personal ownership and accountability
Close collaboration across teams and functions
Frequent, high-touch communication
Creative problem solving and out-of-the-box thinking
A pace that rewards urgency, adaptability, and outcomes

This environment is energizing for people who enjoy building, solving hard problems, and making progress in situations that are not always fully defined. It also requires comfort navigating ambiguity, adjusting course as new information emerges, and maintaining focus and positivity in a fast-moving and intense environment.

We also recognize that this style of operating is not for everyone. If you are primarily optimizing for predictability or a consistently balanced workload, we encourage you to use the interview process to pressure test whether this environment is truly the right fit. We want teammates who thrive here, not just survive here.

At the same time, many people find this work deeply rewarding. If you are excited by meaningful problems, motivated by ambitious goals, and energized by working alongside mission-driven colleagues, there is a good chance you will find TRM to be an exceptional place to grow and contribute. Learn more: Interviewing at TRM: How We Hire and What Success Looks Like

AI Fluency at TRM

AI fluency is a baseline expectation at TRM.

We believe AI meaningfully changes how top performers operate. We expect every team member to use AI to accelerate and reimagine their craft, not just automate surface tasks.

At TRM, AI fluency means you are among the top 10 percent of operators in your function in how you apply AI to:

Accelerate repeatable workflows
Structure and solve problems
Improve output quality
Increase speed and leverage

You will be evaluated on applied AI fluency during the interview process.

Leadership Principles

We hire and grow against three leadership principles. They're the standards for how we operate, treat each other, and make decisions.

Impact-Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment - test, ship, measure, and iterate quickly.
Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes endtoend, and invest in getting better everyday.
Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a oneteam mindset - giving and receiving feedback to make the team stronger.

Join our Mission

At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you're excited by TRM's mission but don't check every box, we encourage you to apply - we hire for slope, judgment, and the will to learn fast.

TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore.

Privacy Policy and Additional Information

By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy.

Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law.

To notify TRM Labs that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

The use of AI tools of any kind (including but not limited to notetakers, interview assistants, and real-time coaching tools such as Otter.ai, Fireflies, Fathom, Cluey, or similar) during TRM interviews is not permitted without prior approval from TRM. TRM uses its own internal tools for note-taking to ensure a consistent and confidential experience for all candidates.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form.

Recruitment agencies

TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement.

Learn More : Company Values | Interviewing | FAQs

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Machine Learning Infrastructure Engineer in San Francisco, CA vacancy

Machine Learning Engineer, Offline Infrastructure (Entry-Level / New Grad PhD)
$112.7k - $169.1k
...data, supporting analytics, product intelligence, machine learning pipelines, and business operations. As data volume... ...ML systems. We’re looking for a Machine Learning Engineer to join our Offline Infrastructure team. This is an ideal role for a recent university...
Suggested
Work at office
Worldwide
Relocation package
UNITY
San Francisco, CA
3 days ago
Machine Learning Infrastructure Engineer
...San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in San Francisco Compensation: Competitive Salary + Equity Who We Are UniversalAGI...
Suggested
Work at office
Flexible hours
1 day per week
UniversalAGI
San Francisco, CA
3 days ago
Machine Learning Infrastructure Engineer
$245k - $345k
...Check out the latest Whatnot updates on our news and engineering blogs and join us as we enable anyone to turn their passion... ...and ML at Whatnot. You’ll design and scale the core infrastructure that powers machine learning and self-hosted large language model applications...
Suggested
Work experience placement
Work at office
Local area
Remote work
Work from home
Home office
Flexible hours
Whatnot
San Francisco, CA
7 hours ago
Senior Machine Learning Infrastructure Engineer
$183.7k - $248.6k
...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real-time systems that power Unity's global advertising platform. This is a high-scale, low-latency environment — processing billions...
Suggested
Work at office
Remote work
Worldwide
Relocation package
UNITY
San Francisco, CA
3 days ago
Machine Learning Infrastructure Engineer
...Workshop Labs Job Posting Build the infrastructure to serve personal AI models privately and... ...first truly private, personal AI – one that learns your skills, judgment, and preferences... ...Have • A deep understanding of the machine learning stack. You can dive into the details...
Suggested
Remote work
Shift work
Workshop Labs
San Francisco, CA
2 days ago
Machine Learning Engineer, Training Infrastructure
...Job Title: Machine Learning Engineer, Training Infrastructure Position Type: Full time Location: San Francisco, CA, USA Salary Range: $150,000 - $250, 000 (USD) Job ID#: 158135 Job Description: We are looking for an ML Engineer with 3+ YOE in high-performance...
Full time
Work experience placement
Intellipro Group
San Francisco, CA
4 days ago
Machine Learning Infrastructure Engineer
...Machine Learning Engineer In ML Runtime & Optimization Zensors is the spatial intelligence platform for the physical world. Our AI platform... ...on both cloud and edge compute resources. The AI Infrastructure team at Zensors builds the engine that powers our visual...
Work at office
Zensors
San Francisco, CA
1 day ago
Senior Machine Learning Infrastructure Engineer
...looking for people that have done genuinely amazing work in infrastructure that are interested in a challenge, working with both traditional... ...., as well as very different infrastructure around inference engines and GPU loads. This is a role that will inherently require...
Morph Inc.
San Francisco, CA
2 days ago
Staff Machine Learning Engineer, ML Infrastructure
$209.7k - $283.8k
...San Francisco, CA, USA Staff Machine Learning Engineer, ML Infrastructure Location San Francisco, CA, USA Department AI & Machine Learning Requisition ID JOBREQ-2615904 Role description The opportunity Unity Vector builds an offline ML platform...
Work at office
Worldwide
Relocation package
Unity Technologies
San Francisco, CA
4 days ago
Machine Learning Infrastructure Engineer
$200k - $300k
...our community of problem solvers, technologists, clinicians, and innovators. The Role: We’re looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure...
Full time
Work at office
3 days per week
Ambience Healthcare
San Francisco, CA
7 hours ago
Machine Learning Infrastructure Engineer- Model Inference
..., PhDs, creatives, technologists, and engineers working together to empower people and... ...Liberty in Pittsburgh. The Role As an ML Infrastructure Engineer, Model Inference at Abridge,... ...infrastructure that powers our machine learning models. Your work will be instrumental...
Hourly pay
Full time
Flexible hours
Abridge
San Francisco, CA
2 days ago
Senior Staff Machine Learning Engineer, Infrastructure Airbnb · United States · $248k-$310k ·
$248k - $310k
# Senior Staff Machine Learning Engineer, InfrastructureAirbnb·United States·$248k - $310kfull-timeleadPosted 10 hours agoApply NowTailor a... ...enable an intelligent & worry-free travel experience. ML Infrastructure, which is the team you will join in, is tasked to provide...
Casual work
Live in
Work at office
Remote work
Not Human Search
San Francisco, CA
1 day ago
Machine Learning Engineer - ML Training Platform
...Overview Pluralis Research is pioneering Protocol Learning – a fully decentralised way to train and deploy... ...AI. We’re looking for an ML Training Platform Engineer to architect, build, and scale the foundational infrastructure powering our decentralised ML training platform...
Work experience placement
Pluralis Research
San Francisco, CA
4 days ago
Senior Machine Learning Platform Engineer
Job Title Disabled veteran A veteran who served on active duty in the U.S. military and is entitled to disability compensation (or who but for the receipt of military retired pay would be entitled to disability compensation) under laws administered by the Secretary ...
Veho
San Francisco, CA
3 days ago
Machine Learning Engineer, GenAI Platform
...to maintain rigid systems, Lightfield learns from how companies actually work, adapting... ...development of ML product development infrastructure, focusing on scaling and innovating in... ...and define best practices for software engineering in an AI-driven development landscape....
Work from home
LIGHTFIELD INC
San Francisco, CA
2 days ago
Senior Machine Learning Engineer - GenAI Platform
$166k - $225k
...P-984 Founded in late 2020 by a small group of machine learning engineers and researchers, Mosaic AI enables companies to securely fine-tune... ...end implementation Design and build the core platform infrastructure that supports our customer-facing product features...
Local area
Worldwide
Databricks
San Francisco, CA
1 day ago
Senior Machine Learning Engineer, AI Platform
$160k - $235k
...Senior Machine Learning Engineer, AI Platform Affinity stitches together billions of data points from massive datasets to create a powerful... ...: Architect and launch ranking and recommendation infrastructure from scratch, initially via integrated off-the-shelf models...
Work at office
Remote work
Worldwide
Flexible hours
2 days per week
3 days per week
Affinity Inc
San Francisco, CA
1 day ago
Machine Learning, Platform Engineer
$160k - $250k
...Machine Learning, Platform Engineer San Francisco About the Role Our team focuses on enabling custom models and dedicated inference on... ...of existing distributed systems, APIs, databases, and infrastructure Partner with product teams to understand functional...
Full time
Together AI
San Francisco, CA
2 days ago
Senior Machine Learning Engineer AI Platform
$151.8k - $265.35k
...outstanding candidates in all related technical fields, such as Machine Learning, Deep Learning, Computer Vision, and Natural Language... ...products. Collaborate with world-class researchers and ML engineers to bring research ideas to production. Publish and present...
Temporary work
Local area
Worldwide
Adobe
San Francisco, CA
1 day ago
Staff Machine Learning Platform Engineer
$246.5k - $339k
...re using the power of tech, data, and machine learning to connect this thriving community of... ...As a Staff Machine Learning Platform Engineer, you will help design, improve, and operate... ...Will Do Design and operate ML infrastructure, including workspaces, clusters, jobs,...
Work experience placement
Work at office
Local area
Remote work
Monday to Friday
Flexible hours
3 days per week
Faire Inc
San Francisco, CA
7 hours ago
ML Infrastructure Engineer
...The Role At Mach9, ML infrastructure engineers build and maintain the systems that power production AI models for civil engineering and surveying. Our ML pipeline spans 10,000+ miles of labeled survey data, image segmentation networks, and 3D prediction models serving...
Work experience placement
Mach9 Robotics Inc
San Francisco, CA
6 days ago
ML Infrastructure Engineer, Safeguards
$320k - $405k
...team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to... ...AI systems. About the role We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you'll...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
7 hours ago
ML Infrastructure Engineer
$250k - $350k
...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-scale...
techire ai
San Francisco, CA
7 hours ago
ML Infrastructure Engineer
...Sygaldry Quantum-Accelerated AI Server Engineer Sygaldry Technologies is building quantum-accelerated AI servers to exponentially... ...that will accelerate and transform AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments...
Casual work
Local area
Visa sponsorship
Sygaldry
San Francisco, CA
1 day ago
ML Infrastructure Engineer
$190k - $260k
...by climate-tech and Silicon Valley investors. For more information, please visit Role Description As a Senior ML Infrastructure Engineer, you will work directly in the Automation org with the core ML, Ops, and Analytics teams to help improve and build out the...
Gridware
San Francisco, CA
3 days ago
Senior ML Infrastructure Engineer
...Senior Client Infrastructure Engineer SAN FRANCISCO, CA ENGINEERING FULL-TIME What Will You Be Doing? Building infrastructure that enables deploying machine learning models over billions of historical data points collected from tens of thousands of retail stores...
Full time
Work experience placement
1872 Consulting
San Francisco, CA
1 day ago
ML Infrastructure Engineer
...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
Spectral Labs
San Francisco, CA
1 day ago
ML Infrastructure Engineer
...innovation through advanced hardware engineering and AI solutions. Our mission is to... ...lasting impact. We emphasize continuous learning and growth, fostering cross-... ...Summary We are seeking a Senior Machine Learning Infrastructure Engineer to join our team. The person...
Flexible hours
Echo Neurotechnologies
San Francisco, CA
4 days ago
Machine Learning Engineer - New Grad 2026
$150k - $175k
...nextdoor.com . Meet Your Future Neighbors At Nextdoor, Machine Learning is one of the most critical teams we are growing. ML is... ...addictive behavior. We are looking for great machine learning engineers who believe in the power of the local community to empower...
Internship
Work at office
Local area
Work from home
Flexible hours
Nextdoor
San Francisco, CA
3 days ago
Machine Learning Engineer
...Machine Learning Engineer At Advex, we're working on solving the hardest problem in all of computer vision – data collection. In order to... ...stack development Large scale model training Building infrastructure to run complex ML pipelines Pruning and model...
OpenReq
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Infrastructure Engineer. Be the first to apply!