Machine Learning Infrastructure Engineer

TRM Labs

Build a Safer World. TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all. TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all. At TRM, we’re on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables financial institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale. As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM’s AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning — building the foundation that enables high-throughput, production-grade ML workloads. The impact you’ll have here: Design and operate GPU cluster infrastructure. Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users. Optimize high-throughput inference. Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads. Enable distributed inference strategies. Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models. Implement model optimization and compilation workflows. Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost. Schedule heterogeneous workloads. Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand. Build observability into ML infrastructure. Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability. Partner across engineering teams. Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services. What we’re looking for: Bachelor’s degree (or equivalent) in Computer Science or related field. 5+ years of experience building and operating distributed systems or infrastructure in production environments. Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP). Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost. Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum. Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems. Familiarity with distributed inference strategies including model parallelism and tensor parallelism. Experience working with Kubernetes or equivalent orchestration systems in cloud environments. Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus. CUDA familiarity and experience debugging GPU-related issues is a plus. Adaptable. Goals can change fast. You anticipate and react quickly. Autonomous. You own what you work on. You move fast and get things done. Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing. Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization. Life at TRM We are building a safer world. That promise shows up in how we work every day. TRM moves quickly. We are a high velocity, high ownership team that expects clarity, follow-through, and impact. People who thrive here are energized by hard problems, experimentation, and continuous feedback. If something takes months elsewhere, it will ship here in days. Our work sits at the intersection of AI, national security, and fighting financial crime. The problems are complex, the stakes are real, and the environment evolves quickly. The pace and intensity of the work reflect the importance of the mission. As a result, the way we operate requires a high level of ownership, adaptability, collaboration, and creative problem-solving. At TRM, you should expect: Priorities and targets to change quickly as we experiment and iterate Work that often requires operating with a high degree of ambiguity A high level of personal ownership and accountability Close collaboration across teams and functions Frequent, high-touch communication • Creative problem solving and out-of-the-box thinking A pace that rewards urgency, adaptability, and outcomes This environment is energizing for people who enjoy building, solving hard problems, and making progress in situations that are not always fully defined. It also requires comfort navigating ambiguity, adjusting course as new information emerges, and maintaining focus and positivity in a fast-moving and intense environment. We also recognize that this style of operating is not for everyone. If you are primarily optimizing for predictability or a consistently balanced workload, we encourage you to use the interview process to pressure test whether this environment is truly the right fit. We want teammates who thrive here, not just survive here. At the same time, many people find this work deeply rewarding. If you are excited by meaningful problems, motivated by ambitious goals, and energized by working alongside mission-driven colleagues, there is a good chance you will find TRM to be an exceptional place to grow and contribute. Learn more: TRM Interviewing at TRM: How We Hire and What Success Looks Like AI Fluency at TRM AI fluency is a baseline expectation at TRM. We believe AI meaningfully changes how top performers operate. We expect every team member to use AI to accelerate and reimagine their craft, not just automate surface tasks. At TRM, AI fluency means you are among the top 10 percent of operators in your function in how you apply AI to: Accelerate repeatable workflows Structure and solve problems Improve output quality Increase speed and leverage You will be evaluated on applied AI fluency during the interview process. Leadership Principles We hire and grow against three leadership principles. They’re the standards for how we operate, treat each other, and make decisions. Impact-Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment – test, ship, measure, and iterate quickly. Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes end‑to‑end, and invest in getting better everyday. Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a one‑team mindset — giving and receiving feedback to make the team stronger. The impact you will have This work has real stakes. Depending on your role at TRM, your week might look like: Driving critical investigations that can’t wait for typical business hours. Shipping products in days when others would schedule quarters. Partnering with teams across time zones to deliver insights while the story is still unfolding. Building new solutions from first principles when the playbook doesn’t yet exist. Protecting victims and customers by tracing illicit activity and disrupting criminal networks. Join our Mission At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you’re excited by TRM’s mission but don’t check every box, we encourage you to apply — we hire for slope, judgment, and the will to learn fast. TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore. Privacy Policy and Additional Information By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy. Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law. To notify TRM Labs that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form. Recruitment agencies TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement. Learn More Learn More: Company Values | Interviewing | FAQs #J-18808-Ljbffr

Apply

Vacancy posted 10 hours ago

Similar jobs that could be interesting for youBased on the Machine Learning Infrastructure Engineer in San Francisco, CA vacancy

Machine Learning Infrastructure Engineer
$200k - $300k
...our community of problem solvers, technologists, clinicians, and innovators. The Role: We’re looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure...
Suggested
Work at office
3 days per week
Ambience Healthcare
San Francisco, CA
11 hours ago
Machine Learning Engineer, Training Infrastructure
...Machine Learning Engineer, Training Infrastructure Job Title: Machine Learning Engineer, Training Infrastructure Position Type: Full time Location: San Francisco, CA, USA Job ID#: 158135 Job Description: We are looking for an ML Engineer with 3+ YOE in high-performance...
Suggested
Full time
Work experience placement
Ipro Networks Pte. Ltd.
San Francisco, CA
1 day ago
Machine Learning Infrastructure Engineer
...and edge compute resources. Responsibility The AI Infrastructure team at Zensors builds the engine that powers our visual sensing platform. We provide... ...monitoring across thousands of video streams. As a Machine Learning Engineer in ML Runtime & Optimization , you will develop...
Suggested
Work at office
Zensors
San Francisco, CA
11 hours ago
Machine Learning Infrastructure Engineer
$245k - $345k
...Check out the latest Whatnot updates on our news and engineering blogs and join us as we enable anyone to turn their passion... ...and ML at Whatnot. You’ll design and scale the core infrastructure that powers machine learning and self-hosted large language model applications...
Suggested
Work experience placement
Work at office
Local area
Remote work
Work from home
Home office
Flexible hours
Whatnot
San Francisco, CA
12 days ago
Machine Learning Infrastructure Engineer
...San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in San Francisco Compensation: Competitive Salary + Equity Who We Are UniversalAGI...
Suggested
Work at office
Flexible hours
1 day per week
UniversalAGI
San Francisco, CA
5 days ago
Senior Machine Learning Infrastructure Engineer
...looking for people that have done genuinely amazing work in infrastructure that are interested in a challenge, working with both traditional... ...., as well as very different infrastructure around inference engines and GPU loads. This is a role that will inherently require...
Morph Inc.
San Francisco, CA
5 days ago
Machine Learning Infrastructure Engineer
...Workshop Labs Job Posting Build the infrastructure to serve personal AI models privately and... ...first truly private, personal AI – one that learns your skills, judgment, and preferences... ...Have • A deep understanding of the machine learning stack. You can dive into the details...
Remote work
Shift work
Workshop Labs
San Francisco, CA
4 days ago
Senior Machine Learning Infrastructure Engineer
$183.7k - $248.6k
...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real-time systems that power Unity's global advertising platform. This is a high-scale, low-latency environment — processing billions...
Work at office
Remote work
Worldwide
Relocation package
UNITY
San Francisco, CA
5 days ago
Staff Machine Learning Engineer, ML Infrastructure
$209.7k - $283.8k
...San Francisco, CA, USA Staff Machine Learning Engineer, ML Infrastructure Location San Francisco, CA, USA Department AI & Machine Learning Requisition ID JOBREQ-2615904 Role description The opportunity Unity Vector builds an offline ML platform...
Work at office
Worldwide
Relocation package
Unity Technologies
San Francisco, CA
1 day ago
Machine Learning Engineer, GenAI Platform
...to maintain rigid systems, Lightfield learns from how companies actually work, adapting... ...development of ML product development infrastructure, focusing on scaling and innovating in... ...and define best practices for software engineering in an AI-driven development landscape....
Work from home
LIGHTFIELD INC
San Francisco, CA
10 hours ago
Machine Learning Engineer - ML Training Platform
...Overview Pluralis Research is pioneering Protocol Learning – a fully decentralised way to train and deploy... ...AI. We’re looking for an ML Training Platform Engineer to architect, build, and scale the foundational infrastructure powering our decentralised ML training platform...
Work experience placement
Pluralis Research
San Francisco, CA
11 days ago
Senior Machine Learning Engineer - GeoAI Platform
$185k - $275k
...Senior Machine Learning Engineer – GeoAI Platform Wherobots, Inc. San Francisco, California, United States | Information Technology About this... ...is a distributed‑systems‑first role with meaningful ML infrastructure ownership. You will spend most of your time building high...
Full time
Work at office
Remote work
Work visa
Wherobots, Inc
San Francisco, CA
11 hours ago
Senior Machine Learning Platform Engineer
Job Title Disabled veteran A veteran who served on active duty in the U.S. military and is entitled to disability compensation (or who but for the receipt of military retired pay would be entitled to disability compensation) under laws administered by the Secretary of...
Veho
San Francisco, CA
3 days ago
Senior Machine Learning Engineer - GenAI Platform
$166k - $225k
...Founded in late 2020 by a small group of machine learning engineers and researchers, Mosaic AI enables companies to securely fine-tune, train... ...to-end implementation Design and build the core platform infrastructure that supports our customer-facing product features Ensure...
Local area
Worldwide
Databricks
San Francisco, CA
11 hours ago
Machine Learning, Platform Engineer
$160k - $250k
...Machine Learning, Platform Engineer San Francisco About the Role Our team focuses on enabling custom models and dedicated inference on... ...of existing distributed systems, APIs, databases, and infrastructure Partner with product teams to understand functional...
Full time
Together AI
San Francisco, CA
5 days ago
Senior Machine Learning Engineer, Machine Learning Platform Technologies
$181.1k - $318.4k
...Senior Machine Learning Engineer, Machine Learning Platform Technologies Work Locations (2) Submit Resume Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and...
Worldwide
Relocation
Apple
San Francisco, CA
2 days ago
Machine Learning Engineer (AI Agent Platform)
$110k - $180k
...portfolios, and real-time trading, all backed by robust data infrastructure. The Role Arta is building the AI infrastructure for the next... ...environment where innovation, collaboration, and continuous learning are highly valued The opportunity to work with a diverse and...
Work at office
Relocation
Arta Finance
San Francisco, CA
11 hours ago
Machine Learning Engineer, AI Agent Platform
...worldwide to eliminate busywork and focus on what matters. Learn more at superhuman.com and about our values here. The Opportunity... ...complex tasks, leveraging Superhuman ubiquitous UI. As a Machine Learning Engineer on this team, you will be at the heart of our company's...
Worldwide
Home office
Flexible hours
I did my part and supported the Regular Toilet
San Francisco, CA
10 hours ago
ML Infrastructure Engineer
$200k - $280k
...Engineering San Francisco Full-time $200,000 - $280,000 About the Role Join our ML Infrastructure team to build the systems that train, deploy, and serve our AI models at scale... .... You'll work at the intersection of machine learning and systems engineering. What You Will...
Full time
Work at office
Lattice
San Francisco, CA
11 hours ago
ML Infrastructure Engineer
$250k - $350k
...Most AI roles build on top of models. This one builds what makes them actually work. We’re hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what’s happening on live job sites using wearable devices, large-scale video, and AI. This...
Trades Workforce Solutions
San Francisco, CA
11 hours ago
ML-Infrastructure Engineer
$100k - $200k
...Simulation & Evaluation that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.20% - 1.00% Location San... ...hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self‑host...
Full time
Live in
Work at office
Voiceflow
San Francisco, CA
11 hours ago
ML Infrastructure Engineer, Safeguards
$320k - $405k
...team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to... ...AI systems. About the role We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you'll...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
ML Infrastructure Engineer
...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
Spectral Labs
San Francisco, CA
3 days ago
Founding ML infrastructure Engineer
...The problem we saw Most AI infrastructure is built for batch: send a query, wait, get a response, reset. Powerful, but transactional. AI... ...inference infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute...
Flexible hours
Shift work
U-Run
San Francisco, CA
11 hours ago
ML Infrastructure Engineer
...innovation through advanced hardware engineering and AI solutions. Our mission is to... ...lasting impact. We emphasize continuous learning and growth, fostering cross-... ...Job Summary We are seeking a Senior Machine Learning Infrastructure Engineer to join our team. The person...
Flexible hours
Echo Neurotechnologies
San Francisco, CA
1 day ago
ML Infrastructure Engineer
...Sygaldry Quantum-Accelerated AI Server Engineer Sygaldry Technologies is building quantum-accelerated AI servers to exponentially... ...that will accelerate and transform AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments...
Casual work
Local area
Visa sponsorship
Sygaldry
San Francisco, CA
3 days ago
Remote ML Infrastructure Engineer
...Whatnot is seeking an AI/ML Platform Engineer to shape the future of machine learning within a fast-growing livestream shopping platform. In this role, you'll design and scale systems that support various business functions, prototype novel architectures, and build robust...
Remote work
Whatnot
San Francisco, CA
1 day ago
Senior Machine Learning Engineer
$225k - $325k
...Senior Machine Learning Engineer ABOUT THE ROLE This is a hands-on, high-ownership role for ML engineers who want to build production... ...subjective quality, and inform model iterations. Level Up Infrastructure – Design and maintain the ML infrastructure needed for...
H1b
kadence
San Francisco, CA
2 days ago
Senior Machine Learning Engineer, AdTech
$180k - $220k
...The Sr. Machine Learning Engineer will join our Applied Data Science group, part of Nexxen DSP Software Development. In this hands‑on role,... ...scientists to build the next generation of applied data science infrastructure for real‑time performance optimization and machine...
Full time
Work at office
Local area
Remote work
3 days per week
Nexxen
San Francisco, CA
1 day ago
Machine Learning Engineer, tvScientific
$123.7k - $254.67k
...advertisers can trust to grow their business. We are seeking a Machine Learning Engineer to build out our simulation and AI capabilities. You’ll... ...Define the technical direction for simulation and AI infrastructure and mentor engineers on the team What we’re looking for Strong...
Work at office
Local area
Relocation
Relocation package
I did my part and supported the Regular Toilet
San Francisco, CA
10 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Infrastructure Engineer. Be the first to apply!