Machine Learning Infrastructure Engineer
TRM
Build a Safer World. TRM Labs provides AI-powered intelligence solutions that help public and private sector agencies investigate and disrupt crime. TRM's platforms enable investigators to trace illicit activity, build cases, and construct operating pictures of threat networks. Leading agencies and businesses worldwide rely on TRM to make the world safer and more secure. TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM's blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all. At TRM, we're on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables financial institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale. As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM's AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning - building the foundation that enables high-throughput, production-grade ML workloads.
The impact you'll have here:
Join our Mission At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you're excited by TRM's mission but don't check every box, we encourage you to apply - we hire for slope, judgment, and the will to learn fast. TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore. Privacy Policy and Additional Information By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy. Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law. To notify TRM Labs that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. The use of AI tools of any kind (including but not limited to notetakers, interview assistants, and real-time coaching tools such as Otter.ai, Fireflies, Fathom, Cluey, or similar) during TRM interviews is not permitted without prior approval from TRM. TRM uses its own internal tools for note-taking to ensure a consistent and confidential experience for all candidates.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form. Recruitment agencies TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement. Learn More : Company Values | Interviewing | FAQs
The impact you'll have here:
- Design and operate GPU cluster infrastructure. Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users.
- Optimize high-throughput inference. Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads.
- Enable distributed inference strategies. Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models.
- Implement model optimization and compilation workflows. Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost.
- Schedule heterogeneous workloads. Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand.
- Build observability into ML infrastructure. Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability.
- Partner across engineering teams. Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services.
- Bachelor's degree (or equivalent) in Computer Science or related field.
- 5+ years of experience building and operating distributed systems or infrastructure in production environments.
- Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
- Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost.
- Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
- Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
- Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
- Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
- Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus.
- CUDA familiarity and experience debugging GPU-related issues is a plus.
- Adaptable. Goals can change fast. You anticipate and react quickly.
- Autonomous. You own what you work on. You move fast and get things done.
- Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing.
- Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization.
- Priorities and targets to change quickly as we experiment and iterate
- Work that often requires operating with a high degree of ambiguity
- A high level of personal ownership and accountability
- Close collaboration across teams and functions
- Frequent, high-touch communication
- Creative problem solving and out-of-the-box thinking
- A pace that rewards urgency, adaptability, and outcomes
- Accelerate repeatable workflows
- Structure and solve problems
- Improve output quality
- Increase speed and leverage
- Impact-Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment - test, ship, measure, and iterate quickly.
- Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes endtoend, and invest in getting better everyday.
- Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a oneteam mindset - giving and receiving feedback to make the team stronger.
Join our Mission At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you're excited by TRM's mission but don't check every box, we encourage you to apply - we hire for slope, judgment, and the will to learn fast. TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore. Privacy Policy and Additional Information By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy. Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law. To notify TRM Labs that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. The use of AI tools of any kind (including but not limited to notetakers, interview assistants, and real-time coaching tools such as Otter.ai, Fireflies, Fathom, Cluey, or similar) during TRM interviews is not permitted without prior approval from TRM. TRM uses its own internal tools for note-taking to ensure a consistent and confidential experience for all candidates.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form. Recruitment agencies TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement. Learn More : Company Values | Interviewing | FAQs
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Machine Learning Infrastructure Engineer in San Francisco, CA vacancy
$112.7k - $169.1k
...data, supporting analytics, product intelligence, machine learning pipelines, and business operations. As data volume... ...ML systems. We’re looking for a Machine Learning Engineer to join our Offline Infrastructure team. This is an ideal role for a recent university...SuggestedWork at officeWorldwideRelocation package- ...San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in San Francisco Compensation: Competitive Salary + Equity Who We Are UniversalAGI...SuggestedWork at officeFlexible hours1 day per week
$245k - $345k
...Check out the latest Whatnot updates on our news and engineering blogs and join us as we enable anyone to turn their passion... ...and ML at Whatnot. You’ll design and scale the core infrastructure that powers machine learning and self-hosted large language model applications...SuggestedWork experience placementWork at officeLocal areaRemote workWork from homeHome officeFlexible hours$183.7k - $248.6k
...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real-time systems that power Unity's global advertising platform. This is a high-scale, low-latency environment — processing billions...SuggestedWork at officeRemote workWorldwideRelocation package- ...Workshop Labs Job Posting Build the infrastructure to serve personal AI models privately and... ...first truly private, personal AI – one that learns your skills, judgment, and preferences... ...Have • A deep understanding of the machine learning stack. You can dive into the details...SuggestedRemote workShift work
- ...Job Title: Machine Learning Engineer, Training Infrastructure Position Type: Full time Location: San Francisco, CA, USA Salary Range: $150,000 - $250, 000 (USD) Job ID#: 158135 Job Description: We are looking for an ML Engineer with 3+ YOE in high-performance...Full timeWork experience placement
- ...Machine Learning Engineer In ML Runtime & Optimization Zensors is the spatial intelligence platform for the physical world. Our AI platform... ...on both cloud and edge compute resources. The AI Infrastructure team at Zensors builds the engine that powers our visual...Work at office
- ...looking for people that have done genuinely amazing work in infrastructure that are interested in a challenge, working with both traditional... ...., as well as very different infrastructure around inference engines and GPU loads. This is a role that will inherently require...
$209.7k - $283.8k
...San Francisco, CA, USA Staff Machine Learning Engineer, ML Infrastructure Location San Francisco, CA, USA Department AI & Machine Learning Requisition ID JOBREQ-2615904 Role description The opportunity Unity Vector builds an offline ML platform...Work at officeWorldwideRelocation package$200k - $300k
...our community of problem solvers, technologists, clinicians, and innovators. The Role: We’re looking for a Machine Learning Infrastructure Engineer to join our AI Platform team. This is a high-leverage role focused on building and scaling the core infrastructure...Full timeWork at office3 days per week- ..., PhDs, creatives, technologists, and engineers working together to empower people and... ...Liberty in Pittsburgh. The Role As an ML Infrastructure Engineer, Model Inference at Abridge,... ...infrastructure that powers our machine learning models. Your work will be instrumental...Hourly payFull timeFlexible hours
$248k - $310k
# Senior Staff Machine Learning Engineer, InfrastructureAirbnb·United States·$248k - $310kfull-timeleadPosted 10 hours agoApply NowTailor a... ...enable an intelligent & worry-free travel experience. ML Infrastructure, which is the team you will join in, is tasked to provide...Casual workLive inWork at officeRemote work- ...Overview Pluralis Research is pioneering Protocol Learning – a fully decentralised way to train and deploy... ...AI. We’re looking for an ML Training Platform Engineer to architect, build, and scale the foundational infrastructure powering our decentralised ML training platform...Work experience placement
- Job Title Disabled veteran A veteran who served on active duty in the U.S. military and is entitled to disability compensation (or who but for the receipt of military retired pay would be entitled to disability compensation) under laws administered by the Secretary ...
- ...to maintain rigid systems, Lightfield learns from how companies actually work, adapting... ...development of ML product development infrastructure, focusing on scaling and innovating in... ...and define best practices for software engineering in an AI-driven development landscape....Work from home
$166k - $225k
...P-984 Founded in late 2020 by a small group of machine learning engineers and researchers, Mosaic AI enables companies to securely fine-tune... ...end implementation Design and build the core platform infrastructure that supports our customer-facing product features...Local areaWorldwide$160k - $235k
...Senior Machine Learning Engineer, AI Platform Affinity stitches together billions of data points from massive datasets to create a powerful... ...: Architect and launch ranking and recommendation infrastructure from scratch, initially via integrated off-the-shelf models...Work at officeRemote workWorldwideFlexible hours2 days per week3 days per week$160k - $250k
...Machine Learning, Platform Engineer San Francisco About the Role Our team focuses on enabling custom models and dedicated inference on... ...of existing distributed systems, APIs, databases, and infrastructure Partner with product teams to understand functional...Full time$151.8k - $265.35k
...outstanding candidates in all related technical fields, such as Machine Learning, Deep Learning, Computer Vision, and Natural Language... ...products. Collaborate with world-class researchers and ML engineers to bring research ideas to production. Publish and present...Temporary workLocal areaWorldwide$246.5k - $339k
...re using the power of tech, data, and machine learning to connect this thriving community of... ...As a Staff Machine Learning Platform Engineer, you will help design, improve, and operate... ...Will Do Design and operate ML infrastructure, including workspaces, clusters, jobs,...Work experience placementWork at officeLocal areaRemote workMonday to FridayFlexible hours3 days per week- ...The Role At Mach9, ML infrastructure engineers build and maintain the systems that power production AI models for civil engineering and surveying. Our ML pipeline spans 10,000+ miles of labeled survey data, image segmentation networks, and 3D prediction models serving...Work experience placement
$320k - $405k
...team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to... ...AI systems. About the role We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you'll...Work at officeVisa sponsorshipFlexible hours$250k - $350k
...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-scale...- ...Sygaldry Quantum-Accelerated AI Server Engineer Sygaldry Technologies is building quantum-accelerated AI servers to exponentially... ...that will accelerate and transform AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments...Casual workLocal areaVisa sponsorship
$190k - $260k
...by climate-tech and Silicon Valley investors. For more information, please visit Role Description As a Senior ML Infrastructure Engineer, you will work directly in the Automation org with the core ML, Ops, and Analytics teams to help improve and build out the...- ...Senior Client Infrastructure Engineer SAN FRANCISCO, CA ENGINEERING FULL-TIME What Will You Be Doing? Building infrastructure that enables deploying machine learning models over billions of historical data points collected from tens of thousands of retail stores...Full timeWork experience placement
- ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
- ...innovation through advanced hardware engineering and AI solutions. Our mission is to... ...lasting impact. We emphasize continuous learning and growth, fostering cross-... ...Summary We are seeking a Senior Machine Learning Infrastructure Engineer to join our team. The person...Flexible hours
$150k - $175k
...nextdoor.com . Meet Your Future Neighbors At Nextdoor, Machine Learning is one of the most critical teams we are growing. ML is... ...addictive behavior. We are looking for great machine learning engineers who believe in the power of the local community to empower...InternshipWork at officeLocal areaWork from homeFlexible hours- ...Machine Learning Engineer At Advex, we're working on solving the hardest problem in all of computer vision – data collection. In order to... ...stack development Large scale model training Building infrastructure to run complex ML pipelines Pruning and model...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Infrastructure Engineer. Be the first to apply!
Related searches
- machine learning ai engineer San Francisco, CA
- machine learning engineer San Francisco, CA
- entry level machine learning engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- machine learning software engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- senior ml engineer San Francisco, CA
- graduate machine learning engineer San Francisco, CA
- computer vision machine learning engineer San Francisco, CA
- data scientist machine learning engineer San Francisco, CA


