Machine Learning Infrastructure Engineer

TRM Labs

Build a Safer World. TRM Labs provides blockchain analytics and AI solutions to help law enforcement, national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide to enable a safer, more secure world for all. At TRM, we’re on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale. As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM’s AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning — building the foundation that enables high‑throughput, production‑grade ML workloads. The Impact You’ll Have Here Design and operate GPU cluster infrastructure. Build and manage GPU‑backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users. Optimize high‑throughput inference. Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads. Enable distributed inference strategies. Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large‑scale models. Implement model optimization and compilation workflows. Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost. Schedule heterogeneous workloads. Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand. Build observability into ML infrastructure. Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability. Partner across engineering teams. Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production‑grade, highly available services. What We’re Looking For Bachelor’s degree (or equivalent) in Computer Science or related field. 5+ years of experience building and operating distributed systems or infrastructure in production environments. Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP). Deep understanding of high‑throughput inference systems, including batching strategies, token throughput optimization, and the trade‑offs between latency, throughput, and cost. Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum. Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems. Familiarity with distributed inference strategies including model parallelism and tensor parallelism. Experience working with Kubernetes or equivalent orchestration systems in cloud environments. Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus. CUDA familiarity and experience debugging GPU‑related issues is a plus. Adaptable. Goals can change fast. You anticipate and react quickly. Autonomous. You own what you work on. You move fast and get things done. Excellent communication. You communicate complex ideas effectively to both technical and non‑technical audiences, verbally and in writing. Collaborative. You work effectively in a cross‑functional team and with people at all levels in an organization. Life at TRM We are building a safer world. That promise shows up in how we work every day. TRM runs fast. Really fast. We’re a high‑velocity, high‑ownership team that expects clarity, follow‑through, and impact. People who thrive here are energized by hard problems, experimentation, and direct feedback. If something takes months elsewhere, it often ships here in days. That pace isn’t for everyone. If you are optimizing primarily for consistent work‑life balance, use the interview process to pressure‑test fit. We want teammates who thrive here, not just survive here. AI Fluency at TRM AI fluency is a baseline expectation at TRM. We believe AI meaningfully changes how top performers operate. We expect every team member to use AI to accelerate and reimagine their craft, not just automate surface tasks. At TRM, AI Fluency Means You Are Among The Top 10 Percent Of Operators In Your Function In How You Apply AI To Accelerate repeatable workflows Structure and solve problems Improve output quality Increase speed and leverage You will be evaluated on applied AI fluency during the interview process. Leadership Principles Impact‑Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment – test, ship, measure, and iterate quickly. Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes end‑to‑end, and invest in getting better everyday. Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a one‑team mindset — giving and receiving feedback to make the team stronger. The impact you will have Driving critical investigations that can’t wait for typical business hours. Shipping products in days when others would schedule quarters. Partnering with teams across time zones to deliver insights while the story is still unfolding. Building new solutions from first principles when the playbook doesn’t yet exist. Protecting victims and customers by tracing illicit activity and disrupting criminal networks. Join our Mission At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you’re excited by TRM’s mission but don’t check every box, we encourage you to apply — we hire for slope, judgment, and the will to learn fast. TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed‑first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore. Privacy Policy And Additional Information By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy. Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law. To notify TRM Labs that you believe this job posting is non‑compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form. Recruitment agencies TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third‑party agency or company without a signed agreement. Learn More Company Values | Interviewing | FAQs #J-18808-Ljbffr

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Machine Learning Infrastructure Engineer in California, MO vacancy

GPU-Backed ML Infrastructure Engineer
A leading blockchain analytics firm is seeking a Senior Software Engineer for ML Infrastructure to collaborate with diverse teams in designing and operating GPU-backed infrastructure for AI systems. This role involves optimizing inference systems and implementing model...
Suggested
TRM Labs
California, MO
14 days ago
Machine Learning Engineer
...global presence.Responsibilities:Design, develop, and implement machine learning models to predict competitive bidding landscape, conversion... ...as modeling in the context of privacy restrictionsWork with engineering and operations teams to build machine learning models,...
Suggested
Local area
Remote work
TechBrains
California, MO
4 days ago
Machine Learning Engineer
$235k - $275k
...Liftoff has a diverse, global presence. About The Revenue Engine Team The Revenue Engine team works to understand the fundamental... ...of demand and the effects of competition. The team of machine learning engineers, software engineers, and data analysts develops theories...
Suggested
Full time
Remote work
Liftoff Inc
California, MO
4 days ago
Senior Machine Learning Engineer - Model Evaluations, Public Sector
$208k - $300k
...Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems—including LLMs... ...Build evaluation frameworks for LLM agents, including infrastructure for scenario‑based and environment‑based testing. Conduct...
Suggested
Full time
Scale AI
California, MO
4 days ago
Sr. Machine Learning Engineer
...About the RoleWe are seeking an experienced Senior ML Inference Engineer to join our team, focusing on optimizing and deploying our... ...edge devices with NVIDIA hardware, and ensuring our inference infrastructure meets FDA and SOC2 compliance requirements. This role offers...
Suggested
Remote work
Worldwide
PICTOR LABS INC
California, MO
3 days ago
Machine Learning Engineer - Distributed ML Systems
...Pluralis Research carries out foundational research on Protocol Learning: multi-participant training of foundation models where no... ...self-sustaining economics. We’re looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large...
Remote work
Visa sponsorship
Pluralis Research
California, MO
4 days ago
ML Infrastructure and Deployment Engineer San Fransisco, CA
...impact that matters. Responsibilities Develop state-of-the-art machine learning models for AI applications. Own the ML lifecycle from... ...Science or related field. Experience with MLOps and cloud infrastructure. Knowledge of containerization and orchestration (Docker, Kubernetes...
Hubnest Inc
California, MO
2 days ago
Staff Machine Learning Engineer, AI Generation Engine
...cybersecurity, physics, mathematics, medicine, engineering, and other specialties. The company... ...is seeking a highly accomplished Machine Learning Engineer to take ownership of the end... ...to): Data Acquisition and Curation, Infrastructure, Pre-Training, Evaluations, and Fine-...
Seasonal work
Flexible hours
SandboxAQ
California, MO
2 days ago
Machine Learning Product Engineer
Job Description Job Description Role: Machine Learning Product Engineer Location: San Jose, CA Contract /FTE Responsibilities Understand current quality problems and how it impacts various user groups, define success metrics, and prioritize solutions based on user...
Contract work
6AM City, LLC
California, MO
4 days ago
AI/ML - Machine Learning Research Engineer, Machine Translation
$181.1k - $318.4k
AI/ML - Machine Learning Research Engineer, Machine Translation San Francisco Bay Area, California, United States Machine Learning and AI We are seeking an experienced machine learning engineer to help bring the next generation of core machine translation (MT) technology...
Relocation
Apple Inc.
California, MO
4 days ago
Remote Staff ML Infra Engineer - Lead Scalable AI Platform
...A leading technology company, Thumbtack, is hiring a Staff ML Infrastructure Engineer to drive the architectural vision for their machine learning infrastructure. This role requires 8+ years in engineering and a strong focus on distributed systems. You'll architect solutions...
Remote work
WORK180 USA
California, MO
4 days ago
Remote ML Engineer - Ad Tech Economics & Optimization
A leading performance marketing platform seeks a Machine Learning Engineer to build statistical models and production systems that balance advertiser performance. The ideal candidate will have a PhD in a relevant field and industry experience applying machine learning to...
Full time
Remote work
Liftoff Inc
California, MO
3 days ago
Senior ML Platform Engineer - Deployments & Tools
6AM City, LLC is seeking an experienced engineer to join the Machine Learning Foundations team. This role focuses on building self-service tooling for model lifecycle management while working across the entire machine learning lifecycle. Candidates should have extensive...
6AM City, LLC
California, MO
2 days ago
ML Data Scientist & Engineer for Web Data Platform
A data-focused technology company is seeking an individual to finetune small language models and enhance existing data quality. The role involves data scrubbing, normalization, and pushing solutions into production environments. Ideal candidates should have Python experience...
Sumble Inc
California, MO
3 days ago
Public Sector ML Engineer: Model Evaluation & Testing
A leading AI solutions provider is looking for a Machine Learning Engineer focused on model evaluations in the public sector. This role involves designing and implementing evaluation pipelines for advanced AI systems, ensuring they function reliably in critical environments...
Scale AI
California, MO
4 days ago
Founding ML Engineer — Equity & AI-Pioneering Work
A high-growth AI startup in Austin is seeking a Founding Engineer to lead the development of their machine learning-powered document extraction engine. You'll work directly with the co-founders, owning backend architecture decisions and expanding core functionalities....
Pear VC
California, MO
2 days ago
ML Engineer Intern — Real-World AI Projects
$40 - $60 per hour
A leading generative AI startup is looking for Machine Learning interns to join their innovative Text Content Generation team. This hybrid role is based in California, requiring three days in the office and offers competitive pay ranging from $40 to $60 per hour. Interns...
Hourly pay
Internship
Work at office
SupportFinity™
California, MO
4 days ago
Senior ML Engineer, AI Generation Engine
A leading AI solutions company is seeking a Machine Learning Engineer for its AI Generation Engine (SAIGE) team. This role requires ownership of the entire ML lifecycle, focusing on designing and rapidly building AI-first products. Key skills include strong Python expertise...
SandboxAQ
California, MO
2 days ago
ML Engineer: Vision‑Language Models for Motion
A cutting-edge AI company in Austin is seeking a Machine Learning Engineer who excels in foundation-model research and production engineering. The role involves training Vision-Language Models to enhance understanding of complex video motion and developing robust APIs for...
Pear VC
California, MO
2 days ago
ML Engineer Intern
$40 - $60 per hour
...ll Make an Impact We are looking for passionate and talented Machine Learning interns to join our Text Content Generation team. As an... ...closely with designers, product managers, marketing teams, and engineers to bring innovative ideas to life. Investigate, prototype, and...
Hourly pay
Summer work
Internship
Work at office
Flexible hours
3 days per week
SupportFinity™
California, MO
4 days ago
Remote Senior ML Inference Engineer — GPU & Edge Deployment
PICTOR LABS INC is seeking a Senior ML Inference Engineer based in the United States to optimize and deploy production virtual staining models. This role demands deep expertise in ML inference optimization, proficiency in Python, and experience with PyTorch and NVIDIA...
Remote job
PICTOR LABS INC
California, MO
19 hours ago
Senior ML Engineer, Distributed Training & P2P Systems
A mission-driven technology company in California is seeking experienced Senior/Staff Engineers proficient in building distributed ML systems. Applicants should possess strong experience in optimizing large-scale training under low-bandwidth conditions, with expertise in...
Remote work
Pluralis Research
California, MO
4 days ago
ML Systems Engineer: Productionize Models & Pipelines
$108.91k - $112.17k
A technology firm specializing in advanced analytics is seeking a Software Engineer focusing on transforming research prototypes into reliable software. The ideal candidate will have over 5 years of experience in software engineering, proficient in Python and Rust, and...
Remote work
AIMdyn, Inc
California, MO
4 days ago
Staff ML Engineer: RealWorld Task Environments & Evaluation
...experiments to improve model behaviors across various domains. Candidates should have 1-4 years of experience in software engineering, machine learning, or applied research, with a strong inclination towards creative problem solving and improving model performance. #J-188...
Jigsaw
California, MO
1 day ago
MLOps Engineer - Scalable ML Pipelines & CI/CD
...Job Overview We are seeking an experienced MLOps Engineer to design, build, and maintain scalable machine learning operations pipelines that support the full... ...teams to build and optimize data pipelines and ML infrastructure . Support engineering teams in provisioning scalable...
Long term contract
Local area
Codinix Consulting Services
California, MO
1 day ago
Applied ML Engineer
Job Description: We are seeking a versatile and pragmatic Applied ML Engineer to contribute across a broad range of machine learning and perception tasks that power our edge‑intelligent maritime systems. This role requires someone comfortable wearing many hats—from working...
Remote work
Flexible hours
Shift work
Quartermaster
California, MO
4 days ago
Remote Staff ML Engineer - Messaging & Conversational AI
Airbnb, Inc. is seeking a Staff Software Engineer for their Communication Products team to drive the technical strategy for integrating machine learning into their messaging products. The ideal candidate will possess over 9 years of experience, a relevant degree, and a...
Remote job
airbnb, Inc.
California, MO
1 day ago
Senior Data & ML Engineering Leader (Remote)
...A prominent footwear and apparel company is seeking a Sr. Manager for Data & ML Engineering to lead their modern data platform initiatives. This role involves building reliable data pipelines on AWS, utilizing dbt for transformations, and mentoring engineering teams. Candidates...
Remote work
Deckers Brands
California, MO
4 days ago
Senior Audio ML Data Engineer - Hybrid
$170.5k - $228.6k
A leading entertainment company is looking for an experienced Data Engineer to optimize data pipelines for AI/ML research in Nicasio, CA. This hybrid role involves designing scalable data processing systems and collaborating with AI/ML researchers. Candidates should have...
The Walt Disney Company (Germany) GmbH
California, MO
2 days ago
ML Product Engineer: Feed Quality & Personalization
6AM City, LLC is seeking a Machine Learning Product Engineer to improve feed quality in their products. The candidate should have over 5 years of product management experience and strong AI/ML knowledge. Responsibilities include understanding and resolving quality issues...
6AM City, LLC
California, MO
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Infrastructure Engineer. Be the first to apply!