Senior or Staff ML Systems Engineer, LLMs

$200k - $275k

Full-time

Trm Labs

United States

Build a Safer World.
TRM Labs provides AI-powered intelligence solutions that help public and private sector agencies investigate and disrupt crime. TRM's platforms enable investigators to trace illicit activity, build cases, and construct operating pictures of threat networks. Leading agencies and businesses worldwide rely on TRM to make the world safer and more secure.
The AI Engineering Team is chartered with enabling next-generation AI applications , with a special focus on Large Language Models (LLMs) and agentic systems. Our mission is to build robust pipelines, high-performance infrastructure, and operational tooling that allow AI systems to be deployed with speed, safety, and scale.
We manage petabyte-scale pipelines, serve models with millisecond-level latency, and provide the observability and governance needed to make AI production-ready. We’re also deeply involved in evaluating and integrating cutting-edge tools in the LLM and agent space — including open-source stacks, vector databases, evaluation frameworks, and orchestration tools that unlock TRM’s ability to innovate faster than the market.
As a Senior or Staff ML Systems Engineer – LLM , you’ll be at the core of building and scaling the technical infrastructure for AI/ML systems. You will:
Build reusable CI/CD workflows for model training, evaluation, and deployment — integrating Langfuse, GitHub Actions, and experiment tracking, etc.
Automate model versioning, approval workflows, and compliance checks across environments.
Build out a modular and scalable AI infrastructure stack — including vector databases, feature stores, model registries, and observability tooling.
Partner with engineering and data science to embed AI models and agents into real-time applications and workflows.
Continuously evaluate and integrate state-of-the-art AI tools (e.g. LangChain, LlamaIndex, vLLM, MLflow, BentoML, etc.).
Drive AI reliability and governance, enabling experimentation while ensuring compliance, security, and uptime.
Build and enhance AI/ML Model Performance
Ensure data accuracy, consistency and reliability, leading to better model training and inferencing
Deploy infrastructure to support offline and online evaluation of LLMs and agents — including regression testing, cost monitoring, and human-in-the-loop workflows.
Enable researchers to iterate quickly by providing sandboxes, dashboards, and reproducible environments.

What We’re Looking For

Write high-quality, maintainable software — primarily in Python, but we value engineering ability over language familiarity.
Have a strong background in scalable infrastructure , including:
- Containerization and orchestration (e.g. Docker, Kubernetes)
- Infrastructure-as-code and deployment (e.g. Terraform, CI/CD pipelines)
- Monitoring and logging frameworks (e.g. Datadog, Prometheus, OpenTelemetry)
Understand and implement ML Ops best practices , including:
- Model versioning and rollback strategies
- Automated evaluation and drift detection
- Scalable model and agent serving infrastructure (e.g. vLLM, Triton, BentoML)
Deploy and maintain LLM and agentic workflows in production, including:
- Monitoring cost, latency, and performance
- Capturing traces for analysis and debugging
- Optimizing prompt/response flows with real-time data access
Demonstrate strong ownership and pragmatism , balancing infrastructure elegance with iterative delivery and measurable impact.

Learn about TRM Speed in this position:

Rapid Issue Resolution. TRM Engineers identify and resolve critical onsite issues in minutes to hours, not weeks. We create virtual war rooms, implement fixes, and share lessons with both customer stakeholders and internal teams within 48 hours.
Navigating Bureaucracy. We anticipate and address procedural hurdles, build trust with key stakeholders, and find alternative pathways to approvals. This keeps projects moving even in complex environments.
Efficient Knowledge Transfer. Engineers document and share updates in real time, ensuring the entire team—onsite and remote—has full visibility into plans, blockers, and resolutions. Knowledge sharing sessions and clear documentation reduce friction and accelerate delivery.

About TRM's Engineering Levels:

Engineer: Responsible for helping to define project milestones and executing small decision decisions independently with the appropriate tradeoffs between simplicity, readability, and performance. Provides mentorship to junior engineers, and enhances operational excellence through tech debt reduction and knowledge sharing.

Senior Engineer: Successfully designs and documents system improvements and features for an OKR/project from the ground up. Consistently delivers efficient and reusable systems, optimizes team throughput with appropriate tradeoffs, mentors team members, and enhances cross-team collaboration through documentation and knowledge sharing.

Staff Engineer: Drives scoping and execution of one or more OKRs/projects that impact multiple teams. Partners with stakeholders to set the team vision and technical roadmaps for one or more products. Is a role model and mentor to the entire engineering organization. Ensures system health and quality with operational reviews, testing strategies, and monitoring rigor.

The following represents the expected range of compensation for this role:

Individual pay is determined by skills, qualifications, experience, and location. The compensation details listed in this posting reflect the US base salary only.
The estimated base salary range for this role is $200,000 - $275,000.
Additionally, this role may be eligible to participate in TRM’s equity plan.
Please note – we factor in the different costs for geographies outside the United States.

Life at TRM

We are building a safer world. That promise shows up in how we work every day.

TRM moves quickly. We are a high velocity, high ownership team that expects clarity, follow-through, and impact. People who thrive here are energized by hard problems, experimentation, and continuous feedback. If something takes months elsewhere, it will ship here in days.

Our work sits at the intersection of AI, national security, and fighting crime. The problems are complex, the stakes are real, and the environment evolves quickly. The pace and intensity of the work reflect the importance of the mission. As a result, the way we operate requires a high level of ownership, adaptability, collaboration, and creative problem-solving.

At TRM, you should expect:

Priorities and targets to change quickly as we experiment and iterate
Work that often requires operating with a high degree of ambiguity
A high level of personal ownership and accountability
Close collaboration across teams and functions
Frequent, high-touch communication
Creative problem solving and out-of-the-box thinking
A pace that rewards urgency, adaptability, and outcomes

This environment is energizing for people who enjoy building, solving hard problems, and making progress in situations that are not always fully defined. It also requires comfort navigating ambiguity, adjusting course as new information emerges, and maintaining focus and positivity in a fast-moving and intense environment.

We also recognize that this style of operating is not for everyone. If you are primarily optimizing for predictability or a consistently balanced workload, we encourage you to use the interview process to pressure test whether this environment is truly the right fit. We want teammates who thrive here, not just survive here.

At the same time, many people find this work deeply rewarding. If you are excited by meaningful problems, motivated by ambitious goals, and energized by working alongside mission-driven colleagues, there is a good chance you will find TRM to be an exceptional place to grow and contribute. Learn more: Interviewing at TRM: How We Hire and What Success Looks Like

AI Fluency at TRM

AI fluency is a baseline expectation at TRM.

We believe AI meaningfully changes how top performers operate. We expect every team member to use AI to accelerate and reimagine their craft, not just automate surface tasks.

At TRM, AI fluency means you are among the top 10 percent of operators in your function in how you apply AI to:

Accelerate repeatable workflows
Structure and solve problems
Improve output quality
Increase speed and leverage

You will be evaluated on applied AI fluency during the interview process.

Leadership Principles

We hire and grow against three leadership principles. They’re the standards for how we operate, treat each other, and make decisions.

Impact-Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment – test, ship, measure, and iterate quickly.
Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes end‑to‑end, and invest in getting better everyday.
Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a one‑team mindset — giving and receiving feedback to make the team stronger.

Join our Mission

At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you’re excited by TRM’s mission but don’t check every box, we encourage you to apply — we hire for slope, judgment, and the will to learn fast.

TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore.

Privacy Policy and Additional Information

By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy .

Our typical hiring cycles for specialized roles span 24 to 36 months. Accordingly, we retain your personal information for up to 36 months to evaluate your application and to consider you for current and future employment opportunities, unless you request earlier deletion or a different retention period is required or permitted by law.

To notify TRM Labs that you believe this job posting is non-compliant, please submit a report through this form . No response will be provided to inquiries unrelated to job posting compliance.

The use of AI tools of any kind (including but not limited to notetakers, interview assistants, and real-time coaching tools such as Otter.ai, Fireflies, Fathom, Cluey, or similar) during TRM interviews is not permitted without prior approval from TRM. TRM uses its own internal tools for note-taking to ensure a consistent and confidential experience for all candidates.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this form .

Recruitment agencies

TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement.

Learn More : Company Values | Interviewing | FAQs

Apply

Vacancy posted 2 hours ago

Similar jobs that could be interesting for youBased on the Senior or Staff ML Systems Engineer, LLMs in United States vacancy

Senior or Staff ML Systems Engineer, LLMs
$200k - $240k
...more secure world for all. The AI Engineering Team is chartered with enabling next... ...special focus on Large Language Models (LLMs) and agentic systems. Our mission is to build robust... ...faster than the market. As a Senior or Staff ML Systems Engineer - LLM , you’ll be...
Senior
Remote work
Worldwide
TRM Labs
San Francisco, CA
3 days ago
Senior ML Systems Engineer - GenAI/LLMs (Java & Python)
$90k - $115k
...Consultancy Services Limited is seeking a Machine Learning Engineer in Tampa, FL. The ideal candidate will have a... ...in designing scalable machine learning systems. Proficiency in Python, SQL, and experience with LLMs is required. This position offers a competitive salary...
Senior
Tata Consultancy Services Limited
Tampa, FL
2 days ago
Senior ML Engineer - LLMs, RLHF & Recommender Systems
$100k
.... is seeking exceptional applied machine learning engineers to advance state-of-the-art Search and Recommendation... ...strong software development skills, expertise in LLMs, and experience with large-scale recommender systems. Netflix offers an annual salary range of $100,000...
Senior
Netflix, Inc.
Los Gatos, CA
5 days ago
Senior AI/ML Engineer — LLMs & Agentic Systems
NLP PEOPLE is seeking a Staff AI/ML Engineer to lead the development of Agentic AI capabilities and... ...algorithms, and designing advanced AI systems. The ideal candidate will have over 8 years... ...development, a deep understanding of LLMs, and active U.S. Government Security...
Senior
Relocation package
Flexible hours
NLP PEOPLE
King of Prussia, PA
5 days ago
Senior ML Engineer - RAG, LLMs & NLP Systems
A technology recruitment firm is looking for a software engineer with expertise in AI systems to design and prototype advanced ML models. Candidates should hold a BS/MS in a relevant field, alongside 4+ years of professional experience. Familiarity with deep learning concepts...
Senior
HR POD - Hiring Talent Globally
Washington DC
2 days ago
Senior ML Systems Engineer, Frameworks & Tooling
...Senior Engineer Our mission is to scale intelligence to serve humanity... ...enterprises who are building AI systems to power magical experiences... ...across the full stack of ML systems, this role gives you... ...Experience with training LLMs or other large transformer architectures...
Senior
Full time
Work at office
Remote work
Flexible hours
Cohere
United States
4 days ago
Senior ML Research Engineer: Production LLMs & Multimodal
A leading AI startup is seeking senior ML research engineers for their office in San Francisco. The role involves developing models for customer support... ...engineering, specifically with fine-tuning and deploying LLMs. This opportunity offers a chance to contribute to...
Senior
Work at office
DRH Search
San Francisco, CA
3 days ago
Senior Data Engineer / AI ML Engineer with Python, AI/ML & LLMs
...candidate to join our talented Team. Job Title: Senior Data Engineer / AI ML Engineer with Python, AI/ML & LLMs. Location: Reston, VA ( Hybrid ) Job... ...edge Large Language Models (LLMs) into production systems. You will work closely with data scientists,...
Senior
Remote work
Ampcus
United States
1 day ago
Senior Distributed ML Systems Engineer (Remote Equity)
A leading AI research company in San Francisco seeks Senior/Staff Engineers skilled in distributed systems and large-scale ML training. Responsibilities include designing systems optimized for low-bandwidth conditions and implementing robust training strategies. Ideal...
Senior
Remote work
Pluralis Research
San Francisco, CA
2 days ago
Senior ML Systems Engineer - Real-Time, Production-Ready
...North Eastern Services is looking for a Senior Software Engineer (Machine Learning) to build and deploy high-performance machine learning systems. The ideal candidate has a strong skill... ...engineering. You will manage the entire ML lifecycle, process massive data, and ensure...
Senior
North Eastern Services
New York, NY
2 days ago
Senior AI/ML Engineer — LLMs, RAG & Relocation
A leading advanced technologies company in Aurora, Colorado seeks a Principal AI/ML Engineer to develop algorithms in various disciplines including object detection, NLP, and LLMs. Candidates must have over 10 years of experience in software development or data science...
Senior
Relocation
Relocation package
Flexible hours
TSG
Aurora, CO
1 day ago
Remote Senior ML Systems Engineer - GPU & Kernel Expert
...company is looking for exceptional generalist engineers who thrive with autonomy. This fully... ...kernels to designing distributed orchestration systems. Ideal candidates will have a Bachelor's... ...track record in systems programming or ML infrastructure. Competitive compensation...
Senior
Remote work
Inferact
New York, NY
5 days ago
Senior ML Systems Engineer - LLM Infra & Governance
A tech-driven company focused on blockchain solutions is seeking a Senior ML Systems Engineer. In this role, you will build reusable workflows, automate model versioning, and deploy scalable AI systems. Candidates should have strong programming skills, experience with...
Senior
TRM Labs
San Francisco, CA
2 days ago
Senior ML Training Systems Engineer - Distributed GPU Infra
...company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for... .... You will design distributed training systems and optimize GPU utilization while collaborating... ...have over 5 years of experience in ML infrastructure and a strong background...
Senior
Baseten
San Francisco, CA
2 days ago
Senior ML Systems Engineer — Distributed Training at Scale
A leading robotics company in Palo Alto seeks a Staff/Principal ML Systems Engineer to enhance training performance for their innovative humanoid robots. You will optimize distributed training systems and engage closely with researchers to transform model changes into scalable...
Senior
Rhoda AI
Palo Alto, CA
5 days ago
Senior/Staff Real-Time 3D Perception ML Engineer
$180k - $265k
...logistics technology company in South San Francisco is hiring senior and staff perception engineers to join their Droid team. This role involves... ...-time 3D perception models and optimizing deep learning systems used in autonomous logistics. You will collaborate closely...
Senior
Zipline
South San Francisco, CA
1 day ago
Senior ML Inference Systems Engineer
...AI workloads is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache... ...Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly in Python and...
Senior
Gimlet Labs
San Francisco, CA
5 days ago
Senior ML Systems Engineer — End-to-End AI Bring-Up
...Sunnyvale, California, is looking for an experienced engineer to join its SOTA Training Platform team. The ideal candidate... ...frameworks. Responsibilities include bringing ML models to life on Cerebras CSX systems, performance tuning, and contributing to tool improvements...
Senior
Cerebras
Sunnyvale, CA
3 days ago
Senior ML Compiler & Inference Systems Engineer
$152k - $287.5k
NVIDIA Gruppe is seeking a Senior Machine Learning Applications and Compiler Engineer in Santa Clara, California. This role involves developing algorithms for their LPX inference and compiler stack, optimizing the performance of neural network workloads on NVIDIA platforms...
Senior
NVIDIA Gruppe
Santa Clara, CA
5 days ago
Senior ML Systems Engineer - LLM Serving & GPU Performance
$207k - $300k
Google Inc. is seeking a Software Engineer in Sunnyvale, CA, to develop cutting-edge technologies for serving Large Language Models. This... ...candidate will have extensive experience in software development, ML infrastructure, and performance profiling. The US base salary...
Senior
Full time
Google Inc.
Sunnyvale, CA
1 day ago
Senior ML Systems & NPU Kernel Engineer
A global semiconductor company in San Jose seeks a Senior Systems Design Engineer to develop and optimize ML operator kernels for their NPU platform. The candidate will work on end-to-end model performance and collaborate closely with silicon teams to ensure innovation...
Senior
Full time
AMD
San Jose, CA
5 days ago
Senior ML Systems Engineer — Production Trading
A leading company in the financial technology sector is seeking a Senior Software Engineer to enhance trading systems through machine learning. The ideal candidate will have extensive software development experience and a strong skill set in Python, managing data pipelines...
Senior
The Hagen Ricci Group
New York, NY
1 day ago
Senior ML Systems Engineer: TPU HW/SW Co-Design
Google Inc. seeks a Senior Software Engineer to work on TPU Performance and Hardware Software Co-Design... ...or Python, and a strong background in ML algorithms, performance analysis, and optimization... ...will manage projects, enhance ML systems, and ensure peak efficiency in a...
Senior
Google Inc.
Mountain View, CA
1 day ago
Remote Senior ML Engineer: LLMs & MLOps Lead
...Sign in to set job alerts for “Vanderlande” roles. Werkstudent (m/w/d) Software Engineering Dortmund, North Rhine-Westphalia, Germany Be an early applicant 6 days ago Dortmund, North Rhine-Westphalia, Germany Be an early applicant 1 week ago Dortmund, North Rhine...
Senior
Remote work
Turing
New Bremen, OH
2 days ago
Senior ML Systems Engineer
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel... ...learning users to effortlessly run large‑scale ML applications, without the hassle of... ...are seeking a versatile and experienced engineer to join our SOTA Training Platform team....
Senior
Internship
Cerebras
Sunnyvale, CA
3 days ago
Senior Machine Learning Engineer (LLMs)
...re building deeply integrated LLMs into a real product used daily... ...of jobs. This is not a “prompt engineer” role. You’ll design, train,... ...You will: Own end‑to‑end LLM systems: architecture, training, evals... ...of real world experience in ML / AI engineering ~ Proven experience...
Senior
Weekend work
Albi
Chicago, IL
3 days ago
Remote Senior ML Engineer: LLMs & MLOps Lead
A leading AI research firm is seeking a hands-on Machine Learning Senior Staff Engineer to drive AI development, mentor teams, and ensure deployment of cutting-edge ML systems. The ideal candidate will combine deep learning expertise with leadership in execution, optimizing...
Senior
Remote work
Turing
United States
7 days ago
Senior ML Systems & Infrastructure Engineer (Remote)
Autodesk, Inc. is looking for a Senior Machine Engineer, ML Systems and Infrastructure to design scalable systems for machine learning. This role focuses on building infrastructure for large-scale data pipelines and production ML workflows. The ideal candidate has experience...
Senior
Remote job
Autodesk, Inc.
Boston, MA
2 days ago
Senior ML Engineer: RAG & LLMs for AI Agents
A leading AI company in Washington, D.C. seeks a skilled ML Engineer with a PhD in Computer Science/Engineering. The role involves designing, researching, and building AI systems while training and deploying ML models focused on Natural Language Processing and Large Language...
Senior
HR POD - Hiring Talent Globally
Washington DC
2 days ago
Senior Global ML Systems Engineer - Backend Infra
...Insight Global is seeking a skilled Machine Learning Engineer based in Chicago, IL. In this role, you'll develop software... ...end applications, including high-performance APIs and systems for training and evaluating ML models. You'll need at least 5 years of experience in...
Senior
Insight Global
Chicago, IL
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior or Staff ML Systems Engineer, LLMs. Be the first to apply!