Member of Technical Staff - ML Systems & Inference

Gimlet Labs

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.
About the role

Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable.

This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle.

What you will work on

Design and optimize end-to-end inference pipelines from request ingestion through execution and response
Build and evolve inference runtimes that balance latency, throughput, and concurrency under real-world load
Reason about batching, queuing, and scheduling tradeoffs, including their impact on tail latency and fairness
Manage KV cache allocation, placement, reuse, and eviction across models and requests
Optimize prefill and decode paths, including attention mechanisms and memory usage
Profile and debug inference performance issues across model, runtime, and system boundaries
Work closely with compilers, kernels, networking, and distributed systems to deliver end-to-end performance improvements

You may be a good fit if

Strong software engineering fundamentals
Experience building or operating ML inference or model serving systems
Comfort reasoning about performance, memory usage, and system behavior under load

Strong candidates may also have

Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems
Deep understanding of modern model architectures and attention mechanisms
Experience with batching, scheduling, and concurrency control in inference systems
Familiarity with KV cache management and memory placement strategies
Experience profiling and tuning latency- and throughput-critical systems
Software development experience in Python and C++

What Makes Gimlet Different

At Gimlet, you will work on infrastructure problems that span the full stack of modern AI systems. Our team operates across datacenters, networking, distributed systems, compilers, runtimes, orchestration, and performance engineering to build the foundation for the next generation of AI infrastructure.

As an early member of the team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

We value people who are excited to work across domains, take ownership of meaningful problems, and build technology that enables the next generation of AI.

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - ML Systems & Inference in San Francisco, CA vacancy

Member of Technical Staff, Inference & RL Systems
$225k
...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About... ...Software Engineer on the Inference & RL Systems team, you will design and operate the... ...debugging performance issues in production ML systems Ability to reason about system‑level...
Suggested
Relocation
Visa sponsorship
Magic
San Francisco, CA
4 days ago
Member of Technical Staff, ML Infrastructure & Inference
Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed... ...machine learning workloads ($80M series A). As AI systems continue to grow in complexity, traditional...
Suggested
Acceler8 Talent
San Francisco, CA
2 days ago
Member of Technical Staff - Inference
$150k - $300k
...spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on... ...RL training stack. Core Technical Responsibilities LLM... ...Required Experience Building ML Systems at Scale: 3+ years... ...development and encourage team members to contribute to the...
Suggested
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime Intellect
San Francisco, CA
3 days ago
Member of Technical Staff, Infrastructure and Training Systems
Member of Technical Staff, Infrastructure and Training Systems Location: SF Bay Area or Tokyo, Japan Type: Full-time About Radical... ..., but exceptional training and inference systems: infrastructure that... ...distributed systems, high-performance ML infrastructure, training systems...
Suggested
Full time
Radical Numerics
San Francisco, CA
1 day ago
Member of Technical Staff, ML Systems
Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration... ...(not limited to): Building and scaling training and inference infrastructure (potentially for various chips across...
Suggested
Mirendil
San Francisco, CA
5 days ago
Member of Technical Staff, Pre-training Systems
$225k
...domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.... ...a Software Engineer on the Pre-training Systems team, you will design and operate the distributed... ...cross-layer issues in production ML systems Strong ownership mindset and...
Relocation
Visa sponsorship
Magic Inc
San Francisco, CA
4 days ago
Member of Technical Staff - Distributed Systems
...immensely competitive market. Build the systems that make AI inference fast, reliable, and cost‑efficient at... ...is a must. Bonus: experience with ML inference stacks (vLLM/SGLang), GPUs/... ...about your experience, and share as much technical detail about Sail as you want to hear...
Work at office
Immediate start
Sail Research
San Francisco, CA
3 days ago
Member of Technical Staff, Inference (Bay Area, Remote)
Overview Build low-latency inference pipelines for on-device deployment, enabling real-time... ...Design and optimize distributed inference systems on GPU clusters, pushing throughput with... ...Deep experience in distributed systems, ML infrastructure, or high-performance serving...
Remote job
Genesis AI
San Francisco, CA
3 days ago
Member of Technical Staff - Edge Inference Engineer
..., we build general-purpose AI systems that run efficiently across deployment... .... The Opportunity Our Edge Inference team compiles Liquid... ...You will work directly with the technical lead on problems that require deep understanding of both ML architectures and hardware constraints...
Liquid AI
San Francisco, CA
5 days ago
Member of Technical Staff - AI Research
...for AI workloads. As AI systems scale, the industry is hitting... ...Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As... ...experimenting with novel inference efficiency techniques such... ...study Experience with AI/ML or applied data science....
Gimlet Labs
San Francisco, CA
4 days ago
Member of Technical Staff - Science, Frontier AI & Robotics (FAR)
$150k
...Robotics (FAR) team is seeking a Member of Technical Staff to drive foundational... ...build intelligent robotic systems from the ground up. In this... ...action models, efficient model inference, video tokenization -... ...degree and 4+ years of CS, CE, ML or related field experience...
Local area
Amazon
San Francisco, CA
2 days ago
Member of Technical Staff Backend
$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive... ...and selling successful AI and ML systems. Key company highlights: - Has never... ...on AWS, including: - Distributed inference - Caching - Queue orchestration...
Full time
Temporary work
H1b
Work at office
Visa sponsorship
Relocation package
Fuku
San Francisco, CA
2 days ago
Member of Technical Staff (AI Inference Engineer)
...Inference Engine Engineer We build and run the inference engine behind every Perplexity... ...Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.... ...you. Good If You Touched Any Of ML Compilers and Framework Internals: PyTorch...
Perplexity AI
San Francisco, CA
4 days ago
Member of Technical Staff
Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for... ...allowing data scientists and ML engineers to focus on... ...training/fine-tuning, and inference? You will also: Find opportunities... ...related field 5+ years of systems engineering experience in...
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
1 day ago
Member of Technical Staff, RL Research & Environments
$200k
Member of Technical Staff, RL Research & Environments Posted Feb 28, 2026 | Full... ...RL, ultra-long context, and inference-time compute to achieve... ...evaluation, and environment systems that improve model capabilities... ...large‑scale data or ML systems Ability to design...
Full time
Relocation
Visa sponsorship
Magic
San Francisco, CA
5 days ago
Member of Technical Staff - ML Training Systems
...training frameworks (Huggingface, verl, slime) Experience with ML training optimization (tell us a story about eliminating data loading... ..., etc.) Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc). Ability...
Work at office
Modal Labs
San Francisco, CA
1 day ago
Member of Technical Staff
...Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll work... ...scalable pipelines for training, inference, and data processing... ...Python and modern engineering or ML frameworks Production Experience...
ERAGON
San Francisco, CA
2 days ago
Member of Technical Staff - Post-Training
Member of Technical Staff - Post‑Training Join to apply for the Member of Technical... .... About The Role Build systems that transform powerful pre... ...learning algorithms, and inference‑time scaling techniques. Collaborate... ...diving into complex ML codebases and distributed...
Full time
Relocation package
Reflection AI
San Francisco, CA
3 days ago
Member of Technical Staff - RL Infrastructure
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research... ...learning. We are building systems to exceed humans in all... ..., training orchestration, inference, evals, data pipelines, observability... ...researchers and applied ML engineers to run, debug,...
Work at office
Local area
Vmax
San Francisco, CA
4 days ago
Member of Technical Staff, Statistical Genetics
Member of Technical Staff, Statistical Genetics Location: SF Bay Area Type: Full... ...bringing the rigor of distributed systems, model architecture, and... .... Partner with AI/ML engineers to analyze model... ...randomization, TWAS, causal inference, cross‑ancestry genetics, admixed...
Full time
Radical Numerics
San Francisco, CA
1 day ago
Member of Technical Staff, Applied Research — Sieve
$150k - $350k
Member of Technical Staff, Applied Research — Sieve Location: San Francisco, CA... ...generative media, and agentic systems. Sieve is building the... ...optimization Parallelized inference systems and pipeline orchestration... ...-on PyTorch or similar ML framework expertise Clean,...
Full time
H1b
Visa sponsorship
David Joseph & Company
San Francisco, CA
4 days ago
Member of Technical Staff, Model Efficiency
...Member of Technical Staff, Model EfficiencyWho are we?Our mission is to scale intelligence... ...who are building AI systems to power magical experiences like... ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop techniques...
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
3 days ago
Member of Technical Staff
What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience.... ...will be crucial in building the core systems that deploy our machine learning capabilities... ...Design and build the integration of ML inference, monitoring systems, LLM interactions...
Work experience placement
RST Recruitment
San Francisco, CA
2 days ago
Member of Technical Staff, Applied AI
Member of Technical Staff, Applied AI The opportunity We are looking for a Member... ...into production systems that deliver scientific value... ...you are You are a strong ML researcher with experience... ...architectures, training dynamics and inference behaviour. You are a...
Flexible hours
Latent Labs
San Francisco, CA
4 days ago
Member of Technical Staff, ML Compiler and Systems (Bay Area, Remote)
...neural networks within simulations Stay current on state-of-the-art ML compilers—such as those in torch, Triton, and JAX—and decide... ...source projects and participate actively in the broader compiler and systems community What You’ll Bring Strong background in compiler...
Remote job
Genesis AI
San Francisco, CA
3 days ago
Member of the Technical Staff — AI/ML
...Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems that solve real-world financial operations... ...AI pipelines from data ingestion through inference, ensuring reliability, scalability, and maintainability...
Full time
Flexible hours
Stuut
San Francisco, CA
3 days ago
Member of Technical Staff - AI/ML - Autonomous Finance AI Platform
Overview Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations. A rapidly growing... ...support data ingestion, model training, inference, and monitoring while ensuring high...
Permanent employment
Full time
Contract work
Flexible hours
Curb
San Francisco, CA
3 days ago
Member of Technical Staff, Senior/Staff MLE
...enterprises who are building AI systems to power magical experiences like... ...typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: Work directly... ...and distributed training or inference pipelines. Understanding of LLM...
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
5 days ago
Member of Technical Staff - Audio & Voice - Autonomous Finance AI Platform
Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform how businesses operate... ...audio ingestion, streaming inference, orchestration, and monitoring, ensuring... ...deploying production systems. AI/ML & Audio Systems Expertise Minimum...
Full time
Flexible hours
Andiamo
San Francisco, CA
2 days ago
Member of Technical Staff, Deployed Research
...requirements into production systems that can find, generate, filter... ...-processing, parallelism, inference optimization, fine-tuning, and... ...ambiguous needs into concrete technical systems Strong Python... ...experience in PyTorch or similar ML frameworks Experience building...
Sieve, Inc.
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - ML Systems & Inference. Be the first to apply!