Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff - ML Systems & Inference

Gimlet Labs

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.
About the role

Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable.

This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle.

What you will work on
  • Design and optimize end-to-end inference pipelines from request ingestion through execution and response
  • Build and evolve inference runtimes that balance latency, throughput, and concurrency under real-world load
  • Reason about batching, queuing, and scheduling tradeoffs, including their impact on tail latency and fairness
  • Manage KV cache allocation, placement, reuse, and eviction across models and requests
  • Optimize prefill and decode paths, including attention mechanisms and memory usage
  • Profile and debug inference performance issues across model, runtime, and system boundaries
  • Work closely with compilers, kernels, networking, and distributed systems to deliver end-to-end performance improvements
You may be a good fit if
  • Strong software engineering fundamentals
  • Experience building or operating ML inference or model serving systems
  • Comfort reasoning about performance, memory usage, and system behavior under load
Strong candidates may also have
  • Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems
  • Deep understanding of modern model architectures and attention mechanisms
  • Experience with batching, scheduling, and concurrency control in inference systems
  • Familiarity with KV cache management and memory placement strategies
  • Experience profiling and tuning latency- and throughput-critical systems
  • Software development experience in Python and C++
What Makes Gimlet Different

At Gimlet, you will work on infrastructure problems that span the full stack of modern AI systems. Our team operates across datacenters, networking, distributed systems, compilers, runtimes, orchestration, and performance engineering to build the foundation for the next generation of AI infrastructure.

As an early member of the team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

We value people who are excited to work across domains, take ownership of meaningful problems, and build technology that enables the next generation of AI.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - ML Systems & Inference in San Francisco, CA vacancy
  • $225k

     ...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About...  ...Software Engineer on the Inference & RL Systems team, you will design and operate the...  ...debugging performance issues in production ML systems Ability to reason about system‑level... 
    Suggested
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed...  ...machine learning workloads ($80M series A). As AI systems continue to grow in complexity, traditional... 
    Suggested

    Acceler8 Talent

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on...  ...RL training stack. Core Technical Responsibilities LLM...  ...Required Experience Building ML Systems at Scale: 3+ years...  ...development and encourage team members to contribute to the... 
    Suggested
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime Intellect

    San Francisco, CA
    3 days ago
  • Member of Technical Staff, Infrastructure and Training Systems Location: SF Bay Area or Tokyo, Japan Type: Full-time About Radical...  ..., but exceptional training and inference systems: infrastructure that...  ...distributed systems, high-performance ML infrastructure, training systems... 
    Suggested
    Full time

    Radical Numerics

    San Francisco, CA
    1 day ago
  • Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration...  ...(not limited to): Building and scaling training and inference infrastructure (potentially for various chips across... 
    Suggested

    Mirendil

    San Francisco, CA
    5 days ago
  • $225k

     ...domain-specific RL, ultra-long context, and inference-time compute to achieve this goal....  ...a Software Engineer on the Pre-training Systems team, you will design and operate the distributed...  ...cross-layer issues in production ML systems Strong ownership mindset and... 
    Relocation
    Visa sponsorship

    Magic Inc

    San Francisco, CA
    4 days ago
  •  ...immensely competitive market. Build the systems that make AI inference fast, reliable, and cost‑efficient at...  ...is a must. Bonus: experience with ML inference stacks (vLLM/SGLang), GPUs/...  ...about your experience, and share as much technical detail about Sail as you want to hear... 
    Work at office
    Immediate start

    Sail Research

    San Francisco, CA
    3 days ago
  • Overview Build low-latency inference pipelines for on-device deployment, enabling real-time...  ...Design and optimize distributed inference systems on GPU clusters, pushing throughput with...  ...Deep experience in distributed systems, ML infrastructure, or high-performance serving... 
    Remote job

    Genesis AI

    San Francisco, CA
    3 days ago
  •  ..., we build general-purpose AI systems that run efficiently across deployment...  .... The Opportunity Our Edge Inference team compiles Liquid...  ...You will work directly with the technical lead on problems that require deep understanding of both ML architectures and hardware constraints... 

    Liquid AI

    San Francisco, CA
    5 days ago
  •  ...for AI workloads. As AI systems scale, the industry is hitting...  ...Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As...  ...experimenting with novel inference efficiency techniques such...  ...study Experience with AI/ML or applied data science.... 

    Gimlet Labs

    San Francisco, CA
    4 days ago
  • $150k

     ...Robotics (FAR) team is seeking a Member of Technical Staff to drive foundational...  ...build intelligent robotic systems from the ground up. In this...  ...action models, efficient model inference, video tokenization -...  ...degree and 4+ years of CS, CE, ML or related field experience... 
    Local area

    Amazon

    San Francisco, CA
    2 days ago
  • $150k - $280k

     ...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive...  ...and selling successful AI and ML systems. Key company highlights: - Has never...  ...on AWS, including: - Distributed inference - Caching - Queue orchestration... 
    Full time
    Temporary work
    H1b
    Work at office
    Visa sponsorship
    Relocation package

    Fuku

    San Francisco, CA
    2 days ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity...  ...Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus....  ...you. Good If You Touched Any Of ML Compilers and Framework Internals: PyTorch... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  • Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for...  ...allowing data scientists and ML engineers to focus on...  ...training/fine-tuning, and inference? You will also: Find opportunities...  ...related field 5+ years of systems engineering experience in... 
    Full time
    Part time
    Work at office
    Work from home
    Flexible hours
    2 days per week

    Pixeltable, Inc.

    San Francisco, CA
    1 day ago
  • $200k

    Member of Technical Staff, RL Research & Environments Posted Feb 28, 2026 | Full...  ...RL, ultra-long context, and inference-time compute to achieve...  ...evaluation, and environment systems that improve model capabilities...  ...large‑scale data or ML systems Ability to design... 
    Full time
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    5 days ago
  •  ...training frameworks (Huggingface, verl, slime) Experience with ML training optimization (tell us a story about eliminating data loading...  ..., etc.) Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc). Ability... 
    Work at office

    Modal Labs

    San Francisco, CA
    1 day ago
  •  ...Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll work...  ...scalable pipelines for training, inference, and data processing...  ...Python and modern engineering or ML frameworks Production Experience... 

    ERAGON

    San Francisco, CA
    2 days ago
  • Member of Technical Staff - Post‑Training Join to apply for the Member of Technical...  .... About The Role Build systems that transform powerful pre...  ...learning algorithms, and inference‑time scaling techniques. Collaborate...  ...diving into complex ML codebases and distributed... 
    Full time
    Relocation package

    Reflection AI

    San Francisco, CA
    3 days ago
  • $300k

    Member of Technical Staff - RL Infrastructure About V max V max is an applied research...  ...learning. We are building systems to exceed humans in all...  ..., training orchestration, inference, evals, data pipelines, observability...  ...researchers and applied ML engineers to run, debug,... 
    Work at office
    Local area

    Vmax

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, Statistical Genetics Location: SF Bay Area Type: Full...  ...bringing the rigor of distributed systems, model architecture, and...  .... Partner with AI/ML engineers to analyze model...  ...randomization, TWAS, causal inference, cross‑ancestry genetics, admixed... 
    Full time

    Radical Numerics

    San Francisco, CA
    1 day ago
  • $150k - $350k

    Member of Technical Staff, Applied Research — Sieve Location: San Francisco, CA...  ...generative media, and agentic systems. Sieve is building the...  ...optimization Parallelized inference systems and pipeline orchestration...  ...-on PyTorch or similar ML framework expertise Clean,... 
    Full time
    H1b
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    4 days ago
  •  ...Member of Technical Staff, Model EfficiencyWho are we?Our mission is to scale intelligence...  ...who are building AI systems to power magical experiences like...  ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop techniques... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    3 days ago
  • What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience....  ...will be crucial in building the core systems that deploy our machine learning capabilities...  ...Design and build the integration of ML inference, monitoring systems, LLM interactions... 
    Work experience placement

    RST Recruitment

    San Francisco, CA
    2 days ago
  • Member of Technical Staff, Applied AI The opportunity We are looking for a Member...  ...into production systems that deliver scientific value...  ...you are You are a strong ML researcher with experience...  ...architectures, training dynamics and inference behaviour. You are a... 
    Flexible hours

    Latent Labs

    San Francisco, CA
    4 days ago
  •  ...neural networks within simulations Stay current on state-of-the-art ML compilers—such as those in torch, Triton, and JAX—and decide...  ...source projects and participate actively in the broader compiler and systems community What You’ll Bring Strong background in compiler... 
    Remote job

    Genesis AI

    San Francisco, CA
    3 days ago
  •  ...Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems that solve real-world financial operations...  ...AI pipelines from data ingestion through inference, ensuring reliability, scalability, and maintainability... 
    Full time
    Flexible hours

    Stuut

    San Francisco, CA
    3 days ago
  • Overview Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations. A rapidly growing...  ...support data ingestion, model training, inference, and monitoring while ensuring high... 
    Permanent employment
    Full time
    Contract work
    Flexible hours

    Curb

    San Francisco, CA
    3 days ago
  •  ...enterprises who are building AI systems to power magical experiences like...  ...typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: Work directly...  ...and distributed training or inference pipelines. Understanding of LLM... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    5 days ago
  • Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform how businesses operate...  ...audio ingestion, streaming inference, orchestration, and monitoring, ensuring...  ...deploying production systems. AI/ML & Audio Systems Expertise Minimum... 
    Full time
    Flexible hours

    Andiamo

    San Francisco, CA
    2 days ago
  •  ...requirements into production systems that can find, generate, filter...  ...-processing, parallelism, inference optimization, fine-tuning, and...  ...ambiguous needs into concrete technical systems Strong Python...  ...experience in PyTorch or similar ML frameworks Experience building... 

    Sieve, Inc.

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - ML Systems & Inference. Be the first to apply!