Member of Technical Staff - ML Systems & Inference
Gimlet Labs
About Us Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them. The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together. Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization. We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.
About the role Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable. This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle. What you will work on
About the role Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable. This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle. What you will work on
- Design and optimize end-to-end inference pipelines from request ingestion through execution and response
- Build and evolve inference runtimes that balance latency, throughput, and concurrency under real-world load
- Reason about batching, queuing, and scheduling tradeoffs, including their impact on tail latency and fairness
- Manage KV cache allocation, placement, reuse, and eviction across models and requests
- Optimize prefill and decode paths, including attention mechanisms and memory usage
- Profile and debug inference performance issues across model, runtime, and system boundaries
- Work closely with compilers, kernels, networking, and distributed systems to deliver end-to-end performance improvements
- Strong software engineering fundamentals
- Experience building or operating ML inference or model serving systems
- Comfort reasoning about performance, memory usage, and system behavior under load
- Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems
- Deep understanding of modern model architectures and attention mechanisms
- Experience with batching, scheduling, and concurrency control in inference systems
- Familiarity with KV cache management and memory placement strategies
- Experience profiling and tuning latency- and throughput-critical systems
- Software development experience in Python and C++
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - ML Systems & Inference in San Francisco, CA vacancy
$225k
...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About... ...Software Engineer on the Inference & RL Systems team, you will design and operate the... ...debugging performance issues in production ML systems Ability to reason about system‑level...SuggestedRelocationVisa sponsorship- Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed... ...machine learning workloads ($80M series A). As AI systems continue to grow in complexity, traditional...Suggested
$150k - $300k
...spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on... ...RL training stack. Core Technical Responsibilities LLM... ...Required Experience Building ML Systems at Scale: 3+ years... ...development and encourage team members to contribute to the...SuggestedWork at officeRemote workVisa sponsorshipRelocation packageFlexible hoursShift work- Member of Technical Staff, Infrastructure and Training Systems Location: SF Bay Area or Tokyo, Japan Type: Full-time About Radical... ..., but exceptional training and inference systems: infrastructure that... ...distributed systems, high-performance ML infrastructure, training systems...SuggestedFull time
- Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration... ...(not limited to): Building and scaling training and inference infrastructure (potentially for various chips across...Suggested
$225k
...domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.... ...a Software Engineer on the Pre-training Systems team, you will design and operate the distributed... ...cross-layer issues in production ML systems Strong ownership mindset and...RelocationVisa sponsorship- ...immensely competitive market. Build the systems that make AI inference fast, reliable, and cost‑efficient at... ...is a must. Bonus: experience with ML inference stacks (vLLM/SGLang), GPUs/... ...about your experience, and share as much technical detail about Sail as you want to hear...Work at officeImmediate start
- Overview Build low-latency inference pipelines for on-device deployment, enabling real-time... ...Design and optimize distributed inference systems on GPU clusters, pushing throughput with... ...Deep experience in distributed systems, ML infrastructure, or high-performance serving...Remote job
- ..., we build general-purpose AI systems that run efficiently across deployment... .... The Opportunity Our Edge Inference team compiles Liquid... ...You will work directly with the technical lead on problems that require deep understanding of both ML architectures and hardware constraints...
- ...for AI workloads. As AI systems scale, the industry is hitting... ...Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As... ...experimenting with novel inference efficiency techniques such... ...study Experience with AI/ML or applied data science....
$150k
...Robotics (FAR) team is seeking a Member of Technical Staff to drive foundational... ...build intelligent robotic systems from the ground up. In this... ...action models, efficient model inference, video tokenization -... ...degree and 4+ years of CS, CE, ML or related field experience...Local area$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive... ...and selling successful AI and ML systems. Key company highlights: - Has never... ...on AWS, including: - Distributed inference - Caching - Queue orchestration...Full timeTemporary workH1bWork at officeVisa sponsorshipRelocation package- ...Inference Engine Engineer We build and run the inference engine behind every Perplexity... ...Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.... ...you. Good If You Touched Any Of ML Compilers and Framework Internals: PyTorch...
- Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for... ...allowing data scientists and ML engineers to focus on... ...training/fine-tuning, and inference? You will also: Find opportunities... ...related field 5+ years of systems engineering experience in...Full timePart timeWork at officeWork from homeFlexible hours2 days per week
$200k
Member of Technical Staff, RL Research & Environments Posted Feb 28, 2026 | Full... ...RL, ultra-long context, and inference-time compute to achieve... ...evaluation, and environment systems that improve model capabilities... ...large‑scale data or ML systems Ability to design...Full timeRelocationVisa sponsorship- ...training frameworks (Huggingface, verl, slime) Experience with ML training optimization (tell us a story about eliminating data loading... ..., etc.) Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc). Ability...Work at office
- ...Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll work... ...scalable pipelines for training, inference, and data processing... ...Python and modern engineering or ML frameworks Production Experience...
- Member of Technical Staff - Post‑Training Join to apply for the Member of Technical... .... About The Role Build systems that transform powerful pre... ...learning algorithms, and inference‑time scaling techniques. Collaborate... ...diving into complex ML codebases and distributed...Full timeRelocation package
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research... ...learning. We are building systems to exceed humans in all... ..., training orchestration, inference, evals, data pipelines, observability... ...researchers and applied ML engineers to run, debug,...Work at officeLocal area- Member of Technical Staff, Statistical Genetics Location: SF Bay Area Type: Full... ...bringing the rigor of distributed systems, model architecture, and... .... Partner with AI/ML engineers to analyze model... ...randomization, TWAS, causal inference, cross‑ancestry genetics, admixed...Full time
$150k - $350k
Member of Technical Staff, Applied Research — Sieve Location: San Francisco, CA... ...generative media, and agentic systems. Sieve is building the... ...optimization Parallelized inference systems and pipeline orchestration... ...-on PyTorch or similar ML framework expertise Clean,...Full timeH1bVisa sponsorship- ...Member of Technical Staff, Model EfficiencyWho are we?Our mission is to scale intelligence... ...who are building AI systems to power magical experiences like... ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop techniques...Full timeWork at officeRemote workFlexible hours
- What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience.... ...will be crucial in building the core systems that deploy our machine learning capabilities... ...Design and build the integration of ML inference, monitoring systems, LLM interactions...Work experience placement
- Member of Technical Staff, Applied AI The opportunity We are looking for a Member... ...into production systems that deliver scientific value... ...you are You are a strong ML researcher with experience... ...architectures, training dynamics and inference behaviour. You are a...Flexible hours
- ...neural networks within simulations Stay current on state-of-the-art ML compilers—such as those in torch, Triton, and JAX—and decide... ...source projects and participate actively in the broader compiler and systems community What You’ll Bring Strong background in compiler...Remote job
- ...Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems that solve real-world financial operations... ...AI pipelines from data ingestion through inference, ensuring reliability, scalability, and maintainability...Full timeFlexible hours
- Overview Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations. A rapidly growing... ...support data ingestion, model training, inference, and monitoring while ensuring high...Permanent employmentFull timeContract workFlexible hours
- ...enterprises who are building AI systems to power magical experiences like... ...typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: Work directly... ...and distributed training or inference pipelines. Understanding of LLM...Full timeWork at officeRemote workFlexible hours
- Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform how businesses operate... ...audio ingestion, streaming inference, orchestration, and monitoring, ensuring... ...deploying production systems. AI/ML & Audio Systems Expertise Minimum...Full timeFlexible hours
- ...requirements into production systems that can find, generate, filter... ...-processing, parallelism, inference optimization, fine-tuning, and... ...ambiguous needs into concrete technical systems Strong Python... ...experience in PyTorch or similar ML frameworks Experience building...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff - ML Systems & Inference. Be the first to apply!
Related searches
- technical support associate San Francisco, CA
- decision support analyst San Francisco, CA
- desktop support analyst San Francisco, CA
- senior technical analyst San Francisco, CA
- user support analyst San Francisco, CA
- customer support technician San Francisco, CA
- technical support analyst San Francisco, CA
- support analyst San Francisco, CA
- tech assistant San Francisco, CA
- technical support specialist San Francisco, CA

