Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff - Infrastructure

Gimlet Labs

About Us Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them. The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together. Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization. We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI. About this Role We are looking for an Infrastructure platform Engineer to design, build, and operate the cluster infrastructure behind Gimlet’s heterogeneous inference cloud. Unlike traditional cloud platforms built around a single hardware ecosystem, Gimlet's infrastructure spans multiple accelerator vendors and architectures. Infrastructure engineers play a key role in bringing new hardware platforms online, building the operational abstractions that make heterogeneous infrastructure manageable at scale, and ensuring new silicon can serve production workloads reliably from day one. This role is highly hands‑on. You will work across bare metal, Linux, Kubernetes or cluster schedulers, high‑speed networking, observability, provisioning, and incident response. You will partner closely with distributed systems, runtime, compiler, and hardware teams to ensure Gimlet’s infrastructure can support demanding AI workloads at production scale. What you will work on Design, deploy, and operate large‑scale CPU, GPU, and accelerator clusters powering production AI inference. Build automation for provisioning, configuration, upgrades, validation, and lifecycle management. Design and scale provisioning systems for heterogeneous bare‑metal infrastructure across multiple datacenters and hardware vendors. Operate cluster scheduling, resource allocation, isolation, quotas, and utilization systems. Debug complex production issues across Linux, networking, storage, drivers, firmware, and orchestration layers. Build and operate high‑performance networking infrastructure, including RDMA‑enabled environments and accelerator interconnects. Build observability for cluster health, capacity, performance, failures, and workload behavior. Improve reliability, availability, and recovery across multi‑node production systems. Work with distributed systems and runtime teams to support low‑latency, high‑throughput inference workloads. Evaluate and integrate new hardware platforms, accelerators, networking technologies, and datacenter designs. Create runbooks, operational standards, and incident response practices as the fleet scales. You may be a good fit if Experience in infrastructure, cluster engineering, platform engineering, SRE, HPC, or distributed systems. Deep Linux systems experience, including debugging performance, networking, storage, processes, and kernel‑level issues. Experience operating Kubernetes, Slurm, Nomad, or similar orchestration and scheduling systems. Strong automation skills using tools such as Terraform, Ansible, Helm, Python, Go, or equivalent. Experience with GPU or accelerator infrastructure, including drivers, firmware, CUDA/ROCm stacks, or hardware validation. Familiarity with high‑performance networking such as InfiniBand, RoCE, high‑speed Ethernet, or datacenter fabrics. Strong operational judgment: you know how to build systems that are observable, recoverable, and boring in production. Comfort working in a fast‑moving startup environment with high ownership and ambiguity. Strong candidates may also have Experience building or operating AI inference, training, HPC, or neocloud infrastructure. Experience with bare‑metal provisioning, PXE/iPXE, image pipelines, BIOS/firmware management, or rack bring‑up. Experience with multi‑tenant cluster isolation, quota systems, fair scheduling, or usage accounting. Experience debugging distributed workload performance across compute, memory, network, and storage bottlenecks. Experience building observability platforms using technologies such as Prometheus, OpenTelemetry, Grafana, or similar tooling. Familiarity with heterogeneous hardware environments across NVIDIA, AMD, Intel, ARM, or emerging accelerators. #J-18808-Ljbffr Gimlet Labs

Vacancy posted 21 hours ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - Infrastructure in San Francisco, CA vacancy
  • Member of Technical Staff, Infrastructure Join us and help shape the future of AI by architecting next-generation knowledge systems. Join us and help shape the future of AI by defining the narrative around document understanding. About the Role The Infra team at LlamaIndex... 
    Suggested
    Work at office

    LlamaIndex, Inc.

    San Francisco, CA
    1 day ago
  •  ...enterprises that integrate LLMs into their products. The team is 5 people with a research and product focus. As a Member of Technical Staff on our infrastructure team, you'll own the cloud systems that serve our compression API end-to-end. You'd get to build global low-... 
    Suggested
    Visa sponsorship

    The Token Company

    San Francisco, CA
    1 day ago
  •  ...Radical Numerics was founded to develop both the power to design and the responsibility to defend. About the Role As a Member of Technical Staff, Infrastructure & Training Systems at Radical Numerics, you will design and build the systems that make large-scale model training... 
    Suggested
    Local area

    Radical Numerics Inc.

    San Francisco, CA
    4 days ago
  • Member of Technical Staff - Infrastructure Security We're partnering with a frontier AI research company that is building next-generation open-weight foundation models with the mission of making advanced AI broadly accessible. Their team includes researchers, engineers... 
    Suggested

    Xcede

    San Francisco, CA
    1 day ago
  •  ...DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond. Role Overview Reflection.AI is looking for a Member of Technical Staff - Infrastructure Security to secure our geographically diverse multi-cloud Kubernetes and cloud environments. In this role, you’ll... 
    Suggested
    Relocation package

    Reflection

    San Francisco, CA
    3 days ago
  •  ...full ownership of NeoSigma's platform infrastructure — lead architectural decisions and design...  ...enterprise customers Own the technical relationship with enterprise customers...  ...career-defining impact As a founding member, you’ll help define the technical foundation... 

    NeoSigma

    San Francisco, CA
    3 days ago
  •  ...Plato is an applied research lab building the foundational infrastructure to train specialized AI agents. We turn real-world data...  ..., and iteration feel like one seamless system. As a Member of Technical Staff, Infrastructure / DevOps, you will own the systems that make... 

    Plato

    San Francisco, CA
    2 days ago
  • $180k

    Member of Technical Staff - RL Infrastructure About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization... 
    Temporary work

    xAI

    San Francisco, CA
    1 day ago
  •  ...Kubernetes, Terraform, Istio, Pulumi, and AWS Cloud, to automate our infrastructure and deliver reliable applications efficiently. By using...  ...and Prometheus. Strong communication, analytical, and technical leadership skills. Preferred Skills Experience working in a... 
    Work at office
    Local area
    Monday to Thursday

    Envoy

    San Francisco, CA
    2 days ago
  •  .... Successful candidates typically come from staff or principal-level roles and are recognized for establishing technical direction, leading large-scale initiatives,...  ...teams use to right‑size space and budgets. This infrastructure already powers 16,000 workplaces and 9,000+... 
    Work at office
    Local area
    Monday to Thursday

    Envoy Inc.

    San Francisco, CA
    5 days ago
  • About Mandolin Nearly every disease will become treatable in our lifetimes. Mandolin is laying the clinical and financial infrastructure to get groundbreaking treatments to patients faster, powered by AI agents. Mandolin partners closely with the largest healthcare institutions... 
    Local area

    Mandolin

    San Francisco, CA
    5 days ago
  • $150k - $300k

    About Us Sieve is the only AI research lab exclusively focused on video data. We combine exabyte-scale video infrastructure, novel video understanding techniques, and dozens of data sources to develop datasets that push the frontier of video modeling. Video makes up 80... 

    Sieve

    San Francisco, CA
    5 days ago
  •  ...agent for enterprise computer automation. Our developer platform writes, tests, and maintains automation code on fully‑managed infrastructure - cutting dev time by 90%. We’re starting with healthcare, where legacy systems make reliable automation a genuinely hard problem... 
    Immediate start
    Remote work

    CloudCruise

    San Francisco, CA
    1 day ago
  • $150k - $300k

    Building Open Superintelligence Infrastructure Prime Intellect is building the open superintelligence stack - from frontier agentic models...  ...Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements into production‑ready... 

    Prime Intellect

    San Francisco, CA
    1 day ago
  •  ...Hands‑on experience building or significantly enhancing distributed compute platforms, orchestration systems, or high‑performance infrastructure at scale Ability to thrive in a fast‑paced, meritocratic environment with full ownership, high standards, and a focus on... 

    xAI

    San Francisco, CA
    7 days ago
  • $150k - $265k

     ...human again. Mission We're building the platform for the future of voice technology. Our market edge is extensible, reliable infrastructure designed for the full complexity of voice interactions. 18 months, 150k developers, adding 1000 every day. Give it a try here... 
    Full time
    Shift work

    Vapi

    San Francisco, CA
    5 days ago
  •  ...Horowitz, GIC, Goldman Sachs, KKR, Visa, and others. Technical Skills Develop and maintain infrastructure that powers digital asset custody, trading, staking,...  ...to solve problems, and assist or teach other team members when possible. You may be a fit for this role if you... 
    Worldwide

    Crypto Pro Network

    San Francisco, CA
    1 day ago
  • $150k - $250k

     ...people take ownership, grow together, and share both the challenges and the wins. What you'll do Build the supercomputing infrastructure that runs our agents. Our agents tackle long-horizon, high-performance workloads, and you'll design the cloud compute,... 
    Work at office
    Remote work
    Flexible hours

    Asari AI

    San Francisco, CA
    12 days ago
  •  ...building the foundational software and infrastructure that everything else depends on. This is...  ...growth. What We Look For Senior to staff-level experience in software engineering...  ...source projects or other publicly visible technical work. Comfort owning ambiguous, high-... 

    Dimensional Inc.

    San Francisco, CA
    2 days ago
  •  ...users create characters, worlds, stories, and relationships with AI, and making that feel fast, reliable, and alive takes serious infrastructure. We are looking for an engineer who wants to help own that whole stack. We run more of our own than most companies our size.... 

    janitorAI

    San Francisco, CA
    1 day ago
  •  ...You'll Pioneer You’ll create the data systems that make frontier research and the largest training runs possible. It's building infrastructure at a scale where billion-image datasets are normal and where video processing pipelines need to run across thousands of GPUs.... 
    Worldwide

    Black Forest Labs

    San Francisco, CA
    3 days ago
  • $200k - $350k

     ...role in designing, scaling, and operating the core platform infrastructure that powers autonomous scientific discovery. Your primary focus...  ...Edison Scientific, engineering at the senior level is about technical ownership and leverage- understanding how complex systems interact... 
    Work at office

    Edison Scientific

    San Francisco, CA
    3 days ago
  • $160k - $240k

     ...Full-time San Francisco · In person $160k - $240k + Equity Member of Technical Staff, Product About the Role You will build the Reific...  ...building full-stack features across UI, API, database, and infrastructure. Good taste for dense, operational software people use repeatedly... 
    Full time

    Reific

    San Francisco, CA
    14 hours ago
  • $227.5k - $401k

     ...motivated individuals who tackle unique technical challenges at scale and solve them...  ...financial technology sector. As a Member of Technical Staff, you will operate with a high degree...  ...Experience in AI‑enabled fintech or infrastructure companies. Familiarity with classical... 
    Work at office
    Immediate start
    Relocation
    Flexible hours

    Adyen

    San Francisco, CA
    1 day ago
  •  ...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical Staff As a founding member of the engineering...  ...lies in empowering teams to focus on innovation, not on infrastructure. We aim to simplify the AI development stack, allowing... 
    Full time
    Part time
    Work at office
    Work from home
    Flexible hours
    2 days per week

    Pixeltable, Inc.

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...scale. The two key areas are: Building the infrastructure to serve LLMs efficiently at scale....  ...systems into our RL training stack. Core Technical Responsibilities LLM Serving Multi‑...  ...in open development and encourage team members to contribute to the broader AI community... 
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime Intellect

    San Francisco, CA
    1 day ago
  •  ...longer whether quantum and AI converge; it is who builds the infrastructure to make that convergence reliable, scalable, and useful....  ...ours at the frontier of science. Role Overview As a Member of Technical Staff you will shape Conductor's core offerings: AI software that... 

    Conductor Quantum

    San Francisco, CA
    2 days ago
  •  ...uses Shapes every single day, and everyone talks to users. Member of Technical Staff is the title we use for engineers who own hard problems...  ...scale High-performance Python backends at scale Realtime infrastructure (WebRTC, WebSockets, streaming) for chat, voice, or video... 

    Shapes

    San Francisco, CA
    2 days ago
  • $200k

     ...Join to apply for the Member of Technical Staff role at Listen Labs . TL;DR: We are seeing strong market demand and an aggressive 6...  ...product and must make decisions across the LLM pipeline, infrastructure, backend, and UX. You have a high bar for quality: In... 
    Flexible hours

    Listen Labs

    San Francisco, CA
    2 days ago
  •  ...environments for Fortune 500 companies. About the Role As a Member of Technical Staff, you will be part of the team responsible for the work...  ...entire stack. The MTS works vigorously on the underlying infrastructure, core features, agent configurations, and user experience... 
    Work experience placement
    H1b
    Work at office
    Visa sponsorship

    Ersilia

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Infrastructure. Be the first to apply!