Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Technical Lead for Inference & ML Performance

Fal

Staff Technical Lead for Inference & ML Performance

San Francisco

fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.

As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.

Why This Role Matters

You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.

What You'll Do

Day-to-day - What success looks like

Set technical direction. Guide your team (kernels, applied performance, ML compilers, distributed inference) to build high-performance inference solutions. - fal's inference engine consistently outperforms industry benchmarks in throughput, latency, and efficiency.

Hands-on IC leadership. Personally contribute to critical inference performance enhancements and optimizations. - You regularly ship code that significantly improves model serving performance.

Collaborate closely with research & applied ML teams. Influence model inference strategies and deployment techniques. - Seamless integration of inference innovations rapidly moves from research to production deployment.

Drive advanced performance optimizations. Implement model parallelism, kernel optimization, and compiler strategies. - Performance bottlenecks are quickly identified and eliminated, dramatically enhancing inference speed and scalability.

Mentor and scale your team. Coach and expand your team of performance-focused engineers. - Your team independently innovates, proactively solves complex performance challenges, and consistently levels up their skills.

You Might Be A Fit If You
  • Are deeply experienced in ML performance optimization. You've optimized inference for large-scale generative models in production environments.
  • Understand the full ML performance stack. From PyTorch, TensorRT, TransformerEngine, Triton to CUTLASS kernels, you've navigated and optimized them all.
  • Know inference inside-out. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling.
  • Lead from the front. You're a respected IC who enjoys getting hands-on with the toughest problems, demonstrating excellence to inspire your team.
  • Thrive in cross-functional collaboration. Comfortable interfacing closely with applied ML teams, researchers, and stakeholders.
Nice-to-Haves
  • Experience building inference engines specifically for diffusion and generative media models
  • Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
  • Leadership experience in scaling technical teams
What You'll Get

One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity.

Sound like your calling? Share your proudest optimization breakthrough, open-source contribution, or performance milestone with us. Let's set new standards for inference performance, together.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Staff Technical Lead for Inference & ML Performance in San Francisco, CA vacancy
  •  .... About the Role As a Technical Lead on the Future of Computing Research...  ...with both the best ML researchers in the world and...  ...Analyze and model system performance, identifying tradeoffs between...  ...implementing the low-level inference stack, including kernel development... 
    Performance
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    16 hours ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to...  ...ensure our LLMs meet rigorous safety, performance, and security standards. Key...  ...LLM serving; prior inference or ML experience is not required Thrive... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 day ago
  • Jaide Health is seeking experienced Members of Technical Staff to join their Model Serving team. This role involves developing and deploying high-performance AI platforms that power advanced NLP applications. You will enhance model delivery through optimized API endpoints... 
    Performance

    Jaide Health

    San Francisco, CA
    3 days ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe...  ...RDMA (RoCE, Infiniband, NVLink) ~ Experience with high performance large scale ML systems ( ~100 GPUs) ~ Experience with FFmpeg and... 
    Performance

    Luma AI

    San Francisco, CA
    2 days ago
  •  ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own...  ...and admission control with clear SLOs Own performance dashboards and capacity planning Requirements...  ...3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA... 
    Performance
    Work at office
    Visa sponsorship
    Relocation package

    PULSE

    San Francisco, CA
    5 days ago
  • $325k

     ...About the Team Our Inference team brings OpenAI's most capable research and technology to the...  ...never been able to before. We focus on performant and efficient model inference, as well as...  ...role if you Have an understanding of modern ML architectures and an intuition for how to... 
    Performance

    OpenAI

    San Francisco, CA
    4 days ago
  • $142.2k - $204.6k

     ...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the...  ...background (3+ years or equivalent) in performance-critical systems Solid understanding of ML inference internals: attention, MLPs, recurrent modules... 
    Performance
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    4 days ago
  • A leading AI technology company in San Francisco is seeking a Tech Lead Manager focused on machine learning performance. In this role, you will manage and mentor a team while driving optimization projects. Ideal candidates have over 5 years of software engineering experience... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  •  ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms that allow applied...  ...or container orchestration. Familiarity with GPU-based ML workloads or distributed training/inference systems. Experience... 
    Performance
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    5 days ago
  •  ...Staff+ Software Engineer, Inference Runtime Remote-Friendly (Travel-Required) | San...  ...a Staff Engineer to be a technical lead for Inference Runtime: the...  ...serving stack, whose performance, correctness, and abstractions...  ...systems engineering or ML infrastructure, with the... 
    Performance
    Work at office
    Remote work
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    5 days ago
  • $175k - $225k

     ...participation from other leading venture capital firms....  ...'re looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware...  ...Experience with ML compilers or compiler-level...  ...ll be joining a deeply technical team obsessed with... 
    Performance
    Local area
    Remote work

    Sauron

    San Francisco, CA
    1 day ago
  • $255k - $345k

     ...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You’ll lead the development and scaling...  ...and high‑throughput GPU inference. This is a role that...  ...parallelism. Optimize system performance by managing resource utilization... 
    Performance
    Work experience placement
    Work at office
    Local area
    Remote work
    Work from home
    Home office

    Whatnot

    San Francisco, CA
    2 days ago
  •  ...Baseten powers mission‑critical inference for the world's most dynamic...  ..., software development, performance engineering, and customer‑facing...  ...aspects of product management, technical customer success, and pre‑...  ...Python due to its relevance in ML projects. Drive customer... 
    Performance
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    5 days ago
  •  ...tools being released daily. Staff Software Engineer / Technical Lead We're hiring a Staff...  ...reliability, scalability, performance, and developer...  ...services across infrastructure, ML systems, APIs, and customer...  ...ranging from large-scale ML inference and model deployment to... 
    Performance
    Full time

    Tamarind Bio

    San Francisco, CA
    2 days ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity...  ...Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar)...  .... Good If You Touched Any Of ML compilers and framework internals:... 
    Performance

    Perplexity AI

    San Francisco, CA
    2 days ago
  • $270k - $340k

     ...model (LLM) training and inference efficiency beyond what...  ...- Scaling, you will lead a team of world‑class researchers...  ..., and systems performance. Define the scaling research...  .... Optimize end‑to‑end ML systems for distributed...  ...talent, providing both technical guidance (research... 
    Performance
    Local area
    Worldwide

    I did my part and supported the Regular Toilet

    San Francisco, CA
    3 days ago
  •  ..."Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: Work...  ...& Solution Design Lead the design and delivery...  ...directly enhance model performance for customer use‑cases....  ...distributed training or inference pipelines. Understanding... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    4 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San...  ...boundaries of inference performance and efficiency....  ...A culture of deep technical ownership and high impact...  .... Collaborate with ML researchers to bring new...  ...We have contributed to leading open-source research,... 
    Performance
    Full time
    Local area

    Together AI

    San Francisco, CA
    2 days ago
  •  ...Platform Engineer – Training & Inference Saviynt's AI-powered identity...  ...and empower the world's leading brands, Fortune 500 companies...  ...model at Saviynt. We need an ML Platform Engineer to own distributed...  ...sharing. Optimise inference performance: configure fractional GPU... 
    Performance

    Saviynt

    San Francisco, CA
    1 day ago
  • $332k - $421k

     ...Principal Software Engineer, ML Flywheel Technical Lead Waymo is an autonomous driving technology company with the mission to be the world...  ...mining, selection and curation on the modeling pipeline performance. Articulate the strategy for incorporating diverse data... 
    Performance
    Full time
    Remote work

    Waymo

    San Francisco, CA
    4 days ago
  •  ...Tech Lead, AI Compute Infrastructure Los Angeles...  ...are seeking a seasoned Technical Leader to build and scale...  ...directly impact model performance, developer productivity...  ...of devices for inference, training, data processing...  ...~ Experience with core ML frameworks such as PyTorch... 
    Performance
    Full time

    HeyGen

    San Francisco, CA
    5 days ago
  • $151.5k - $244.2k

     ...Operationalization Build ML deployment pipelines—...  ...observability: drift detection, performance alerting, and lifecycle management...  ...for scientific or technical applications. Preferred...  ...VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for... 
    Performance
    Full time
    Flexible hours

    Eli Lilly

    San Francisco, CA
    4 days ago
  •  ...and machines can talk to. As a Tech Lead for the Applied Computer Vision...  ...grade features within our tech stack. Technical Leadership: Lead the design and implementation...  ...quality algorithms and general ML code for high-performance execution on CPU and GPU.... 
    Performance
    Work at office
    3 days per week

    Niantic Spatial, Inc

    San Francisco, CA
    16 hours ago
  • $264.8k - $331k

     ...end-to-end solutions for the ML lifecycle. You will work closely...  ...generation LLM training, inference and data curation. If you...  ...experience, qualifications, interview performance, and relevant education or...  ...that power the world's leading models, and help enterprises... 
    Performance
    Full time

    Scale AI

    San Francisco, CA
    14 days ago
  • $190k - $250k

     ...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology...  ...Engineer to act as the technical anchor for a small,...  ...environment for both training and inference. We design our...  ...infrastructure to not just be highly performant, but also easy to use.... 
    Performance
    Full time
    Work at office
    Local area
    Worldwide
    Relocation

    HeartFlow

    San Francisco, CA
    1 day ago
  •  ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In this role, you will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for... 
    Performance
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    3 days ago
  • $225k - $325k

     ...strategic partners. FDEs are technical builders: they ship end-to-end...  ...technical teams. As a Tech Lead Manager, Forward Deployed Engineering...  ..., growth conversations, and performance. ~ Must be someone who codes...  ...haves Experience with AI/ML product integrations or... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Handshake

    San Francisco, CA
    15 days ago
  • $248.8k - $311k

     ...Technical Lead Manager, Physical AI San Francisco, CA Scale AI is the data engine for the...  ...robot deployment. You will lead a high-performing team of Research Engineers while remaining...  .... Required Qualifications AI/ML Excellence Deep Learning Mastery:... 
    Performance
    Full time

    Scale AI

    San Francisco, CA
    5 days ago
  • $235.03k - $352.29k

     ...Softbank, Fidelity, T. Rowe Price, and other leading investors. About the Role We are...  ...for an Autonomy Leader to drive the technical roadmap for the systems that validate...  ...algorithms that measure the cognitive performance of the ML models powering our self-driving car's... 
    Performance
    Immediate start
    Flexible hours

    Nuro

    San Francisco, CA
    7 days ago
  • $238k - $302k

     ...collaborate across teams to bring ML to production systems and...  ...incessant drive to improve the performance of our technology stack. This...  ...Manager. You will: Lead a top-tier applied ML team...  ...travel millions of miles. Drive technical direction, and provide... 
    Performance
    Full time
    Remote work

    Waymo

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Technical Lead for Inference & ML Performance. Be the first to apply!