Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Technical Lead for Inference & ML Performance

Fal

Staff Technical Lead for Inference & ML Performance

San Francisco

fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.

As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.

Why This Role Matters

You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.

What You'll Do

Day-to-day - What success looks like

Set technical direction. Guide your team (kernels, applied performance, ML compilers, distributed inference) to build high-performance inference solutions. - fal's inference engine consistently outperforms industry benchmarks in throughput, latency, and efficiency.

Hands-on IC leadership. Personally contribute to critical inference performance enhancements and optimizations. - You regularly ship code that significantly improves model serving performance.

Collaborate closely with research & applied ML teams. Influence model inference strategies and deployment techniques. - Seamless integration of inference innovations rapidly moves from research to production deployment.

Drive advanced performance optimizations. Implement model parallelism, kernel optimization, and compiler strategies. - Performance bottlenecks are quickly identified and eliminated, dramatically enhancing inference speed and scalability.

Mentor and scale your team. Coach and expand your team of performance-focused engineers. - Your team independently innovates, proactively solves complex performance challenges, and consistently levels up their skills.

You Might Be A Fit If You
  • Are deeply experienced in ML performance optimization. You've optimized inference for large-scale generative models in production environments.
  • Understand the full ML performance stack. From PyTorch, TensorRT, TransformerEngine, Triton to CUTLASS kernels, you've navigated and optimized them all.
  • Know inference inside-out. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling.
  • Lead from the front. You're a respected IC who enjoys getting hands-on with the toughest problems, demonstrating excellence to inspire your team.
  • Thrive in cross-functional collaboration. Comfortable interfacing closely with applied ML teams, researchers, and stakeholders.
Nice-to-Haves
  • Experience building inference engines specifically for diffusion and generative media models
  • Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
  • Leadership experience in scaling technical teams
What You'll Get

One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity.

Sound like your calling? Share your proudest optimization breakthrough, open-source contribution, or performance milestone with us. Let's set new standards for inference performance, together.

Vacancy posted 7 hours ago
Similar jobs that could be interesting for youBased on the Staff Technical Lead for Inference & ML Performance in San Francisco, CA vacancy
  •  .... About the Role As a Technical Lead on the Future of Computing Research...  ...with both the best ML researchers in the world and...  ...Analyze and model system performance, identifying tradeoffs between...  ...implementing the low-level inference stack, including kernel development... 
    Performance
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    4 days ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA | Seattle, WA About Anthropic Anthropic'...  ...ensure our LLMs meet rigorous safety, performance, and security standards. What...  ...LLM serving; prior inference or ML experience is not required Thrive... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 day ago
  • Jaide Health is seeking experienced Members of Technical Staff to join their Model Serving team. This role involves developing and deploying high-performance AI platforms that power advanced NLP applications. You will enhance model delivery through optimized API endpoints... 
    Performance

    Jaide Health

    San Francisco, CA
    2 days ago
  • $197.3k - $225.1k

     ...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating...  ...our applications of AI & ML are bringing humanity and...  ...experiences and scalable, high-performance AI infrastructure. At...  ..., research scientists, technical program managers, and product... 
    Performance
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Francisco, CA
    5 days ago
  • $380k

     ...the Role We're looking for a GPU Inference Engineer to contribute to improvements...  ...drive initiatives to optimize inference performance and scalability. You'll also be engaged...  ...leverage initiatives by building a stronger technical foundation. In this role you will:... 
    Performance
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    1 day ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe...  ...RDMA (RoCE, Infiniband, NVLink) ~ Experience with high performance large scale ML systems ( ~100 GPUs) ~ Experience with FFmpeg and... 
    Performance

    Luma AI

    San Francisco, CA
    1 day ago
  • $142.2k - $204.6k

     ...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the...  ...background (3+ years or equivalent) in performance-critical systems Solid understanding of ML inference internals: attention, MLPs, recurrent modules... 
    Performance
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    3 days ago
  •  ...About the Team Our Inference team brings OpenAI's most capable research and technology...  ...ve never been able to before. We focus on performant and efficient model inference, as well as...  ...you: Have an understanding of modern ML architectures and an intuition for how to... 
    Performance

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms that allow applied...  ...or container orchestration. Familiarity with GPU-based ML workloads or distributed training/inference systems. Experience... 
    Performance
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    4 days ago
  •  ...'re looking for a Tech Lead Manager (TLM) to own and...  ...time on hands-on technical work and 30% on people...  ...the models, agents, and ML systems that power Our...  ...mentor, and grow a high-performing team of ML and AI...  ...model training and inference infrastructure, set standards... 
    Performance
    Remote work
    Flexible hours

    TEEMA

    San Francisco, CA
    1 day ago
  •  ...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You'll lead the development and scaling...  ...and high-throughput GPU inference. This is a role that...  ...parallelism. Optimize system performance by managing resource... 
    Performance
    Work experience placement
    Work at office
    Local area
    Remote work
    Work from home
    Home office

    Whatnot

    San Francisco, CA
    3 days ago
  • A leading AI technology company in San Francisco is seeking a Tech Lead Manager focused on machine learning performance. In this role, you will manage and mentor a team while driving optimization projects. Ideal candidates have over 5 years of software engineering experience... 
    Performance

    Baseten

    San Francisco, CA
    3 days ago
  •  ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own...  ...and admission control with clear SLOs Own performance dashboards and capacity planning Requirements...  ...3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA... 
    Performance
    Work at office
    Visa sponsorship
    Relocation package

    Pulse

    San Francisco, CA
    2 days ago
  • $175k - $225k

     ...participation from other leading venture capital firms....  ...'re looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware...  ...Experience with ML compilers or compiler-level...  ...ll be joining a deeply technical team obsessed with... 
    Performance
    Local area
    Remote work

    Sauron

    San Francisco, CA
    7 hours ago
  • $255k - $345k

     ...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You’ll lead the development and scaling...  ...and high‑throughput GPU inference. This is a role that...  ...parallelism. Optimize system performance by managing resource utilization... 
    Performance
    Work experience placement
    Work at office
    Local area
    Remote work
    Work from home
    Home office

    Whatnot

    San Francisco, CA
    1 day ago
  • $220k

    We build and run the inference engine behind every Perplexity query and deploy dozens of model...  ...keep up with rapidly growing traffic. Performance optimisation. Profile and fix...  ...out for you. Good if you touched any of ML compilers and framework internals: PyTorch... 
    Performance

    Perplexity

    San Francisco, CA
    4 days ago
  •  ...Baseten powers mission‑critical inference for the world's most dynamic...  ..., software development, performance engineering, and customer‑facing...  ...aspects of product management, technical customer success, and pre‑...  ...Python due to its relevance in ML projects. Drive customer... 
    Performance
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    4 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San...  ...boundaries of inference performance and efficiency....  ...A culture of deep technical ownership and high impact...  .... Collaborate with ML researchers to bring new...  ...We have contributed to leading open-source research,... 
    Performance
    Full time
    Local area

    Together AI

    San Francisco, CA
    1 day ago
  •  ..."Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will:...  ...Leadership & Solution Design Lead the design and delivery...  ...directly enhance model performance for customer use‑cases....  ...distributed training or inference pipelines.... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    1 day ago
  •  ...Technical Lead - Software Development - FTE - Bay Area, CA This will be a 2-days-hybrid role...  ..., CA. • As our Tech Lead in AI/ML, you will play a crucial role in the hands...  ...production, evaluating and improving their performance and accuracy over time. •... 
    Performance

    Right Skale, Inc.

    San Francisco, CA
    7 hours ago
  •  ...Tech Lead, AI Compute Infrastructure Los Angeles...  ...are seeking a seasoned Technical Leader to build and scale...  ...directly impact model performance, developer productivity...  ...of devices for inference, training, data processing...  ...~ Experience with core ML frameworks such as PyTorch... 
    Performance
    Full time

    HeyGen

    San Francisco, CA
    1 day ago
  • $332k - $421k

     ...Principal Software Engineer, ML Flywheel Technical Lead Waymo is an autonomous driving technology company with the mission to be the world...  ...mining, selection and curation on the modeling pipeline performance. Articulate the strategy for incorporating diverse data... 
    Performance
    Full time
    Remote work

    Waymo

    San Francisco, CA
    3 days ago
  •  ...and machines can talk to. As a Tech Lead for the Applied Computer Vision...  ...grade features within our tech stack. Technical Leadership: Lead the design and implementation...  ...quality algorithms and general ML code for high-performance execution on CPU and GPU.... 
    Performance
    Work at office
    3 days per week

    Niantic Spatial, Inc

    San Francisco, CA
    4 days ago
  •  ...Platform Engineer - Training & Inference Saviynt's AI-powered...  ...protect and empower the world's leading brands, Fortune 500 companies...  ...model at Saviynt. We need an ML Platform Engineer to own distributed...  ...sharing. Optimise inference performance: configure fractional GPU... 
    Performance

    Saviynt Inc.

    San Francisco, CA
    2 days ago
  • $151.5k - $244.2k

     ...Operationalization Build ML deployment pipelines-...  ...observability: drift detection, performance alerting, and lifecycle management...  ...for scientific or technical applications. Preferred Qualifications...  ...VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for... 
    Performance
    Full time
    Flexible hours

    Eli Lilly

    San Francisco, CA
    3 days ago
  •  ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In this role, you will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for... 
    Performance
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    1 day ago
  • $281k - $356k

     ...Technical Lead Manager, Simulator Efficiency Waymo is an autonomous driving...  ...group of machine learning (ML) engineers, software...  ...that measure and enhance the performance of the Waymo Driver. We achieve...  ...you will report to a Senior Staff Engineering Manager. You will... 
    Performance
    Full time
    Remote work

    Waymo

    San Francisco, CA
    2 days ago
  • $110k - $150k

     ...to help them hire. Title of Role: Technical Projects Lead Location: San Francisco, CA (FiDi...  ...- $150,000 base + Meaningful Equity + Performance Bonus (potential $300K+) Visa: Visa...  ...proficiency in any major language AI / ML infrastructure interest Project... 
    Performance
    Internship
    Work at office
    Remote work
    Visa sponsorship
    Relocation package

    Recruiting from Scratch

    San Francisco, CA
    1 day ago
  • $252k - $315k

     ...end-to-end solutions for the ML lifecycle. You will work closely...  ...generation LLM training, inference and data curation. If you are...  ...skills, experience, interview performance, and relevant education or...  ...technologies that power the world's leading models, and help enterprises... 
    Performance
    Full time

    Scale AI, Inc.

    San Francisco, CA
    4 days ago
  • $251k - $310k

     ...Staff Technical Lead Manager, Behaviors Waymo is an autonomous driving technology company with...  ...an impact on scaling our Waymo Driver's performance and maintaining and improving our excellent...  ...workflows and building metrics for ML models The expected base salary range... 
    Performance
    Full time
    Remote work

    Waymo

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Technical Lead for Inference & ML Performance. Be the first to apply!