Staff Technical Lead for Inference & ML Performance

Fal

Staff Technical Lead for Inference & ML Performance

San Francisco

fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.

As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.

Why This Role Matters

You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.

What You'll Do

Day-to-day - What success looks like

Set technical direction. Guide your team (kernels, applied performance, ML compilers, distributed inference) to build high-performance inference solutions. - fal's inference engine consistently outperforms industry benchmarks in throughput, latency, and efficiency.

Hands-on IC leadership. Personally contribute to critical inference performance enhancements and optimizations. - You regularly ship code that significantly improves model serving performance.

Collaborate closely with research & applied ML teams. Influence model inference strategies and deployment techniques. - Seamless integration of inference innovations rapidly moves from research to production deployment.

Drive advanced performance optimizations. Implement model parallelism, kernel optimization, and compiler strategies. - Performance bottlenecks are quickly identified and eliminated, dramatically enhancing inference speed and scalability.

Mentor and scale your team. Coach and expand your team of performance-focused engineers. - Your team independently innovates, proactively solves complex performance challenges, and consistently levels up their skills.

You Might Be A Fit If You

Are deeply experienced in ML performance optimization. You've optimized inference for large-scale generative models in production environments.
Understand the full ML performance stack. From PyTorch, TensorRT, TransformerEngine, Triton to CUTLASS kernels, you've navigated and optimized them all.
Know inference inside-out. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling.
Lead from the front. You're a respected IC who enjoys getting hands-on with the toughest problems, demonstrating excellence to inspire your team.
Thrive in cross-functional collaboration. Comfortable interfacing closely with applied ML teams, researchers, and stakeholders.

Nice-to-Haves

Experience building inference engines specifically for diffusion and generative media models
Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
Leadership experience in scaling technical teams

What You'll Get

One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity.

Sound like your calling? Share your proudest optimization breakthrough, open-source contribution, or performance milestone with us. Let's set new standards for inference performance, together.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Staff Technical Lead for Inference & ML Performance in San Francisco, CA vacancy

Inference Technical Lead, On-Device Transformers
.... About the Role As a Technical Lead on the Future of Computing Research... ...with both the best ML researchers in the world and... ...Analyze and model system performance, identifying tradeoffs between... ...implementing the low-level inference stack, including kernel development...
Performance
Work at office
Relocation package
OpenAI
San Francisco, CA
16 hours ago
Staff + Sr. Software Engineer, Cloud Inference
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to... ...ensure our LLMs meet rigorous safety, performance, and security standards. Key... ...LLM serving; prior inference or ML experience is not required Thrive...
Performance
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
1 day ago
Staff SWE, Inference Infrastructure — High-Scale ML
Jaide Health is seeking experienced Members of Technical Staff to join their Model Serving team. This role involves developing and deploying high-performance AI platforms that power advanced NLP applications. You will enhance model delivery through optimized API endpoints...
Performance
Jaide Health
San Francisco, CA
3 days ago
Software Engineer, Inference
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe... ...RDMA (RoCE, Infiniband, NVLink) ~ Experience with high performance large scale ML systems ( ~100 GPUs) ~ Experience with FFmpeg and...
Performance
Luma AI
San Francisco, CA
2 days ago
Software Engineer, Inference
...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own... ...and admission control with clear SLOs Own performance dashboards and capacity planning Requirements... ...3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA...
Performance
Work at office
Visa sponsorship
Relocation package
PULSE
San Francisco, CA
5 days ago
Software Engineer, Model Inference
$325k
...About the Team Our Inference team brings OpenAI's most capable research and technology to the... ...never been able to before. We focus on performant and efficient model inference, as well as... ...role if you Have an understanding of modern ML architectures and an intuition for how to...
Performance
OpenAI
San Francisco, CA
4 days ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the... ...background (3+ years or equivalent) in performance-critical systems Solid understanding of ML inference internals: attention, MLPs, recurrent modules...
Performance
Local area
Worldwide
Databricks
San Francisco, CA
4 days ago
Tech Lead Manager, ML Inference & Performance
A leading AI technology company in San Francisco is seeking a Tech Lead Manager focused on machine learning performance. In this role, you will manage and mentor a team while driving optimization projects. Ideal candidates have over 5 years of software engineering experience...
Performance
Baseten
San Francisco, CA
4 days ago
Software Engineer Intern (AI Infrastructure / Training / Inference)
...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms that allow applied... ...or container orchestration. Familiarity with GPU-based ML workloads or distributed training/inference systems. Experience...
Performance
Internship
Immediate start
SpreeAI
San Francisco, CA
5 days ago
Staff+ Software Engineer, Inference Runtime
...Staff+ Software Engineer, Inference Runtime Remote-Friendly (Travel-Required) | San... ...a Staff Engineer to be a technical lead for Inference Runtime: the... ...serving stack, whose performance, correctness, and abstractions... ...systems engineering or ML infrastructure, with the...
Performance
Work at office
Remote work
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago
AI Inference Engineer
$175k - $225k
...participation from other leading venture capital firms.... ...'re looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware... ...Experience with ML compilers or compiler-level... ...ll be joining a deeply technical team obsessed with...
Performance
Local area
Remote work
Sauron
San Francisco, CA
1 day ago
Technical Lead Manager, ML Platform
$255k - $345k
...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You’ll lead the development and scaling... ...and high‑throughput GPU inference. This is a role that... ...parallelism. Optimize system performance by managing resource utilization...
Performance
Work experience placement
Work at office
Local area
Remote work
Work from home
Home office
Whatnot
San Francisco, CA
2 days ago
Applied AI Inference Engineer
...Baseten powers mission‑critical inference for the world's most dynamic... ..., software development, performance engineering, and customer‑facing... ...aspects of product management, technical customer success, and pre‑... ...Python due to its relevance in ML projects. Drive customer...
Performance
Work experience placement
Flexible hours
Baseten
San Francisco, CA
5 days ago
Staff Software Engineer / Technical Lead
...tools being released daily. Staff Software Engineer / Technical Lead We're hiring a Staff... ...reliability, scalability, performance, and developer... ...services across infrastructure, ML systems, APIs, and customer... ...ranging from large-scale ML inference and model deployment to...
Performance
Full time
Tamarind Bio
San Francisco, CA
2 days ago
Member of Technical Staff (AI Inference Engineer)
...Inference Engine Engineer We build and run the inference engine behind every Perplexity... ...Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar)... .... Good If You Touched Any Of ML compilers and framework internals:...
Performance
Perplexity AI
San Francisco, CA
2 days ago
Senior Staff AI Research TLM - AI Systems
$270k - $340k
...model (LLM) training and inference efficiency beyond what... ...- Scaling, you will lead a team of world‑class researchers... ..., and systems performance. Define the scaling research... .... Optimize end‑to‑end ML systems for distributed... ...talent, providing both technical guidance (research...
Performance
Local area
Worldwide
I did my part and supported the Regular Toilet
San Francisco, CA
3 days ago
Member of Technical Staff, Senior/Staff MLE
..."Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: Work... ...& Solution Design Lead the design and delivery... ...directly enhance model performance for customer use‑cases.... ...distributed training or inference pipelines. Understanding...
Performance
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
4 days ago
Senior Backend Engineer, Inference Platform
$160k - $250k
...Senior Backend Engineer, Inference Platform San... ...boundaries of inference performance and efficiency.... ...A culture of deep technical ownership and high impact... .... Collaborate with ML researchers to bring new... ...We have contributed to leading open-source research,...
Performance
Full time
Local area
Together AI
San Francisco, CA
2 days ago
AI Platform Engineer, Training and Inference
...Platform Engineer – Training & Inference Saviynt's AI-powered identity... ...and empower the world's leading brands, Fortune 500 companies... ...model at Saviynt. We need an ML Platform Engineer to own distributed... ...sharing. Optimise inference performance: configure fractional GPU...
Performance
Saviynt
San Francisco, CA
1 day ago
Principal Software Engineer, ML Flywheel Technical Lead
$332k - $421k
...Principal Software Engineer, ML Flywheel Technical Lead Waymo is an autonomous driving technology company with the mission to be the world... ...mining, selection and curation on the modeling pipeline performance. Articulate the strategy for incorporating diverse data...
Performance
Full time
Remote work
Waymo
San Francisco, CA
4 days ago
Tech Lead, AI Compute Infrastructure
...Tech Lead, AI Compute Infrastructure Los Angeles... ...are seeking a seasoned Technical Leader to build and scale... ...directly impact model performance, developer productivity... ...of devices for inference, training, data processing... ...~ Experience with core ML frameworks such as PyTorch...
Performance
Full time
HeyGen
San Francisco, CA
5 days ago
Technical Lead - Software Developer, Data Foundry
$151.5k - $244.2k
...Operationalization Build ML deployment pipelines—... ...observability: drift detection, performance alerting, and lifecycle management... ...for scientific or technical applications. Preferred... ...VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for...
Performance
Full time
Flexible hours
Eli Lilly
San Francisco, CA
4 days ago
Technical Lead, Computer Vision
...and machines can talk to. As a Tech Lead for the Applied Computer Vision... ...grade features within our tech stack. Technical Leadership: Lead the design and implementation... ...quality algorithms and general ML code for high-performance execution on CPU and GPU....
Performance
Work at office
3 days per week
Niantic Spatial, Inc
San Francisco, CA
16 hours ago
Tech Lead Manager- MLRE, ML Systems
$264.8k - $331k
...end-to-end solutions for the ML lifecycle. You will work closely... ...generation LLM training, inference and data curation. If you... ...experience, qualifications, interview performance, and relevant education or... ...that power the world's leading models, and help enterprises...
Performance
Full time
Scale AI
San Francisco, CA
14 days ago
Staff Software Engineer / Tech Lead, ML Infrastructure
$190k - $250k
...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology... ...Engineer to act as the technical anchor for a small,... ...environment for both training and inference. We design our... ...infrastructure to not just be highly performant, but also easy to use....
Performance
Full time
Work at office
Local area
Worldwide
Relocation
HeartFlow
San Francisco, CA
1 day ago
Software Engineer - AI Inference Engine
...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In this role, you will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for...
Performance
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
3 days ago
Technical Lead Manager, Forward Deployed Engineering
$225k - $325k
...strategic partners. FDEs are technical builders: they ship end-to-end... ...technical teams. As a Tech Lead Manager, Forward Deployed Engineering... ..., growth conversations, and performance. ~ Must be someone who codes... ...haves Experience with AI/ML product integrations or...
Performance
Full time
Work at office
Remote work
Flexible hours
Handshake
San Francisco, CA
15 days ago
Technical Lead Manager, Physical AI
$248.8k - $311k
...Technical Lead Manager, Physical AI San Francisco, CA Scale AI is the data engine for the... ...robot deployment. You will lead a high-performing team of Research Engineers while remaining... .... Required Qualifications AI/ML Excellence Deep Learning Mastery:...
Performance
Full time
Scale AI
San Francisco, CA
5 days ago
Technical Lead Manager, Autonomy Evaluation and Intelligence
$235.03k - $352.29k
...Softbank, Fidelity, T. Rowe Price, and other leading investors. About the Role We are... ...for an Autonomy Leader to drive the technical roadmap for the systems that validate... ...algorithms that measure the cognitive performance of the ML models powering our self-driving car's...
Performance
Immediate start
Flexible hours
Nuro
San Francisco, CA
7 days ago
Technical Lead Manager (TLM), ML Simulation
$238k - $302k
...collaborate across teams to bring ML to production systems and... ...incessant drive to improve the performance of our technology stack. This... ...Manager. You will: Lead a top-tier applied ML team... ...travel millions of miles. Drive technical direction, and provide...
Performance
Full time
Remote work
Waymo
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Technical Lead for Inference & ML Performance. Be the first to apply!