Staff Technical Lead for Inference & ML Performance
Fal
Staff Technical Lead for Inference & ML Performance
San Francisco
fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.
As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.
Why This Role Matters
You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.
What You'll Do
Day-to-day - What success looks like
Set technical direction. Guide your team (kernels, applied performance, ML compilers, distributed inference) to build high-performance inference solutions. - fal's inference engine consistently outperforms industry benchmarks in throughput, latency, and efficiency.
Hands-on IC leadership. Personally contribute to critical inference performance enhancements and optimizations. - You regularly ship code that significantly improves model serving performance.
Collaborate closely with research & applied ML teams. Influence model inference strategies and deployment techniques. - Seamless integration of inference innovations rapidly moves from research to production deployment.
Drive advanced performance optimizations. Implement model parallelism, kernel optimization, and compiler strategies. - Performance bottlenecks are quickly identified and eliminated, dramatically enhancing inference speed and scalability.
Mentor and scale your team. Coach and expand your team of performance-focused engineers. - Your team independently innovates, proactively solves complex performance challenges, and consistently levels up their skills.
You Might Be A Fit If You
- Are deeply experienced in ML performance optimization. You've optimized inference for large-scale generative models in production environments.
- Understand the full ML performance stack. From PyTorch, TensorRT, TransformerEngine, Triton to CUTLASS kernels, you've navigated and optimized them all.
- Know inference inside-out. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling.
- Lead from the front. You're a respected IC who enjoys getting hands-on with the toughest problems, demonstrating excellence to inspire your team.
- Thrive in cross-functional collaboration. Comfortable interfacing closely with applied ML teams, researchers, and stakeholders.
Nice-to-Haves
- Experience building inference engines specifically for diffusion and generative media models
- Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
- Leadership experience in scaling technical teams
What You'll Get
One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity.
Sound like your calling? Share your proudest optimization breakthrough, open-source contribution, or performance milestone with us. Let's set new standards for inference performance, together.
- .... About the Role As a Technical Lead on the Future of Computing Research... ...with both the best ML researchers in the world and... ...Analyze and model system performance, identifying tradeoffs between... ...implementing the low-level inference stack, including kernel development...PerformanceWork at officeRelocation package
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA | Seattle, WA About Anthropic Anthropic'... ...ensure our LLMs meet rigorous safety, performance, and security standards. What... ...LLM serving; prior inference or ML experience is not required Thrive...PerformanceWork at officeVisa sponsorshipFlexible hours- Jaide Health is seeking experienced Members of Technical Staff to join their Model Serving team. This role involves developing and deploying high-performance AI platforms that power advanced NLP applications. You will enhance model delivery through optimized API endpoints...Performance
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating... ...our applications of AI & ML are bringing humanity and... ...experiences and scalable, high-performance AI infrastructure. At... ..., research scientists, technical program managers, and product...PerformanceFull timePart timeLocal area$380k
...the Role We're looking for a GPU Inference Engineer to contribute to improvements... ...drive initiatives to optimize inference performance and scalability. You'll also be engaged... ...leverage initiatives by building a stronger technical foundation. In this role you will:...PerformanceWork at officeRelocation package$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe... ...RDMA (RoCE, Infiniband, NVLink) ~ Experience with high performance large scale ML systems ( ~100 GPUs) ~ Experience with FFmpeg and...Performance$142.2k - $204.6k
...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the... ...background (3+ years or equivalent) in performance-critical systems Solid understanding of ML inference internals: attention, MLPs, recurrent modules...PerformanceLocal areaWorldwide- ...About the Team Our Inference team brings OpenAI's most capable research and technology... ...ve never been able to before. We focus on performant and efficient model inference, as well as... ...you: Have an understanding of modern ML architectures and an intuition for how to...Performance
- ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms that allow applied... ...or container orchestration. Familiarity with GPU-based ML workloads or distributed training/inference systems. Experience...PerformanceInternshipImmediate start
- ...'re looking for a Tech Lead Manager (TLM) to own and... ...time on hands-on technical work and 30% on people... ...the models, agents, and ML systems that power Our... ...mentor, and grow a high-performing team of ML and AI... ...model training and inference infrastructure, set standards...PerformanceRemote workFlexible hours
- ...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You'll lead the development and scaling... ...and high-throughput GPU inference. This is a role that... ...parallelism. Optimize system performance by managing resource...PerformanceWork experience placementWork at officeLocal areaRemote workWork from homeHome office
- A leading AI technology company in San Francisco is seeking a Tech Lead Manager focused on machine learning performance. In this role, you will manage and mentor a team while driving optimization projects. Ideal candidates have over 5 years of software engineering experience...Performance
- ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own... ...and admission control with clear SLOs Own performance dashboards and capacity planning Requirements... ...3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA...PerformanceWork at officeVisa sponsorshipRelocation package
$175k - $225k
...participation from other leading venture capital firms.... ...'re looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware... ...Experience with ML compilers or compiler-level... ...ll be joining a deeply technical team obsessed with...PerformanceLocal areaRemote work$255k - $345k
...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You’ll lead the development and scaling... ...and high‑throughput GPU inference. This is a role that... ...parallelism. Optimize system performance by managing resource utilization...PerformanceWork experience placementWork at officeLocal areaRemote workWork from homeHome office$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model... ...keep up with rapidly growing traffic. Performance optimisation. Profile and fix... ...out for you. Good if you touched any of ML compilers and framework internals: PyTorch...Performance- ...Baseten powers mission‑critical inference for the world's most dynamic... ..., software development, performance engineering, and customer‑facing... ...aspects of product management, technical customer success, and pre‑... ...Python due to its relevance in ML projects. Drive customer...PerformanceWork experience placementFlexible hours
$160k - $250k
...Senior Backend Engineer, Inference Platform San... ...boundaries of inference performance and efficiency.... ...A culture of deep technical ownership and high impact... .... Collaborate with ML researchers to bring new... ...We have contributed to leading open-source research,...PerformanceFull timeLocal area- ..."Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will:... ...Leadership & Solution Design Lead the design and delivery... ...directly enhance model performance for customer use‑cases.... ...distributed training or inference pipelines....PerformanceFull timeWork at officeRemote workFlexible hours
- ...Technical Lead - Software Development - FTE - Bay Area, CA This will be a 2-days-hybrid role... ..., CA. • As our Tech Lead in AI/ML, you will play a crucial role in the hands... ...production, evaluating and improving their performance and accuracy over time. •...Performance
- ...Tech Lead, AI Compute Infrastructure Los Angeles... ...are seeking a seasoned Technical Leader to build and scale... ...directly impact model performance, developer productivity... ...of devices for inference, training, data processing... ...~ Experience with core ML frameworks such as PyTorch...PerformanceFull time
$332k - $421k
...Principal Software Engineer, ML Flywheel Technical Lead Waymo is an autonomous driving technology company with the mission to be the world... ...mining, selection and curation on the modeling pipeline performance. Articulate the strategy for incorporating diverse data...PerformanceFull timeRemote work- ...and machines can talk to. As a Tech Lead for the Applied Computer Vision... ...grade features within our tech stack. Technical Leadership: Lead the design and implementation... ...quality algorithms and general ML code for high-performance execution on CPU and GPU....PerformanceWork at office3 days per week
- ...Platform Engineer - Training & Inference Saviynt's AI-powered... ...protect and empower the world's leading brands, Fortune 500 companies... ...model at Saviynt. We need an ML Platform Engineer to own distributed... ...sharing. Optimise inference performance: configure fractional GPU...Performance
$151.5k - $244.2k
...Operationalization Build ML deployment pipelines-... ...observability: drift detection, performance alerting, and lifecycle management... ...for scientific or technical applications. Preferred Qualifications... ...VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for...PerformanceFull timeFlexible hours- ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In this role, you will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for...PerformanceWorldwideFlexible hours
$281k - $356k
...Technical Lead Manager, Simulator Efficiency Waymo is an autonomous driving... ...group of machine learning (ML) engineers, software... ...that measure and enhance the performance of the Waymo Driver. We achieve... ...you will report to a Senior Staff Engineering Manager. You will...PerformanceFull timeRemote work$110k - $150k
...to help them hire. Title of Role: Technical Projects Lead Location: San Francisco, CA (FiDi... ...- $150,000 base + Meaningful Equity + Performance Bonus (potential $300K+) Visa: Visa... ...proficiency in any major language AI / ML infrastructure interest Project...PerformanceInternshipWork at officeRemote workVisa sponsorshipRelocation package$252k - $315k
...end-to-end solutions for the ML lifecycle. You will work closely... ...generation LLM training, inference and data curation. If you are... ...skills, experience, interview performance, and relevant education or... ...technologies that power the world's leading models, and help enterprises...PerformanceFull time$251k - $310k
...Staff Technical Lead Manager, Behaviors Waymo is an autonomous driving technology company with... ...an impact on scaling our Waymo Driver's performance and maintaining and improving our excellent... ...workflows and building metrics for ML models The expected base salary range...PerformanceFull timeRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Technical Lead for Inference & ML Performance. Be the first to apply!
- technical lead manager San Francisco, CA
- technical leader San Francisco, CA
- salesforce technical lead San Francisco, CA
- technical lead San Francisco, CA
- performance coach San Francisco, CA
- human performance consultant San Francisco, CA
- senior performance engineer San Francisco, CA
- lead performance test engineer San Francisco, CA
- high performance computing engineer San Francisco, CA
- performance nutrition San Francisco, CA

