Staff Technical Lead for Inference & ML Performance
Fal
Staff Technical Lead for Inference & ML Performance
San Francisco
fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.
As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on.
Why This Role Matters
You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.
What You'll Do
Day-to-day - What success looks like
Set technical direction. Guide your team (kernels, applied performance, ML compilers, distributed inference) to build high-performance inference solutions. - fal's inference engine consistently outperforms industry benchmarks in throughput, latency, and efficiency.
Hands-on IC leadership. Personally contribute to critical inference performance enhancements and optimizations. - You regularly ship code that significantly improves model serving performance.
Collaborate closely with research & applied ML teams. Influence model inference strategies and deployment techniques. - Seamless integration of inference innovations rapidly moves from research to production deployment.
Drive advanced performance optimizations. Implement model parallelism, kernel optimization, and compiler strategies. - Performance bottlenecks are quickly identified and eliminated, dramatically enhancing inference speed and scalability.
Mentor and scale your team. Coach and expand your team of performance-focused engineers. - Your team independently innovates, proactively solves complex performance challenges, and consistently levels up their skills.
You Might Be A Fit If You
- Are deeply experienced in ML performance optimization. You've optimized inference for large-scale generative models in production environments.
- Understand the full ML performance stack. From PyTorch, TensorRT, TransformerEngine, Triton to CUTLASS kernels, you've navigated and optimized them all.
- Know inference inside-out. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling.
- Lead from the front. You're a respected IC who enjoys getting hands-on with the toughest problems, demonstrating excellence to inspire your team.
- Thrive in cross-functional collaboration. Comfortable interfacing closely with applied ML teams, researchers, and stakeholders.
Nice-to-Haves
- Experience building inference engines specifically for diffusion and generative media models
- Track record of industry-leading performance improvements (papers, open-source contributions, benchmarks)
- Leadership experience in scaling technical teams
What You'll Get
One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity.
Sound like your calling? Share your proudest optimization breakthrough, open-source contribution, or performance milestone with us. Let's set new standards for inference performance, together.
- .... About the Role As a Technical Lead on the Future of Computing Research... ...with both the best ML researchers in the world and... ...Analyze and model system performance, identifying tradeoffs between... ...implementing the low-level inference stack, including kernel development...PerformanceWork at officeRelocation package
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to... ...ensure our LLMs meet rigorous safety, performance, and security standards. Key... ...LLM serving; prior inference or ML experience is not required Thrive...PerformanceWork at officeVisa sponsorshipFlexible hours- Jaide Health is seeking experienced Members of Technical Staff to join their Model Serving team. This role involves developing and deploying high-performance AI platforms that power advanced NLP applications. You will enhance model delivery through optimized API endpoints...Performance
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe... ...RDMA (RoCE, Infiniband, NVLink) ~ Experience with high performance large scale ML systems ( ~100 GPUs) ~ Experience with FFmpeg and...Performance- ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own... ...and admission control with clear SLOs Own performance dashboards and capacity planning Requirements... ...3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA...PerformanceWork at officeVisa sponsorshipRelocation package
$325k
...About the Team Our Inference team brings OpenAI's most capable research and technology to the... ...never been able to before. We focus on performant and efficient model inference, as well as... ...role if you Have an understanding of modern ML architectures and an intuition for how to...Performance$142.2k - $204.6k
...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the... ...background (3+ years or equivalent) in performance-critical systems Solid understanding of ML inference internals: attention, MLPs, recurrent modules...PerformanceLocal areaWorldwide- A leading AI technology company in San Francisco is seeking a Tech Lead Manager focused on machine learning performance. In this role, you will manage and mentor a team while driving optimization projects. Ideal candidates have over 5 years of software engineering experience...Performance
- ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms that allow applied... ...or container orchestration. Familiarity with GPU-based ML workloads or distributed training/inference systems. Experience...PerformanceInternshipImmediate start
- ...Staff+ Software Engineer, Inference Runtime Remote-Friendly (Travel-Required) | San... ...a Staff Engineer to be a technical lead for Inference Runtime: the... ...serving stack, whose performance, correctness, and abstractions... ...systems engineering or ML infrastructure, with the...PerformanceWork at officeRemote workVisa sponsorshipFlexible hours
$175k - $225k
...participation from other leading venture capital firms.... ...'re looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware... ...Experience with ML compilers or compiler-level... ...ll be joining a deeply technical team obsessed with...PerformanceLocal areaRemote work$255k - $345k
...intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You’ll lead the development and scaling... ...and high‑throughput GPU inference. This is a role that... ...parallelism. Optimize system performance by managing resource utilization...PerformanceWork experience placementWork at officeLocal areaRemote workWork from homeHome office- ...Baseten powers mission‑critical inference for the world's most dynamic... ..., software development, performance engineering, and customer‑facing... ...aspects of product management, technical customer success, and pre‑... ...Python due to its relevance in ML projects. Drive customer...PerformanceWork experience placementFlexible hours
- ...tools being released daily. Staff Software Engineer / Technical Lead We're hiring a Staff... ...reliability, scalability, performance, and developer... ...services across infrastructure, ML systems, APIs, and customer... ...ranging from large-scale ML inference and model deployment to...PerformanceFull time
- ...Inference Engine Engineer We build and run the inference engine behind every Perplexity... ...Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar)... .... Good If You Touched Any Of ML compilers and framework internals:...Performance
$270k - $340k
...model (LLM) training and inference efficiency beyond what... ...- Scaling, you will lead a team of world‑class researchers... ..., and systems performance. Define the scaling research... .... Optimize end‑to‑end ML systems for distributed... ...talent, providing both technical guidance (research...PerformanceLocal areaWorldwide- ..."Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: Work... ...& Solution Design Lead the design and delivery... ...directly enhance model performance for customer use‑cases.... ...distributed training or inference pipelines. Understanding...PerformanceFull timeWork at officeRemote workFlexible hours
$160k - $250k
...Senior Backend Engineer, Inference Platform San... ...boundaries of inference performance and efficiency.... ...A culture of deep technical ownership and high impact... .... Collaborate with ML researchers to bring new... ...We have contributed to leading open-source research,...PerformanceFull timeLocal area- ...Platform Engineer – Training & Inference Saviynt's AI-powered identity... ...and empower the world's leading brands, Fortune 500 companies... ...model at Saviynt. We need an ML Platform Engineer to own distributed... ...sharing. Optimise inference performance: configure fractional GPU...Performance
$332k - $421k
...Principal Software Engineer, ML Flywheel Technical Lead Waymo is an autonomous driving technology company with the mission to be the world... ...mining, selection and curation on the modeling pipeline performance. Articulate the strategy for incorporating diverse data...PerformanceFull timeRemote work- ...Tech Lead, AI Compute Infrastructure Los Angeles... ...are seeking a seasoned Technical Leader to build and scale... ...directly impact model performance, developer productivity... ...of devices for inference, training, data processing... ...~ Experience with core ML frameworks such as PyTorch...PerformanceFull time
$151.5k - $244.2k
...Operationalization Build ML deployment pipelines—... ...observability: drift detection, performance alerting, and lifecycle management... ...for scientific or technical applications. Preferred... ...VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for...PerformanceFull timeFlexible hours- ...and machines can talk to. As a Tech Lead for the Applied Computer Vision... ...grade features within our tech stack. Technical Leadership: Lead the design and implementation... ...quality algorithms and general ML code for high-performance execution on CPU and GPU....PerformanceWork at office3 days per week
$264.8k - $331k
...end-to-end solutions for the ML lifecycle. You will work closely... ...generation LLM training, inference and data curation. If you... ...experience, qualifications, interview performance, and relevant education or... ...that power the world's leading models, and help enterprises...PerformanceFull time$190k - $250k
...Staff Software Engineer / Tech Lead, ML Infrastructure Heartflow is a medical technology... ...Engineer to act as the technical anchor for a small,... ...environment for both training and inference. We design our... ...infrastructure to not just be highly performant, but also easy to use....PerformanceFull timeWork at officeLocal areaWorldwideRelocation- ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In this role, you will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for...PerformanceWorldwideFlexible hours
$225k - $325k
...strategic partners. FDEs are technical builders: they ship end-to-end... ...technical teams. As a Tech Lead Manager, Forward Deployed Engineering... ..., growth conversations, and performance. ~ Must be someone who codes... ...haves Experience with AI/ML product integrations or...PerformanceFull timeWork at officeRemote workFlexible hours$248.8k - $311k
...Technical Lead Manager, Physical AI San Francisco, CA Scale AI is the data engine for the... ...robot deployment. You will lead a high-performing team of Research Engineers while remaining... .... Required Qualifications AI/ML Excellence Deep Learning Mastery:...PerformanceFull time$235.03k - $352.29k
...Softbank, Fidelity, T. Rowe Price, and other leading investors. About the Role We are... ...for an Autonomy Leader to drive the technical roadmap for the systems that validate... ...algorithms that measure the cognitive performance of the ML models powering our self-driving car's...PerformanceImmediate startFlexible hours$238k - $302k
...collaborate across teams to bring ML to production systems and... ...incessant drive to improve the performance of our technology stack. This... ...Manager. You will: Lead a top-tier applied ML team... ...travel millions of miles. Drive technical direction, and provide...PerformanceFull timeRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Technical Lead for Inference & ML Performance. Be the first to apply!
- technical lead manager San Francisco, CA
- technical leader San Francisco, CA
- technical lead San Francisco, CA
- performance improvement coordinator San Francisco, CA
- IT performance management San Francisco, CA
- senior performance engineer San Francisco, CA
- senior performance tester San Francisco, CA
- acting performance San Francisco, CA
- performance test architect San Francisco, CA
- performance engineer San Francisco, CA


