LLM Inference & Model-Performance Engineer
Baseten
A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT. The ideal candidate will have a relevant degree and strong programming skills in Python or C++. Competitive compensation, generous PTO, and full medical benefits are offered. Join us in shaping the future of AI! #J-18808-Ljbffr Baseten
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ...inference for large language models (LLMs). Our mission is to... ...infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency...PerformanceFull time- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...Performance
- ...without needing to be a distributed systems expert. About the Role As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an incredibly critical role to Anyscale as it...Performance
- Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...Performance
- ...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed...Performance
$180k - $270k
...for productivity. The role involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering skills, particularly in...Performance- Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed... ...inference engines that support multimodal models, optimizing frameworks for low-...Performance
- ...AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over 5...PerformanceRemote work
- ...inventive research, design, and engineering. Our organization is very... ...the Role You will lead the Model Routing & Inference team at Cursor, owning the... ..., optimizing for cost and performance. Designing routing... ...frameworks (vLLM, TensorRT‑LLM, TGI), load balancing, or building...Performance
- ...practice of medicine—and the inference systems that power them... ...class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers...PerformanceHourly payFull timeFlexible hours
$74.38 - $83.8 per hour
...Specialty Software Engineer - API Developer - GenAI... ...Rather than building models, you'll be responsible... ...Exposure to model serving or inference gateways... ...Kubernetes GenAI, LLM, or agent-based frameworks... ...and optimize service performance Contribute to architecture...PerformanceFull timeContract workTemporary workFlexible hours- Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses... ...ML systems while enhancing core performance metrics across model execution. You'... ...C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity...PerformanceRemote job
- ...Model Implementation Engineer Sciforium is an AI infrastructure company developing next-generation... ...models across modalities, ensuring high performance and reliability from day one. You... ...with large-scale model training or inference systems. Contributions to open-...PerformanceFlexible hours
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2... ...in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation of $220,000...PerformanceLocal area$167.2k - $209k
...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role,... ...can deploy and scale their models with industry-leading performance and reliability. This is a... ...inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve....PerformanceLocal areaRemote workWorldwideFlexible hours- ...looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role... ...routing, cluster management, and performance. The ideal candidate has a strong background... ...-throughput systems and software engineering fundamentals, combined with...Performance
$350k
...committed researchers, engineers, policy experts, and business... ...Role Anthropic's inference fleet serves Claude to... ...: accelerator kernels, model servers, distributed... ...measure how the fleet performs against its theoretical... ...infrastructure or general LLM serving stacks. Direct...PerformanceWork at officeVisa sponsorshipFlexible hours- ...developing open weight models for individuals,... ...As a Forward Deployed Engineer Lead, Post-Training, you... ...that measure real-world performance. Own the data pipeline... ...infrastructure teams to ensure inference performance, cost... ...2+ years focused on LLM post-training in a...PerformanceRelocation package
$295k
...About the Team Our Inference team brings OpenAI's most capable... ...access our start-of-the-art AI models, allowing them to do things... ...able to before. We focus on performant and efficient model inference... ...Role We are looking for an engineer who wants to take the world's...Performance$180k - $270k
...to elevate productivity and performance through note‑taking solutions... ...on. Possess strong software engineering skills (especially in Python)... ...can run at scale against live model checkpoints. Can deeply partner... ..." looks like for a Speech LLM, translating capabilities (like...PerformanceFull timeWork at officeWorldwide- ...inherent to generative models. CTGT is the... ...reliable, controllable, and performant in practice. Our mission... ...the gap between LLM capabilities and domain... ...Senior Machine Learning Engineer will operate deep within... ...policy enforcement at inference time. Who You Are...Performance
- Slope is seeking a Founding Compiler Engineer in San Francisco, responsible for designing core compiler infrastructure and optimizing AI models. You will write CUDA kernels and conduct performance reviews, contributing to Luminal's mission of making AI workloads portable...PerformanceFull time
- ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems... ..., and making state-of-the-art speech models actually behave correctly in realtime... ...The team already operates beyond the performance of most publicly available realtime...PerformanceRemote workFlexible hours
- ...do. We're pioneering the model architectures that will make... ...model innovation and systems engineering paired with a design-minded... ...the Role We're hiring an Inference Engineer to advance our... ...systems with high demands on performance, reliability, and observability...PerformanceWork at officeVisa sponsorshipFlexible hours
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...Performance- PetsApp is looking for strong engineers to evaluate and benchmark LLM models at their San Francisco office. The role involves analyzing model performance and working closely with both open-source and closed-source model labs. Candidates should have significant Python experience...PerformanceWork at officeRelocation package
$300k
...firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate... ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300...PerformanceVisa sponsorshipRelocation package- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...Performance
- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize... ...kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming experience...PerformanceFlexible hours
- ...THE ROLE You build and operate the inference systems that serve our models in production. The work spans... ...running real workloads. This is an engineering role, not a research role. You'll... ...models at high throughput Own the performance characteristics of those systems end...Performance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM Inference & Model-Performance Engineer. Be the first to apply!
- performance improvement coordinator San Francisco, CA
- IT performance management San Francisco, CA
- senior performance engineer San Francisco, CA
- senior performance tester San Francisco, CA
- acting performance San Francisco, CA
- performance test architect San Francisco, CA
- performance engineer San Francisco, CA
- performance improvement specialist San Francisco, CA
- performance food group San Francisco, CA
- system performance engineer San Francisco, CA

