Software Engineer (Model Performance)

BaseTen

ABOUT BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting‑edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE Are you passionate about advancing the application of artificial intelligence? We are looking for a Software Engineer focused on ML performance to join our dynamic team. This role is ideal for someone who thrives in a fast‑paced startup environment and is eager to make significant contributions to the exciting field of LLM Inference. If you are a backend engineer who thrives on making things faster and is excited about open‑source ML models, we look forward to your application. EXAMPLE INITIATIVES You'll get to work on these types of projects as part of our Model Performance team: Baseten Embeddings Inference: The fastest embeddings solution available The Baseten Inference Stack Driving model performance optimization RESPONSIBILITIES Implement, refine, and productionize cutting‑edge techniques (quantization, speculative decoding, kv cache reuse, chunked prefill and LoRA) for ML model inference and infrastructure. Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT‑LLM, vllm, sglang, CUDA, and other libraries to debug ML performance issues. Apply and scale optimization techniques across a wide range of ML models, particularly large language models. Collaborate with a diverse team to design and implement innovative solutions. Own projects from idea to production. REQUIREMENTS Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field. Experience with one or more general‑purpose programming languages, such as Python or C++. Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous batching). Strong familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT‑LLM. Demonstrated interest and experience in LLM’s. Deep understanding of GPU architecture. Bonus: Proficiency in enhancing the performance of software systems, particularly in the context of large language models (LLMs). Experience with CUDA or similar technologies. Deep understanding of software engineering principles and a proven track record of developing and deploying AI/ML inference solutions. Experience with Docker and Kubernetes. BENEFITS Competitive compensation, including meaningful equity. 100% coverage of medical, dental, and vision insurance for employee and dependents Generous PTO policy including company‑wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!) Paid parental leave Company‑facilitated 401(k) Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities. Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward‑thinking team, we would love to hear from you. At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status. #J-18808-Ljbffr

Apply

Vacancy posted 15 hours ago

Similar jobs that could be interesting for youBased on the Software Engineer (Model Performance) in San Francisco, CA vacancy

Software Engineer, Model Inference
$325k
...access our start-of-the-art AI models, allowing them to do things... ...able to before. We focus on performant and efficient model inference... ...the Role We are looking for an engineer who wants to take the world's... ...5 years of professional software engineering experience. Have...
Performance
OpenAI
San Francisco, CA
5 days ago
Software Engineer - Model APIs
...at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently... .... Join us and help build the platform engineers turn to to ship AI products. THE ROLE: Baseten’s Model Performance (MP) team is responsible for ensuring the...
Performance
Flexible hours
BaseTen
San Francisco, CA
15 hours ago
Software Engineer, Productivity - Model Performance
$230k - $385k
...above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the... ...results, or market conditions. About the Team We’re hiring software engineers to make OpenAI’s Model Performance teams more productive. These teams work on...
Performance
Full time
Work at office
Local area
Relocation package
Flexible hours
Slope
San Francisco, CA
14 hours ago
Software Engineer (Model Evaluation & Benchmarking)
...Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems... ...automated benchmarking, dataset-driven testing, and performance validation pipelines. You will work at the intersection of...
Performance
SpreeAI
San Francisco, CA
1 day ago
Software Engineer (Model Hub)
$173.11k - $234.39k
...Location Type Hybrid Department Engineering Compensation $173,113 – $234... ..., qualifications, interview performance, and work location. We are... ...data, and run AI agents and models directly in their workflows.... ...QUALIFICATIONS 3+ years of software engineering or equivalent...
Performance
Full time
Work at office
Local area
Flexible hours
Shift work
3 days per week
Menlo Ventures
San Francisco, CA
15 hours ago
Senior Software Engineer - Model Performance
$220k - $320k
...squeezing every last drop of performance out of GPUs, diving deep into... ...and hosts specialized language models for companies that need... ...well-funded ten-person team of engineers who work in-person in downtown... ...has founded and run their own software companies. We are high-agency...
Performance
Work at office
Inference
San Francisco, CA
2 days ago
Senior Software Engineer, Model Serving
$166k - $225k
...to improve their business. Databricks’ Model Serving product provides enterprises with... ...strong SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping... ...decisions and trade-offs to optimize performance, throughput, autoscaling, and...
Performance
Local area
Worldwide
Cacheflow
San Francisco, CA
16 hours ago
Model Performance Software Engineer, Claude Code
$405k
...group of committed researchers, engineers, policy experts, and... ...We're looking for a Staff Software Engineer to set technical direction... ...eval frameworks that measure model capabilities across diverse... ...technical initiatives in high-performance, demanding environments—trading...
Performance
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
Senior Software Engineer, AI Model Lifecycle
$172.43k - $230.95k
...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the... ...partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at...
Performance
Temporary work
Crusoe
San Francisco, CA
4 days ago
Staff Software Engineer, Model LifeCycle
$300 per month
...and intelligence. We’re crafting the engine that powers a world where people can... ...role About this role: The Staff Software Engineer for the Model LifeCycle team will play a key role... ...source AI projects. Experience with performance optimizations on GPU systems and inference...
Performance
Temporary work
Crusoe Energy Systems LLC
San Francisco, CA
15 hours ago
Staff Software Engineer, Foundational Model Serving
$192k - $260k
...improve their business. Foundation Model Serving is the API Product for hosting... ...is necessary. We're looking for engineers who have owned high scale operational sensitive... ...decisions and trade-offs to optimize performance, throughput, autoscaling, and operational...
Performance
Local area
Worldwide
Databricks
San Francisco, CA
2 days ago
Senior AI Model Serving Engineer Low-Latency Inference
...data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in... ...distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong...
Performance
Jobleads-US
San Francisco, CA
15 hours ago
Staff Software Engineer, Model Serving
$192k - $260k
...to improve their business. Databricks’ Model Serving product provides enterprises with... ...SLAs and cost efficiency. As a Staff Engineer, you’ll play a critical role in shaping... ...architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and...
Performance
Local area
Worldwide
Cacheflow
San Francisco, CA
3 days ago
AI Model Evaluation Engineer — Benchmarking & Validation
A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
Performance
SpreeAI
San Francisco, CA
1 day ago
Senior Model Serving Engineer - Low-Latency AI Platform
...AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product... ..., and research teams to ensure high-performance system delivery. The ideal candidate has over 10...
Performance
Menlo Ventures
San Francisco, CA
15 hours ago
AI Engineer - Model Performance
Role Overview We’re hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the... ...one). Benefits The opportunity to shape the foundational software services of a growing company. A role that balances innovation...
Performance
Fathom
San Francisco, CA
3 days ago
Lead Software Engineer, Model Serving Platform
...developing next-generation multimodal AI models and a proprietary, high-efficiency... ...from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the... ...generation model serving platform , the high-performance engine that will bring a multimodal,...
Performance
Work at office
Flexible hours
Sciforium
San Francisco, CA
14 hours ago
Senior AI Infrastructure Engineer, Model Serving Platform
$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for... ...Build and maintain fault-tolerant, high-performance systems for serving LLMs workloads at... ...and engineers to integrate and optimize models for production and research use cases....
Performance
Full time
Scale AI
San Francisco, CA
7 days ago
Software Engineer, Model Serving
...Python Kubernetes ML infrastructure Requirements Mid level Visa Sponsorship Not mentioned Relocation Not mentioned About the Role ML model serving infrastructure engineer. Interested in this role? Apply directly on Baseten's website Apply for this Position #J-18808-Ljbffr...
Relocation
Visa sponsorship
Repovive, Inc.
San Francisco, CA
14 hours ago
Speech LLM Model Evaluations Engineer - Hybrid
$180k - $270k
...involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering skills, particularly in Python, and the ability to...
Performance
Plaud
San Francisco, CA
1 day ago
Model API Engineer: Fast, Scalable AI Inference
...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed...
Performance
BaseTen
San Francisco, CA
15 hours ago
Senior Engineering Manager, Real-Time Model Serving
...A leading technology company in San Francisco is looking for a Senior Engineering Manager to oversee the Model Serving product. This role involves leading a high-performing engineering team, defining the product roadmap, and ensuring the product meets rigorous performance...
Performance
Databricks
San Francisco, CA
15 hours ago
Research Engineer - Language Model Pre-Training
...in San Francisco, California. The Role: As a Research Engineer - Language Model Pre-Training , you'll shape our language model roadmap... ...Large-scale training runs and model parallelization Performance optimization of our pretraining stack Dataset collection...
Performance
Work at office
Relocation package
Zyphra
San Francisco, CA
7 days ago
Staff Research Engineer: AI Model Efficiency & Speed
...research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement advanced techniques to optimize model performance in production. Ideal candidates will hold a PhD in...
Performance
Remote work
Cohere
San Francisco, CA
11 days ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
Performance
BaseTen
San Francisco, CA
15 hours ago
Senior Model Inference Engineer for Production-Scale AI
$325k
...company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production... ...ideal candidate has over 5 years of software engineering experience, strong familiarity... ...with researchers and focus on performance optimization. Compensation ranges...
Performance
OpenAI
San Francisco, CA
15 hours ago
Founding Compiler Engineer - AI/ML Model Optimizer
...Slope is seeking a Founding Compiler Engineer in San Francisco, responsible for designing core compiler infrastructure and optimizing AI models. You will write CUDA kernels and conduct performance reviews, contributing to Luminal's mission of making AI workloads portable...
Performance
Full time
Slope
San Francisco, CA
15 hours ago
Senior Engineering Manager, Real-Time Model Serving
...Databricks is seeking a Senior Engineering Manager to lead the Model Serving team, responsible for both customer-facing capabilities and foundational... .... You will define product roadmaps and ensure high performance and reliability across systems. The ideal candidate has...
Performance
I did my part and supported the Regular Toilet
San Francisco, CA
1 day ago
AI Inference & Model Routing Lead
...looking for an experienced leader for the Model Routing & Inference team in San... ...traffic routing, cluster management, and performance. The ideal candidate has a strong background in high-throughput systems and software engineering fundamentals, combined with leadership...
Performance
Anysphere
San Francisco, CA
16 hours ago
Technical Business Development (Model Labs)
$220k - $270k
...reliability. By combining high-performance inference, orchestration,... ...companies, particularly Model Labs, focused on driving successful... ...the AI, tech, or enterprise software industries. Prior... ...with diverse teams, including engineering, product, and customer success...
Performance
Temporary work
Currently hiring
Relocation
Visa sponsorship
fal
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer (Model Performance). Be the first to apply!