Inference Engineer
Cartesia, Inc.
About Cartesia Our mission is to architect AI that learns from and interacts with the world like humans do. We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences. We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI. About the Role We're hiring an Inference Engineer to advance our mission of building real-time multimodal intelligence.
Your Impact
Your Impact
- Design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models using Transformers, SSMs and hybrid models.
- Work closely with our research team and product engineers to serve our suite of products in a fast, cost-effective, and reliable manner.
- Design and build robust inference infrastructure and monitoring for our products.
- Have significant autonomy to shape our products and directly impact how cutting-edge AI is applied across various devices and applications.
- Strong engineering skills, comfortable navigating complex codebases and an eye for writing clean and maintainable code.
- Experience building large-scale distributed systems with high demands on performance, reliability, and observability.
- Technical leadership with the ability to execute and deliver zero-to-one results amidst ambiguity.
- Background in or experience working on inference pipelines with machine learning and generative models.
- Experience implementing state of the art Machine Learning models and research to applied problems.
- Preferable: experience with vLLM, SGLang, Continuous Batching or other inference frameworks.
- Preferable: experience working in CUDA, Triton or similar
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Inference Engineer in San Francisco, CA vacancy
$200k - $280k
A leading AI company in San Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and post-training pipelines. Ideal candidates have 3+ years of...SuggestedFull time- Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton...Suggested
- Sail Research in San Francisco is looking for an individual to design and implement high-performance scheduling systems for AI inference processes. This role requires strong foundational knowledge in distributed systems and an eagerness to work closely with agent-based...Suggested
- ...systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that... ...call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back...SuggestedFlexible hours
- ...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-... ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...Suggested
- Acceler8 Talent is looking for a Software Engineer in San Francisco to focus on building and optimizing inference systems for next-generation AI at scale. You will design production inference pipelines and improve system performance under real production constraints. The...
$160k - $320k
A leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills, HPC experience, and knowledge of parallel programming techniques. Responsibilities include designing GPU kernels,...- FriendliAI is seeking a QA engineer in San Francisco to ensure the quality of its innovative AI inference platform. The ideal candidate will have at least 3 years of experience in software quality engineering, strong Python skills, and familiarity with testing frameworks...Flexible hours
- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...Flexible hours
- ...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
- ...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to...Full time- ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast-growing voice AI company building the realtime speech infrastructure layer behind hundreds of millions...Remote workFlexible hours
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
- ...Join a small, focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer... ...possible. About the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This...RelocationVisa sponsorshipRelocation package
- ...stealth, the company has already reached eight-figure revenue, raised an $80M Series A, and is scaling a world-class engineering team across inference, distributed systems, compiler infrastructure, and high-performance AI compute. Their platform automatically maps complex...
$180k - $270k
Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a...- Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills...Flexible hours
- An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...Flexible hours
- ...to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of performance for inference at large scale. This is...Work at office
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
- AI Chopping Block, Inc. is looking for a specialized role to model inference performance across application, model, and fleet layers. Responsibilities include building performance models and analyzing inference workloads to identify and optimize bottlenecks. Ideal candidates...
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...Local area$180k - $250k
fal, located in San Francisco, is looking for a skilled individual to help maintain generative media models' performance. You will design and implement innovative model serving architectures while working with the Applied ML team and customers. The ideal candidate has expertise...- A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
$36.06 - $40.87 per hour
...health worldwide. Summary : The Technical Support Field Engineer provides on-site technical support for Dentsply Sirona Imaging... ..., certifications, transcripts and languages spoken); and inferences from personal information collected (e.g., a profile reflecting...Hourly payWork experience placementWork at officeRemote workWorldwideFlexible hoursNight shift$293k
...About the Role We are seeking a Tokens-as-a-Service (TaaS) Engineer to help build the systems that convert large-scale... ...or workload optimization. Familiarity with model porting, inference/training workloads, token economics, or compute efficiency analysis...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Inference Engineer. Be the first to apply!

