Inference Engineer

Cartesia, Inc.

About Cartesia Our mission is to architect AI that learns from and interacts with the world like humans do. We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences. We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI. About the Role We're hiring an Inference Engineer to advance our mission of building real-time multimodal intelligence. Your Impact Design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models using Transformers, SSMs and hybrid models. Work closely with our research team and product engineers to serve our suite of products in a fast, cost-effective, and reliable manner. Design and build robust inference infrastructure and monitoring for our products. Have significant autonomy to shape our products and directly impact how cutting-edge AI is applied across various devices and applications. What You Bring Given the scale and difficulty of problems we work on, we value strong engineering skills at Cartesia. Strong engineering skills, comfortable navigating complex codebases and an eye for writing clean and maintainable code. Experience building large-scale distributed systems with high demands on performance, reliability, and observability. Technical leadership with the ability to execute and deliver zero-to-one results amidst ambiguity. Background in or experience working on inference pipelines with machine learning and generative models. Experience implementing state of the art Machine Learning models and research to applied problems. Preferable: experience with vLLM, SGLang, Continuous Batching or other inference frameworks. Preferable: experience working in CUDA, Triton or similar More Details In-office policy: We’re an in-person team based out of offices in San Francisco, London and Bangalore. We love being in the office, hanging out together, and learning from each other every day. Visa sponsorship: We provide visa sponsorship support and assess each circumstance on a case-by-case basis. However, visa sponsorship is dependent on many factors, including the role you are applying for, and the location you are going to be based, and so we can’t always guarantee success. Your Recruiter will work with you to understand your visa sponsorship needs from the first call. We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality or design along the way. We support each other. We have an open & inclusive culture that’s focused on giving everyone the resources they need to succeed. Our Benefits Compensation. Competitive base salary alongside attractive equity package. Commuter Allowance. A monthly stipend to help you get to and from the office. Flexible PTO. Take as much time as you need to recharge your batteries. Meals & Snacks. Lunch, dinner and plenty of snacks, provided daily. Your own personal Yoshi. #J-18808-Ljbffr Cartesia, Inc.

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Inference Engineer in San Francisco, CA vacancy

Staff Engineer, AI Inference & Distributed Systems
Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while focusing...
Suggested
Sail Research
San Francisco, CA
2 days ago
Staff Engineer, Inference & RL Systems — Scale Production ML
$225k
Dormont Manufacturing Co is looking for a Software Engineer on the Inference & RL Systems team in San Francisco. The role involves designing distributed systems, optimizing performance, and ensuring high reliability for RL and post-training workflows. The ideal candidate...
Suggested
Dormont Manufacturing Co
San Francisco, CA
6 days ago
Staff Engineer, ML Inference Systems
...is seeking a Member of Technical Staff focused on ML Systems & Inference in San Francisco, California. This role includes building and... ...production AI workloads. The ideal candidate has strong software engineering roots and experience in inference systems. You will influence...
Suggested
Acceler8 Talent
San Francisco, CA
4 days ago
System Engineering In
Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Senior Inference & RL Systems Engineer
$225k
...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that...
Suggested
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Production AI Inference Engineer — Scale & Impact
A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...
Flexible hours
Baseten
San Francisco, CA
12 days ago
Unlock Faster Inference: Developer Productivity Engineer
United States Digital Space LLC is seeking a Developer Productivity Engineer to enhance the systems for serving models in their Inference Runtime teams. This role is crucial in ensuring reliability and efficiency across various applications. Your responsibilities will include...
United States Digital Space LLC
San Francisco, CA
6 days ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast-growing voice AI company building the realtime speech infrastructure layer behind hundreds of millions of production conversations...
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
5 days ago
ML Inference Systems Engineer
...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
Gimlet Labs, Inc.
San Francisco, CA
2 days ago
Senior ML Inference Engineer Production Systems
MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
MakerMaker.AI
San Francisco, CA
4 days ago
Senior ML Inference Systems Engineer
...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
Gimlet Labs
San Francisco, CA
5 days ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT...
Baseten
San Francisco, CA
2 days ago
Multimodal Inference Engineer — Scale GPU AI Models
An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
Jobleads-US
San Francisco, CA
2 days ago
LLM Inference Engineer: Frameworks & Optimizations
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...
Together AI
San Francisco, CA
6 days ago
Distributed LLM Inference Engineer
...to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of performance for inference at large scale. This is...
Work at office
Anyscale
San Francisco, CA
2 days ago
INFERENCE ENGINEER
...Francisco, on-site ABOUT THE ROLE You build and operate the inference systems that serve our models in production. The work spans serving... ...that come with running real workloads. This is an engineering role, not a research role. You'll measure, profile, debug, and...
MakerMaker
San Francisco, CA
4 days ago
Inference Runtime Engineer for LLMs & Diffusion
Inferact is seeking an inference runtime engineer to enhance the performance and capabilities of LLM and diffusion model serving. This role requires expertise in optimizing model execution on various hardware architectures and has significant implications for AI inference...
Remote work
Inferact
San Francisco, CA
6 days ago
Founding Engineer, ML Inference
...Join a small, focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer... ...possible. About the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This...
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
3 days ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...state‑of‑the‑art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize... ...We are seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines...
Full time
Together AI
San Francisco, CA
6 days ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...expanding its AI Infrastructure layer to support the next generation of AI-driven applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you will be a key technical leader responsible for designing, developing, and...
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
6 days ago
GPU Kernel Engineer for AI Inference & Performance
FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
FriendliAI
San Francisco, CA
2 days ago
Distributed Training and Inference Engineer
...and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering... ...Sciforium is seeking a highly skilled Distributed Training and Inference Engineer to build, optimize, and maintain the critical software...
Flexible hours
Sciforium
San Francisco, CA
6 days ago
Senior Inference Performance Engineer — GPU & CUDA
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...
Local area
Inference
San Francisco, CA
6 days ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
inference.net
San Francisco, CA
2 days ago
Speech LLM Inference Engineer — Ultra-Low Latency Serving
$200k
Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position...
Work at office
Plaud
San Francisco, CA
2 days ago
LLM Inference Engineer: Scalable Serving (SF Onsite)
Gravity Engineering Services Pvt Ltd. is seeking a talented individual in San Francisco to architect and implement robust, scalable inference systems for AI models. This in-person role focuses on optimizing model serving infrastructures for high throughput and low latency...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Edge Inference Engineer: Optimize On-Device AI Kernels
Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...
Flexible hours
Liquid AI
San Francisco, CA
4 days ago
Realtime Inference Engineer — Scalable AI Serving
Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills...
Flexible hours
Cartesia
San Francisco, CA
5 days ago
Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
Anyscale
San Francisco, CA
2 days ago
Real-Time GPU Inference Optimization Engineer
$300k
...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal... ...understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base...
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Engineer. Be the first to apply!