AI Inference Performance Engineer

FATHOM

Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques, and strong Python skills. This role focuses on improving real systems that impact millions of meetings, ensuring efficient GPU utilization, and debugging production issues. Fathom offers competitive compensation and a supportive environment for personal and professional growth. #J-18808-Ljbffr Fathom

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer in San Francisco, CA vacancy

Multimodal Inference Engineer — Scale GPU AI Models
...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing... ...models, focusing on high-performance delivery of audio and image inputs.... ...product teams to push the boundaries of AI technology, ensuring reliable production...
Performance
Jobleads-US
San Francisco, CA
1 day ago
GPU Kernel Engineer for AI Inference & Performance
FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
Performance
FriendliAI
San Francisco, CA
1 day ago
Edge Inference Engineer: Optimize On-Device AI Kernels
Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...
Performance
Flexible hours
Liquid AI
San Francisco, CA
3 days ago
Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
Performance
Anyscale
San Francisco, CA
1 day ago
Staff Engineer, AI Inference & Distributed Systems
Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while...
Performance
Sail Research
San Francisco, CA
1 day ago
Senior ML Inference Engineer Production Systems
MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
Performance
MakerMaker.AI
San Francisco, CA
3 days ago
Senior ML Inference Systems Engineer
A tech startup focused on AI workloads is seeking a Member of... ...Staff to design and optimize inference systems. The role involves managing... ...and improving execution performance across various components.... ...should have strong software engineering skills and experience with ML...
Performance
Gimlet Labs
San Francisco, CA
4 days ago
GPU Kernel Engineer: Build Fast AI Inference at Scale
...A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal...
Performance
Baseten
San Francisco, CA
5 days ago
Senior Model Inference Engineer for Production-Scale AI
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate... ...collaboration with researchers and focus on performance optimization. Compensation ranges from $325K to $4...
Performance
Jobleads-US
San Francisco, CA
1 day ago
ML Inference Systems Engineer
...Technical Staff focused on ML systems and inference in San Francisco. You will design and... ...fast, predictable, and scalable performance. Key responsibilities include optimizing... ...should have strong foundations in software engineering, experience with ML inference systems,...
Performance
Gimlet Labs, Inc.
San Francisco, CA
1 day ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
Performance
Baseten
San Francisco, CA
5 days ago
Speech LLM Inference Engineer — Ultra-Low Latency Serving
$200k
Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position...
Performance
Work at office
Plaud
San Francisco, CA
1 day ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
Performance
inference.net
San Francisco, CA
1 day ago
LLM Inference & Optimization Engineer
Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Kernel Engineer for High-Performance AI Kernels
$225k
Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA...
Performance
Magic Inc
San Francisco, CA
1 day ago
Real-Time GPU Inference Optimization Engineer
$300k
...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will possess strong... ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive...
Performance
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
4 days ago
Inference Runtime Engineer for LLMs & Diffusion
Inferact is seeking an inference runtime engineer to enhance the performance and capabilities of LLM and diffusion model serving. This role requires expertise in... ...architectures and has significant implications for AI inference. The ideal candidate must possess a bachelor...
Performance
Remote work
Inferact
San Francisco, CA
10 days ago
Senior Inference Performance Engineer — GPU & CUDA
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...
Performance
Local area
Inference
San Francisco, CA
5 days ago
INFERENCE ENGINEER
...ROLE You build and operate the inference systems that serve our models... ...real workloads. This is an engineering role, not a research role.... ...at high throughput Own the performance characteristics of those systems... ...systems) isn't appealing #J-18808-Ljbffr MakerMaker.AI
Performance
MakerMaker.AI
San Francisco, CA
3 days ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds... ...This role is with a fast-growing voice AI company building the realtime speech infrastructure... .... The team already operates beyond the performance of most publicly available realtime...
Performance
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
4 days ago
Distributed LLM Inference Engineer
...tech stacks to accelerate the progress of AI applications out into the real world.... ...expert. About the Role As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
LLM Inference Frameworks and Optimization Engineer
About the Role At Together.ai, we are building state-of-the-art... ...efficient and scalable inference for large language models (LLMs... ...infrastructure, pushing the boundaries of performance, scalability, and cost-... ...Frameworks and Optimization Engineer to design, develop, and...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Founding Engineer, ML Inference
...focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer... ...the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This is a highly technical, high-impact...
Performance
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
2 days ago
Inference Engineer
...Our mission is to architect AI that learns from and interacts... ...model innovation and systems engineering paired with a design-minded product... ...the Role We're hiring an Inference Engineer to advance our... ...systems with high demands on performance, reliability, and observability...
Performance
Work at office
Visa sponsorship
Flexible hours
Cartesia, Inc.
San Francisco, CA
4 days ago
GPU Inference Engineer - Robotics (Hybrid SF)
Centaur Labs is seeking a GPU Inference Engineer to enhance model serving efficiency for Robotics... ...impact role involves optimizing inference performance and collaborating with research teams... ...be excited about deploying effective AI solutions. #J-18808-Ljbffr Centaur Labs
Performance
Centaur Labs
San Francisco, CA
3 days ago
LLM Inference Engineer - Distributed Systems at Scale
Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
Senior GPU Kernel Engineer - Accelerate AI Training Systems
MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...
Performance
MakerMaker
San Francisco, CA
2 days ago
Unlock Faster Inference: Developer Productivity Engineer
...Space LLC is seeking a Developer Productivity Engineer to enhance the systems for serving models in their Inference Runtime teams. This role is crucial in ensuring... ...to ensure swift operations without compromising performance. If you enjoy improving developer experiences and...
Performance
United States Digital Space LLC
San Francisco, CA
5 days ago
Engineer, Inference & Model serving
$220k - $320k
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models...
Performance
3 days per week
Trades Workforce Solutions
San Francisco, CA
4 days ago
Senior Inference & RL Systems Engineer
$225k
...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you... ...What you’ll work on Design and scale high‑performance inference serving systems Optimize KV‑cache...
Performance
Relocation
Visa sponsorship
Magic
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!