AI Inference Performance Engineer
FATHOM
Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques, and strong Python skills. This role focuses on improving real systems that impact millions of meetings, ensuring efficient GPU utilization, and debugging production issues. Fathom offers competitive compensation and a supportive environment for personal and professional growth. #J-18808-Ljbffr Fathom
- ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...Performance
$167.2k - $209k
...world. DigitalOcean is expanding its AI Infrastructure layer to support the... ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you... ...their models with industry-leading performance and reliability. This is a hands‑on...PerformanceLocal areaRemote workWorldwideFlexible hours- ...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing... ...models, focusing on high-performance delivery of audio and image inputs.... ...product teams to push the boundaries of AI technology, ensuring reliable production...Performance
- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...PerformanceFlexible hours
- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...Performance
- ...MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...Performance
- A tech startup focused on AI workloads is seeking a Member of... ...Staff to design and optimize inference systems. The role involves managing... ...and improving execution performance across various components.... ...should have strong software engineering skills and experience with ML...Performance
- ...on building and optimizing ML inference systems in San Francisco. The... ...pipelines and enhancing performance under real-world workloads. Candidates... ...should have strong software engineering skills, experience with ML... ...to work at the forefront of AI infrastructure technology. #J...Performance
- ...A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal...Performance
- ...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed systems...Performance
$325k
...A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate... ...collaboration with researchers and focus on performance optimization. Compensation ranges from $325K to $4...Performance- ...Technical Staff focused on ML systems and inference in San Francisco. You will design and... ...fast, predictable, and scalable performance. Key responsibilities include optimizing... ...should have strong foundations in software engineering, experience with ML inference systems,...Performance
$300k
...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will possess strong... ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive...PerformanceVisa sponsorshipRelocation package- ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems... ...This role is with a fast-growing voice AI company building the realtime speech... .... The team already operates beyond the performance of most publicly available realtime...PerformanceRemote workFlexible hours
- ...ABOUT THE ROLE You build and operate the inference systems that serve our models in... ...with running real workloads. This is an engineering role, not a research role. You'll measure... ...large models at high throughput Own the performance characteristics of those systems end‑to...Performance
- ...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....Performance
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient... ..., pushing the boundaries of performance, scalability, and cost-efficiency....PerformanceFull time- ...Our mission is to architect AI that learns from and interacts... ...model innovation and systems engineering paired with a design-minded product... ...Role We're hiring an Inference Engineer to advance our... ...systems with high demands on performance, reliability, and observability...PerformanceWork at officeVisa sponsorshipFlexible hours
- Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low...Performance
$225k
Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA...Performance- ...OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics research, driving engineering efforts to enhance model serving and system efficiency. The ideal candidate...PerformanceWork at officeRelocationRelocation package
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...Performance- ...MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...Performance
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...PerformanceLocal area- ...tech stacks to accelerate the progress of AI applications out into the real world.... ...expert. About the Role As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an...Performance
$220k - $320k
...ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models...Performance3 days per week- ...focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer... ...the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This is a highly technical, high-impact...PerformanceRelocationVisa sponsorshipRelocation package
- ...strive to seamlessly blend high‑level AI capabilities with the constraints of physical... ...About the Role We’re looking for a GPU Inference Engineer to contribute to improvements in model... ...initiatives to optimize inference performance and scalability. You’ll also be engaged...PerformanceWork at officeRelocation package
- Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...Performance
$225k
...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you... ...What you’ll work on Design and scale high‑performance inference serving systems Optimize KV‑cache...PerformanceRelocationVisa sponsorship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!
- ai engineer remote San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai research engineer San Francisco, CA
- performance improvement coordinator San Francisco, CA
- IT performance management San Francisco, CA

