AI Inference Engineer
$175k - $225kSauron
Who We Are Sauron protects your family and home, bringing the innovations of autonomous robots and self-driving cars to residential security. Our team is led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla, Apple, and Google. Sauron has raised an $27M seed round led by A* and Atomic with participation from other leading venture capital firms. The Role We're looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this role, you won't just be managing pipelines; you'll be squeezing every drop of performance out of silicon to ensure our perception systems can see, think, and act in real-time. You will own productionizing of AI - taking sophisticated models and transforming them into lightning-fast, production-ready engines running on edge devices in homes across the country. If you are obsessed with CUDA kernels, TensorRT optimizations, and the challenge of deploying robust vision systems on real robots, we want to talk to you. What You'll Do
- Lead the development and optimization of low-latency inference engines using TensorRT and ONNX, including authoring custom plugins to support cutting-edge architectures.
- Design and maintain multithreaded video processing and streaming pipelines (RTSP, RTP, HLS) using GStreamer and DeepStream.
- Collaborate closely with embedded engineers to integrate perception software with Yocto platforms, ensuring seamless hardware-software synergy.
- Work with raw data from cameras and LiDAR to enable real-time data capture, obstacle detection, and avoidance.
- Write and optimize custom CUDA kernels and perform low-level GPU tuning to maximize throughput and minimize power consumption.
- Productionize proven prototypes from Jetpack into Yocto
- Apply advanced optimization techniques-including quantization (INT8/FP16), pruning, and distillation - to bring research-grade models to production-grade efficiency.
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, Robotics, or a related field.
- 3+ years of experience developing and deploying computer vision or machine learning applications on real-world robotic systems (not just in simulation).
- High proficiency in C, C++, and Python, with a focus on real-time and embedded systems.
- Expert-level knowledge of the NVIDIA Jetson ecosystem (JetPack SDK, DeepStream, TensorRT) and a deep understanding of CUDA/GPU architecture.
- Hands-on experience with video streaming tools like ffmpeg and protocols such as RTSP, RTP and HLS.
- Proven track record of deploying AI systems that operate in the field, handling the unpredictability of real-world sensor data.
- Familiarity with NVIDIA's broader robotics stack
- Experience with ML compilers or compiler-level optimizations for GPU inference.
- Specific background in sensor fusion and AI-driven obstacle avoidance for autonomous navigation.
- Exposure to remote logging, log ingestion, and distributed telemetry aggregation.
- Previous experience in early-stage startups or fast-paced hardware/software integration environments.
- We celebrate as a team and troubleshoot as a team.
- The goal is the mission, not the credit.
- Be ruthless with problems, but kind to people.
- Raise the bar, lower the shield
- Your perspective is a requirement, not a suggestion.
- Speak the hard truths early so we can fix them fast.
- Do what you say you'll do.
- If it breaks, fix it. If it works, make it better.
- Earn trust through empathy and consistency.
- Anticipate needs before they become requests.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the AI Inference Engineer in San Francisco, CA vacancy
- Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques...Suggested
- A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical strategy for performance architecture and addressing complex performance issues ensuring industry-leading service....SuggestedRemote job
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...Suggested$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...SuggestedRemote job- An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate...Suggested
- ABOUT BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma... ..., and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As an Applied...Work experience placementFlexible hours
- ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join...
- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development and...Flexible hours
- Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing platform. This role involves enhancing performance for various hardware configurations and providing technical support to...
- A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience...
- ...people interact with the web by building AI agents that can reliably do everyday digital... ...reward models) Scale infra for agentic inference (throughput and latency of perception‑... ...generalist web‑agent Work closely with product engineers to translate cutting‑edge AI capabilities...Work at officeRelocationVisa sponsorship
- A leading AI fashion-tech company is seeking a Software Engineer Intern to focus on building infrastructure for AI systems. This role involves designing scalable models, developing APIs, and optimizing for performance and reliability. An ideal candidate will have a strong...InternshipImmediate start
$269.1k - $307.2k
...Distinguished AI Engineer At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital... ...including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation...Full timePart timeLocal area- ...Meet Eloquent AI At Eloquent AI, we're building the next generation of AI Operators... ...alongside world-class talent in AI, engineering, and product as we redefine the future... ...generative AI models, including fine-tuning and inference optimization. ~ Familiarity with APIs,...
$180k - $250k
...AI Engineer We're hiring a full-time AI Engineer to own the prompts, agents, evals, and pipelines behind user-facing features that ship... ...Ray or similar distributed compute frameworks for batch inference, eval pipelines, or scaling agent workloads Open source contributions...Full timeWork at officeRemote workRelocation- AI Engineer Location: San Francisco or New York City About Pathwork Pathwork is redesigning life & health insurance jobs for the AI age... ...AI systems, owning model orchestration, retrieval, real‑time inference, evaluation, and production infrastructure, and collaborating...Work at office
$12 per hour
...ex-Narvar Founding CTO (Unicorn, $1B+). 6 AI patents. Enterprise AI pedigree: Google... ...to mission-critical infrastructure. AI Engineer: Computer Vision, LLMs & ML Engineer the... ...industrial datasets Track record of optimizing inference costs What You'll Work With We're...Full timeFor contractorsRemote workFlexible hours- Hilbert is building a reasoning engine that must navigate non-deterministic user behavior... ...hard problem of orchestrating multi-step inference over messy, high-stakes enterprise data where... ...We're also co-building alongside leading AI companies. We're looking for an AI...Shift work
$150k - $250k
Job type: Full Time · Department: AI/ML · Work type: Hybrid · USD 150000 -250000 / year... ...About the Role We are looking for an AI Engineer to build and ship LLM-powered healthcare... ...thousands of clinicians Build and optimize inference pipelines and real-time speech...Full timeWork from homeFlexible hours- ...software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices,... ...graph code and conventional C++ DSP and control code. Role The AI Kernel Engineer in Quadric plays the key role to enable a large number of AI...
- ...exceptional problem solvers, combining expert engineering talent with deep experience in National Security. We build AI for the most important problem in defense — making... ..., sensor data). Develop reliable training and inference systems optimized for performance and edge...
- ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference... ...infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical...WorldwideFlexible hours
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step...$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers... ...00 USD About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide -...Local areaWorldwide- ...us on social media. Who You Are The Agentic AI Software Engineer - Cybersecurity Systems designs, develops, and deploys advanced... ..., false positive rates, detection accuracy). Optimize inference pipelines and distributed systems for reliability and...Local areaWork from home
$180k - $300k
...work with teams across the United States to help them hire. AI Engineer (Full-stack) Location: San Francisco, CA (On-site)... ...web crawling and scraping systems for visual datasets Build inference serving systems Develop APIs powering client-facing AI products...H1bWork at officeRemote workVisa sponsorship$180k - $300k
...About this role You'll build the AI systems that power taste evals, tooling,... ...visual data across the web Set up inference serving and APIs for client-facing products... ...experience in fullstack or backend software engineering Compensation & Additional Details...Full timeH1bWork at officeVisa sponsorship- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to... ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms...InternshipImmediate start
$295k
...About the Team Our Inference team brings OpenAI's most capable research and technology... ...alike to use and access our start-of-the-art AI models, allowing them to do things that... ...the Role We are looking for an engineer who wants to take the world's largest and...$170k - $216k
...build services and tools for a broad range of customers Software Engineers, Product, Data Science, System Engineering, and more. So if... ...Engineering Manager. You will: Build and evolve ML inference infrastructure for simulations. Be responsible for the reliability...Full timeRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Engineer. Be the first to apply!
Related searches
- ai engineer remote San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai research engineer San Francisco, CA
- embedded ai engineer
- ai network engineer


