AI Inference Engineer

$175k - $225k

Sauron

Who We Are

Sauron protects your family and home, bringing the innovations of autonomous robots and self-driving cars to residential security. Our team is led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla, Apple, and Google. Sauron has raised an $27M seed round led by A* and Atomic with participation from other leading venture capital firms.

The Role

We're looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this role, you won't just be managing pipelines; you'll be squeezing every drop of performance out of silicon to ensure our perception systems can see, think, and act in real-time.

You will own productionizing of AI - taking sophisticated models and transforming them into lightning-fast, production-ready engines running on edge devices in homes across the country. If you are obsessed with CUDA kernels, TensorRT optimizations, and the challenge of deploying robust vision systems on real robots, we want to talk to you.

What You'll Do

Lead the development and optimization of low-latency inference engines using TensorRT and ONNX, including authoring custom plugins to support cutting-edge architectures.
Design and maintain multithreaded video processing and streaming pipelines (RTSP, RTP, HLS) using GStreamer and DeepStream.
Collaborate closely with embedded engineers to integrate perception software with Yocto platforms, ensuring seamless hardware-software synergy.
Work with raw data from cameras and LiDAR to enable real-time data capture, obstacle detection, and avoidance.
Write and optimize custom CUDA kernels and perform low-level GPU tuning to maximize throughput and minimize power consumption.
Productionize proven prototypes from Jetpack into Yocto
Apply advanced optimization techniques-including quantization (INT8/FP16), pruning, and distillation - to bring research-grade models to production-grade efficiency.

What You Bring

Bachelor's or Master's degree in Computer Science, Electrical Engineering, Robotics, or a related field.
3+ years of experience developing and deploying computer vision or machine learning applications on real-world robotic systems (not just in simulation).
High proficiency in C, C++, and Python, with a focus on real-time and embedded systems.
Expert-level knowledge of the NVIDIA Jetson ecosystem (JetPack SDK, DeepStream, TensorRT) and a deep understanding of CUDA/GPU architecture.
Hands-on experience with video streaming tools like ffmpeg and protocols such as RTSP, RTP and HLS.
Proven track record of deploying AI systems that operate in the field, handling the unpredictability of real-world sensor data.

Nice to Have

Familiarity with NVIDIA's broader robotics stack
Experience with ML compilers or compiler-level optimizations for GPU inference.
Specific background in sensor fusion and AI-driven obstacle avoidance for autonomous navigation.
Exposure to remote logging, log ingestion, and distributed telemetry aggregation.
Previous experience in early-stage startups or fast-paced hardware/software integration environments.

We Value

1. The Power of "We": "Align, then Accelerate"

We celebrate as a team and troubleshoot as a team.
The goal is the mission, not the credit.

2. High Challenge, Low Ego: "Respect the person, debate the idea."

Be ruthless with problems, but kind to people.
Raise the bar, lower the shield

3. Speak up: "Silence is a setback."

Your perspective is a requirement, not a suggestion.
Speak the hard truths early so we can fix them fast.

4. Integrity in Motion: "Own the outcome, not just the task."

Do what you say you'll do.
If it breaks, fix it. If it works, make it better.

5. Humanity at the Core: "Relationships over transactions."

Earn trust through empathy and consistency.
Anticipate needs before they become requests.

The compensation range for this position is $175-225k base + equity + benefits.

Why Sauron

You'll be joining a deeply technical team obsessed with building real-world systems that make a tangible difference in people's lives. We move quickly, iterate relentlessly, and ship with urgency - all while holding a deep respect for software craftsmanship and system reliability. If you're looking to solve challenging problems and own major parts of the deployment stack for a category-defining product, we want to talk.

We are focused on building a diverse and inclusive workforce. If you're excited about this role, but do not meet 100% of the qualifications listed above, we encourage you to apply.

Sauron is an Equal Opportunity Employer and considers applicants for employment without regard to race, color, religion, sex, orientation, national origin, age, disability, genetics or any other basis forbidden under federal, state, or local law.

Please review our CCPA policies here.

Compensation

The base pay range for this role is $175,000 - $225,000 per year.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the AI Inference Engineer in San Francisco, CA vacancy

Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...
Suggested
Perplexity
San Francisco, CA
2 days ago
Senior AI Inference Performance Engineer (Remote)
A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical strategy for performance architecture and addressing complex performance issues ensuring industry-leading service....
Suggested
Remote job
DigitalOcean
San Francisco, CA
3 days ago
AI Inference Performance Engineer
Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques...
Suggested
Fathom
San Francisco, CA
18 hours ago
Senior AI Inference Data Plane Engineer - Remote
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...
Suggested
Remote job
DigitalOcean
San Francisco, CA
7 days ago
Applied AI Inference Engineer
ABOUT BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma... ..., and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As an Applied...
Suggested
Work experience placement
Flexible hours
Baseten
San Francisco, CA
3 days ago
AI Infrastructure Engineer — Scalable Training & Inference
An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate...
SpreeAI
San Francisco, CA
3 days ago
Member of Technical Staff (AI Inference Engineer)
...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join...
Perplexity AI
San Francisco, CA
5 days ago
Production AI Inference Engineer — Scale & Impact
A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development and...
Flexible hours
Baseten
San Francisco, CA
3 days ago
AI Infrastructure Engineer Intern — Training & Inference
A leading AI fashion-tech company is seeking a Software Engineer Intern to focus on building infrastructure for AI systems. This role involves designing scalable models, developing APIs, and optimizing for performance and reliability. An ideal candidate will have a strong...
Internship
Immediate start
SpreeAI
San Francisco, CA
3 days ago
AI Engineer — LLM Infra
...people interact with the web by building AI agents that can reliably do everyday digital... ...reward models) Scale infra for agentic inference (throughput and latency of perception‑... ...generalist web‑agent Work closely with product engineers to translate cutting‑edge AI capabilities...
Work at office
Relocation
Visa sponsorship
Yutori
San Francisco, CA
1 day ago
Distinguished AI Engineer
$269.1k - $307.2k
...Distinguished AI Engineer At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital... ...including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation...
Full time
Part time
Local area
Capital One Financial Corp
San Francisco, CA
1 day ago
AI Engineer, Multimodal LLMs
...Meet Eloquent AI At Eloquent AI, we're building the next generation of AI Operators... ...alongside world-class talent in AI, engineering, and product as we redefine the future... ...generative AI models, including fine-tuning and inference optimization. ~ Familiarity with APIs,...
Eloquent AI
San Francisco, CA
1 day ago
AI Engineer
$180k - $250k
...AI Engineer We're hiring a full-time AI Engineer to own the prompts, agents, evals, and pipelines behind user-facing features that ship... ...Ray or similar distributed compute frameworks for batch inference, eval pipelines, or scaling agent workloads Open source contributions...
Full time
Work at office
Remote work
Relocation
Fluency Corp
San Francisco, CA
1 day ago
AI Engineer
AI Engineer Location: San Francisco or New York City About Pathwork Pathwork is redesigning life & health insurance jobs for the AI age... ...AI systems, owning model orchestration, retrieval, real‑time inference, evaluation, and production infrastructure, and collaborating...
Work at office
Pathwork
San Francisco, CA
18 hours ago
AI Engineer
$150k - $250k
Job type: Full Time · Department: AI/ML · Work type: Hybrid · USD 150000 -250000 / year... ...About the Role We are looking for an AI Engineer to build and ship LLM-powered healthcare... ...thousands of clinicians Build and optimize inference pipelines and real-time speech...
Full time
Work from home
Flexible hours
Neara
San Francisco, CA
4 days ago
AI Engineer
...exceptional problem solvers, combining expert engineering talent with deep experience in National Security. We build AI for the most important problem in defense — making... ..., sensor data). Develop reliable training and inference systems optimized for performance and edge...
Pytho AI
San Francisco, CA
1 day ago
AI Kernel Engineer
...software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices,... ...graph code and conventional C++ DSP and control code. Role The AI Kernel Engineer in Quadric plays the key role to enable a large number of AI...
Quadric
San Francisco, CA
3 days ago
AI Engineer - Enterprise
Hilbert is building a reasoning engine that must navigate non-deterministic user behavior... ...hard problem of orchestrating multi-step inference over messy, high-stakes enterprise data where... ...We're also co-building alongside leading AI companies. We're looking for an AI...
Shift work
Hilbert's AI
San Francisco, CA
2 days ago
AI Engineer
$12 per hour
...ex-Narvar Founding CTO (Unicorn, $1B+). 6 AI patents. Enterprise AI pedigree: Google... ...to mission-critical infrastructure. AI Engineer: Computer Vision, LLMs & ML Engineer the... ...industrial datasets Track record of optimizing inference costs What You'll Work With We're...
Full time
For contractors
Remote work
Flexible hours
Different Technologies Pty Ltd.
San Francisco, CA
18 hours ago
Software Engineer - AI Inference Engine
...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference... ...infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical...
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
1 day ago
Software Engineer, Inference
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step...
Luma AI
San Francisco, CA
18 hours ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers... ...00 USD About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide -...
Local area
Worldwide
Databricks
San Francisco, CA
2 days ago
AI Engineer (Full-stack)
$180k - $300k
...work with teams across the United States to help them hire. AI Engineer (Full-stack) Location: San Francisco, CA (On-site)... ...web crawling and scraping systems for visual datasets Build inference serving systems Develop APIs powering client-facing AI products...
H1b
Work at office
Remote work
Visa sponsorship
Recruiting from Scratch
San Francisco, CA
2 days ago
AI Software Engineer
...us on social media. Who You Are The Agentic AI Software Engineer - Cybersecurity Systems designs, develops, and deploys advanced... ..., false positive rates, detection accuracy). Optimize inference pipelines and distributed systems for reliability and...
Local area
Work from home
Bishop Fox
San Francisco, CA
1 day ago
AI Engineer (Full-stack)
$180k - $300k
...About this role You'll build the AI systems that power taste evals, tooling,... ...visual data across the web Set up inference serving and APIs for client-facing products... ...experience in fullstack or backend software engineering Compensation & Additional Details...
Full time
H1b
Work at office
Visa sponsorship
Tangerine Search, Inc.
San Francisco, CA
3 days ago
Software Engineer, Inference
...tools consistently fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups... ...plus About the Role Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and...
Work at office
Visa sponsorship
Relocation package
Pulse
San Francisco, CA
1 day ago
Software Engineer Intern (AI Infrastructure / Training / Inference)
...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to... ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms...
Internship
Immediate start
SpreeAI
San Francisco, CA
3 days ago
Software Engineer, ML Inference, Simulation Infrastructure
$170k - $216k
...build services and tools for a broad range of customers Software Engineers, Product, Data Science, System Engineering, and more. So if... ...Engineering Manager. You will: Build and evolve ML inference infrastructure for simulations. Be responsible for the reliability...
Full time
Remote work
Waymo
San Francisco, CA
3 days ago
Software Engineer, Model Inference
$295k
...About the Team Our Inference team brings OpenAI's most capable research and technology... ...alike to use and access our start-of-the-art AI models, allowing them to do things that... ...the Role We are looking for an engineer who wants to take the world's largest and...
OpenAI
San Francisco, CA
2 days ago
Senior Staff AI Engineer
$207k - $290k
...Description Job Description About JazzX AI: Vision: Enterprises operating on... ...Role We are seeking an experienced AI Engineer with deep expertise in Reinforcement... ...compute optimization techniques , including inference-time search, chain-of-thought reasoning,...
Worldwide
Flexible hours
JazzX AI
San Francisco, CA
a month ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Engineer. Be the first to apply!