Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Inference Engineer

$175k - $225k

Sauron

Who We Are

Sauron protects your family and home, bringing the innovations of autonomous robots and self-driving cars to residential security. Our team is led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla, Apple, and Google. Sauron has raised an $27M seed round led by A* and Atomic with participation from other leading venture capital firms.

The Role

We're looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this role, you won't just be managing pipelines; you'll be squeezing every drop of performance out of silicon to ensure our perception systems can see, think, and act in real-time.

You will own productionizing of AI - taking sophisticated models and transforming them into lightning-fast, production-ready engines running on edge devices in homes across the country. If you are obsessed with CUDA kernels, TensorRT optimizations, and the challenge of deploying robust vision systems on real robots, we want to talk to you.

What You'll Do

  • Lead the development and optimization of low-latency inference engines using TensorRT and ONNX, including authoring custom plugins to support cutting-edge architectures.
  • Design and maintain multithreaded video processing and streaming pipelines (RTSP, RTP, HLS) using GStreamer and DeepStream.
  • Collaborate closely with embedded engineers to integrate perception software with Yocto platforms, ensuring seamless hardware-software synergy.
  • Work with raw data from cameras and LiDAR to enable real-time data capture, obstacle detection, and avoidance.
  • Write and optimize custom CUDA kernels and perform low-level GPU tuning to maximize throughput and minimize power consumption.
  • Productionize proven prototypes from Jetpack into Yocto
  • Apply advanced optimization techniques-including quantization (INT8/FP16), pruning, and distillation - to bring research-grade models to production-grade efficiency.
What You Bring
  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, Robotics, or a related field.
  • 3+ years of experience developing and deploying computer vision or machine learning applications on real-world robotic systems (not just in simulation).
  • High proficiency in C, C++, and Python, with a focus on real-time and embedded systems.
  • Expert-level knowledge of the NVIDIA Jetson ecosystem (JetPack SDK, DeepStream, TensorRT) and a deep understanding of CUDA/GPU architecture.
  • Hands-on experience with video streaming tools like ffmpeg and protocols such as RTSP, RTP and HLS.
  • Proven track record of deploying AI systems that operate in the field, handling the unpredictability of real-world sensor data.
Nice to Have
  • Familiarity with NVIDIA's broader robotics stack
  • Experience with ML compilers or compiler-level optimizations for GPU inference.
  • Specific background in sensor fusion and AI-driven obstacle avoidance for autonomous navigation.
  • Exposure to remote logging, log ingestion, and distributed telemetry aggregation.
  • Previous experience in early-stage startups or fast-paced hardware/software integration environments.
We Value

1. The Power of "We": "Align, then Accelerate"
  • We celebrate as a team and troubleshoot as a team.
  • The goal is the mission, not the credit.
2. High Challenge, Low Ego: "Respect the person, debate the idea."
  • Be ruthless with problems, but kind to people.
  • Raise the bar, lower the shield
3. Speak up: "Silence is a setback."
  • Your perspective is a requirement, not a suggestion.
  • Speak the hard truths early so we can fix them fast.
4. Integrity in Motion: "Own the outcome, not just the task."
  • Do what you say you'll do.
  • If it breaks, fix it. If it works, make it better.
5. Humanity at the Core: "Relationships over transactions."
  • Earn trust through empathy and consistency.
  • Anticipate needs before they become requests.

The compensation range for this position is $175-225k base + equity + benefits.

Why Sauron

You'll be joining a deeply technical team obsessed with building real-world systems that make a tangible difference in people's lives. We move quickly, iterate relentlessly, and ship with urgency - all while holding a deep respect for software craftsmanship and system reliability. If you're looking to solve challenging problems and own major parts of the deployment stack for a category-defining product, we want to talk.

We are focused on building a diverse and inclusive workforce. If you're excited about this role, but do not meet 100% of the qualifications listed above, we encourage you to apply.

Sauron is an Equal Opportunity Employer and considers applicants for employment without regard to race, color, religion, sex, orientation, national origin, age, disability, genetics or any other basis forbidden under federal, state, or local law.

Please review our CCPA policies here.

Compensation

The base pay range for this role is $175,000 - $225,000 per year.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the AI Inference Engineer in San Francisco, CA vacancy
  • Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques... 
    Suggested

    Fathom

    San Francisco, CA
    4 days ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience... 
    Suggested

    Perplexity

    San Francisco, CA
    1 day ago
  • A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical strategy for performance architecture and addressing complex performance issues ensuring industry-leading service.... 
    Suggested
    Remote job

    DigitalOcean

    San Francisco, CA
    2 days ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Suggested
    Remote job

    DigitalOcean

    San Francisco, CA
    6 days ago
  • An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate... 
    Suggested

    SpreeAI

    San Francisco, CA
    2 days ago
  • ABOUT BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma...  ..., and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As an Applied... 
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development and... 
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience... 

    Perplexity

    San Francisco, CA
    3 days ago
  • Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing platform. This role involves enhancing performance for various hardware configurations and providing technical support to... 

    Quadric

    San Francisco, CA
    2 days ago
  •  ...people interact with the web by building AI agents that can reliably do everyday digital...  ...reward models) Scale infra for agentic inference (throughput and latency of perception‑...  ...generalist web‑agent Work closely with product engineers to translate cutting‑edge AI capabilities... 
    Work at office
    Relocation
    Visa sponsorship

    Yutori

    San Francisco, CA
    13 hours ago
  • A leading AI fashion-tech company is seeking a Software Engineer Intern to focus on building infrastructure for AI systems. This role involves designing scalable models, developing APIs, and optimizing for performance and reliability. An ideal candidate will have a strong... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    2 days ago
  •  ...Meet Eloquent AI At Eloquent AI, we're building the next generation of AI Operators...  ...alongside world-class talent in AI, engineering, and product as we redefine the future...  ...generative AI models, including fine-tuning and inference optimization. ~ Familiarity with APIs,... 

    Eloquent AI

    San Francisco, CA
    13 hours ago
  • $269.1k - $307.2k

     ...Distinguished AI Engineer At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital...  ...including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation... 
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Francisco, CA
    13 hours ago
  • $180k - $250k

     ...AI Engineer We're hiring a full-time AI Engineer to own the prompts, agents, evals, and pipelines behind user-facing features that ship...  ...Ray or similar distributed compute frameworks for batch inference, eval pipelines, or scaling agent workloads Open source contributions... 
    Full time
    Work at office
    Remote work
    Relocation

    Fluency Corp

    San Francisco, CA
    13 hours ago
  • AI Engineer Location: San Francisco or New York City About Pathwork Pathwork is redesigning life & health insurance jobs for the AI age...  ...AI systems, owning model orchestration, retrieval, real‑time inference, evaluation, and production infrastructure, and collaborating... 
    Work at office

    Pathwork

    San Francisco, CA
    4 days ago
  • Hilbert is building a reasoning engine that must navigate non-deterministic user behavior...  ...hard problem of orchestrating multi-step inference over messy, high-stakes enterprise data where...  ...We're also co-building alongside leading AI companies. We're looking for an AI... 
    Shift work

    Hilbert's AI

    San Francisco, CA
    1 day ago
  • $12 per hour

     ...ex-Narvar Founding CTO (Unicorn, $1B+). 6 AI patents. Enterprise AI pedigree: Google...  ...to mission-critical infrastructure. AI Engineer: Computer Vision, LLMs & ML Engineer the...  ...industrial datasets Track record of optimizing inference costs What You'll Work With We're... 
    Full time
    For contractors
    Remote work
    Flexible hours

    Different Technologies Pty Ltd.

    San Francisco, CA
    4 days ago
  •  ...software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices,...  ...graph code and conventional C++ DSP and control code. Role The AI Kernel Engineer in Quadric plays the key role to enable a large number of AI... 

    Quadric

    San Francisco, CA
    2 days ago
  •  ...exceptional problem solvers, combining expert engineering talent with deep experience in National Security. We build AI for the most important problem in defense — making...  ..., sensor data). Develop reliable training and inference systems optimized for performance and edge... 

    Pytho AI

    San Francisco, CA
    13 hours ago
  • $150k - $250k

    Job type: Full Time · Department: AI/ML · Work type: Hybrid · USD 150000 -250000 / year...  ...About the Role We are looking for an AI Engineer to build and ship LLM-powered healthcare...  ...thousands of clinicians Build and optimize inference pipelines and real-time speech... 
    Full time
    Work from home
    Flexible hours

    Neara

    San Francisco, CA
    3 days ago
  • $142.2k - $204.6k

     ...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers...  ...00 USD About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide -... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  •  ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference...  ...infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    13 hours ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step... 

    Luma AI

    San Francisco, CA
    4 days ago
  •  ...us on social media. Who You Are The Agentic AI Software Engineer - Cybersecurity Systems designs, develops, and deploys advanced...  ..., false positive rates, detection accuracy). Optimize inference pipelines and distributed systems for reliability and... 
    Local area
    Work from home

    Bishop Fox

    San Francisco, CA
    13 hours ago
  • $180k - $300k

     ...work with teams across the United States to help them hire. AI Engineer (Full-stack) Location: San Francisco, CA (On-site)...  ...web crawling and scraping systems for visual datasets Build inference serving systems Develop APIs powering client-facing AI products... 
    H1b
    Work at office
    Remote work
    Visa sponsorship

    Recruiting from Scratch

    San Francisco, CA
    1 day ago
  • $180k - $300k

     ...About this role You'll build the AI systems that power taste evals, tooling,...  ...visual data across the web Set up inference serving and APIs for client-facing products...  ...experience in fullstack or backend software engineering Compensation & Additional Details... 
    Full time
    H1b
    Work at office
    Visa sponsorship

    Tangerine Search, Inc.

    San Francisco, CA
    2 days ago
  • $207k - $290k

     ...Description Job Description About JazzX AI: Vision: Enterprises operating on...  ...Role We are seeking an experienced AI Engineer with deep expertise in Reinforcement...  ...compute optimization techniques , including inference-time search, chain-of-thought reasoning,... 
    Worldwide
    Flexible hours

    JazzX AI

    San Francisco, CA
    29 days ago
  •  ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to...  ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    2 days ago
  •  ...Staff+ Software Engineer, Inference Runtime Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY About...  ...mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and... 
    Work at office
    Remote work
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    7 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Engineer. Be the first to apply!