Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Engineer - Model Performance

FATHOM

Role Overview We’re hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the fine‑tuning infrastructure that makes the rest of the AI team faster. This is not a research role. You’ll be optimizing real systems serving millions of meetings — choosing between quantization trade‑offs, debugging speculative decoding, or figuring out why one GPU family’s tail latency explodes at high concurrency while another stays stable. Responsibilities Benchmark FP8 quantization across GPU families, find that FP8 KV cache causes catastrophic repetition loops, identify static quantization as 6% faster than dynamic on certain hardware, and ship a production config that gets 1.3x speedup with less than 1% quality degradation. Evaluate serving frameworks (vLLM vs SGLang) with speculative decoding — discover that ngram speculation degrades ASR quality while EAGLE3 draft models don’t, and that torch.compile makes certain GPUs 7% slower. Build a fine‑tuning pipeline that takes a JSONL dataset and produces an optimized tune ready for serving, so a teammate can train a small classifier in an afternoon instead of a week. Optimize GPU spend — know which GPU families are best for batch workloads (stable under high concurrency) vs latency‑sensitive paths, and when a 30% cost premium isn’t worth it. Debug production inference issues — trace a quality regression to a serving framework upgrade that changed the default attention backend, or find that audio format handling in the multimodal pipeline silently drops segments. Hard Skills Deep experience with LLM serving frameworks (vLLM, SGLang, TensorRT‑LLM, or similar) — not just deploying them, but tuning them: attention backends, scheduling strategies, CUDA graph warmup, prefix caching. Hands‑on quantization experience — you understand weight vs activation quantization, per‑channel vs per‑tensor scaling, and when dynamic quantization introduces more overhead than it saves. Production fine‑tuning experience — LoRA/QLoRA SFT, familiarity with training frameworks, understanding of data formatting, learning rate schedules, and how to diagnose training failures. Strong Python skills. You’ll write serving infrastructure, benchmarking harnesses, and training pipelines — not notebooks. Comfort with GPU profiling and performance analysis. You should be able to look at a benchmark result and know whether the bottleneck is compute, memory bandwidth, or scheduling overhead. Strong Signals Cost modeling for GPU infrastructure — you’ve had to choose between GPU types and justify the tradeoff. Experience with multimodal models (audio/vision encoders + LLM decoders). Experience with Modal, Ray Serve, or similar serverless GPU platforms. Understanding of audio processing (codecs, chunking, sample rates). Experience building internal tooling that other engineers use — this role succeeds when the rest of the team ships faster. Not Required ML research background or publications. Prompt engineering expertise. Frontend or full‑stack experience. Masters/PhD (though it’s fine if you have one). Benefits The opportunity to shape the foundational software services of a growing company. A role that balances innovation and incremental improvement. A dynamic and collaborative engineering team. Competitive compensation and benefits. A supportive environment that encourages innovation and personal growth. #J-18808-Ljbffr Fathom

Vacancy posted 11 hours ago
Similar jobs that could be interesting for youBased on the AI Engineer - Model Performance in San Francisco, CA vacancy
  • A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale...  ...and collaboration across teams to optimize performance and reliability. Ideal candidates will have... 
    Performance

    Jobleads-US

    San Francisco, CA
    1 day ago
  • A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity... 
    Performance

    SpreeAI

    San Francisco, CA
    3 days ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team, you will design...  ...fault-tolerant, high-performance systems for serving LLMs workloads...  ...engineers to integrate and optimize models for production and research...  ...is to develop reliable AI systems for the world's most... 
    Performance
    Full time

    Scale AI

    San Francisco, CA
    4 days ago
  • A growing technology company located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to improve product quality. As part of the AI engineering team, you will use data to shape how AI behaves... 
    Suggested

    Notion

    San Francisco, CA
    11 hours ago
  • $201k

    A leading technology firm in San Francisco seeks a Senior Software Engineer for the Model LifeCycle team. This role focuses on building a managed platform for application development, specifically leveraging Machine Learning models, including LLMs. Candidates should have... 
    Suggested

    Epoch Biodesign

    San Francisco, CA
    3 days ago
  •  ...Runtime Engineer – AI Runtime & Execution About the Role We're looking for a Runtime Engineer to help build and optimise...  ..., you'll play a key role in ensuring that compiled models execute with maximum performance, scalability, and reliability across a range of... 
    Performance

    Oho Group

    San Francisco, CA
    11 hours ago
  •  ...to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as...  ...About the Role We are looking for an engineer who wants to take the world's largest... 
    Performance

    OpenAI

    San Francisco, CA
    2 days ago
  • $230k - $385k

    About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and infrastructure...  ...forward progress About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring... 
    Performance

    OpenAI

    San Francisco, CA
    1 day ago
  • $220k - $320k

     ...love squeezing every last drop of performance out of GPUs, diving deep into...  ...and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models...  ...well-funded ten-person team of engineers who work in-person in downtown... 
    Performance
    Work at office

    Inference

    San Francisco, CA
    4 days ago
  •  ...building Signalcore is building the performance management layer for enterprise AI systems. As companies deploy AI...  ...systems is fundamentally hard across models, runs, and environment We...  ...We’re looking for a founding AI engineer to help build the core system. This... 
    Performance
    Immediate start

    Signalcore AI

    San Francisco, CA
    1 day ago
  • A leading research accelerator for AI located in San Francisco is looking for a software engineer to evaluate and enhance AI-generated code. The role requires...  ...requiring 10 to 40 hours of engagement weekly, with potential for extensions based on performance. #J-18808-Ljbffr
    Performance
    Contract work
    Remote work

    Turing

    San Francisco, CA
    3 days ago
  • $160k - $220k

     ...AI Engineer Title of Role: AI Engineer Location: San Francisco, onsite Company Stage of Funding: Pre-Seed — Software Development...  ...applications using Python and TypeScript, ensuring high performance and responsiveness. Analyze user feedback and system performance... 
    Performance
    Work at office

    Recruiting from Scratch

    San Francisco, CA
    8 days ago
  •  ...Hello, We have 5 urgent openings for an " AI Software Engineer ". These are hybrid roles. Only looking for candidates who can work...  ...software meets enterprise standards for security, quality, performance, and maintainability. The ideal candidate has strong software... 
    Performance
    Long term contract

    Anagh Technologies Inc

    San Francisco, CA
    11 hours ago
  • $405k

     ...interpretable, and steerable AI systems. We want AI to be safe...  ...of committed researchers, engineers, policy experts, and business...  ...eval frameworks that measure model capabilities across diverse...  ...technical initiatives in high-performance, demanding environments—trading... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    4 days ago
  • $172.43k - $230.95k

     ...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As...  ...partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at... 
    Performance
    Temporary work

    Crusoe

    San Francisco, CA
    1 day ago
  • $175k - $225k

     ...About the job AI Engineer AI Engineer San Francisco, CA | On-site Full-time | Software Development | Engineering...  ...Optimize prompt and context engineering for reliability and performance Implement backend systems in Python supporting AI workflows... 
    Performance
    Full time

    Essence Coaching Group

    San Francisco, CA
    3 days ago
  •  ...revolutionizing software development with AI-powered formal verification....  .... Our novel foundation model enables scalable, precise...  ...role Join our team as an AI Engineer and help us push the...  ...languages and tools critical for high-performance computing in Python/C++ and machine... 
    Performance
    Contract work

    Logical Intelligence

    San Francisco, CA
    2 days ago
  •  ...We are looking for a skilled GenAI Agent Engineer with strong experience in agent orchestration frameworks nd workflow-driven AI systems . The ideal candidate has hands-on...  ...and user engagement workflows • Perform prompt engineering optimized for marketing... 
    Performance

    Mindlance

    San Francisco, CA
    5 days ago
  •  ...AI Engineer Conduit is the platform for building conversational AI agents focused on hospitality...  ...threaded. We operate on a contact-based model (vs ticket-based). Context across calls,...  ...tests, validation datasets, and performance benchmarks to ensure reliability. Model... 
    Performance
    Flexible hours

    Conduit AI

    San Francisco, CA
    4 days ago
  •  ...AI Prompt Engineer San Francisco, CA (On-Site M-F) Our client is an early-stage, AI-native technology company building AI-powered call...  ...across all customer deployments. Review failed or low-performing calls daily, identify root causes, and ship targeted prompt... 
    Performance

    latitude

    San Francisco, CA
    3 days ago
  • $180k - $300k

     ...About the role You'll own the core AI systems that power Gamma: the models, prompts, and pipelines behind...  ...evaluating, and fine-tuning for maximum performance across Gamma's product surface....  ...stack. You'll work closely with engineering and product to ship improvements that... 
    Performance
    Full time
    Work at office
    Immediate start
    Work from home

    Gamma

    San Francisco, CA
    4 days ago
  •  ...Cooperidge Consulting Firm is seeking an AI Engineer for a top Financial Technology (FinTech) client. This role focuses on building...  ...Debug and resolve production issues involving agent behavior, performance, and edge cases Collaborate with engineering and product stakeholders... 
    Performance
    Hourly pay
    Full time
    Weekend work

    Cooperidge Consulting Firm

    San Francisco, CA
    4 days ago
  • $150k - $250k

     ...AI Engineer At Distyl, AI Engineers build and operate AI systems that deliver real business...  ...on production AI systems that must perform reliably under real-world constraints....  ...implement compound AI workflows that combine models, agents, retrieval, evaluation, and execution... 
    Performance
    Work at office
    3 days per week

    Distyl AI

    San Francisco, CA
    4 days ago
  •  ...AI Engineer Location: San Francisco Company: Thorin Stage: Seed | Incubated at 8VC Thorin is an applied AI company born out of...  ...workflows Set quality and velocity benchmarks for a high-performance engineering team Help shape product direction through close... 
    Performance

    Thorin

    San Francisco, CA
    4 days ago
  • $175k - $250k

     ...AI Engineer (Hybrid - San Francisco, CA) We are currently supporting a new client based in San Francisco that is building...  ...usage. Build evaluation frameworks and feedback loops for model quality and performance. Work closely with leadership on AI strategy, product... 
    Performance
    Full time

    OMG Technologies Inc

    San Francisco, CA
    4 days ago
  • $150k - $180k

     ...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports to: Director Of EngineeringResponsibilitiesBuild...  ...tooling meets production standards for reliability, performance, and maintainability.Qualifications5+ years of... 
    Performance
    Remote work
    Flexible hours

    Ellipsis Health

    San Francisco, CA
    3 days ago
  •  ...About the Role Fieldguide is building AI agents for the most complex audit and...  ...power mission-critical work. As an AI Engineer, you'll build Fieldguide's intelligence layer...  ...with a focus on reliability and performance Execute with AI-Native Leverage... 
    Performance
    Remote work
    Flexible hours

    Fieldguide.ai

    San Francisco, CA
    3 days ago
  •  ...opportunities in security and AI-powered tools. Build,...  ...multiple sources for testing and model development. Develop, optimize...  ...insights and track key performance indicators (KPIs). Monitor...  ...in Computer Science, Software Engineering, or related field (or equivalent... 
    Performance

    Insight Global

    San Francisco, CA
    4 days ago
  • $150k - $200k

     ...building the world's first viral agent. An AI purpose built for influencer marketing....  ...greater system. Deep knowledge of which models to be used whenCreating and using...  ...embeddings. Strong grasp of API architecture, performance tuning, and database optimization (SQL).... 
    Performance

    Influur

    San Francisco, CA
    4 days ago
  • $155k - $180k

     ...disciplinary skills (embedded AI, high-tech manufacturing automation...  ...will assume the role of AI engineer, with the following main...  ...data processing architectures Perform business and technical analysis...  ...trained networks or training custom models Test and analyze Deep... 
    Performance

    Kickmaker

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Engineer - Model Performance. Be the first to apply!