AI Engineer - Model Performance

FATHOM

Role Overview We’re hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the fine‑tuning infrastructure that makes the rest of the AI team faster. This is not a research role. You’ll be optimizing real systems serving millions of meetings — choosing between quantization trade‑offs, debugging speculative decoding, or figuring out why one GPU family’s tail latency explodes at high concurrency while another stays stable. Responsibilities Benchmark FP8 quantization across GPU families, find that FP8 KV cache causes catastrophic repetition loops, identify static quantization as 6% faster than dynamic on certain hardware, and ship a production config that gets 1.3x speedup with less than 1% quality degradation. Evaluate serving frameworks (vLLM vs SGLang) with speculative decoding — discover that ngram speculation degrades ASR quality while EAGLE3 draft models don’t, and that torch.compile makes certain GPUs 7% slower. Build a fine‑tuning pipeline that takes a JSONL dataset and produces an optimized tune ready for serving, so a teammate can train a small classifier in an afternoon instead of a week. Optimize GPU spend — know which GPU families are best for batch workloads (stable under high concurrency) vs latency‑sensitive paths, and when a 30% cost premium isn’t worth it. Debug production inference issues — trace a quality regression to a serving framework upgrade that changed the default attention backend, or find that audio format handling in the multimodal pipeline silently drops segments. Hard Skills Deep experience with LLM serving frameworks (vLLM, SGLang, TensorRT‑LLM, or similar) — not just deploying them, but tuning them: attention backends, scheduling strategies, CUDA graph warmup, prefix caching. Hands‑on quantization experience — you understand weight vs activation quantization, per‑channel vs per‑tensor scaling, and when dynamic quantization introduces more overhead than it saves. Production fine‑tuning experience — LoRA/QLoRA SFT, familiarity with training frameworks, understanding of data formatting, learning rate schedules, and how to diagnose training failures. Strong Python skills. You’ll write serving infrastructure, benchmarking harnesses, and training pipelines — not notebooks. Comfort with GPU profiling and performance analysis. You should be able to look at a benchmark result and know whether the bottleneck is compute, memory bandwidth, or scheduling overhead. Strong Signals Cost modeling for GPU infrastructure — you’ve had to choose between GPU types and justify the tradeoff. Experience with multimodal models (audio/vision encoders + LLM decoders). Experience with Modal, Ray Serve, or similar serverless GPU platforms. Understanding of audio processing (codecs, chunking, sample rates). Experience building internal tooling that other engineers use — this role succeeds when the rest of the team ships faster. Not Required ML research background or publications. Prompt engineering expertise. Frontend or full‑stack experience. Masters/PhD (though it’s fine if you have one). Benefits The opportunity to shape the foundational software services of a growing company. A role that balances innovation and incremental improvement. A dynamic and collaborative engineering team. Competitive compensation and benefits. A supportive environment that encourages innovation and personal growth. #J-18808-Ljbffr Fathom

Apply

Vacancy posted 11 hours ago

Similar jobs that could be interesting for youBased on the AI Engineer - Model Performance in San Francisco, CA vacancy

Senior AI Model Serving Engineer — Low-Latency Inference
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale... ...and collaboration across teams to optimize performance and reliability. Ideal candidates will have...
Performance
Jobleads-US
San Francisco, CA
1 day ago
AI Model Evaluation Engineer — Benchmarking & Validation
A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
Performance
SpreeAI
San Francisco, CA
3 days ago
Senior AI Infrastructure Engineer, Model Serving Platform
$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design... ...fault-tolerant, high-performance systems for serving LLMs workloads... ...engineers to integrate and optimize models for production and research... ...is to develop reliable AI systems for the world's most...
Performance
Full time
Scale AI
San Francisco, CA
4 days ago
AI Model Behavior Engineer—Quality & Evaluation
A growing technology company located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to improve product quality. As part of the AI engineering team, you will use data to shape how AI behaves...
Suggested
Notion
San Francisco, CA
11 hours ago
Senior AI Model Lifecycle Engineer - LLM Training
$201k
A leading technology firm in San Francisco seeks a Senior Software Engineer for the Model LifeCycle team. This role focuses on building a managed platform for application development, specifically leveraging Machine Learning models, including LLMs. Candidates should have...
Suggested
Epoch Biodesign
San Francisco, CA
3 days ago
AI Runtime Engineer
...Runtime Engineer – AI Runtime & Execution About the Role We're looking for a Runtime Engineer to help build and optimise... ..., you'll play a key role in ensuring that compiled models execute with maximum performance, scalability, and reliability across a range of...
Performance
Oho Group
San Francisco, CA
11 hours ago
Software Engineer, Model Inference
...to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as... ...About the Role We are looking for an engineer who wants to take the world's largest...
Performance
OpenAI
San Francisco, CA
2 days ago
Software Engineer, Productivity - Model Performance
$230k - $385k
About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and infrastructure... ...forward progress About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring...
Performance
OpenAI
San Francisco, CA
1 day ago
Senior Software Engineer - Model Performance
$220k - $320k
...love squeezing every last drop of performance out of GPUs, diving deep into... ...and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models... ...well-funded ten-person team of engineers who work in-person in downtown...
Performance
Work at office
Inference
San Francisco, CA
4 days ago
Founding AI Engineer
...building Signalcore is building the performance management layer for enterprise AI systems. As companies deploy AI... ...systems is fundamentally hard across models, runs, and environment We... ...We’re looking for a founding AI engineer to help build the core system. This...
Performance
Immediate start
Signalcore AI
San Francisco, CA
1 day ago
Remote Senior AI Software Engineer Contract
A leading research accelerator for AI located in San Francisco is looking for a software engineer to evaluate and enhance AI-generated code. The role requires... ...requiring 10 to 40 hours of engagement weekly, with potential for extensions based on performance. #J-18808-Ljbffr
Performance
Contract work
Remote work
Turing
San Francisco, CA
3 days ago
AI Engineer
$160k - $220k
...AI Engineer Title of Role: AI Engineer Location: San Francisco, onsite Company Stage of Funding: Pre-Seed — Software Development... ...applications using Python and TypeScript, ensuring high performance and responsiveness. Analyze user feedback and system performance...
Performance
Work at office
Recruiting from Scratch
San Francisco, CA
8 days ago
Ai Software Engineer
...Hello, We have 5 urgent openings for an " AI Software Engineer ". These are hybrid roles. Only looking for candidates who can work... ...software meets enterprise standards for security, quality, performance, and maintainability. The ideal candidate has strong software...
Performance
Long term contract
Anagh Technologies Inc
San Francisco, CA
11 hours ago
Model Performance Software Engineer, Claude Code
$405k
...interpretable, and steerable AI systems. We want AI to be safe... ...of committed researchers, engineers, policy experts, and business... ...eval frameworks that measure model capabilities across diverse... ...technical initiatives in high-performance, demanding environments—trading...
Performance
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
4 days ago
Senior Software Engineer, AI Model Lifecycle
$172.43k - $230.95k
...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As... ...partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at...
Performance
Temporary work
Crusoe
San Francisco, CA
1 day ago
AI Engineer
$175k - $225k
...About the job AI Engineer AI Engineer San Francisco, CA | On-site Full-time | Software Development | Engineering... ...Optimize prompt and context engineering for reliability and performance Implement backend systems in Python supporting AI workflows...
Performance
Full time
Essence Coaching Group
San Francisco, CA
3 days ago
AI Engineer, LLMs
...revolutionizing software development with AI-powered formal verification.... .... Our novel foundation model enables scalable, precise... ...role Join our team as an AI Engineer and help us push the... ...languages and tools critical for high-performance computing in Python/C++ and machine...
Performance
Contract work
Logical Intelligence
San Francisco, CA
2 days ago
Gen AI/ Specialty Software Engineer 4
...We are looking for a skilled GenAI Agent Engineer with strong experience in agent orchestration frameworks nd workflow-driven AI systems . The ideal candidate has hands-on... ...and user engagement workflows • Perform prompt engineering optimized for marketing...
Performance
Mindlance
San Francisco, CA
5 days ago
AI Engineer
...AI Engineer Conduit is the platform for building conversational AI agents focused on hospitality... ...threaded. We operate on a contact-based model (vs ticket-based). Context across calls,... ...tests, validation datasets, and performance benchmarks to ensure reliability. Model...
Performance
Flexible hours
Conduit AI
San Francisco, CA
4 days ago
AI Prompt Engineer
...AI Prompt Engineer San Francisco, CA (On-Site M-F) Our client is an early-stage, AI-native technology company building AI-powered call... ...across all customer deployments. Review failed or low-performing calls daily, identify root causes, and ship targeted prompt...
Performance
latitude
San Francisco, CA
3 days ago
AI Engineer
$180k - $300k
...About the role You'll own the core AI systems that power Gamma: the models, prompts, and pipelines behind... ...evaluating, and fine-tuning for maximum performance across Gamma's product surface.... ...stack. You'll work closely with engineering and product to ship improvements that...
Performance
Full time
Work at office
Immediate start
Work from home
Gamma
San Francisco, CA
4 days ago
AI Engineer
...Cooperidge Consulting Firm is seeking an AI Engineer for a top Financial Technology (FinTech) client. This role focuses on building... ...Debug and resolve production issues involving agent behavior, performance, and edge cases Collaborate with engineering and product stakeholders...
Performance
Hourly pay
Full time
Weekend work
Cooperidge Consulting Firm
San Francisco, CA
4 days ago
AI Engineer
$150k - $250k
...AI Engineer At Distyl, AI Engineers build and operate AI systems that deliver real business... ...on production AI systems that must perform reliably under real-world constraints.... ...implement compound AI workflows that combine models, agents, retrieval, evaluation, and execution...
Performance
Work at office
3 days per week
Distyl AI
San Francisco, CA
4 days ago
AI Engineer - Thorin
...AI Engineer Location: San Francisco Company: Thorin Stage: Seed | Incubated at 8VC Thorin is an applied AI company born out of... ...workflows Set quality and velocity benchmarks for a high-performance engineering team Help shape product direction through close...
Performance
Thorin
San Francisco, CA
4 days ago
AI Engineer (Hybrid - San Francisco, CA)
$175k - $250k
...AI Engineer (Hybrid - San Francisco, CA) We are currently supporting a new client based in San Francisco that is building... ...usage. Build evaluation frameworks and feedback loops for model quality and performance. Work closely with leadership on AI strategy, product...
Performance
Full time
OMG Technologies Inc
San Francisco, CA
4 days ago
AI Evaluations Engineer - Healthcare
$150k - $180k
...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports to: Director Of EngineeringResponsibilitiesBuild... ...tooling meets production standards for reliability, performance, and maintainability.Qualifications5+ years of...
Performance
Remote work
Flexible hours
Ellipsis Health
San Francisco, CA
3 days ago
AI Engineer
...About the Role Fieldguide is building AI agents for the most complex audit and... ...power mission-critical work. As an AI Engineer, you'll build Fieldguide's intelligence layer... ...with a focus on reliability and performance Execute with AI-Native Leverage...
Performance
Remote work
Flexible hours
Fieldguide.ai
San Francisco, CA
3 days ago
AI Engineer
...opportunities in security and AI-powered tools. Build,... ...multiple sources for testing and model development. Develop, optimize... ...insights and track key performance indicators (KPIs). Monitor... ...in Computer Science, Software Engineering, or related field (or equivalent...
Performance
Insight Global
San Francisco, CA
4 days ago
AI Engineer
$150k - $200k
...building the world's first viral agent. An AI purpose built for influencer marketing.... ...greater system. Deep knowledge of which models to be used whenCreating and using... ...embeddings. Strong grasp of API architecture, performance tuning, and database optimization (SQL)....
Performance
Influur
San Francisco, CA
4 days ago
AI Engineer
$155k - $180k
...disciplinary skills (embedded AI, high-tech manufacturing automation... ...will assume the role of AI engineer, with the following main... ...data processing architectures Perform business and technical analysis... ...trained networks or training custom models Test and analyze Deep...
Performance
Kickmaker
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Engineer - Model Performance. Be the first to apply!