AI Engineer - Model Performance
FATHOM
Role Overview We’re hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the fine‑tuning infrastructure that makes the rest of the AI team faster. This is not a research role. You’ll be optimizing real systems serving millions of meetings — choosing between quantization trade‑offs, debugging speculative decoding, or figuring out why one GPU family’s tail latency explodes at high concurrency while another stays stable. Responsibilities Benchmark FP8 quantization across GPU families, find that FP8 KV cache causes catastrophic repetition loops, identify static quantization as 6% faster than dynamic on certain hardware, and ship a production config that gets 1.3x speedup with less than 1% quality degradation. Evaluate serving frameworks (vLLM vs SGLang) with speculative decoding — discover that ngram speculation degrades ASR quality while EAGLE3 draft models don’t, and that torch.compile makes certain GPUs 7% slower. Build a fine‑tuning pipeline that takes a JSONL dataset and produces an optimized tune ready for serving, so a teammate can train a small classifier in an afternoon instead of a week. Optimize GPU spend — know which GPU families are best for batch workloads (stable under high concurrency) vs latency‑sensitive paths, and when a 30% cost premium isn’t worth it. Debug production inference issues — trace a quality regression to a serving framework upgrade that changed the default attention backend, or find that audio format handling in the multimodal pipeline silently drops segments. Hard Skills Deep experience with LLM serving frameworks (vLLM, SGLang, TensorRT‑LLM, or similar) — not just deploying them, but tuning them: attention backends, scheduling strategies, CUDA graph warmup, prefix caching. Hands‑on quantization experience — you understand weight vs activation quantization, per‑channel vs per‑tensor scaling, and when dynamic quantization introduces more overhead than it saves. Production fine‑tuning experience — LoRA/QLoRA SFT, familiarity with training frameworks, understanding of data formatting, learning rate schedules, and how to diagnose training failures. Strong Python skills. You’ll write serving infrastructure, benchmarking harnesses, and training pipelines — not notebooks. Comfort with GPU profiling and performance analysis. You should be able to look at a benchmark result and know whether the bottleneck is compute, memory bandwidth, or scheduling overhead. Strong Signals Cost modeling for GPU infrastructure — you’ve had to choose between GPU types and justify the tradeoff. Experience with multimodal models (audio/vision encoders + LLM decoders). Experience with Modal, Ray Serve, or similar serverless GPU platforms. Understanding of audio processing (codecs, chunking, sample rates). Experience building internal tooling that other engineers use — this role succeeds when the rest of the team ships faster. Not Required ML research background or publications. Prompt engineering expertise. Frontend or full‑stack experience. Masters/PhD (though it’s fine if you have one). Benefits The opportunity to shape the foundational software services of a growing company. A role that balances innovation and incremental improvement. A dynamic and collaborative engineering team. Competitive compensation and benefits. A supportive environment that encourages innovation and personal growth. #J-18808-Ljbffr Fathom
- A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale... ...and collaboration across teams to optimize performance and reliability. Ideal candidates will have...Performance
- A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...Performance
$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design... ...fault-tolerant, high-performance systems for serving LLMs workloads... ...engineers to integrate and optimize models for production and research... ...is to develop reliable AI systems for the world's most...PerformanceFull time- A growing technology company located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to improve product quality. As part of the AI engineering team, you will use data to shape how AI behaves...Suggested
$201k
A leading technology firm in San Francisco seeks a Senior Software Engineer for the Model LifeCycle team. This role focuses on building a managed platform for application development, specifically leveraging Machine Learning models, including LLMs. Candidates should have...Suggested- ...Runtime Engineer – AI Runtime & Execution About the Role We're looking for a Runtime Engineer to help build and optimise... ..., you'll play a key role in ensuring that compiled models execute with maximum performance, scalability, and reliability across a range of...Performance
- ...to use and access our start-of-the-art AI models, allowing them to do things that they've never been able to before. We focus on performant and efficient model inference, as well as... ...About the Role We are looking for an engineer who wants to take the world's largest...Performance
$230k - $385k
About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and infrastructure... ...forward progress About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring...Performance$220k - $320k
...love squeezing every last drop of performance out of GPUs, diving deep into... ...and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models... ...well-funded ten-person team of engineers who work in-person in downtown...PerformanceWork at office- ...building Signalcore is building the performance management layer for enterprise AI systems. As companies deploy AI... ...systems is fundamentally hard across models, runs, and environment We... ...We’re looking for a founding AI engineer to help build the core system. This...PerformanceImmediate start
- A leading research accelerator for AI located in San Francisco is looking for a software engineer to evaluate and enhance AI-generated code. The role requires... ...requiring 10 to 40 hours of engagement weekly, with potential for extensions based on performance. #J-18808-LjbffrPerformanceContract workRemote work
$160k - $220k
...AI Engineer Title of Role: AI Engineer Location: San Francisco, onsite Company Stage of Funding: Pre-Seed — Software Development... ...applications using Python and TypeScript, ensuring high performance and responsiveness. Analyze user feedback and system performance...PerformanceWork at office- ...Hello, We have 5 urgent openings for an " AI Software Engineer ". These are hybrid roles. Only looking for candidates who can work... ...software meets enterprise standards for security, quality, performance, and maintainability. The ideal candidate has strong software...PerformanceLong term contract
$405k
...interpretable, and steerable AI systems. We want AI to be safe... ...of committed researchers, engineers, policy experts, and business... ...eval frameworks that measure model capabilities across diverse... ...technical initiatives in high-performance, demanding environments—trading...PerformanceWork at officeVisa sponsorshipFlexible hours$172.43k - $230.95k
...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As... ...partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at...PerformanceTemporary work$175k - $225k
...About the job AI Engineer AI Engineer San Francisco, CA | On-site Full-time | Software Development | Engineering... ...Optimize prompt and context engineering for reliability and performance Implement backend systems in Python supporting AI workflows...PerformanceFull time- ...revolutionizing software development with AI-powered formal verification.... .... Our novel foundation model enables scalable, precise... ...role Join our team as an AI Engineer and help us push the... ...languages and tools critical for high-performance computing in Python/C++ and machine...PerformanceContract work
- ...We are looking for a skilled GenAI Agent Engineer with strong experience in agent orchestration frameworks nd workflow-driven AI systems . The ideal candidate has hands-on... ...and user engagement workflows • Perform prompt engineering optimized for marketing...Performance
- ...AI Engineer Conduit is the platform for building conversational AI agents focused on hospitality... ...threaded. We operate on a contact-based model (vs ticket-based). Context across calls,... ...tests, validation datasets, and performance benchmarks to ensure reliability. Model...PerformanceFlexible hours
- ...AI Prompt Engineer San Francisco, CA (On-Site M-F) Our client is an early-stage, AI-native technology company building AI-powered call... ...across all customer deployments. Review failed or low-performing calls daily, identify root causes, and ship targeted prompt...Performance
$180k - $300k
...About the role You'll own the core AI systems that power Gamma: the models, prompts, and pipelines behind... ...evaluating, and fine-tuning for maximum performance across Gamma's product surface.... ...stack. You'll work closely with engineering and product to ship improvements that...PerformanceFull timeWork at officeImmediate startWork from home- ...Cooperidge Consulting Firm is seeking an AI Engineer for a top Financial Technology (FinTech) client. This role focuses on building... ...Debug and resolve production issues involving agent behavior, performance, and edge cases Collaborate with engineering and product stakeholders...PerformanceHourly payFull timeWeekend work
$150k - $250k
...AI Engineer At Distyl, AI Engineers build and operate AI systems that deliver real business... ...on production AI systems that must perform reliably under real-world constraints.... ...implement compound AI workflows that combine models, agents, retrieval, evaluation, and execution...PerformanceWork at office3 days per week- ...AI Engineer Location: San Francisco Company: Thorin Stage: Seed | Incubated at 8VC Thorin is an applied AI company born out of... ...workflows Set quality and velocity benchmarks for a high-performance engineering team Help shape product direction through close...Performance
$175k - $250k
...AI Engineer (Hybrid - San Francisco, CA) We are currently supporting a new client based in San Francisco that is building... ...usage. Build evaluation frameworks and feedback loops for model quality and performance. Work closely with leadership on AI strategy, product...PerformanceFull time$150k - $180k
...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports to: Director Of EngineeringResponsibilitiesBuild... ...tooling meets production standards for reliability, performance, and maintainability.Qualifications5+ years of...PerformanceRemote workFlexible hours- ...About the Role Fieldguide is building AI agents for the most complex audit and... ...power mission-critical work. As an AI Engineer, you'll build Fieldguide's intelligence layer... ...with a focus on reliability and performance Execute with AI-Native Leverage...PerformanceRemote workFlexible hours
- ...opportunities in security and AI-powered tools. Build,... ...multiple sources for testing and model development. Develop, optimize... ...insights and track key performance indicators (KPIs). Monitor... ...in Computer Science, Software Engineering, or related field (or equivalent...Performance
$150k - $200k
...building the world's first viral agent. An AI purpose built for influencer marketing.... ...greater system. Deep knowledge of which models to be used whenCreating and using... ...embeddings. Strong grasp of API architecture, performance tuning, and database optimization (SQL)....Performance$155k - $180k
...disciplinary skills (embedded AI, high-tech manufacturing automation... ...will assume the role of AI engineer, with the following main... ...data processing architectures Perform business and technical analysis... ...trained networks or training custom models Test and analyze Deep...Performance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Engineer - Model Performance. Be the first to apply!
- ai research engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai developer San Francisco, CA
- ai engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- acting performance San Francisco, CA
- lead performance test engineer San Francisco, CA


