Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer - Model Performance

$220k - $320k

Inference

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'd love to meet you. About Inference.net Inference.net trains and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models we train match GPT-5 accuracy but are smaller, faster, and up to 90% cheaper. Our platform handles everything end-to-end: distillation, training, evaluation, and planet-scale hosting. We are a well-funded ten-person team of engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do. Most of us are in the office 4 days a week in SF; hybrid works for Bay Area candidates. About the Role You will be responsible for making our inference stack as fast and efficient as possible. Your work spans from implementing known optimization techniques to experimenting with novel approaches, always with the goal of serving models faster and cheaper at scale. Your north star is inference performance: latency, throughput, cost efficiency, and how quickly we can bring new model architectures into production. You'll work across the full inference stack—from CUDA kernels to serving frameworks—to find and eliminate bottlenecks. This role reports directly to the founding team. You'll have the autonomy, a large compute budget, and technical support to push the limits of what's possible in model serving. Key Responsibilities Implement and productionize optimization techniques including quantization, speculative decoding, KV cache optimization, continuous batching, and LoRA serving Deep dive into inference frameworks (vLLM, SGLang, TensorRT-LLM) and underlying libraries to debug and improve performance Profile and optimize CUDA kernels and GPU utilization across our serving infrastructure Add support for new model architectures, ensuring they meet our performance standards before going to production Experiment with novel inference techniques and bring successful approaches into production Build tooling and benchmarks to measure and track inference performance across our fleet Collaborate with applied ML engineers to ensure trained models can be served efficiently Requirements 2+ years of experience in ML systems, inference optimization, or GPU programming Strong proficiency in Python and familiarity with C++ Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar) Deep understanding of GPU architecture and experience profiling GPU workloads Familiarity with LLM optimization techniques (quantization, speculative decoding, continuous batching, KV cache management) Experience with PyTorch and understanding of how models execute on hardware Track record of measurably improving system performance Nice-to-Have Experience with CUDA programming Familiarity with serving non-LLM models (TTS, vision, embeddings) Experience with distributed inference and multi-GPU serving Contributions to open-source inference frameworks Experience with Docker and Kubernetes You don't need to tick every box. Curiosity and the ability to learn quickly matter more. Compensation We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $220,000 - $320,000, plus equity and benefits, depending on experience. Equal Opportunity Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status. If you're excited about making AI inference faster for everyone, we'd love to hear from you. Please send your resume and GitHub to View email address on click.appcast.io and/or apply here on Ashby. #J-18808-Ljbffr

Vacancy posted 19 hours ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer - Model Performance in San Francisco, CA vacancy
  • $166k - $225k

     ...to improve their business. Databricks’ Model Serving product provides enterprises...  ...strong SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping...  ...architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and... 
    Senior
    Performance
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    3 days ago
  •  ...leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in...  ...distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong... 
    Senior
    Performance

    Menlo Ventures

    San Francisco, CA
    3 days ago
  •  ...AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product...  ..., and research teams to ensure high-performance system delivery. The ideal candidate has over 10... 
    Senior
    Performance

    Menlo Ventures

    San Francisco, CA
    4 days ago
  • $237.6k - $318.24k

     ...and partners advance their AI strategies, and be part of a high‑performing team that believes in each other, come build with us at Crusoe. About This Role: The Senior Staff Software Engineer for the AI Model Lifecycle team will play a crucial role in building a... 
    Senior
    Performance
    Temporary work

    AI Corporation

    San Francisco, CA
    4 days ago
  • $325k

     ...company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production...  ...ideal candidate has over 5 years of software engineering experience, strong familiarity...  ...with researchers and focus on performance optimization. Compensation ranges... 
    Senior
    Performance

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...the frontier of AI to bring cutting-edge models into production. With our recent $150M...  ...customer demand. THE ROLE Baseten’s Model Performance (MP) team is responsible for ensuring...  ...contributions to open-source inference engines (vLLM, TensorRT-LLM, SGLang, TGI) Knowledge... 
    Performance
    Flexible hours

    Baseten

    San Francisco, CA
    19 hours ago
  • $325k

     ...access our start-of-the-art AI models, allowing them to do things...  ...able to before. We focus on performant and efficient model inference...  ...the Role We are looking for an engineer who wants to take the world's...  ...5 years of professional software engineering experience. Have... 
    Performance

    Centaur Labs

    San Francisco, CA
    1 day ago
  • Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems...  ...automated benchmarking, dataset-driven testing, and performance validation pipelines. You will work at the intersection of... 
    Performance

    SpreeAI

    San Francisco, CA
    1 day ago
  •  ...plans to support business objectives and drive growth. Manage all aspects of marketing campaigns, including planning, execution, and performance analysis. Collaborate with cross-functional teams to develop marketing collateral, content, and messaging. Oversee digital... 
    Senior
    Performance
    Freelance

    The Garage Daventry Ltd

    San Francisco, CA
    1 day ago
  •  ...frontier of AI to bring cutting‑edge models into production. We're growing...  ...Join us and help build the platform engineers turn to to ship AI products. THE...  ...intelligence? We are looking for a Software Engineer focused on ML performance to join our dynamic team. This role... 
    Performance
    Flexible hours

    Baseten

    San Francisco, CA
    6 days ago
  • $230k - $385k

     ...above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the...  ...results, or market conditions. About the Team We’re hiring software engineers to make OpenAI’s Model Performance teams more productive. These teams work on... 
    Performance
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    2 days ago
  • $320k

     ...group of committed researchers, engineers, policy experts, and business...  ...The Role We’re looking for a Software Engineer to work at the...  ...evaluation systems that measure model capabilities across diverse coding...  ...systems Working in high‑performance, demanding environments—trading... 
    Performance
    Work experience placement
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  • A leading AI research accelerator is looking for a skilled software engineer based in the US or Canada. This contractor role involves evaluating...  ...requires a minimum of 10 hours per week, with a duration of 1 month and potential extensions based on performance. #J-18808-Ljbffr
    Senior
    Performance
    For contractors
    Remote work
    10 hours per week

    Turing

    San Francisco, CA
    2 days ago
  • $187k - $231k

     ...days per week in office) The Opportunity We’re hiring Senior Software Engineers to help shape the future of private markets. At Sydecar...  ...compensation may vary depending on experience, qualifications, performance and other factors. Sydecar’s values Our values are... 
    Senior
    Performance
    Bank staff
    Work at office
    Night shift
    2 days per week

    Sydecar

    San Francisco, CA
    2 days ago
  •  ...eliminating complexity and friction with seamless automation. As a Senior Software Engineer at Capably, you’ll build the systems that make enterprise...  ...that power Capably’s product, with a sharp focus on performance, reliability, maintainability, and user impact. You’ll... 
    Senior
    Performance
    Immediate start

    Capably

    San Francisco, CA
    19 hours ago
  •  ...Improve agent intelligence, reliability, and real-time voice performance Work closely with Product, Customer Success, and Sales to...  ...Looking For 4-8 years of experience in backend or full-stack software engineering Strong engineering fundamentals and product intuition... 
    Senior
    Performance
    Work at office
    Immediate start

    Broccoli AI

    San Francisco, CA
    1 day ago
  •  ...Job Title Senior Software Engineer Location: San Francisco, CA, United States Job Type: Full-Time Job Summary The world is experiencing explosive...  ...great designs and great code. You will be part of a high performance team that is working on an industry-leading Optimization... 
    Senior
    Performance
    Full time
    Worldwide

    Makani Networks

    San Francisco, CA
    1 day ago
  • MSCI is seeking a Senior Software Engineer in San Francisco, CA, to design and build high-performance distributed systems. This role requires strong hands-on engineering skills alongside architectural thinking to create scalable and reliable services. Candidates should... 
    Senior
    Performance

    MSCI

    San Francisco, CA
    4 days ago
  • Nuro is seeking talented engineers to join our Performance team to optimize the performance of our AV software. The role focuses on ensuring our vehicles react quickly and safely to their environment. Candidates should have strong C++ skills and a deep understanding of... 
    Senior
    Performance

    Nuro

    San Francisco, CA
    4 days ago
  •  ...About the role This isn't just another engineering role. This is a unique opportunity to be...  ...function and shape the future of performance and scalability at Persona. As our products...  ...What you'll bring to Persona A strong software engineering background, demonstrated by... 
    Senior
    Performance
    Full time
    Temporary work
    For contractors
    Internship

    Persona

    San Francisco, CA
    1 day ago
  • $400k

     ...Mechanize RL Engineer Mechanize builds reinforcement learning environments...  ...and evaluate their coding models. Learn more at mechanize.work...  ..., judgment-heavy parts of software engineering. We build the...  ...infrastructure, distributed systems, performance, security, or other... 
    Senior
    Performance

    Mechanize

    San Francisco, CA
    3 days ago
  •  ...sensors) foot scan into precision engineered, 3D‑printed insoles that...  ...the world 3 Proprietary AI models that power the experience...  ...month‑over‑month. The Role As a Senior Software Engineer, you’ll build and...  ...or days. you’ll join a high‑performing team of top engineers and AI... 
    Senior
    Performance

    Hike Medical

    San Francisco, CA
    19 hours ago
  •  ...Role Overview Transcend is hiring a Senior Software Engineer on the Workflows Team to build software that helps companies tackle privacy...  ...roadmap discussions and technical design reviews, weighing performance, cost, reliability, and scale trade‑offs. Proactively fix... 
    Senior
    Performance
    Full time
    Remote work
    Flexible hours

    Transcend

    San Francisco, CA
    2 days ago
  •  ...described here is that of an individually contributing senior software development engineer. Your time will mostly be spent understanding the problem...  ...solved to make this frictionless, scalable and highly performant. There’s always the thrill of working at an early stage... 
    Senior
    Performance
    Remote work

    Zipstack

    San Francisco, CA
    1 day ago
  •  ...advertising platform built for the next era of performance marketing. We operate at the...  .... Role Overview We are looking for a Software Engineer or Senior Software Engineer (Backend) to join...  ...integrate and deploy machine learning models into production environments Improve... 
    Senior
    Performance
    Work at office
    Monday to Friday

    Jobr

    San Francisco, CA
    19 hours ago
  •  ...Core team is dedicated to developing the software that powers Walrus’ storage nodes and...  ...teams, to keep the Walrus network secure, performant, and reliable. Optimize existing...  ...Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical... 
    Senior
    Performance

    Alumni Ventures

    San Francisco, CA
    19 hours ago
  • $160k - $220k

     ...hardware, robotics, and product software. Our software team builds the systems that connect models to motors, sensors to...  ...production. About the Role As a Senior Software Engineer at Droyd, you will own core...  ...in large codebases and performance-sensitive environments Can... 
    Senior
    Performance
    Full time

    Droyd

    San Francisco, CA
    19 hours ago
  •  ...Senior Software Engineer | AI Healthcare Platform Startup San Francisco, CA (Onsite) | High-growth startup | Mission-driven We’re hiring...  ...’ll work on You’ll be building the backbone of a high-performance AI system, including: Architecting and scaling distributed... 
    Senior
    Performance

    Signify Technology

    San Francisco, CA
    4 days ago
  •  ...Senior Software Engineer San Francisco, California, United States Or refer someone Job Openings Senior Software Engineer Position Title...  ...functional teams Uphold high standards for code quality, performance, and maintainability Contribute to product strategy and... 
    Senior
    Performance
    Full time

    Barker Staffing Solutions, LLC

    San Francisco, CA
    4 days ago
  • $225k

     ...Senior Software Engineer Job ID: 6824015 Posted: 9 days ago Experience: Senior Level Salary: $225,000 - $450,000 per year Job Details What You...  ..., automation, and AI research agents Improving the performance and reliability of MCP servers and backend agentic services... 
    Senior
    Performance
    Work at office
    2 days per week
    3 days per week

    Leoforce

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - Model Performance. Be the first to apply!