Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer - Model Performance

Baseten

Software Engineer Focused On ML Performance

Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products.

Are you passionate about advancing the application of artificial intelligence? We are looking for a Software Engineer focused on ML performance to join our dynamic team. This role is ideal for someone who thrives in a fast-paced startup environment and is eager to make significant contributions to the exciting field of LLM Inference. If you are a backend engineer who thrives on making things faster and is excited about open-source ML models, we look forward to your application.

You'll get to work on these types of projects as part of our Model Performance team:

  • Baseten Embeddings Inference: The fastest embeddings solution available
  • The Baseten Inference Stack
  • Driving model performance optimization

Implement, refine, and productionize cutting-edge techniques (quantization, speculative decoding, kv cache reuse, chunked prefill and LoRA) for ML model inference and infrastructure.

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to debug ML performance issues.

Apply and scale optimization techniques across a wide range of ML models, particularly large language models.

Collaborate with a diverse team to design and implement innovative solutions.

Own projects from idea to production.

Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.

Experience with one or more general-purpose programming languages, such as Python or C++.

Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous batching).

Strong familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT-LLM.

Demonstrated interest and experience in LLM's.

Deep understanding of GPU architecture.

Competitive compensation, including meaningful equity.

100% coverage of medical, dental, and vision insurance for employee and dependents

Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)

Paid parental leave

Fertility and family-building stipend through Carrot

Company-facilitated 401(k)

Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Software Engineer - Model Performance in United States vacancy
  •  ...access our start-of-the-art AI models, allowing them to do things...  ...able to before. We focus on performant and efficient model inference...  ...Role We are looking for an engineer who wants to take the world's...  ...5 years of professional software engineering experience. Have... 
    Performance

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently...  ...automated benchmarking, dataset-driven testing, and performance validation pipelines. You will work at the... 
    Performance

    SPREEAI

    San Francisco, CA
    1 day ago
  • $230k - $385k

    About the Team We're hiring software engineers to make OpenAI's Model Performance teams more productive. These teams work on the systems, tooling, and infrastructure that help improve model performance across OpenAI's training and inference workloads at frontier scale.... 
    Performance

    OpenAI

    San Francisco, CA
    7 hours ago
  • $220k - $320k

     ...squeezing every last drop of performance out of GPUs, diving deep into...  ...and hosts specialized language models for companies that need...  ...well-funded ten-person team of engineers who work in-person in downtown...  ...has founded and run their own software companies. We are high-agency... 
    Performance
    Work at office

    Inference

    San Francisco, CA
    3 days ago
  •  ...Baseten Model Performance Engineer Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer... 
    Performance
    Remote work
    Flexible hours

    Baseten

    United States
    7 hours ago
  • $405k

     ...Model Performance Software Engineer, Claude Code San Francisco, CA | New York City, NY About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    anthropic

    New York, NY
    3 days ago
  • $140k - $390k

     ...AI ASIC). This role sits at the intersection of ML modeling and hardware-aware systems engineering - you will architect and train state-of-the-art models...  ...underlying silicon and compiler stack to maximize performance. You will drive the full lifecycle from model research... 
    Performance
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    3 days ago
  • Core Model Software Development Engineer Hyundai America Technical Center, Inc. (HATCI) is currently looking for a Core Model Software Development...  ...tool and subsystem models for fuel economy, linear performance, grade performance, and trailer tow simulation... 
    Performance
    For contractors
    Flexible hours

    Hyundai America Technical Center

    Superior, MI
    2 days ago
  • $172.43k - $230.95k

     ...Senior Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the...  ...partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at... 
    Performance
    Temporary work

    Crusoe

    San Francisco, CA
    7 hours ago
  • $45 per hour

     ...our global user base! You will work on improving the performance and efficiency of large-scale AI models across training, inference, and deployment. This is...  ...early. Responsibilities: - Support research and engineering efforts to optimize deep learning models for speed,... 
    Performance
    Hourly pay
    Full time
    Summer work
    Internship
    Local area

    Tik Tok

    San Jose, CA
    1 day ago
  • $193.3k - $261.5k

     ...(AWS) builds AWS Neuron, the software development kit used to accelerate...  ...ML inference and training performance. The Inference Enablement...  ...of running a wide range of models and supporting novel architecture...  ...-software boundary, our engineers build systematic infrastructure... 
    Performance
    Work experience placement
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    1 day ago
  • $184k - $287.5k

    Responsibilities Develop state‑of‑the‑art model optimization techniques—...  ...—to boost end‑to‑end model performance for production deployments....  ...on the road. Architect the software interface to seamlessly...  ...Computer Science, Computer Engineering, or a related technical... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $173.11k - $234.39k

     ...Location Type Hybrid Department Engineering Compensation $173,113 - $23...  ..., qualifications, interview performance, and work location. We are...  ...data, and run AI agents and models directly in their workflows....  ...QUALIFICATIONS 3+ years of software engineering or equivalent... 
    Performance
    Full time
    Work at office
    Local area
    Flexible hours
    Shift work
    3 days per week

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $166k - $225k

     ...to improve their business. Databricks’ Model Serving product provides enterprises with...  ...SLAs and cost efficiency. As a Senior Engineer, you’ll play a critical role in shaping...  ...architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and... 
    Performance
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    4 days ago
  •  ...Software Engineer Opportunity Baseten powers mission-critical inference for the world's most...  ...frontier of AI to bring cutting-edge models into production. Join us and help build...  ...sitting at the intersection of high-performance computing (HPC) and Large Language Model... 
    Performance
    Remote work
    Flexible hours

    Baseten

    United States
    3 days ago
  •  ...Senior AI Engineer In Pre-training Evaluation Aleph Alpha Research's mission is to deliver...  ...Our organization develops foundational models and next-generation methods that make it...  ...and whether it predicts downstream performance. Other weeks you'll be optimizing pipeline... 
    Performance
    Remote work
    Relocation
    Flexible hours

    Aleph Alpha

    United States
    3 days ago
  • $212.8k

     ...Responsibilities: - Convert and compile ML models for execution on edge NPUs,...  ...- Profile and analyze model performance and power consumption on...  ...Science, Electrical Engineering, Computer Engineering, or a...  ...in machine learning software engineering, model deployment... 
    Performance
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    4 days ago
  •  ...Software Engineer, Apple Intelligence Model Platform The Proactive Intelligence Platform is at the heart of an intelligent system experience that...  ...released code. You will develop and improve unit tests, performance tests, and diagnose and resolve customer reported... 
    Performance
    Worldwide

    Apple

    Cupertino, CA
    7 hours ago
  • $149.2k - $214.5k

     ...Role Abnormal AI is looking for a Software Backend Engineer II to join the Detection Team. The...  ...on building systems for Detection's Model Platform, you will be responsible for...  ...computer science, data structures, and performance optimization. ~ BS degree in Computer... 
    Performance
    Immediate start
    Remote work

    Abnormal AI, Inc.

    United States
    3 days ago
  • $100k

     ...Opportunity The Consumer ML Model Compute & Serving Systems team...  ...framework, a compute orchestration engine, and many more. We are looking for strong software engineers for this team, which...  ...availability, throughput, and performance. You are adept at building... 
    Performance
    Hourly pay
    Full time
    Immediate start
    Remote work
    Flexible hours

    Netflix

    Los Gatos, CA
    2 hours ago
  • $145k - $200k

     ...Palantir builds the world’s leading software for data-driven decisions and...  .... The Role We are a software engineering team with expertise in enabling ML models in production. We deploy AI...  ...Responsibilities Building high-performance model serving infrastructure that... 
    Performance
    Full time
    Work experience placement
    Work at office
    Remote work
    Work from home
    Relocation package

    Palantir Technologies

    New York, NY
    21 hours ago
  • $40 per hour

     ...We are looking for a Software Developer to join our team to train AI models. You will measure the progress of these AI chatbots, evaluate their logic, and...  ...quality produced by AI models for correctness and performance Qualifications Fluency in English (native or... 
    Performance
    Hourly pay
    Full time
    Contract work
    Part time
    Remote work

    DataAnnotation

    United States
    2 days ago
  •  ...Senior Python Developer - AI/ML Model SDKs **(USCs + GC...  ...used by data scientists and ML engineers • Develop SDKs that support...  ...Python expertise and strong software engineering practices to build...  ...reviews, CI/CD, linting, and performance optimization • Manage the end... 
    Performance
    Contract work

    Diverse Lynx

    Alpharetta, GA
    1 day ago
  •  ...Python Infrastructure Engineer — Model Evaluation What if your Python expertise could directly shape how the world's most advanced AI...  ...What You'll Do Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation... 
    Performance
    Hourly pay
    Ongoing contract
    Contract work
    Freelance
    Remote work
    Flexible hours

    Alignerr

    United States
    4 days ago
  • $175k - $250k

     ...Python Developer - EQ Factor Model Risk Technology Millennium is looking for an exceptional...  ...impactful work at the intersection of engineering, data, and quantitative analytics....  ...models into the firm's delivery platforms Perform extensive back-testing of existing and... 
    Performance

    Millennium Management Corp

    New York, NY
    3 days ago
  • $40 per hour

     ...specializing in AI is seeking a Systems Developer for a remote position. The role involves training AI models, providing coding challenges, and evaluating their performance. Candidates should be proficient in at least one programming language, including Python or... 
    Performance
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    United States
    4 days ago
  •  ...financial services industry, is seeking a CrossMargin Quantitative Model Developer to join their team. As a CrossMargin Quantitative...  ...Potential for contract extension based on project needs and performance. Work in a vibrant city with a hybrid work schedule, combining... 
    Performance
    Contract work
    Work at office
    Remote work

    Manpower Group Inc.

    Charlotte, NC
    1 day ago
  •  ...Interactive Brokers (IBKR) seeks a Quantitative Software Engineer to join our elite transaction...  ...next-generation surveillance models to detect emerging manipulation patterns...  ...millions of daily trades) Evaluate model performance to optimize detection accuracy while minimizing... 
    Performance
    Work at office
    Remote work

    Interactive Brokers

    Greenwich, CT
    3 days ago
  • $139.9k - $274.8k

     ...Llama, and more. As a? Principal Software Engineer , you will shape the future of one of...  ...AI strategy. Our mission is to serve models at scale-reliably, efficiently, and with...  ...scalability, observability, efficiency, and performance across mission-critical services.... 
    Performance
    Ongoing contract
    Local area

    Microsoft Corporation

    Redmond, WA
    1 day ago
  • $192k - $260k

     ...improve their business. Foundation Model Serving is the API Product for hosting...  ...is necessary. We're looking for engineers who have owned high scale operational sensitive...  ...decisions and trade-offs to optimize performance, throughput, autoscaling, and operational... 
    Performance
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer - Model Performance. Be the first to apply!