Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Engineer - Inference

$160k - $230k

Together

Together AI is seeking a Machine Learning Engineer to join our Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the‑art large language models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting‑edge AI solutions.

Responsibilities
  • Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
  • Develop and optimize runtime inference services for large-scale AI applications.
  • Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
  • Conduct design and code reviews to ensure high standards of quality.
  • Create services, tools, and developer documentation to support the inference engine.
  • Implement robust and fault-tolerant systems for data ingestion and processing.
Requirements
  • 3+ years of experience writing high-performance, well-tested, production-quality code.
  • Proficiency with Python and PyTorch.
  • Demonstrated experience in building high performance libraries and tooling.
  • Excellent understanding of low-level operating system concepts including multi-threading, memory management, networking, storage, performance, and scale.
  • Preferred: Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum.
  • Preferred: Knowledge of AI inference techniques such as speculative decoding.
  • Preferred: Knowledge of CUDA/Triton programming.
  • Nice to have: Knowledge of Rust, Cython and compilers.
About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co‑designing software, hardware, algorithms, and models. We have contributed to leading open‑source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next‑generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 – $230,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Interested in building your career at Together AI? Get future opportunities sent straight to your email.

#J-18808-Ljbffr
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Machine Learning Engineer - Inference in San Francisco, CA vacancy
  • $200k

     ...highest standards of data security and privacy protection. To learn more about Plaud, please visit and follow along on Instagram...  ...building and deploying high-throughput, ultra-low-latency inference engines for large language models or foundational speech models.... 
    Suggested
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    4 days ago
  • $179k - $248k

     ...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Abridge . Base pay range...  ...and maintain scalable Kubernetes clusters for AI model inference and training Develop, optimize, and maintain ML model... 
    Suggested
    Hourly pay
    Full time
    Flexible hours

    Abridge

    San Francisco, CA
    3 days ago
  • Job Overview Department: Engineering Location: San Francisco We're looking for an ML Inference Engineer with deep expertise in high-performance ML engineering. This is a highly technical, high-impact role focused on squeezing every drop of performance from generative... 
    Suggested
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    4 days ago
  • Reactor seeks an ML Inference Engineer in San Francisco to enhance performance on generative media models. In this role, you'll drive model performance, design in-house inference runtimes, and optimize neural network models. Required qualifications include a Bachelor's... 
    Suggested
    Relocation package

    Reactor

    San Francisco, CA
    4 days ago
  •  ...Francisco is searching for an ML Infrastructure and Platform Engineer. In this role, you will lead the architecture and scaling of our...  ...the ground up, ensuring high availability and low-latency inference. This is a founding technical hire position, requiring end-to-... 
    Suggested

    URun

    San Francisco, CA
    4 days ago
  •  ...A research-driven AI company is seeking a Machine Learning Engineer to join their Inference Engine team. You'll design and develop production systems to enhance AI inference performance, collaborating with researchers and engineers. The ideal candidate will have over 3... 
    Full time

    Together

    San Francisco, CA
    3 days ago
  • Reactor is looking for an experienced ML Inference Engineer with deep expertise in high-performance ML engineering. This role focuses on optimizing the performance of generative media models, contributing to Reactor's competitive edge. The ideal candidate will drive model... 

    Reactor

    San Francisco, CA
    4 days ago
  • A media technology company in San Francisco is seeking a Founding Engineer specializing in ML Inference. This highly technical role requires expertise in the ML infrastructure stack and aims to optimize generative media performance. The ideal candidate will drive innovations... 
    Relocation package

    Reactor

    San Francisco, CA
    20 hours ago
  • Reducto, a fast-growing AI company in San Francisco, is hiring a Machine Learning Infra Engineer. This role involves building and maintaining the training and inference frameworks necessary for optimal performance. Ideal candidates should possess strong Python skills, have... 

    Reducto

    San Francisco, CA
    3 days ago
  • uRun is seeking an ML Performance Engineer to build high-performance infrastructure for interactive AI. You will write custom CUDA kernels and optimize model inference for speed and efficiency. This foundational role involves working closely with the founding team on critical... 

    URun

    San Francisco, CA
    4 days ago
  •  ...requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will...  ...inference frameworks and a solid understanding of reinforcement learning technologies. Comprehensive healthcare benefits, parental... 

    Reflection AI

    San Francisco, CA
    4 days ago
  •  ...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving...  ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly... 

    Gimlet Labs

    San Francisco, CA
    2 days ago
  •  ...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems...  .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning... 

    Gimlet Labs, Inc.

    San Francisco, CA
    4 days ago
  • MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production... 

    MakerMaker.AI

    San Francisco, CA
    1 day ago
  •  ...company is seeking an Infrastructure Software Engineer in San Francisco to build and maintain components of an ML inference platform. The successful candidate will...  ...collaborative team dedicated to advancing AI and machine learning infrastructure. #J-18808-Ljbffr Baseten

    Baseten

    San Francisco, CA
    4 days ago
  •  ...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-...  ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency... 

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  • $135k - $210k

     ...about the fruit they are seeing. We are looking for a Machine Learning Engineer to build creative, practical, and robust solutions to ML/...  ...deploy infrastructure for model training, evaluation, and inference, both in the cloud and on edge devices. Design and... 
    Full time
    Work at office
    Weekend work

    Orchard Robotics

    San Francisco, CA
    1 day ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems...  ...plus strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work... 
    Remote job

    Jaide Health

    San Francisco, CA
    2 days ago
  • A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    2 days ago
  • $300k - $430k

     ...evaluation and experimentation, and the routing layer that manages inference across multiple providers. We work at the intersection of...  ...to use. About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering Decagon's model training and... 
    Work at office

    Decagon

    San Francisco, CA
    2 days ago
  • $150k - $225k

     ...losses. About You: You want to learn from the best of the best, get your hands...  .... You are looking to be an impeccable machine learning engineer working on cutting-edge AI solutions....  ...: Implement optimizations for model inference and training, ensuring ML services can... 
    Full time
    Work at office
    Flexible hours
    3 days per week

    BASELAYER

    San Francisco, CA
    3 days ago
  • $115k - $185k

     ...experience — talk with your recruiter to learn more. Base pay range $115,000.00/yr - $185,000.00/yr Machine Learning Engineer Fractal Analytics is a strategic AI partner...  ...of interviewing at Fractal by 2x Inferred from the description for this job Medical... 
    Hourly pay
    Full time
    Local area
    Remote work
    Relocation

    Fractal, Inc.

    San Francisco, CA
    2 days ago
  • $147.6k - $274k

     ...Machine Learning Engineer - Infra San Francisco, CA The Opportunity We are revolutionizing drug discovery with cutting-edge machine learning...  ...with PyTorch implementation, especially regarding scaling inference performance. A history of significant contributions to... 
    Relocation package

    ESR Healthcare

    San Francisco, CA
    3 days ago
  •  ...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100...  ...office at our San Francisco office Eager to learn and adapt quickly Prior startup or...  ...and active learning pipelines Optimize inference, batching, and quantization on GPU Productionize... 
    Work at office
    Visa sponsorship
    Relocation package

    Trypulse

    San Francisco, CA
    8 days ago
  • $160k - $220k

     ...About the Role Together AI is looking for an ML Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs. Relevant experience includes implementing runtime systems that perform inference at scale using AI/ML models... 
    Full time

    Together AI

    San Francisco, CA
    3 days ago
  • $150k - $220k

     ...Founding Machine Learning Engineer San Francisco Compensation ~ Estimated base salary $150K – $220K • Offers Equity • Offers Bonus...  ...automation platform. You'll work at the intersection of LLM inference, browser understanding, and low-latency systems, shipping... 
    H1b
    Work at office
    Visa sponsorship
    Sleeping nights

    Composite.ai

    San Francisco, CA
    1 day ago
  •  ...construction veterans and world-class engineers to solve physical-world problems that...  ...team-we'd love to have you join us. Machine Learning Engineer: Perception Bedrock is bringing...  ...to the Edge: Optimize models for inference on embedded hardware. You will debug... 
    Work at office
    Flexible hours

    Bedrock Robotics

    San Francisco, CA
    1 day ago
  • $150k - $190k

     ...-driven simulation software stack for engineering and manufacturing across advanced industries...  ..., multi-physics simulation through AI inference across the entire engineering...  ...goals. Who We're Looking For As a Machine Learning Engineer in Delivery, you are a... 
    Remote work
    Flexible hours

    PhysicsX

    San Francisco, CA
    2 days ago
  •  ...Machine Learning Engineer We are looking for a Machine Learning Engineer to join the growing AI and Machine Learning team at Strava. This...  ...prototyping to shipping production code to scaling and optimizing inference and deployment Shape AI at Strava: Bring your voice... 
    Worldwide

    Strava

    San Francisco, CA
    1 day ago
  • $130k - $170k

     ...Aquabyte is seeking a Machine Learning Engineer to develop and deploy algorithms for fish farms worldwide. You’ll be responsible for software...  ...in‑depth data analytics, and building statistical data inference models of biological processes. This AI team develops image... 
    Immediate start
    Worldwide
    Flexible hours

    Aquabyte

    San Francisco, CA
    7 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer - Inference. Be the first to apply!