Machine Learning Engineer - Inference

$160k - $230k

Together

Together AI is seeking a Machine Learning Engineer to join our Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the‑art large language models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting‑edge AI solutions.

Responsibilities

Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
Develop and optimize runtime inference services for large-scale AI applications.
Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
Conduct design and code reviews to ensure high standards of quality.
Create services, tools, and developer documentation to support the inference engine.
Implement robust and fault-tolerant systems for data ingestion and processing.

Requirements

3+ years of experience writing high-performance, well-tested, production-quality code.
Proficiency with Python and PyTorch.
Demonstrated experience in building high performance libraries and tooling.
Excellent understanding of low-level operating system concepts including multi-threading, memory management, networking, storage, performance, and scale.
Preferred: Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum.
Preferred: Knowledge of AI inference techniques such as speculative decoding.
Preferred: Knowledge of CUDA/Triton programming.
Nice to have: Knowledge of Rust, Cython and compilers.

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co‑designing software, hardware, algorithms, and models. We have contributed to leading open‑source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next‑generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 – $230,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Interested in building your career at Together AI? Get future opportunities sent straight to your email.

#J-18808-Ljbffr

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Machine Learning Engineer - Inference in San Francisco, CA vacancy

Machine Learning Engineer, Inference & Serving (Speech LLM) - San Francisco
$200k
...highest standards of data security and privacy protection. To learn more about Plaud, please visit and follow along on Instagram... ...building and deploying high-throughput, ultra-low-latency inference engines for large language models or foundational speech models....
Suggested
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
4 days ago
Machine Learning Infrastructure Engineer- Model Inference
$179k - $248k
...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Abridge . Base pay range... ...and maintain scalable Kubernetes clusters for AI model inference and training Develop, optimize, and maintain ML model...
Suggested
Hourly pay
Full time
Flexible hours
Abridge
San Francisco, CA
3 days ago
ML Inference Engineer
Job Overview Department: Engineering Location: San Francisco We're looking for an ML Inference Engineer with deep expertise in high-performance ML engineering. This is a highly technical, high-impact role focused on squeezing every drop of performance from generative...
Suggested
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
4 days ago
ML Inference Engineer — Ultra-Low Latency & High Throughput
Reactor seeks an ML Inference Engineer in San Francisco to enhance performance on generative media models. In this role, you'll drive model performance, design in-house inference runtimes, and optimize neural network models. Required qualifications include a Bachelor's...
Suggested
Relocation package
Reactor
San Francisco, CA
4 days ago
Founding ML Infra Engineer: Scale Real-Time Inference
...Francisco is searching for an ML Infrastructure and Platform Engineer. In this role, you will lead the architecture and scaling of our... ...the ground up, ensuring high availability and low-latency inference. This is a founding technical hire position, requiring end-to-...
Suggested
URun
San Francisco, CA
4 days ago
ML Inference Engineer PyTorch & Scalable AI
...A research-driven AI company is seeking a Machine Learning Engineer to join their Inference Engine team. You'll design and develop production systems to enhance AI inference performance, collaborating with researchers and engineers. The ideal candidate will have over 3...
Full time
Together
San Francisco, CA
3 days ago
High-Performance ML Inference Engineer for Diffusion Models
Reactor is looking for an experienced ML Inference Engineer with deep expertise in high-performance ML engineering. This role focuses on optimizing the performance of generative media models, contributing to Reactor's competitive edge. The ideal candidate will drive model...
Reactor
San Francisco, CA
4 days ago
Founding ML Inference Engineer — Ultra-Low Latency AI
A media technology company in San Francisco is seeking a Founding Engineer specializing in ML Inference. This highly technical role requires expertise in the ML infrastructure stack and aims to optimize generative media performance. The ideal candidate will drive innovations...
Relocation package
Reactor
San Francisco, CA
20 hours ago
ML Infra Engineer: Scale GPU Training & Inference
Reducto, a fast-growing AI company in San Francisco, is hiring a Machine Learning Infra Engineer. This role involves building and maintaining the training and inference frameworks necessary for optimal performance. Ideal candidates should possess strong Python skills, have...
Reducto
San Francisco, CA
3 days ago
Founding ML Performance Engineer - Sub-50ms Inference
uRun is seeking an ML Performance Engineer to build high-performance infrastructure for interactive AI. You will write custom CUDA kernels and optimize model inference for speed and efficiency. This foundational role involves working closely with the founding team on critical...
URun
San Francisco, CA
4 days ago
Senior GPU ML Infra Engineer — Mid-Training & Inference
...requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will... ...inference frameworks and a solid understanding of reinforcement learning technologies. Comprehensive healthcare benefits, parental...
Reflection AI
San Francisco, CA
4 days ago
Senior ML Inference Systems Engineer
...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
Gimlet Labs
San Francisco, CA
2 days ago
ML Inference Systems Engineer
...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
Gimlet Labs, Inc.
San Francisco, CA
4 days ago
Senior ML Inference Engineer Production Systems
MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
MakerMaker.AI
San Francisco, CA
1 day ago
ML Inference Infrastructure Engineer
...company is seeking an Infrastructure Software Engineer in San Francisco to build and maintain components of an ML inference platform. The successful candidate will... ...collaborative team dedicated to advancing AI and machine learning infrastructure. #J-18808-Ljbffr Baseten
Baseten
San Francisco, CA
4 days ago
Staff ML Inference Systems Engineer - Scalable GPU Infra (SF)
...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-... ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...
Acceler8 Talent
San Francisco, CA
3 days ago
Machine Learning Engineer
$135k - $210k
...about the fruit they are seeing. We are looking for a Machine Learning Engineer to build creative, practical, and robust solutions to ML/... ...deploy infrastructure for model training, evaluation, and inference, both in the cloud and on edge devices. Design and...
Full time
Work at office
Weekend work
Orchard Robotics
San Francisco, CA
1 day ago
Staff ML Inference Engineer — Model Efficiency (Remote)
Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems... ...plus strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work...
Remote job
Jaide Health
San Francisco, CA
2 days ago
ML Infrastructure Engineer - Model Inference & Scale
A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
Abridge
San Francisco, CA
2 days ago
Staff ML Infrastructure Engineer: Scale Training & Inference
$300k - $430k
...evaluation and experimentation, and the routing layer that manages inference across multiple providers. We work at the intersection of... ...to use. About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering Decagon's model training and...
Work at office
Decagon
San Francisco, CA
2 days ago
Machine Learning Engineer
$150k - $225k
...losses. About You: You want to learn from the best of the best, get your hands... .... You are looking to be an impeccable machine learning engineer working on cutting-edge AI solutions.... ...: Implement optimizations for model inference and training, ensuring ML services can...
Full time
Work at office
Flexible hours
3 days per week
BASELAYER
San Francisco, CA
3 days ago
Machine Learning Engineer
$115k - $185k
...experience — talk with your recruiter to learn more. Base pay range $115,000.00/yr - $185,000.00/yr Machine Learning Engineer Fractal Analytics is a strategic AI partner... ...of interviewing at Fractal by 2x Inferred from the description for this job Medical...
Hourly pay
Full time
Local area
Remote work
Relocation
Fractal, Inc.
San Francisco, CA
2 days ago
Machine Learning Engineer - Infra San Francisco, CA
$147.6k - $274k
...Machine Learning Engineer - Infra San Francisco, CA The Opportunity We are revolutionizing drug discovery with cutting-edge machine learning... ...with PyTorch implementation, especially regarding scaling inference performance. A history of significant contributions to...
Relocation package
ESR Healthcare
San Francisco, CA
3 days ago
Machine Learning Engineer
...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...office at our San Francisco office Eager to learn and adapt quickly Prior startup or... ...and active learning pipelines Optimize inference, batching, and quantization on GPU Productionize...
Work at office
Visa sponsorship
Relocation package
Trypulse
San Francisco, CA
8 days ago
Machine Learning Engineer
$160k - $220k
...About the Role Together AI is looking for an ML Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs. Relevant experience includes implementing runtime systems that perform inference at scale using AI/ML models...
Full time
Together AI
San Francisco, CA
3 days ago
Founding Machine Learning Engineer
$150k - $220k
...Founding Machine Learning Engineer San Francisco Compensation ~ Estimated base salary $150K – $220K • Offers Equity • Offers Bonus... ...automation platform. You'll work at the intersection of LLM inference, browser understanding, and low-latency systems, shipping...
H1b
Work at office
Visa sponsorship
Sleeping nights
Composite.ai
San Francisco, CA
1 day ago
Machine Learning Engineer: Perception
...construction veterans and world-class engineers to solve physical-world problems that... ...team-we'd love to have you join us. Machine Learning Engineer: Perception Bedrock is bringing... ...to the Edge: Optimize models for inference on embedded hardware. You will debug...
Work at office
Flexible hours
Bedrock Robotics
San Francisco, CA
1 day ago
Machine Learning Engineer
$150k - $190k
...-driven simulation software stack for engineering and manufacturing across advanced industries... ..., multi-physics simulation through AI inference across the entire engineering... ...goals. Who We're Looking For As a Machine Learning Engineer in Delivery, you are a...
Remote work
Flexible hours
PhysicsX
San Francisco, CA
2 days ago
Machine Learning Engineer
...Machine Learning Engineer We are looking for a Machine Learning Engineer to join the growing AI and Machine Learning team at Strava. This... ...prototyping to shipping production code to scaling and optimizing inference and deployment Shape AI at Strava: Bring your voice...
Worldwide
Strava
San Francisco, CA
1 day ago
Machine Learning Engineer
$130k - $170k
...Aquabyte is seeking a Machine Learning Engineer to develop and deploy algorithms for fish farms worldwide. You’ll be responsible for software... ...in‑depth data analytics, and building statistical data inference models of biological processes. This AI team develops image...
Immediate start
Worldwide
Flexible hours
Aquabyte
San Francisco, CA
7 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer - Inference. Be the first to apply!