Machine Learning Engineer - Inference
$160k - $230kTogether
Together AI is seeking a Machine Learning Engineer to join our Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the‑art large language models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting‑edge AI solutions.
Responsibilities
- Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
- Develop and optimize runtime inference services for large-scale AI applications.
- Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
- Conduct design and code reviews to ensure high standards of quality.
- Create services, tools, and developer documentation to support the inference engine.
- Implement robust and fault-tolerant systems for data ingestion and processing.
Requirements
- 3+ years of experience writing high-performance, well-tested, production-quality code.
- Proficiency with Python and PyTorch.
- Demonstrated experience in building high performance libraries and tooling.
- Excellent understanding of low-level operating system concepts including multi-threading, memory management, networking, storage, performance, and scale.
- Preferred: Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum.
- Preferred: Knowledge of AI inference techniques such as speculative decoding.
- Preferred: Knowledge of CUDA/Triton programming.
- Nice to have: Knowledge of Rust, Cython and compilers.
About Together AI
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co‑designing software, hardware, algorithms, and models. We have contributed to leading open‑source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next‑generation AI infrastructure.
Compensation
We offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 – $230,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Interested in building your career at Together AI? Get future opportunities sent straight to your email.
#J-18808-Ljbffr$200k
...highest standards of data security and privacy protection. To learn more about Plaud, please visit and follow along on Instagram... ...building and deploying high-throughput, ultra-low-latency inference engines for large language models or foundational speech models....SuggestedFull timeWork at officeWorldwide$179k - $248k
...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Abridge . Base pay range... ...and maintain scalable Kubernetes clusters for AI model inference and training Develop, optimize, and maintain ML model...SuggestedHourly payFull timeFlexible hours- Job Overview Department: Engineering Location: San Francisco We're looking for an ML Inference Engineer with deep expertise in high-performance ML engineering. This is a highly technical, high-impact role focused on squeezing every drop of performance from generative...SuggestedVisa sponsorshipRelocation package
- Reactor seeks an ML Inference Engineer in San Francisco to enhance performance on generative media models. In this role, you'll drive model performance, design in-house inference runtimes, and optimize neural network models. Required qualifications include a Bachelor's...SuggestedRelocation package
- ...Francisco is searching for an ML Infrastructure and Platform Engineer. In this role, you will lead the architecture and scaling of our... ...the ground up, ensuring high availability and low-latency inference. This is a founding technical hire position, requiring end-to-...Suggested
- ...A research-driven AI company is seeking a Machine Learning Engineer to join their Inference Engine team. You'll design and develop production systems to enhance AI inference performance, collaborating with researchers and engineers. The ideal candidate will have over 3...Full time
- Reactor is looking for an experienced ML Inference Engineer with deep expertise in high-performance ML engineering. This role focuses on optimizing the performance of generative media models, contributing to Reactor's competitive edge. The ideal candidate will drive model...
- A media technology company in San Francisco is seeking a Founding Engineer specializing in ML Inference. This highly technical role requires expertise in the ML infrastructure stack and aims to optimize generative media performance. The ideal candidate will drive innovations...Relocation package
- Reducto, a fast-growing AI company in San Francisco, is hiring a Machine Learning Infra Engineer. This role involves building and maintaining the training and inference frameworks necessary for optimal performance. Ideal candidates should possess strong Python skills, have...
- uRun is seeking an ML Performance Engineer to build high-performance infrastructure for interactive AI. You will write custom CUDA kernels and optimize model inference for speed and efficiency. This foundational role involves working closely with the founding team on critical...
- ...requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will... ...inference frameworks and a solid understanding of reinforcement learning technologies. Comprehensive healthcare benefits, parental...
- ...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
- ...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
- MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
- ...company is seeking an Infrastructure Software Engineer in San Francisco to build and maintain components of an ML inference platform. The successful candidate will... ...collaborative team dedicated to advancing AI and machine learning infrastructure. #J-18808-Ljbffr Baseten
- ...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-... ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...
$135k - $210k
...about the fruit they are seeing. We are looking for a Machine Learning Engineer to build creative, practical, and robust solutions to ML/... ...deploy infrastructure for model training, evaluation, and inference, both in the cloud and on edge devices. Design and...Full timeWork at officeWeekend work- Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems... ...plus strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work...Remote job
- A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional...
$300k - $430k
...evaluation and experimentation, and the routing layer that manages inference across multiple providers. We work at the intersection of... ...to use. About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering Decagon's model training and...Work at office$150k - $225k
...losses. About You: You want to learn from the best of the best, get your hands... .... You are looking to be an impeccable machine learning engineer working on cutting-edge AI solutions.... ...: Implement optimizations for model inference and training, ensuring ML services can...Full timeWork at officeFlexible hours3 days per week$115k - $185k
...experience — talk with your recruiter to learn more. Base pay range $115,000.00/yr - $185,000.00/yr Machine Learning Engineer Fractal Analytics is a strategic AI partner... ...of interviewing at Fractal by 2x Inferred from the description for this job Medical...Hourly payFull timeLocal areaRemote workRelocation$147.6k - $274k
...Machine Learning Engineer - Infra San Francisco, CA The Opportunity We are revolutionizing drug discovery with cutting-edge machine learning... ...with PyTorch implementation, especially regarding scaling inference performance. A history of significant contributions to...Relocation package- ...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...office at our San Francisco office Eager to learn and adapt quickly Prior startup or... ...and active learning pipelines Optimize inference, batching, and quantization on GPU Productionize...Work at officeVisa sponsorshipRelocation package
$160k - $220k
...About the Role Together AI is looking for an ML Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs. Relevant experience includes implementing runtime systems that perform inference at scale using AI/ML models...Full time$150k - $220k
...Founding Machine Learning Engineer San Francisco Compensation ~ Estimated base salary $150K – $220K • Offers Equity • Offers Bonus... ...automation platform. You'll work at the intersection of LLM inference, browser understanding, and low-latency systems, shipping...H1bWork at officeVisa sponsorshipSleeping nights- ...construction veterans and world-class engineers to solve physical-world problems that... ...team-we'd love to have you join us. Machine Learning Engineer: Perception Bedrock is bringing... ...to the Edge: Optimize models for inference on embedded hardware. You will debug...Work at officeFlexible hours
$150k - $190k
...-driven simulation software stack for engineering and manufacturing across advanced industries... ..., multi-physics simulation through AI inference across the entire engineering... ...goals. Who We're Looking For As a Machine Learning Engineer in Delivery, you are a...Remote workFlexible hours- ...Machine Learning Engineer We are looking for a Machine Learning Engineer to join the growing AI and Machine Learning team at Strava. This... ...prototyping to shipping production code to scaling and optimizing inference and deployment Shape AI at Strava: Bring your voice...Worldwide
$130k - $170k
...Aquabyte is seeking a Machine Learning Engineer to develop and deploy algorithms for fish farms worldwide. You’ll be responsible for software... ...in‑depth data analytics, and building statistical data inference models of biological processes. This AI team develops image...Immediate startWorldwideFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Engineer - Inference. Be the first to apply!
- machine learning software engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- graduate machine learning engineer San Francisco, CA
- computer vision machine learning engineer San Francisco, CA
- machine learning engineer San Francisco, CA
- entry level machine learning engineer San Francisco, CA
- senior ml engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- data scientist machine learning engineer San Francisco, CA


