ML Infrastructure Engineer

Nebius

About Nebius:

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

The role

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

Your responsibilities will include:

Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g.,CUDA, ROCm).
Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
Perform acceptance testing acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
Develop tools and dashboards to visualise performance metrics visualise performance metrics, bottlenecks, and trends.
Contribute to internal tooling, frameworks, and best practices

We expect you to have:

A profound understanding of theoretical foundations of machine learning
Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM)
Good understanding of the GPU stack: CUDA,NCCL, drivers, and relevant libraries
Familiarity with containerized environments (e.g., Docker, Kubernetes).
Strong communication and ability to work independently

Ways to stand out from the crowd:

Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf).
Familiarity with cloud ML platforms like AWS, GCP, Azure ML
Contributions to open-source ML benchmarking tools

Benefits & Perks:

Competitive compensation
Career growth and learning opportunities
Flexibility and ownership
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams

What's it like to work at Nebius:

Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI

Equal Opportunity Statement:

Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law.

Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.

If you need accommodations during the application process, please let us know.

Apply

Vacancy posted 18 hours ago

Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer in United States vacancy

Staff, ML Infrastructure Engineer
$227.2k - $324.5k
...Corporation. About the Role: This Software Engineering team works closely with Machine Learning... ...and low latency. Work with ML engineers to understand their challenges... ...Familiarity with the machine‑learning infrastructure. Previous experience with Akka. Ability...
Suggested
Full time
Flexible hours
Tubi Tv
San Francisco, CA
2 days ago
ML Infrastructure Engineer
...ML Infrastructure Engineer San Francisco, CA (On-Site M-F) Our client is a fast-growing, Series B AI startup building the infrastructure layer that connects complex enterprise data with large language models. Backed by top-tier investors, they're processing data...
Suggested
RecruitSeq
San Francisco, CA
2 days ago
ML Infrastructure Engineer
...Senior Machine Learning Infrastructure Engineer Echo Neurotechnologies is an exciting new startup in the Brain-Computer Interface (BCI) space... ...critical role in shaping a high-performance, production-grade ML ecosystem to support rapid experimentation with diverse...
Suggested
Flexible hours
Echo Neurotechnologies
United States
4 days ago
ML Infrastructure Engineer
...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
Suggested
Spectral Labs
San Francisco, CA
10 days ago
Senior ML Infrastructure Engineer (Compute)
...AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous... ...development by prioritizing high-impact, ML-centric use cases. About the Role:... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms...
Suggested
Local area
Work from home
General Motors
Sunnyvale, CA
3 days ago
Senior ML Infrastructure Engineer - Embodied AI
$153.2k - $234.1k
...vehicle behavior across real-world scenarios. As a Senior ML Infra Engineer, you will work on the core systems that enable rapid... ...experienceworking onlarge-scale distributed systems, applications, or ML infrastructure. ~ Experience designing robust services or frameworks...
Local area
Remote work
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
3 days ago
ML Infrastructure Engineer
$250k - $350k
...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-...
techire ai
San Francisco, CA
4 days ago
ML Infrastructure Engineer, Safeguards
$320k - $405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization... ...team, you'll design and implement ML infrastructure that powers Claude safety...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
4 days ago
Senior ML Infrastructure Engineer - Embodied AI Scaling Foundations
$153.2k - $234.1k
...team at General Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge Autonomous Driving models... ...s most advanced driverless vehicles. As a Senior ML Infra Engineer, you will build critical...
Work at office
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
ML Ops Infrastructure Engineer
...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...
Home office
Flexible hours
Deepgram
United States
3 days ago
HPC/ML Infrastructure Engineer
...Experienced HPC Infrastructure Engineer We're looking for an experienced HPC infrastructure engineer to lead bringup, administration, and operations on what is probably the largest anime AI training cluster in the world. You'll serve as the bridge between our researchers...
Work at office
Visa sponsorship
Spellbrush
San Francisco, CA
7 hours ago
Senior ML Infrastructure Engineer — Scalable ML & MLOps
$170.7k - $300.2k
A leading technology firm in Cupertino is seeking engineers to develop scalable machine learning approaches for autonomous systems. Candidates should possess a strong background in ML modeling frameworks, GPU computing, and software engineering. Responsibilities include...
Career-Mover
Cupertino, CA
1 day ago
ML Infrastructure Engineer
Role Description As the first and founding ML Operations Engineer at Tennr, you’ll play a crucial role in building and iterating on foundational... ...and managing models at scale. Develop and maintain infrastructure that supports efficient ML operations, including data pipelines...
Work at office
Tennr
New York, NY
3 days ago
ML Infrastructure Engineer, Scalable Training & Serving
Cognita Imaging Inc. is seeking a Member of Technical Staff for the ML Infrastructure team in Palo Alto, California. This role involves building and managing the infrastructure for machine learning systems, focusing on distributed systems and model serving. Candidates should...
Cognita Imaging Inc.
Palo Alto, CA
18 hours ago
Distributed ML Infrastructure Engineer
$150k
A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed...
Institute of Foundation Models
Sunnyvale, CA
18 hours ago
ML-Infrastructure Engineer
$100k - $200k
Coval Simulation & Evaluation that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.20% - 1.00% Location San Francisco, CA, US Job type Full‑time Role Engineering, Backend Experience 1+ years Visa US citizen/visa only Skills...
Full time
Live in
Work at office
Voiceflow
San Francisco, CA
2 days ago
Senior ML Infrastructure Engineer
...we're entering our next phase of growth — with AI at the center of everything we build next. We're looking for a Senior ML Infrastructure Engineer to build the platform our ML engineers depend on to rapidly iterate, experiment, and ship models — spanning feature pipelines...
For subcontractor
Rebar
New York, NY
18 hours ago
ML Ops Infrastructure Engineer: Scale AI Models
A cutting-edge AI company is seeking an experienced ML Ops Infrastructure Engineer to bridge research and production. This role focuses on designing and building CI/CD pipelines and deploying ML models for real-time applications. With a strong emphasis on automation, monitoring...
Deepgram
New York, NY
1 day ago
Remote ML Infrastructure Engineer
Whatnot is seeking an AI/ML Platform Engineer to shape the future of machine learning within a fast-growing livestream shopping platform. In this role, you'll design and scale systems that support various business functions, prototype novel architectures, and build robust...
Remote job
Whatnot
San Francisco, CA
3 days ago
Senior ML Infrastructure Engineer: Scale Training & Inference
A leading AI-driven technology company in Seattle is seeking a Senior or Staff Software Engineer for the ML Infrastructure team. The role involves designing and operating systems for large-scale model training and inference, focusing on reliability and performance. Candidates...
Salesforce, Inc.
Seattle, WA
4 days ago
Senior ML Infrastructure Engineer — Remote (Equity Eligible)
$216.7k - $303.4k
Senior Machine Learning Systems Engineer Remote - United States Reddit is a community of... ...Reddit is a high-impact team that owns the infrastructure that powers recommendations, content... ...Learning teams. What You’ll Do: As a Senior ML Infrastructure Engineer, you will lead...
Remote job
For contractors
Work experience placement
reddit
New York, NY
1 day ago
Senior ML Infrastructure Engineer: Scale GPU Training & HPC
A cutting-edge robotics company based in California is looking for an experienced Machine Learning Infrastructure Engineer. This role involves designing scalable ML training platforms, optimizing high-performance computing systems, and ensuring robust job scheduling and...
Dyna Robotics
Redwood City, CA
2 days ago
ML Infrastructure Engineer - Scalable GPU & Kubernetes
B Capital is seeking Software Engineers to join the ML Infrastructure team. In this role, you will design and operate systems to support large scale machine learning model training and inference. Candidates need significant experience in backend systems and distributed...
B Capital
Seattle, WA
1 day ago
Sr. ML Infrastructure Engineer
$170.7k - $300.2k
...Posted on 09/17/2023 The Apple Special Projects Group is looking for engineers to work on developing scalable machine learning approaches for... ...researchers. The qualifications sought include proficiency in ML modeling frameworks, experience in ML model serving, familiarity...
Career-Mover
Cupertino, CA
1 day ago
Senior ML Infrastructure Engineer - Fraud Detection
Repovive, Inc. seeks an experienced ML Engineer to build infrastructure for fraud detection and bank intelligence at Plaid. The role requires a minimum of 5 years of applied ML experience and emphasizes expertise in ML graph embeddings and feature stores. Interested candidates...
Repovive, Inc.
San Francisco, CA
1 day ago
Senior ML Infrastructure Engineer — Scalable Feature Stores
Plaid Inc is seeking a Senior Software Engineer for their Machine Learning Infrastructure team in Seattle, focusing on designing and implementing ML systems. This key role involves building reliable infrastructures and working collaboratively to accelerate ML product delivery...
Plaid Inc
Seattle, WA
2 days ago
Senior ML Infrastructure Engineer - Training Algorithms, SIML
$171.6k - $302.2k
Senior ML Infrastructure Engineer - Training Algorithms, SIML Seattle, Washington, United States Machine Learning and AI Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people...
Relocation
Apple Inc.
Seattle, WA
2 days ago
ML Infrastructure Engineer — Scalable GPU & Kubernetes
$148.5k - $313.7k
100 Salesforce, Inc. is seeking a Software Engineer for ML Infrastructure to design and operate core systems that power AI at Slack. Candidates should have significant experience in software engineering, particularly in infrastructure and distributed systems, as well as...
100 Salesforce, Inc.
Seattle, WA
1 day ago
Founding ML infrastructure Engineer
The problem we saw Most AI infrastructure is built for batch: send a query, wait, get a response, reset. Powerful, but transactional... ...generation of AI inference infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU...
Flexible hours
Shift work
URun
San Francisco, CA
3 days ago
ML Infrastructure Engineer
...home robotics. We're developing end-to-end ML models for robot manipulation, and you'... ...of expertise: data pipelines, training infrastructure or inference. You'll build systems... ...What We're Looking For Strong software engineering and systems fundamentals Experience building...
Sunday
Mountain View, CA
18 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Infrastructure Engineer. Be the first to apply!