ML Infrastructure Engineer

Nebius

About Nebius:

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

The role

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

Your responsibilities will include:

Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g.,CUDA, ROCm).
Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
Perform acceptance testing acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
Develop tools and dashboards to visualise performance metrics visualise performance metrics, bottlenecks, and trends.
Contribute to internal tooling, frameworks, and best practices

We expect you to have:

A profound understanding of theoretical foundations of machine learning
Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM)
Good understanding of the GPU stack: CUDA,NCCL, drivers, and relevant libraries
Familiarity with containerized environments (e.g., Docker, Kubernetes).
Strong communication and ability to work independently

Ways to stand out from the crowd:

Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf).
Familiarity with cloud ML platforms like AWS, GCP, Azure ML
Contributions to open-source ML benchmarking tools

Benefits & Perks:

Competitive compensation
Career growth and learning opportunities
Flexibility and work-life balance
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams

What's it like to work at Nebius:

Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI

Equal Opportunity Statement:

Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law.

Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.

If you need accommodations during the application process, please let us know.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer in United States vacancy

ML Ops Infrastructure Engineer
...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...
Suggested
Home office
Flexible hours
Deepgram
New York, NY
4 days ago
Senior ML Infrastructure Engineer Remote (Equity Eligible)
$216.7k - $303.4k
...Senior Machine Learning Systems Engineer Remote - United States Reddit is a community... ...is a high-impact team that owns the infrastructure that powers recommendations, content discovery... ...teams. What You’ll Do: As a Senior ML Infrastructure Engineer, you will...
Suggested
For contractors
Work experience placement
Remote work
Reddit
United States
3 days ago
ML Infrastructure Engineer - Multimodal Training Tools, SIML
$181.1k - $272.1k
...ML Infrastructure Engineer - Multimodal Training Tools, SIML Work Locations (2) Submit Resume Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people? We are the Intelligence...
Suggested
Relocation
Apple
Cupertino, CA
2 days ago
Senior ML Infrastructure Engineer - VE Algorithms
$139.5k - $258.1k
...Senior ML Infrastructure Engineer - VE Algorithms Are you passionate about groundbreaking modeling technologies to enrich billions of people? We are the Video Engineering (VE) team at Apple developing cutting-edge video and machine learning algorithms to build the...
Suggested
Relocation
Apple
San Diego, CA
3 days ago
GPU-Backed ML Infrastructure Engineer
A leading blockchain analytics firm is seeking a Senior Software Engineer for ML Infrastructure to collaborate with diverse teams in designing and operating GPU-backed infrastructure for AI systems. This role involves optimizing inference systems and implementing model...
Suggested
TRM Labs
California, MO
16 days ago
ML Infrastructure Engineer
...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
Spectral Labs
San Francisco, CA
3 days ago
ML Infrastructure Engineer
...Sygaldry Quantum-Accelerated AI Server Engineer Sygaldry Technologies is building quantum... ...and transform AI. They need compute infrastructure that stays out of their way: GPU access... ..., distributed training) Python-based ML and scientific computing tooling (PyTorch...
Casual work
Local area
Visa sponsorship
Sygaldry
San Francisco, CA
3 days ago
Senior ML Infrastructure Engineer - Embodied AI Scaling Foundations
$153.2k - $234.1k
...team at General Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge Autonomous Driving models... ...s most advanced driverless vehicles. As a Senior ML Infra Engineer, you will build critical...
Work at office
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
5 days ago
ML Infrastructure Engineer, Safeguards
$320k - $405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization... ...team, you'll design and implement ML infrastructure that powers Claude safety...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
ML Infrastructure Engineer
$250k - $350k
...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-...
techire ai
San Francisco, CA
2 days ago
Senior ML Infrastructure Engineer - Embodied AI
$153.2k - $234.1k
...vehicle behavior across real-world scenarios. As a Senior ML Infra Engineer, you will work on the core systems that enable rapid... ...experienceworking onlarge-scale distributed systems, applications, or ML infrastructure. ~ Experience designing robust services or frameworks...
Local area
Remote work
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
1 day ago
Senior ML Infrastructure Engineer (Compute)
...AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous... ...development by prioritizing high-impact, ML-centric use cases. About the Role:... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms...
Local area
Work from home
General Motors
Sunnyvale, CA
6 days ago
Senior ML Infrastructure Engineer, Proactive
$181.1k - $318.4k
...Senior ML Infrastructure Engineer, Proactive The Intelligence Platform team empowers clients across Apple's operating systems with high quality user-centric knowledge and inferences that enable next generation user experiences. We're a systems engineering team focused...
Worldwide
Relocation
Apple
Cupertino, CA
5 days ago
Senior ML Engineer HD Maps & Lane Network
...learning models. Responsibilities include leading the design of neural networks, managing data strategies, and collaborating with engineers for seamless implementation. May offers a competitive salary range and comprehensive healthcare benefits, aiming to foster a diverse...
May Mobility
New York, NY
4 days ago
Staff ML Infrastructure Engineer - Embodied AI
$189.3k - $290.7k
...vehicle behavior across real-world scenarios. As a Staff ML Infra Engineer, you will drive the development of core systems that enable... ...-performance, and cost-efficient systems on modern cloud infrastructure-performance ~ End-to-end experience across the ML development...
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Olympia, WA
1 day ago
Senior ML Data Infrastructure Engineer
...and implement scalable data processing pipelines for ML training and validation Build and maintain... ...Technical Requirements: ~7+ years of software engineering experience, with 3+ years in data infrastructure ~ Strong expertise in GCP's data and ML infrastructure...
Remote work
Tranzeal
United States
4 days ago
ML Infrastructure Engineer: Scale Data, Training & Evaluation
DeepReach Inc. is seeking a Member of Technical Staff for ML Infrastructure to build core systems for large-scale robotics data and model... ...will have a Bachelor's degree in Computer Science and strong engineering experience, particularly in ML infrastructure and data systems...
DeepReach Inc.
San Jose, CA
4 days ago
Senior ML Data Infrastructure Engineer
...ML Data Infrastructure Engineer Location: Sunnyvale CA or Remote Duration: 12+ Months Rate: DOE Key skills - GCP ML Infrastructure, BigQuery, Dataflow, Airflow (Cloud composer), Vertext AI, Datapipeline, ML Training Role Overview: We're seeking an experienced...
Remote work
Redolent
United States
3 days ago
ML Data Infrastructure Engineer, Sunnyvale
...Job Title 7+ years of software engineering experience, with 3+ years in ML serving/infrastructure. Strong expertise in container orchestration (Kubernetes) and cloud platforms. Experience with model serving technologies (TensorFlow Serving, Triton, KServe). Deep knowledge...
Tranzeal
Sunnyvale, CA
5 days ago
Senior ML Infrastructure Engineer - GPU & Scale
...cutting-edge technology company is seeking a Senior Machine Learning Engineer to build and operate systems that power large-scale machine learning training. This role includes designing ML infrastructure, optimizing performance, and enhancing developer experiences....
Flexible hours
TensorWave
Las Vegas, NV
3 days ago
Data/ML Infrastructure Engineer
...About the Role We are seeking a Data Infrastructure Engineer to build and operate the infrastructure that turns drone, aerial, and orbital... ...'ll Do Design, build, and operate scalable data and ML infrastructure on AWS, including workloads running on Kubernetes...
Permanent employment
Full time
Matter Intelligence
San Francisco, CA
2 days ago
Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model
$181.1k - $318.4k
...Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model Work Locations (2) Submit Resume Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create...
Relocation
Apple
Seattle, WA
5 days ago
ML Infrastructure and Deployment Engineer San Fransisco, CA
...of-the-art machine learning models for AI applications. Own the ML lifecycle from research to deployment. Drive innovation in... ...Computer Science or related field. Experience with MLOps and cloud infrastructure. Knowledge of containerization and orchestration (Docker, Kubernetes...
Hubnest Inc
California, MO
4 days ago
On-device ML Infrastructure Engineer, Compiler & Runtime, Graphics, Games & ML
$181.1k - $318.4k
...On-device ML Infrastructure Engineer, Compiler & Runtime, Graphics, Games & ML Imagine being at the forefront of an evolution where modern AI meets the elegance of Apple silicon. The On-Device Machine Learning team transforms groundbreaking research into practical...
Relocation
Apple
Cupertino, CA
2 days ago
ML Infrastructure Engineer
$124k - $250k
...awards HERE. A Day in the Life As a member of our software engineering infra team, you'll solve technical challenges, including upgrading and implementing state-of-the-art software infrastructure. The team builds a high-performance, high availability, globally...
Full time
Applovin
Palo Alto, CA
23 hours ago
Senior ML Systems & Infrastructure Engineer (Remote)
Autodesk, Inc. is looking for a Senior Machine Engineer, ML Systems and Infrastructure to design scalable systems for machine learning. This role focuses on building infrastructure for large-scale data pipelines and production ML workflows. The ideal candidate has experience...
Remote job
Autodesk, Inc.
Boston, MA
1 day ago
AI/ML Infrastructure Engineer
...AI/ML Infrastructure Engineer 1755 Grant St Concord California 94520 (3 days onsite in week) 12+ Months Web Cam Interview $90-95/Hr on W2 Role:- Lead and design the platform and infrastructure architecture for AIML and NLP in modern...
Remote work
3 days per week
Syntricate Technologies
United States
5 days ago
Remote ML Ops Engineer Cloud & Compute Clusters
...firm based in Palo Alto, California, is seeking a Machine Learning DevOps professional. This role focuses on optimizing and automating ML pipelines to ensure scalable and reliable models in a stimulating work environment. The ideal candidate will have solid experience...
Permanent employment
Remote work
Pathway Genomics
Palo Alto, CA
1 day ago
Cloud ML DevRel Engineer - US remote
...& 1.5 million Gradio apps. Our open-source libraries have more than 700,000 stars on Github. About the Role As a Cloud ML DevRel Engineer, your goal is to grow the impact of the Hugging Face ML Cloud team by teaching the community of ML practitioners how to accelerate...
Work at office
Remote work
Flexible hours
Hugging Face
United States
4 days ago
Cloud and ML Infrastructure Engineer
$120k - $190k
...Cloud and ML Infrastructure Engineer $120k–$190k + equity Somerville, MA The last decade has been incredibly exciting for electric mobility. However, the electrification of the transportation industry has scaled while relying on an insufficient set of quality...
Work at office
Glimpse Corp LLC
United States
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Infrastructure Engineer. Be the first to apply!