ML Infrastructure Engineer
Nebius
About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D. The role We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development. Your responsibilities will include:
What's it like to work at Nebius: Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI
Equal Opportunity Statement: Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law. Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.
If you need accommodations during the application process, please let us know.
- Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
- Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g.,CUDA, ROCm).
- Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
- Perform acceptance testing acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
- Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
- Develop tools and dashboards to visualise performance metrics visualise performance metrics, bottlenecks, and trends.
- Contribute to internal tooling, frameworks, and best practices
- A profound understanding of theoretical foundations of machine learning
- Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
- Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM)
- Good understanding of the GPU stack: CUDA,NCCL, drivers, and relevant libraries
- Familiarity with containerized environments (e.g., Docker, Kubernetes).
- Strong communication and ability to work independently
- Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
- Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf).
- Familiarity with cloud ML platforms like AWS, GCP, Azure ML
- Contributions to open-source ML benchmarking tools
- Competitive compensation
- Career growth and learning opportunities
- Flexibility and ownership
- Collaborative and innovative culture
- Opportunity to work on impactful AI projects
- International environment and talented teams
What's it like to work at Nebius: Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI
Equal Opportunity Statement: Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law. Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.
If you need accommodations during the application process, please let us know.
Vacancy posted 18 hours ago
Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer in United States vacancy
$227.2k - $324.5k
...Corporation. About the Role: This Software Engineering team works closely with Machine Learning... ...and low latency. Work with ML engineers to understand their challenges... ...Familiarity with the machine‑learning infrastructure. Previous experience with Akka. Ability...SuggestedFull timeFlexible hours- ...ML Infrastructure Engineer San Francisco, CA (On-Site M-F) Our client is a fast-growing, Series B AI startup building the infrastructure layer that connects complex enterprise data with large language models. Backed by top-tier investors, they're processing data...Suggested
- ...Senior Machine Learning Infrastructure Engineer Echo Neurotechnologies is an exciting new startup in the Brain-Computer Interface (BCI) space... ...critical role in shaping a high-performance, production-grade ML ecosystem to support rapid experimentation with diverse...SuggestedFlexible hours
- ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...Suggested
- ...AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous... ...development by prioritizing high-impact, ML-centric use cases. About the Role:... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms...SuggestedLocal areaWork from home
$153.2k - $234.1k
...vehicle behavior across real-world scenarios. As a Senior ML Infra Engineer, you will work on the core systems that enable rapid... ...experienceworking onlarge-scale distributed systems, applications, or ML infrastructure. ~ Experience designing robust services or frameworks...Local areaRemote workWork from homeRelocation packageFlexible hours$250k - $350k
...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-...$320k - $405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization... ...team, you'll design and implement ML infrastructure that powers Claude safety...Work at officeVisa sponsorshipFlexible hours$153.2k - $234.1k
...team at General Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge Autonomous Driving models... ...s most advanced driverless vehicles. As a Senior ML Infra Engineer, you will build critical...Work at officeLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours- ...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...Home officeFlexible hours
- ...Experienced HPC Infrastructure Engineer We're looking for an experienced HPC infrastructure engineer to lead bringup, administration, and operations on what is probably the largest anime AI training cluster in the world. You'll serve as the bridge between our researchers...Work at officeVisa sponsorship
$170.7k - $300.2k
A leading technology firm in Cupertino is seeking engineers to develop scalable machine learning approaches for autonomous systems. Candidates should possess a strong background in ML modeling frameworks, GPU computing, and software engineering. Responsibilities include...- Role Description As the first and founding ML Operations Engineer at Tennr, you’ll play a crucial role in building and iterating on foundational... ...and managing models at scale. Develop and maintain infrastructure that supports efficient ML operations, including data pipelines...Work at office
- Cognita Imaging Inc. is seeking a Member of Technical Staff for the ML Infrastructure team in Palo Alto, California. This role involves building and managing the infrastructure for machine learning systems, focusing on distributed systems and model serving. Candidates should...
$150k
A leading research lab in Sunnyvale is seeking a distributed ML infrastructure engineer to extend and scale training systems. The ideal candidate must have over 5 years of experience in ML systems with strong expertise in distributed training frameworks like DeepSpeed...$100k - $200k
Coval Simulation & Evaluation that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.20% - 1.00% Location San Francisco, CA, US Job type Full‑time Role Engineering, Backend Experience 1+ years Visa US citizen/visa only Skills...Full timeLive inWork at office- ...we're entering our next phase of growth — with AI at the center of everything we build next. We're looking for a Senior ML Infrastructure Engineer to build the platform our ML engineers depend on to rapidly iterate, experiment, and ship models — spanning feature pipelines...For subcontractor
- A cutting-edge AI company is seeking an experienced ML Ops Infrastructure Engineer to bridge research and production. This role focuses on designing and building CI/CD pipelines and deploying ML models for real-time applications. With a strong emphasis on automation, monitoring...
- Whatnot is seeking an AI/ML Platform Engineer to shape the future of machine learning within a fast-growing livestream shopping platform. In this role, you'll design and scale systems that support various business functions, prototype novel architectures, and build robust...Remote job
- A leading AI-driven technology company in Seattle is seeking a Senior or Staff Software Engineer for the ML Infrastructure team. The role involves designing and operating systems for large-scale model training and inference, focusing on reliability and performance. Candidates...
$216.7k - $303.4k
Senior Machine Learning Systems Engineer Remote - United States Reddit is a community of... ...Reddit is a high-impact team that owns the infrastructure that powers recommendations, content... ...Learning teams. What You’ll Do: As a Senior ML Infrastructure Engineer, you will lead...Remote jobFor contractorsWork experience placement- A cutting-edge robotics company based in California is looking for an experienced Machine Learning Infrastructure Engineer. This role involves designing scalable ML training platforms, optimizing high-performance computing systems, and ensuring robust job scheduling and...
- B Capital is seeking Software Engineers to join the ML Infrastructure team. In this role, you will design and operate systems to support large scale machine learning model training and inference. Candidates need significant experience in backend systems and distributed...
$170.7k - $300.2k
...Posted on 09/17/2023 The Apple Special Projects Group is looking for engineers to work on developing scalable machine learning approaches for... ...researchers. The qualifications sought include proficiency in ML modeling frameworks, experience in ML model serving, familiarity...- Repovive, Inc. seeks an experienced ML Engineer to build infrastructure for fraud detection and bank intelligence at Plaid. The role requires a minimum of 5 years of applied ML experience and emphasizes expertise in ML graph embeddings and feature stores. Interested candidates...
- Plaid Inc is seeking a Senior Software Engineer for their Machine Learning Infrastructure team in Seattle, focusing on designing and implementing ML systems. This key role involves building reliable infrastructures and working collaboratively to accelerate ML product delivery...
$171.6k - $302.2k
Senior ML Infrastructure Engineer - Training Algorithms, SIML Seattle, Washington, United States Machine Learning and AI Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people...Relocation$148.5k - $313.7k
100 Salesforce, Inc. is seeking a Software Engineer for ML Infrastructure to design and operate core systems that power AI at Slack. Candidates should have significant experience in software engineering, particularly in infrastructure and distributed systems, as well as...- The problem we saw Most AI infrastructure is built for batch: send a query, wait, get a response, reset. Powerful, but transactional... ...generation of AI inference infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU...Flexible hoursShift work
- ...home robotics. We're developing end-to-end ML models for robot manipulation, and you'... ...of expertise: data pipelines, training infrastructure or inference. You'll build systems... ...What We're Looking For Strong software engineering and systems fundamentals Experience building...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Infrastructure Engineer. Be the first to apply!
Related searches
- machine learning software engineer United States
- staff machine learning engineer United States
- ai ml engineer United States
- junior machine learning engineer United States
- lead machine learning engineer United States
- graduate machine learning engineer United States
- computer vision machine learning engineer United States
- machine learning engineer United States
- entry level machine learning engineer United States
- senior ml engineer United States



