ML Infrastructure Engineer
Nebius
About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D. The role We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development. Your responsibilities will include:
What's it like to work at Nebius: Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI
Equal Opportunity Statement: Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law. Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.
If you need accommodations during the application process, please let us know.
- Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
- Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g.,CUDA, ROCm).
- Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
- Perform acceptance testing acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
- Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
- Develop tools and dashboards to visualise performance metrics visualise performance metrics, bottlenecks, and trends.
- Contribute to internal tooling, frameworks, and best practices
- A profound understanding of theoretical foundations of machine learning
- Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
- Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM)
- Good understanding of the GPU stack: CUDA,NCCL, drivers, and relevant libraries
- Familiarity with containerized environments (e.g., Docker, Kubernetes).
- Strong communication and ability to work independently
- Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
- Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf).
- Familiarity with cloud ML platforms like AWS, GCP, Azure ML
- Contributions to open-source ML benchmarking tools
- Competitive compensation
- Career growth and learning opportunities
- Flexibility and work-life balance
- Collaborative and innovative culture
- Opportunity to work on impactful AI projects
- International environment and talented teams
What's it like to work at Nebius: Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI
Equal Opportunity Statement: Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law. Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.
If you need accommodations during the application process, please let us know.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer in United States vacancy
- ...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...SuggestedHome officeFlexible hours
$216.7k - $303.4k
...Senior Machine Learning Systems Engineer Remote - United States Reddit is a community... ...is a high-impact team that owns the infrastructure that powers recommendations, content discovery... ...teams. What You’ll Do: As a Senior ML Infrastructure Engineer, you will...SuggestedFor contractorsWork experience placementRemote work$181.1k - $272.1k
...ML Infrastructure Engineer - Multimodal Training Tools, SIML Work Locations (2) Submit Resume Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people? We are the Intelligence...SuggestedRelocation$139.5k - $258.1k
...Senior ML Infrastructure Engineer - VE Algorithms Are you passionate about groundbreaking modeling technologies to enrich billions of people? We are the Video Engineering (VE) team at Apple developing cutting-edge video and machine learning algorithms to build the...SuggestedRelocation- A leading blockchain analytics firm is seeking a Senior Software Engineer for ML Infrastructure to collaborate with diverse teams in designing and operating GPU-backed infrastructure for AI systems. This role involves optimizing inference systems and implementing model...Suggested
- ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize...
- ...Sygaldry Quantum-Accelerated AI Server Engineer Sygaldry Technologies is building quantum... ...and transform AI. They need compute infrastructure that stays out of their way: GPU access... ..., distributed training) Python-based ML and scientific computing tooling (PyTorch...Casual workLocal areaVisa sponsorship
$153.2k - $234.1k
...team at General Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge Autonomous Driving models... ...s most advanced driverless vehicles. As a Senior ML Infra Engineer, you will build critical...Work at officeLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$320k - $405k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization... ...team, you'll design and implement ML infrastructure that powers Claude safety...Work at officeVisa sponsorshipFlexible hours$250k - $350k
...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-...$153.2k - $234.1k
...vehicle behavior across real-world scenarios. As a Senior ML Infra Engineer, you will work on the core systems that enable rapid... ...experienceworking onlarge-scale distributed systems, applications, or ML infrastructure. ~ Experience designing robust services or frameworks...Local areaRemote workWork from homeRelocation packageFlexible hours- ...AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous... ...development by prioritizing high-impact, ML-centric use cases. About the Role:... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms...Local areaWork from home
$181.1k - $318.4k
...Senior ML Infrastructure Engineer, Proactive The Intelligence Platform team empowers clients across Apple's operating systems with high quality user-centric knowledge and inferences that enable next generation user experiences. We're a systems engineering team focused...WorldwideRelocation- ...learning models. Responsibilities include leading the design of neural networks, managing data strategies, and collaborating with engineers for seamless implementation. May offers a competitive salary range and comprehensive healthcare benefits, aiming to foster a diverse...
$189.3k - $290.7k
...vehicle behavior across real-world scenarios. As a Staff ML Infra Engineer, you will drive the development of core systems that enable... ...-performance, and cost-efficient systems on modern cloud infrastructure-performance ~ End-to-end experience across the ML development...Local areaRemote workWork from homeRelocationRelocation packageFlexible hours- ...and implement scalable data processing pipelines for ML training and validation Build and maintain... ...Technical Requirements: ~7+ years of software engineering experience, with 3+ years in data infrastructure ~ Strong expertise in GCP's data and ML infrastructure...Remote work
- DeepReach Inc. is seeking a Member of Technical Staff for ML Infrastructure to build core systems for large-scale robotics data and model... ...will have a Bachelor's degree in Computer Science and strong engineering experience, particularly in ML infrastructure and data systems...
- ...ML Data Infrastructure Engineer Location: Sunnyvale CA or Remote Duration: 12+ Months Rate: DOE Key skills - GCP ML Infrastructure, BigQuery, Dataflow, Airflow (Cloud composer), Vertext AI, Datapipeline, ML Training Role Overview: We're seeking an experienced...Remote work
- ...Job Title 7+ years of software engineering experience, with 3+ years in ML serving/infrastructure. Strong expertise in container orchestration (Kubernetes) and cloud platforms. Experience with model serving technologies (TensorFlow Serving, Triton, KServe). Deep knowledge...
- ...cutting-edge technology company is seeking a Senior Machine Learning Engineer to build and operate systems that power large-scale machine learning training. This role includes designing ML infrastructure, optimizing performance, and enhancing developer experiences....Flexible hours
- ...About the Role We are seeking a Data Infrastructure Engineer to build and operate the infrastructure that turns drone, aerial, and orbital... ...'ll Do Design, build, and operate scalable data and ML infrastructure on AWS, including workloads running on Kubernetes...Permanent employmentFull time
$181.1k - $318.4k
...Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model Work Locations (2) Submit Resume Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create...Relocation- ...of-the-art machine learning models for AI applications. Own the ML lifecycle from research to deployment. Drive innovation in... ...Computer Science or related field. Experience with MLOps and cloud infrastructure. Knowledge of containerization and orchestration (Docker, Kubernetes...
$181.1k - $318.4k
...On-device ML Infrastructure Engineer, Compiler & Runtime, Graphics, Games & ML Imagine being at the forefront of an evolution where modern AI meets the elegance of Apple silicon. The On-Device Machine Learning team transforms groundbreaking research into practical...Relocation$124k - $250k
...awards HERE. A Day in the Life As a member of our software engineering infra team, you'll solve technical challenges, including upgrading and implementing state-of-the-art software infrastructure. The team builds a high-performance, high availability, globally...Full time- Autodesk, Inc. is looking for a Senior Machine Engineer, ML Systems and Infrastructure to design scalable systems for machine learning. This role focuses on building infrastructure for large-scale data pipelines and production ML workflows. The ideal candidate has experience...Remote job
- ...AI/ML Infrastructure Engineer 1755 Grant St Concord California 94520 (3 days onsite in week) 12+ Months Web Cam Interview $90-95/Hr on W2 Role:- Lead and design the platform and infrastructure architecture for AIML and NLP in modern...Remote work3 days per week
- ...firm based in Palo Alto, California, is seeking a Machine Learning DevOps professional. This role focuses on optimizing and automating ML pipelines to ensure scalable and reliable models in a stimulating work environment. The ideal candidate will have solid experience...Permanent employmentRemote work
- ...& 1.5 million Gradio apps. Our open-source libraries have more than 700,000 stars on Github. About the Role As a Cloud ML DevRel Engineer, your goal is to grow the impact of the Hugging Face ML Cloud team by teaching the community of ML practitioners how to accelerate...Work at officeRemote workFlexible hours
$120k - $190k
...Cloud and ML Infrastructure Engineer $120k–$190k + equity Somerville, MA The last decade has been incredibly exciting for electric mobility. However, the electrification of the transportation industry has scaled while relying on an insufficient set of quality...Work at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Infrastructure Engineer. Be the first to apply!
Related searches
- entry level machine learning engineer United States
- senior ml engineer United States
- data scientist machine learning engineer United States
- machine learning ai engineer United States
- lead machine learning engineer United States
- junior machine learning engineer United States
- staff machine learning engineer United States
- junior machine learning research engineer United States
- computer vision machine learning engineer United States
- graduate machine learning engineer United States


