Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Infrastructure Engineer

Nebius

About Nebius:

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

The role

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

Your responsibilities will include:
  • Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
  • Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g.,CUDA, ROCm).
  • Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
  • Perform acceptance testing acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
  • Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
  • Develop tools and dashboards to visualise performance metrics visualise performance metrics, bottlenecks, and trends.
  • Contribute to internal tooling, frameworks, and best practices
We expect you to have:
  • A profound understanding of theoretical foundations of machine learning
  • Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.)
  • Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM)
  • Good understanding of the GPU stack: CUDA,NCCL, drivers, and relevant libraries
  • Familiarity with containerized environments (e.g., Docker, Kubernetes).
  • Strong communication and ability to work independently
Ways to stand out from the crowd:
  • Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
  • Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf).
  • Familiarity with cloud ML platforms like AWS, GCP, Azure ML
  • Contributions to open-source ML benchmarking tools
Benefits & Perks:
  • Competitive compensation
  • Career growth and learning opportunities
  • Flexibility and work-life balance
  • Collaborative and innovative culture
  • Opportunity to work on impactful AI projects
  • International environment and talented teams

What's it like to work at Nebius:

Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI


Equal Opportunity Statement:

Nebius is an equal opportunity employer. We are committed to fostering an inclusive and diverse workplace and to providing equal employment opportunities in all aspects of employment. We do not discriminate on the basis of race, color, religion, sex (including pregnancy), national origin, ancestry, age, disability, genetic information, marital status, veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable law.

Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.


If you need accommodations during the application process, please let us know.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer in United States vacancy
  •  ...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems... 
    Suggested
    Home office
    Flexible hours

    Deepgram

    New York, NY
    4 days ago
  • $216.7k - $303.4k

     ...Senior Machine Learning Systems Engineer Remote - United States Reddit is a community...  ...is a high-impact team that owns the infrastructure that powers recommendations, content discovery...  ...teams. What You’ll Do: As a Senior ML Infrastructure Engineer, you will... 
    Suggested
    For contractors
    Work experience placement
    Remote work

    Reddit

    United States
    3 days ago
  • $181.1k - $272.1k

     ...ML Infrastructure Engineer - Multimodal Training Tools, SIML Work Locations (2) Submit Resume Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people? We are the Intelligence... 
    Suggested
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $139.5k - $258.1k

     ...Senior ML Infrastructure Engineer - VE Algorithms Are you passionate about groundbreaking modeling technologies to enrich billions of people? We are the Video Engineering (VE) team at Apple developing cutting-edge video and machine learning algorithms to build the... 
    Suggested
    Relocation

    Apple

    San Diego, CA
    3 days ago
  • A leading blockchain analytics firm is seeking a Senior Software Engineer for ML Infrastructure to collaborate with diverse teams in designing and operating GPU-backed infrastructure for AI systems. This role involves optimizing inference systems and implementing model... 
    Suggested

    TRM Labs

    California, MO
    16 days ago
  •  ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems. Our model SGS-1 is state-of-the-art for parametric geometry, and we are currently building the next generation of models to revolutionize... 

    Spectral Labs

    San Francisco, CA
    3 days ago
  •  ...Sygaldry Quantum-Accelerated AI Server Engineer Sygaldry Technologies is building quantum...  ...and transform AI. They need compute infrastructure that stays out of their way: GPU access...  ..., distributed training) Python-based ML and scientific computing tooling (PyTorch... 
    Casual work
    Local area
    Visa sponsorship

    Sygaldry

    San Francisco, CA
    3 days ago
  • $153.2k - $234.1k

     ...team at General Motors, where we build the critical infrastructure that powers every machine learning engineer working on our cutting-edge Autonomous Driving models...  ...s most advanced driverless vehicles. As a Senior ML Infra Engineer, you will build critical... 
    Work at office
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    5 days ago
  • $320k - $405k

     ...growing group of committed researchers, engineers, policy experts, and business leaders working...  ...We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization...  ...team, you'll design and implement ML infrastructure that powers Claude safety... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  • $250k - $350k

     ...Description Most AI roles build on top of models. This one builds what makes them actually work. We're hiring ML Infrastructure Engineers to tackle a hard, real-world problem, understanding what's happening on live job sites using wearable devices, large-... 

    techire ai

    San Francisco, CA
    2 days ago
  • $153.2k - $234.1k

     ...vehicle behavior across real-world scenarios. As a Senior ML Infra Engineer, you will work on the core systems that enable rapid...  ...experienceworking onlarge-scale distributed systems, applications, or ML infrastructure. ~ Experience designing robust services or frameworks... 
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  •  ...AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous...  ...development by prioritizing high-impact, ML-centric use cases. About the Role:...  ...are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms... 
    Local area
    Work from home

    General Motors

    Sunnyvale, CA
    6 days ago
  • $181.1k - $318.4k

     ...Senior ML Infrastructure Engineer, Proactive The Intelligence Platform team empowers clients across Apple's operating systems with high quality user-centric knowledge and inferences that enable next generation user experiences. We're a systems engineering team focused... 
    Worldwide
    Relocation

    Apple

    Cupertino, CA
    5 days ago
  •  ...learning models. Responsibilities include leading the design of neural networks, managing data strategies, and collaborating with engineers for seamless implementation. May offers a competitive salary range and comprehensive healthcare benefits, aiming to foster a diverse... 

    May Mobility

    New York, NY
    4 days ago
  • $189.3k - $290.7k

     ...vehicle behavior across real-world scenarios. As a Staff ML Infra Engineer, you will drive the development of core systems that enable...  ...-performance, and cost-efficient systems on modern cloud infrastructure-performance ~ End-to-end experience across the ML development... 
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Olympia, WA
    1 day ago
  •  ...and implement scalable data processing pipelines for ML training and validation Build and maintain...  ...Technical Requirements: ~7+ years of software engineering experience, with 3+ years in data infrastructure ~ Strong expertise in GCP's data and ML infrastructure... 
    Remote work

    Tranzeal

    United States
    4 days ago
  • DeepReach Inc. is seeking a Member of Technical Staff for ML Infrastructure to build core systems for large-scale robotics data and model...  ...will have a Bachelor's degree in Computer Science and strong engineering experience, particularly in ML infrastructure and data systems... 

    DeepReach Inc.

    San Jose, CA
    4 days ago
  •  ...ML Data Infrastructure Engineer Location: Sunnyvale CA or Remote Duration: 12+ Months Rate: DOE Key skills - GCP ML Infrastructure, BigQuery, Dataflow, Airflow (Cloud composer), Vertext AI, Datapipeline, ML Training Role Overview: We're seeking an experienced... 
    Remote work

    Redolent

    United States
    3 days ago
  •  ...Job Title 7+ years of software engineering experience, with 3+ years in ML serving/infrastructure. Strong expertise in container orchestration (Kubernetes) and cloud platforms. Experience with model serving technologies (TensorFlow Serving, Triton, KServe). Deep knowledge... 

    Tranzeal

    Sunnyvale, CA
    5 days ago
  •  ...cutting-edge technology company is seeking a Senior Machine Learning Engineer to build and operate systems that power large-scale machine learning training. This role includes designing ML infrastructure, optimizing performance, and enhancing developer experiences.... 
    Flexible hours

    TensorWave

    Las Vegas, NV
    3 days ago
  •  ...About the Role We are seeking a Data Infrastructure Engineer to build and operate the infrastructure that turns drone, aerial, and orbital...  ...'ll Do Design, build, and operate scalable data and ML infrastructure on AWS, including workloads running on Kubernetes... 
    Permanent employment
    Full time

    Matter Intelligence

    San Francisco, CA
    2 days ago
  • $181.1k - $318.4k

     ...Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model Work Locations (2) Submit Resume Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create... 
    Relocation

    Apple

    Seattle, WA
    5 days ago
  •  ...of-the-art machine learning models for AI applications. Own the ML lifecycle from research to deployment. Drive innovation in...  ...Computer Science or related field. Experience with MLOps and cloud infrastructure. Knowledge of containerization and orchestration (Docker, Kubernetes... 

    Hubnest Inc

    California, MO
    4 days ago
  • $181.1k - $318.4k

     ...On-device ML Infrastructure Engineer, Compiler & Runtime, Graphics, Games & ML Imagine being at the forefront of an evolution where modern AI meets the elegance of Apple silicon. The On-Device Machine Learning team transforms groundbreaking research into practical... 
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $124k - $250k

     ...awards HERE. A Day in the Life As a member of our software engineering infra team, you'll solve technical challenges, including upgrading and implementing state-of-the-art software infrastructure. The team builds a high-performance, high availability, globally... 
    Full time

    Applovin

    Palo Alto, CA
    23 hours ago
  • Autodesk, Inc. is looking for a Senior Machine Engineer, ML Systems and Infrastructure to design scalable systems for machine learning. This role focuses on building infrastructure for large-scale data pipelines and production ML workflows. The ideal candidate has experience... 
    Remote job

    Autodesk, Inc.

    Boston, MA
    1 day ago
  •  ...AI/ML Infrastructure Engineer 1755 Grant St Concord California 94520 (3 days onsite in week) 12+ Months Web Cam Interview $90-95/Hr on W2 Role:- Lead and design the platform and infrastructure architecture for AIML and NLP in modern... 
    Remote work
    3 days per week

    Syntricate Technologies

    United States
    5 days ago
  •  ...firm based in Palo Alto, California, is seeking a Machine Learning DevOps professional. This role focuses on optimizing and automating ML pipelines to ensure scalable and reliable models in a stimulating work environment. The ideal candidate will have solid experience... 
    Permanent employment
    Remote work

    Pathway Genomics

    Palo Alto, CA
    1 day ago
  •  ...& 1.5 million Gradio apps. Our open-source libraries have more than 700,000 stars on Github. About the Role As a Cloud ML DevRel Engineer, your goal is to grow the impact of the Hugging Face ML Cloud team by teaching the community of ML practitioners how to accelerate... 
    Work at office
    Remote work
    Flexible hours

    Hugging Face

    United States
    4 days ago
  • $120k - $190k

     ...Cloud and ML Infrastructure Engineer $120k–$190k + equity Somerville, MA The last decade has been incredibly exciting for electric mobility. However, the electrification of the transportation industry has scaled while relying on an insufficient set of quality... 
    Work at office

    Glimpse Corp LLC

    United States
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Infrastructure Engineer. Be the first to apply!