Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Infrastructure Engineer

42dot

AI Infrastructure Engineer

At 42dot, our AI Infrastructure Engineer manages the high-performance AI infrastructure orchestrating thousands of GPUs across multiple data centers. You will contribute to the scaling, monitoring, and operational optimization required to maintain a robust and world-class computing environment.

Responsibilities
  • Operate and maintain a large-scale GPU cluster consisting of thousands of GPUs across multiple data centers using Kubernetes and Slurm.
  • Monitor and diagnose failures across the GPU hardware and software stacks to ensure high availability and rapid recovery.
  • Develop automation tools and scripts using Python or Shell to streamline repetitive infrastructure management tasks and improve operational efficiency.
  • Manage GPU resource quotas and provide technical support to ML researchers to ensure optimal utilization of computing resources.
  • Participate in the architectural design and performance tuning of distributed training environments for large-scale autonomous driving models.
Qualifications
  • Strong proficiency in Linux operating systems, including a solid understanding of kernel operations, process management, and system security.
  • Practical experience with containerization technologies (Docker) and orchestration (Kubernetes), including building, managing, and troubleshooting containerized environments.
  • Solid understanding of networking fundamentals, including TCP/IP and with the ability to perform basic network troubleshooting.
  • Ability to write clean and maintainable scripts in Python or Shell for automation and system administration.
  • Logical approach to problem-solving with the persistence to identify and resolve root causes in complex, large-scale systems.
  • Strong communication skills to effectively collaborate with cross-functional teams and external partners.
Interview Process
  • Resume Screening - Coding Test - Virtual Interview (approximately 1 hour) - Onsite or Virtual Interview (approximately 3 hours) - Final Offer
  • Please note that the interview process may vary depending on the position and is subject to change based on scheduling and other circumstances.
  • Interview schedules and results will be communicated individually via the email address provided in your application.
Additional Information
  • Please upload all required documents in PDF format.
  • Veterans and applicants eligible for employment protection will receive preferential consideration in accordance with applicable laws and regulations.
  • In compliance with the Act on Employment Promotion and Vocational Rehabilitation for Persons with Disabilities, registered individuals with disabilities will receive preferential consideration.
  • 42dot does not accept unsolicited resumes from search firms. We will not pay any fees for resumes submitted without prior agreement.
  • A 3-month probationary period may apply.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the AI Infrastructure Engineer in United States vacancy
  • $157.49k - $174.71k

     ...AI Infrastructure Engineer Intelligent Data Management: Use AI tools to analyze, map, and automate the data migration from the existing workflows and system Design modern, flexible data architectures, not locked to legacy patterns Leverage AI to detect... 
    Suggested
    Remote work
    Flexible hours

    General Dynamics

    United States
    1 day ago
  • $100k - $150k

     ...AI Infrastructure Engineer Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable... 
    Suggested
    Full time
    H1b
    Remote work
    Visa sponsorship

    Bright Vision Technologies

    United States
    2 days ago
  • $200k - $300k

     ...AI Training Infrastructure Engineer – Humanoid Whole Body Control San Jose, CA Figure is an AI Robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are... 
    Suggested
    Full time
    Work at office

    Figure

    San Jose, CA
    23 hours ago
  • $1,000 per month

     ...Join Elliptic's Ai Platform Team This is an opportunity to join Elliptic's AI Platform...  ...to help build the foundational infrastructure that will power how Elliptic's products...  ...and act. You will be one of the first engineers working on a centralised AI platform whose... 
    Suggested
    Remote work
    Home office

    Elliptic

    United States
    23 hours ago
  • $170k - $210k

     ...AI Infrastructure Engineer Utilidata is a fast-growing AI company enabling AI data centers to dynamically orchestrate power and unlock more compute capacity from existing energy infrastructure. For over a decade, we have applied AI to the electric grid — bringing real... 
    Suggested
    Local area
    Remote work
    Flexible hours

    Utilidata

    United States
    2 days ago
  • Mercor is seeking talented Performance Engineers in Beaumont, Texas, to join their advanced AI Lab's GenAI team. This position requires deep expertise in low-level systems optimization, particularly in C++, Python, and Rust, with a focus on enhancing AI training and inference... 

    Mercor Inc

    Beaumont, TX
    3 days ago
  •  ...we partner with global logistics company leveraging AI, Machine Learning, and Data Engineering to optimize warehouse operations, predictive maintenance...  .... Role: Build and maintain scalable AI infrastructure, enabling teams to run ML experiments, deploy machine... 
    Long term contract
    Remote work

    Sphere Partners LLC

    United States
    20 hours ago
  • $60 per hour

     ...A leading AI development company is looking for proficient programmers to join their remote team. You will work on challenging coding tasks to train AI systems, with responsibilities including designing solutions, writing quality code, and evaluating AI-generated outputs... 
    Remote work

    DataAnnotation

    Wausau, WI
    2 days ago
  •  ...next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded...  ...We are seeking a DevOps / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate... 

    Advanced Micro Devices , Inc.

    San Jose, CA
    4 days ago
  • $140k - $252k

     ...screenshot-based VLM agents, with the larger goal of integrating with Tesla's broader AI ecosystem. We're seeking an ML/RL Infra Engineer to build scalable, reliable infrastructure that powers these agents and enables seamless, high-volume rollouts for model evaluation... 
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    3 days ago
  •  ...Founders Fund–backed NVIDIA cloud partner building the infrastructure platform that powers AI at scale. We connect AI Factories—high-performance GPU...  ...onboarding. Your job is to change that. As an AI Infrastructure Engineer, you'll work directly with AI platform customers to get... 
    Remote work

    Slope

    New York, NY
    7 days ago
  • $60 per hour

    A leading AI development firm is seeking proficient programmers to join their team. This remote role allows for flexible scheduling, letting you choose your projects and work when it suits you. Responsibilities include solving coding challenges for AI training and providing... 
    Remote work
    Flexible hours

    DataAnnotation

    Wyoming, OH
    2 days ago
  • $60 per hour

     ...A leading AI development company seeks proficient programmers to engage in innovative tasks involving state-of-the-art AI models. Responsibilities include designing coding problems, writing high-quality code, and evaluating AI-generated outputs. This fully remote role... 
    Remote work
    Flexible hours

    DataAnnotation

    Lincoln, NE
    2 days ago
  •  ...AI Infrastructure Engineer IV At ASI, we are revolutionizing industries with state-of-the-art autonomous robotics solutions. Within the fields of agriculture, construction, landscaping, and logistics, we deliver technologies that enhance safety, productivity, and efficiency... 
    Local area

    Autonomous Solutions

    Lehi, UT
    4 days ago
  •  ...AI Infrastructure Specialist As vCluster's AI Infrastructure Specialist, you will work directly with customers at the earliest and most...  ...role exists to make that happen. As an AI Infrastructure Engineer, your role will include: Lead Technical Deployments: Drive... 
    Remote work
    Flexible hours

    vCluster

    United States
    3 days ago
  • $163.5k - $212.4k

     ...flagship sedan, and the ET5, a mid-size smart electric sedan. About the Position We are looking for a senior AI Inference Infrastructure Software Engineer with strong hands-on experience building, optimizing, and deploying high-performance, scalable inference systems... 
    Full time
    Temporary work
    Immediate start
    Flexible hours

    NIO

    San Jose, CA
    3 days ago
  • $60 per hour

    A technology company is looking for proficient programmers to contribute to the development of AI systems. This remote position allows for a flexible schedule and offers competitive pay up to $60 per hour. Responsibilities include solving coding problems, writing code,... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Rockwell, NC
    2 days ago
  •  ...AI Infrastructure Engineer At BNY, our culture allows us to run our company better and enables employees' growth and success. As a leading global financial services company at the heart of the global financial system, we influence nearly 20% of the world's investible... 
    Work experience placement
    Worldwide
    Flexible hours

    BNY

    Lake Mary, FL
    1 day ago
  • $60 per hour

    A growing AI development company is seeking proficient programmers to contribute to cutting-edge AI systems. This fully remote role allows flexibility in choosing projects and working hours, with competitive pay up to $60 per hour based on performance. Responsibilities... 
    Hourly pay
    Remote work

    DataAnnotation

    Boston, MA
    4 days ago
  • $144k - $198k

     ...ADI ensures today's innovators stay Ahead of What's Possible™. Learn more at and on LinkedIn and Twitter (X). Senior AI Infrastructure Engineer, Developer Experience Analog Devices, Inc. (NASDAQ: ADI) is a global semiconductor leader that bridges the physical... 
    Permanent employment
    Work at office
    Shift work
    Day shift

    Analog Devices

    Wilmington, MA
    4 days ago
  • $60 per hour

    A tech-driven company seeks proficient programmers to develop and advance AI systems, offering remote work and a flexible schedule. Responsibilities include designing coding challenges, evaluating AI-generated code, and writing clear code snippets. Candidates should have... 
    Remote work
    Flexible hours

    DataAnnotation

    Florida, NY
    2 days ago
  • $60 per hour

     ...A technology company is looking for proficient programmers to assist in developing cutting-edge AI systems. This fully remote role allows you to work from anywhere with a flexible schedule. You'll design and solve coding challenges, evaluate AI code, and contribute to... 
    Hourly pay
    Remote work
    Flexible hours

    DataAnnotation

    Sioux Falls, SD
    3 days ago
  •  ...About Obvio AI Each year, more than 40,000 people in the U.S. leave home and never...  ...and lifecycle layer. Stand up the infrastructure that loads versioned CV models and handles...  ...back without pipeline downtime. Set the engineering standard. This is an early hire. You'll... 
    Local area

    Obvio

    San Carlos, CA
    4 days ago
  • $151.8k

     ...What you can expect We are seeking an experienced AI Infrastructure Engineer to join our AI Incubation team. You will be focused on building and optimizing large-scale training infrastructure for Large Language Models (LLMs). The ideal candidate will combine engineering... 
    Work at office
    Remote work

    Zoom Corporation

    Seattle, WA
    1 day ago
  •  ...AI Engineer The AI Engineer will design, develop, and deploy scalable machine learning and AI-driven analytics capabilities. Responsibilities include multi-source data fusion, entity resolution and behavioral modeling, predictive and prescriptive intelligence analytics... 
    Remote work

    Navstar

    United States
    1 day ago
  •  ...HTEC Group is hiring for a software development position focused on next-generation AI compute platforms. You will design and implement software components across various stacks while collaborating with compiler developers and ML scientists. Candidates should have at... 

    HTEC Group Inc

    New York, NY
    2 days ago
  •  ...transform critical institutions with applied AI. We care that industries that power the...  ...bring: Forward-deployed expertise in engineering, product, and research Mosaic, our in...  ...About the role We're hiring an AI Infrastructure Engineer to own the infrastructure,... 
    Contract work

    Percepta

    New York, NY
    2 days ago
  • $124k - $420k

     ...What to Expect As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with exporting and deploying neural networks to the bot, and evaluate experimental... 
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    2 days ago
  • $100k - $150k

     ...technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we’re looking for a skilled AI Infrastructure Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is... 
    Full time
    H1b
    Local area
    Immediate start
    Remote work
    Visa sponsorship
    Work visa

    Bright Vision Technologies

    Rockville, MD
    8 days ago
  •  ...A leading U.S. technology firm is hiring an AI Infrastructure Engineer for a full-time remote position with H-1B visa sponsorship available. This role involves designing and optimizing AI platforms, deploying Kubernetes and Docker container environments, and enhancing... 
    Full time
    H1b
    Remote work
    Visa sponsorship

    NewsNowGh

    New York, NY
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Infrastructure Engineer. Be the first to apply!