Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

GPU Infra Engineer

United IT

GPU Infra Engineer (GPU Bare Metal)

Location: Remote

Required Skills
  • Proven ability to orchestrate bare metal linux systems at scale including building automation for firmware updates, bios config management, configuring PXE environments.
  • Deep Linux systems experience including troubleshooting network interfaces, developing and applying configuration management, security best practices and monitoring and alerting.
  • Strong automation mindset.
  • Expert knowledge in 1 or more orchestration tools such as MaaS, Salt, Chef, Ansible or Puppet.
  • Strong communication skills.
  • Your job will involve writing detailed documentation for others to pick up or leading knowledge sharing sessions with operations teams.
  • Bonus skills include:
  • Hands-on experience in High Performance Computing (HPC) clustered environments from Nvidia or AMD.
  • Experience in performing automated wide scale testing on NCCL or other frameworks.
  • Network engineering experience with VyOS platforms.
What You'll Be Working On
  • Provisioning and automating GPU Bare Metal Deployments
  • DevOps - Assist customer support and Cloud Ops teams with GPU specific knowledge/debugging during customer escalations.
  • Performance testing, analysis and monitoring
  • Firmware, BIOS, Kernel Upgrades and Testing
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the GPU Infra Engineer in United States vacancy
  • $180k

     ...Compute Infrastructure team, focused on building one of the world’s largest AI supercomputers. You will design and optimize massive GPU clusters, ensuring fast and reliable AI training. Ideal candidates will possess deep programming skills, GPU kernel optimization experience... 
    Suggested

    xAI

    Palo Alto, CA
    2 days ago
  • $180k - $250k

    A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years of... 
    Suggested

    Fal

    San Francisco, CA
    4 days ago
  • A leading technology firm in Sunnyvale, CA, seeks a Product Quality Engineer for GPU platforms. This role involves leading quality initiatives, ensuring the reliability of hardware systems, and collaborating with manufacturing partners. Candidates should have a Bachelor... 
    Suggested

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Suggested

    Hyperbolic Labs

    San Francisco, CA
    4 days ago
  •  ...technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and...  .... You will design distributed training systems and optimize GPU utilization while collaborating with cross-functional teams to adapt... 
    Suggested

    Baseten

    San Francisco, CA
    4 days ago
  • A cutting-edge tech company in San Francisco seeks infrastructure engineers to enhance the tooling and systems that power its AI applications. Responsibilities include building GPU orchestration, scaling cloud batchjob systems, and designing efficient scheduling software... 
    Visa sponsorship

    Exa

    San Francisco, CA
    4 days ago
  •  ...Senior GPU Systems / AI Infrastructure Engineer (NYC) Location: New York City (Hybrid / On-site preferred) Comp: Competitive + equity (Series A-C / high-growth AI infra) About the Role We’re hiring a senior-level engineer to build and optimise next-generation... 
    Permanent employment
    New York, NY
    a month ago
  • $272k - $431.25k

    NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California, to enhance the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with various teams, monitoring infrastructure performance, and implementing improvements... 

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  •  ...designing end-to-end inference pipelines and enhancing performance under real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency in programming languages like Python and C++. This position offers... 

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  • A leading technology company is seeking a Senior Design Verification Engineer in Santa Clara, CA. This role involves improving verification flows, collaborating with cross-functional teams, and automating processes to enhance productivity. The ideal candidate will possess... 

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you will have a pivotal role in enhancing efficiency for our researchers... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • HerculesAI in Campbell, California is looking for a skilled professional to manage GPU compute provisioning and enhance security architectures across cloud platforms. You will design Infrastructure as Code foundations and implement Zero Trust principles, ensuring the security... 

    HerculesAI

    Campbell, CA
    19 hours ago
  •  ...Role : Infra Automation Engineer Location: Phoenix AZ Duration: Long term Automation Team Engineer II: Expertise knowledge of at least 5 of the 10 skills below: 1. Python - Mandatory 2. API Experience (FAST, Flask, Swagger) - Mandatory... 
    Work at office
    3 days per week

    Inficare

    Phoenix, AZ
    5 days ago
  •  ...Infra Support / System Engineer Jersey City, NJ (Hybrid 2 Days Onsite) 12+ Months Web Cam Interview $50/Hr on W2 Responsibilities Overseeing the development and installation of new networking and computing infrastructures. Selecting the best possible solutions... 
    Work at office

    Syntricate Technologies

    Jersey City, NJ
    1 day ago
  •  ...Our client is looking Data Center GPU Commissioning Engineer in San Jose CA below is the detailed requirements. Job Title : Data Center GPU Commissioning Engineer Location : San Jose CA Job Description The Data Center GPU Commissioning Engineer... 

    Lorven Technologies

    San Jose, CA
    2 days ago
  •  ...System Engineer (Active Directory+ Messaging Infra) We are from US IT Solutions, an ISO Certified, E-Verify, WMBE Certified organization established in 2005 in CA. Our company is serving various State, Local and County Departments for over 10 years. USITSOL has been... 
    Work at office
    Local area

    Tech Marketing

    Richmond, VA
    1 day ago
  •  ...Seeking a full-time GPU Systems Engineer (CUDA) to work remotely, who will design and optimize high-performance CUDA kernels for compute-intensive workloads, profile and enhance GPU code, and collaborate with cross-functional teams to drive performance improvements in... 
    Full time
    Remote work

    Virtual Vocations Inc

    United States
    1 day ago
  •  ...On-Premise LLM Inference & GPU Systems Engineer NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking... 

    NTT DATA

    Charlotte, NC
    3 days ago
  •  ...technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we're looking for a skilled GPU Systems Engineer (CUDA) to join our dynamic team and contribute to our mission of transforming business processes through technology. This... 
    Full time
    H1b
    Local area
    Immediate start
    Remote work
    Visa sponsorship
    Work visa

    Bright Vision Technologies

    United States
    19 hours ago
  • $151.9k - $172.16k

     ...Requisition ID: 2503 Standard Title: GPU Systems Engineer Required Security Clearance: Top Secret/SCI Location: Bethesda, MD Work Type: On-Site Shift: First Referral Eligibility: Eligible U.S. Citizenship Required? Yes Pay Range... 
    Hourly pay
    Contract work
    Temporary work
    Immediate start
    Flexible hours
    Shift work

    Base2 Solutions

    Bethesda, MD
    19 hours ago
  •  ...Infra Quality Engineer Location: Dallas, Texas Rate: DOE Duration: 4+ Months U.S. Citizens and those authorized to work in the U.S. are encouraged to apply. We are unable to sponsor at this time. Description: PURPOSE The Infrastructure Quality Specialist... 

    Georgia IT Inc

    Dallas, TX
    6 days ago
  • $160k - $320k

     ...by those who show initiative and deliver excellence.  We seek engineers/researchers with strong intrinsic drive, a true passion for advancing...  ...leverage your knowledge of high-performance systems to optimize GPU performance at the bleeding edge of AI. Full-Time On-site... 
    Full time
    Work at office

    Vast

    San Francisco, CA
    4 days ago
  • $190.58k - $200k

     ...GPU Cluster Lead Engineer Stanford Research Computing seeks an exceptional GPU Cluster Lead Engineer to oversee technical operations, optimization, and strategic development of Marlowe, Stanford's NVIDIA SuperPOD. This role combines deep technical expertise in GPU computing... 
    Hourly pay
    Flexible hours
    Weekend work
    Afternoon shift

    Stanford

    Stanford, CA
    4 days ago
  • $150k - $300k

     ...Hudson River Trading (HRT) is looking for GPU Systems Engineers to help scale and evolve our exceptionally sophisticated HPC/AI research environment. Joining our Research and Development team, you will collaborate with experts responsible for the compute, storage... 
    Work at office
    Local area
    Immediate start

    Hudson River Trading

    New York, NY
    2 days ago
  • $160k - $230k

     ...Systems Research Engineer, GPU Programming San Francisco About the Role As a Systems Research Engineer specialized in GPU Programming, you will play a crucial role in developing and optimizing GPU-accelerated kernels and algorithms for ML/AI applications. Working... 
    Full time
    Remote work

    Together AI

    San Francisco, CA
    3 days ago
  •  ...inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (US-NC), United States (US). Job Description: ~ Role Overview We are... 
    Remote work

    The Nippon Telegraph and Telephone Corporation (NTT)

    Charlotte, NC
    1 day ago
  • $160k - $320k

     ...those who show initiative and deliver excellence. We seek engineers/researchers with strong intrinsic drive, a true passion for...  ...Francisco or Westwood, Los Angeles. About the Role As a systems/GPU engineer, you will play a crucial role in developing new kernels... 
    Full time
    Work at office

    Vast

    Los Angeles, CA
    4 days ago
  • $80k - $120k

    Hydra Host, Inc. is seeking a Datacenter MEP Field Engineer in Miami, Florida. This role involves owning the liquid cooling and HVAC infrastructure in GPU compute environments, evaluating and commissioning systems, and coordinating with multiple stakeholders. Ideal candidates... 
    Full time

    Hydra Host, Inc.

    Miami, FL
    19 hours ago
  •  ...the next generation of intelligence. The Role We're hiring an Infra Engineer to own General Intuition's API. Our research team builds...  ...clients and stream actions back, how requests route to the right GPU, how sessions spin up and tear down, how k8s clusters get stood... 
    Work at office

    Medal

    New York, NY
    19 hours ago
  • $179k - $218k

     ...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer... 
    Temporary work

    Crusoe

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to GPU Infra Engineer. Be the first to apply!