Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Network Engineer - AI/HPC

$180k

Xai

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands‑on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. About the Role xAI was first in the world to build a 100k GPU cluster on an ethernet network and then did it again in 92 days, floors, walls and all. We need an engineer with deep experience in RoCEv2 that can develop at hyper scale while optimizing performance and availability. xAI is building at a furious pace with the latest hardware to help people understand the universe. To make the next significant leap forward, we need to own our own destiny by understanding our current network performance and availability and then optimize it to our training models and how we execute customer inference queries. You will spend most of your days deep inside NCCL, building metric dashboards and tweaking configurations to ensure no performance is left on the table. You will help design the next iteration of our backend and front-end networks that will allow us to seamlessly build‑out new GPU infrastructure with little to no engineering assistance. There will be a significant amount of travel to Memphis for building more capacity as well as participating in a team on‑call rotation and helping on other scaling and maintenance efforts. This will become easier as we build out the team and engineers contribute to deployment and operations frameworks to remove repetitive tasks. Required Qualifications A minimum of 10 years designing and operating large scale networks with 5 years in the ethernet AI/HPC space. Deep understanding of congestion control on ethernet with Infiniband an added bonus. Deep understanding of AI training and inference workloads and how they operate on the network. As part of this you are able to use and debug NCCL and potentially commit to the library. Expertise in creating a portfolio of metrics for performance and operations to optimize the fleet for training and inference traffic. Experience with Python to automate away repetitive tasks and facilitate your daily job working with and analyzing large sets of data. Annual Salary Range $180,000 - $440,000 Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long‑term disability insurance, life insurance, and various other discounts and perks. #J-18808-Ljbffr

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Network Engineer - AI/HPC in Palo Alto, CA vacancy
  •  ...A leading technology firm in California is seeking network engineers with hands-on experience in InfiniBand and Ethernet for managing high-performance computing (HPC) and artificial intelligence (AI) environments. Candidates should have advanced knowledge of networking... 
    Suggested

    TechDigital Group

    Santa Clara, CA
    3 days ago
  • $152k - $248k

     ...Job Description Position: Staff Network Engineer – Data Center & Core Network Engineering Location...  ...Kubernetes, host networking, TCP tuning, and HPC networking. Experience with congestion...  ...with IXIA or similar. Experience building AI/ML network infrastructure (Infiniband,... 
    Suggested
    Work at office

    LinkedIn

    Mountain View, CA
    1 day ago
  • $152k - $248k

     ...world's largest professional network, built to create economic opportunity...  ...Data Center & Core Network Engineering team is responsible for...  ...networking, host TCP tuning, and HPC networking. ~ Experience in...  ...platforms. Experience in building AI/ML network infrastructure, e.g... 
    Suggested
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Mountain View, CA
    16 hours ago
  • $90k - $110k

     ...Job Description CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform...  ...are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    1 day ago
  •  ...Architect to join their Infrastructure Specialist Team. The ideal candidate will work on large-scale Networking projects and support Research & Development activities for AI and HPC systems. With at least eight years of experience in networking and system design, candidates... 
    Suggested

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • black.ai is looking for a skilled platform engineer in Palo Alto to enhance our AWS infrastructure and support quantum simulations. This role requires strong experience in platform engineering, DevOps practices, and GPU workloads. As a platform engineer, you will improve... 

    black.ai

    Palo Alto, CA
    2 days ago
  • $60k - $80k

     ...The demand for skilled Network Engineers continues to grow as organizations expand their digital footprints, adopt cloud services, and require...  ...networking services. The rise of intent‑based networking and AI‑driven network operations is beginning to augment network management... 
    Local area
    Remote work

    SecondTalent

    Palo Alto, CA
    3 days ago
  •  ...Job Description Network engineers experienced with InfiniBand and Ethernet experience for configuring and managing the high-performance computing (HPC) / artificial intelligence (AI) datacenter environment. Must have: Hands-on experience with InfiniBand and Ethernet, including... 

    TechDigital Group

    Santa Clara, CA
    3 days ago
  •  ...Company Description NetAI Inc. is transforming network operations for enterprises, service...  ...multi-vendor networks, avoiding black-box AI and guesswork. NetAI offers flexible...  ...Description This is a full-time, on-site Network Engineer role based in Ahmedabad India. Ahmedabad... 
    Full time
    Flexible hours

    NetAI Inc.

    Palo Alto, CA
    3 days ago
  •  ...fast finality. Social games and community AI can use our onchain tokens for micro-...  ...Harmony is a community-driven project, a network with hundreds of applications, and a team...  ...Because the invincible summer awaits! For engineers, we value your deep understanding of how... 
    Full time
    Work experience placement
    Summer work
    Work at office

    Harmony

    Palo Alto, CA
    4 days ago
  • $180k

     ...About xAI xAI’s mission is to create AI systems that can accurately understand the universe...  ...small, highly motivated, and focused on engineering excellence. This organization is for...  ...technologies to design, build, and optimize the network fabric that powers large‑scale AI... 
    Temporary work
    Work at office

    Pantera Capital

    Palo Alto, CA
    3 days ago
  • $158.9k - $238.3k

     ...Rubrik and are the first customers of the Engineering teams at Rubrik. Rubrik Corp IT is...  ...an experienced and hands-on Senior Cloud Network Engineer to design, implement, and operate...  ...in Securing and Accelerating the World's AI Transformation Rubrik (RBRK), the Security... 
    Local area

    Rubrik

    Palo Alto, CA
    5 days ago
  •  ...About The Role We are looking for a Network Architect to join our Cluster Engineering Team and help shape the front‑end...  ...current and next generations of our AI clusters. You will partner closely...  ...‑end network fabrics for AI/ML and HPC clusters, optimizing for high resource... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    4 days ago
  •  ...Job role : Network Engineer Duration : 6+ months, can extend Location : Palo Alto, CA onsite Required Qualifications:...  ...Cisco ISE, Aruba ClearPass, or equivalent). Comfort using AI tools (e.g., Claude, Copilot) for network operations, documentation... 

    VDart

    Palo Alto, CA
    3 days ago
  •  ...Role: Network Engineer, Corporate & Datacenter Location- Palo Alto, CA (Onsite) Contract About the Role Client is looking...  ...Cloud Engineering teams, and are encouraged to leverage AI tools to accelerate troubleshooting, documentation, and... 
    Contract work
    Remote work

    VDart

    Palo Alto, CA
    4 days ago
  • $180k

    xAI’s mission is to create AI systems that can accurately understand the universe and...  ...small, highly motivated, and focused on engineering excellence. This organization is for individuals...  ...their teammates. About the Role Our Network Engineering team handles a dynamic,... 
    Permanent employment
    Temporary work
    Work at office
    Remote work

    xAI

    Palo Alto, CA
    3 days ago
  •  ...Advanced Micro Devices is seeking a Director of Cloud, HPC & Sovereign AI Customer Engineering to lead strategic customer deployments and life cycle support for AMD’s compute and AI solutions. The role requires a strong technical leader with extensive experience in cloud... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 hours ago
  • $109.2k - $223.4k

     ...Principal Network Engineer We are the AI Infrastructure - Network Operations team at OCI. We support and operate the RDMA/RoCE network fabrics for OCI's largest AI and HPC customers. These fabrics are the foundation underneath OCI's AI, GPU and HPC services, and support... 
    Temporary work
    Flexible hours

    Oracle

    Santa Clara, CA
    3 days ago
  • $195k - $220k

     ...and implementing secure, high performance network topologies for Internet-facing SaaS...  ...performer who is effective in mentoring other engineers and in communicating with other technical...  ...a data science technology company whose AI platform uncovers signals of risk across... 
    Full time
    Work at office
    Work from home
    Flexible hours

    Quantifind

    Palo Alto, CA
    3 days ago
  • $225k - $275k

     ...Senior Staff Network Deployment Engineer Crusoe Cloud is seeking a Senior Staff Network Deployment Engineer to serve as the technical...  ...rapidly scale our footprint of high-performance compute (HPC) and GPU-based AI infrastructure, you will define the deployment strategy... 
    Temporary work
    Remote work

    Crusoe

    Sunnyvale, CA
    16 hours ago
  • $202.5k - $274k

     .... Review and implement changes on the network, following Change management process (ITIL...  .../documentation to help level1/level2 engineers to perform their job efficiently Must...  ...architecture and Cisco DNA Familiarity with AI/ML concepts for network operations,... 
    Local area
    Shift work

    Intuit

    Mountain View, CA
    4 hours ago
  • $193k - $234k

     ...Staff Network Deployment Engineer Crusoe Cloud is seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the...  ...rapidly expand our footprint of high-performance compute (HPC) and GPU-based AI infrastructure, you will be the primary driver behind... 
    Temporary work
    Remote work

    Crusoe

    Sunnyvale, CA
    1 day ago
  •  ...ATX Venture Partners is seeking a Staff Network Engineer to improve network operations and introduce best practices. The ideal candidate will have experience with Cisco architectures and AI tools for automation. This role involves troubleshooting and configuring network... 

    ATX Venture Partners

    Mountain View, CA
    3 days ago
  •  ...NVIDIA Gruppe is seeking a Senior Firmware Engineer to join their Firmware team. In this role, you will develop advanced networking features for AI, cloud, and HPC. You will work closely with various design teams and be involved in everything from implementation to verification... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...leading technology company in Palo Alto is seeking an experienced AI/ML Engineer with expertise in GenAI development patterns and Azure services. The role involves building and deploying tools for network troubleshooting and integrating various systems to enhance... 

    Quantum Technologies. LLC

    Palo Alto, CA
    4 days ago
  •  ...building hardware; electronics systems and semiconductors where AI can design and create beyond human cognitive limits. About...  ...About this Role You'll design and deploy high-performance network architectures connecting Voltai's compute infrastructure and... 

    Voltai, Inc

    Palo Alto, CA
    1 day ago
  •  ...industry player is seeking a Senior Director of Solutions Engineering to lead innovative teams in AI and high-performance computing solutions. This hybrid...  ...The ideal candidate will have extensive experience in HPC and AI systems design, with a proven track record in managing... 

    Skilltorch

    Santa Clara, CA
    3 days ago
  • $50 - $70 per hour

     ...custom software and mobile app development, cloud computing, AI/ML integration, cybersecurity, blockchain, e-commerce platforms...  ...0/Hr Job Description: Client is hiring a Senior Cloud Networking Engineer to design, implement, and optimize networking solutions... 

    ApTask

    Mountain View, CA
    1 day ago
  • $82 - $85 per hour

     ...Network DevOps Contractor Payrate: $82.00 - $85.00/hr. Summary: We are looking for...  ...environment stays agile and secure. Network Engineering team is responsible for architecting,...  .... Consent to Communication and Use of AI Technology: By submitting your application... 
    Hourly pay
    Full time
    For contractors
    Local area
    Flexible hours

    Aditi Consulting

    Palo Alto, CA
    1 day ago
  • $124k - $186k

     ...company provides end-to-end advertising and AI solutions for businesses to reach,...  ...our awards HERE. We're looking for a Network Engineerwith a strong focus on cloud networking...  ...work closely with DevOps, Security, and Engineering teams to ensure seamless connectivity and... 

    AppLovin

    Palo Alto, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Network Engineer - AI/HPC. Be the first to apply!