Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Network Engineer - ML Infrastructure (High-Speed Interconnects)

$180k

xAI

Job Description

Job Description

About xAI

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

ABOUT THE ROLE:

xAI is building at a furious pace with the latest compute and switching hardware to help people understand the universe. We are looking for exceptional ML Infrastructure Engineers with deep expertise in high-speed interconnect technologies to design, build, and optimize the network fabric that powers large-scale AI training and inference clusters. This strategic role will drive innovation in high-bandwidth, low-latency, power-efficient interconnects critical for AI/ML clusters based on advanced computing platforms.

You will have the opportunity to work on all modalities of interconnects connecting GPUs and switches both inside and between data centers, including our primary front and backend networks that train Grok and that customers use for inference. Engineers will own all aspects from design and development to build and operations. You will be expected to define and improve team processes and to contribute to scaling and maintenance efforts.

You will focus on the physical layer and system-level integration of copper (ACC, AEC, CPC) and optical (FRO, LRO/TRO, LPO, AOC, CPO) interconnects that directly determine the performance, power efficiency, scale, and cost of next-generation AI/ML clusters. This is a highly technical, hands-on role bridging ML cluster requirements with cutting-edge interconnect hardware — ideal for engineers who love both large-scale AI systems and the physics/engineering of 200G+ SerDes, PAM4, photonics, signal integrity and diagnostics.

RESPONSIBILITIES:
  • Design, validate, and productize high-speed copper and optical connectivity solutions for AI clusters (100k+ GPU scale).
  • Own vendor due diligence and onboarding for new 1.6T products including AEC and pluggable optical transceivers (DR4/8, FR4) including rigorous bring-up & characterization.
  • Investigate the opportunity for LPO and LRO in our network.
  • Evaluate early co-packaged and near-packaged engines for switches and GPUs.
  • Pathfinding for new interconnect modalities including VCSEL, microLED, THz radio-based solutions to improve network economics and reliability.
  • Work closely with vendors (transceiver, cable, SerDes, DSP, silicon photonics foundries) to influence roadmaps and ensure timely delivery of next-gen solutions.
  • Collaborate with ML training teams to translate workload communication patterns into concrete interconnect topology and optical reconfigurability requirements.
  • Perform system-level simulation of end-to-end fabric performance.
  • Drive failure analysis, root cause, and corrective actions for interconnect-related issues in production clusters through fleet-level metrics gathering and analysis.
  • Contribute to internal tooling and automation for interconnect health monitoring, telemetry, diagnostics, remediation and automated qualification pipelines.
  • Stay current with industry standards (OIF CMIS, IEEE) and emerging technologies (multi-core/hollow-core fiber, 448G SerDes, TFLN, ring resonators)
BASIC QUALIFICATIONS:
  • At least 8+ years of hands-on experience in designing, deploying and operating high-speed copper and optical interconnects, preferably in a module design role or in a hyperscale datacenter environment.
  • Master's or PhD degree in Electrical Engineering, Photonics or Physics.
  • Deep knowledge of PAM4 SerDes performance, equalization, jitter, crosstalk.
  • Solid operational understanding of FEC, Retimers, TIAs and Drivers.
  • Deep knowledge of optical link budget analysis and performance metrics including TDECQ, OMA, Tcode, stressed receiver sensitivity and associated diagnostics.
  • Expertise in transceiver components including CW lasers, SiPh PICs, EML, DSP, passive subassemblies, their failure modes and characterization.
  • Knowledge of thermal, mechanical, power, signal integrity constraints in dense hardware.
  • Knowledge of SiPh design process, yield improvement and reliability testing.
  • Familiarity with CPO technologies and challenges/risk areas.
  • Familiarity with subcomponent supply chains and global manufacturers, ODMs and CMs.
  • Strong problem-solving skills and ability to thrive in a fast-paced, ambiguous setting.
COMPENSATION AND BENEFITS:

$180,000 - $440,000 USD

Base salary is just one part of our total rewards package at X, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Network Engineer - ML Infrastructure (High-Speed Interconnects) in Palo Alto, CA vacancy
  •  ...generation of humanoid robots — from high-performance, software-...  .... We're looking for a Cloud Infrastructure Engineer to build and operate the...  ...Azure), including compute, networking, storage, and managed database...  ...building infrastructure for ML workloads — GPU cluster management... 
    Suggested

    Rhoda AI

    Palo Alto, CA
    3 days ago
  • $138k - $198k

     ...degree in Electrical Engineering, Computer Engineering...  ...years of experience in high-performance ASIC...  ...Experience developing networking IP across one or more...  ...Experience with high-speed interconnects. Experience with scripting...  ...within AI/ML-driven systems. Participate... 
    Suggested

    Google Inc.

    Sunnyvale, CA
    1 day ago
  •  ...Senior Staff Software Engineer at Hippocratic AI, you’...  ...engineering standards, CI/CD infrastructure, and developer platform...  ...directly with ML researchers, clinical advisors...  ...infrastructure, build speeds, linting, and inner‑...  ...engineering culture at a high‑growth startup.... 
    Suggested
    Work at office
    Local area

    Hippocratic-Ai

    Palo Alto, CA
    11 days ago
  •  ...candidate to tackle software integration challenges. You will work with cutting-edge GPU architectures and AI infrastructure projects, focusing on high-speed communication and virtualization. Your responsibilities will include managing the integration of large-scale products... 
    Suggested

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • NVIDIA Gruppe in Santa Clara is seeking experienced Software Engineers to develop high-speed interconnect technologies. This role involves close collaboration with hardware architects and significant application of C++ programming skills. Ideal candidates will possess... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $262k - $365k

    Senior Staff Software Engineer, AI Infrastructure, Google Cloud, Applied AI corporate...  ...Systems, LLMs and High Performance Computing. Preferred...  ...infrastructure, and talent. AI/ML software engineers in Cloud...  ...We enable high adoption and speed to value by building solutions... 

    Google Inc.

    Sunnyvale, CA
    11 hours ago
  •  ...are looking for a Network Architect to join our Cluster Engineering Team and help shape...  ...datacenter and interconnect fabric for the current...  ..., reliable, and high‑throughput connectivity...  ...fabrics for AI/ML and HPC clusters,...  ...of network infrastructure using Python, including... 

    Cerebras Systems Inc.

    Sunnyvale, CA
    5 days ago
  • $155.42k - $395.9k

     ...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure organization. Our team owns the cloud...  ...development by optimizing for high-priority, ML-centric use cases. Our...  ...a Senior ML Infrastructure engineer to help build and scale robust platforms... 
    Local area
    Remote work
    Relocation
    Relocation package
    Flexible hours

    Israelvcforum

    Mountain View, CA
    2 days ago
  • $70 - $79 per hour

    FocusKPI is seeking an AI Infrastructure & Experience Engineer to join one of our clients, a high-tech SaaS company....  ...‑intensive tasks. AI/ML Frameworks: Extensive...  ...communication in a local network environment. Overall...  ...in environments where speed and creativity are... 
    Contract work
    Local area
    Shift work

    FocusKPI, Inc.

    Mountain View, CA
    2 days ago
  • $204k - $343k

     ...creating the digital infrastructure needed to bring...  ...About the role As an Engineering Manager on the ML Platform team, you'll...  ...architecture, scheduling, networking, and resource...  ...accelerate their iteration speed Drive hiring,...  ...mentoring, and growth for a high-performing, mission-... 
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Decisive Point

    Sunnyvale, CA
    4 days ago
  •  ...launch new services at the speed we have been since our inception...  ...in this always-on, high-tech, and hyper-connected world...  ...As our Senior Staff Software Engineer, ML infra Engineer for Search &...  ...pipelines * Develop and scale data infrastructure that powers batch and real-... 
    Full time
    Temporary work
    Flexible hours

    Coupang

    Mountain View, CA
    4 days ago
  • Agentic Search Infrastructure Engineer - Moveworks Other Mountain View, CALIFORNIA, United States Full...  ...on top of the corpus. Design and run high-throughput ingestion and enrichment pipelines...  ...-functionally with Search Ranking, ML, and Platform engineering teams to... 
    Full time
    Shift work

    Moveworks

    Mountain View, CA
    3 days ago
  •  ...generation of humanoid robots — from high-performance, software-...  ...re looking for an Inference Infrastructure Engineer to help build and operate...  ...pipelines for managing distributed ML workloads Own resource...  ...gRPC, NATS) Background in networking, low-latency systems, or... 

    Rhoda ai

    Palo Alto, CA
    2 days ago
  •  ...advanced signal processing, and is engineered to operate from -40 °C to +12...  ...’ll help extend our existing high‑level dialects and design a...  ...with hardware engineers and ML developers, your work will...  ...migration. Define validation infrastructure within MLIR, including interpretation... 

    Latent AI

    Palo Alto, CA
    2 days ago
  • Gigamon is seeking a Hardware Engineer to define high-level system design and hardware specifications for new platforms...  ...will have over 10 years of experience in networking and embedded products, specifically with high-speed circuit design. This role requires a solid... 

    Gigamon

    Santa Clara, CA
    3 days ago
  •  ...data scientist can scale an ML application from their...  ...a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide...  ...data plane, which ensures high-performance execution of...  ...deployments Deep understanding of networking, security, and... 

    Cerebras

    Palo Alto, CA
    1 day ago
  • $180k - $250k

     ...blog posts sharing our high-level results for text...  ...research and data engineering necessary to solve this...  ...an experienced Cloud Infrastructure Engineer to join our core...  ...training large-scale ML models Ensure our infrastructure...  ...-level debugging—networking issues, memory leaks,... 
    Work at office
    Relocation package

    Datologyai

    Redwood City, CA
    20 hours ago
  • $163k - $347.5k

     ...Distinguished Technologist Mechanical Engineer (Network Infrastructure)Applylocations: Sunnyvale,...  ...into outcomes at the speed required to thrive in...  ....The ideal candidate is a highly experienced, hands-on...  ...cooling, liquid cooling, interconnects, materials, and manufacturing... 
    Work experience placement
    Work at office

    Hewlett Packard Enterprise Development LP

    Sunnyvale, CA
    2 days ago
  • Arista Networks, located in Santa Clara, California, is seeking an experienced Hardware Design Engineer to innovate and develop high-speed networking and Ethernet products. You will be responsible for the end-to-end design and development of advanced hardware solutions... 

    Arista Networks

    Santa Clara, CA
    11 hours ago
  • $210k - $247k

     ...traditional options like engines, turbines, and fuel...  ...utility customers. We have a high-impact opportunity for...  ...impact physical infrastructure and global energy transition...  ...the team building high-speed systems to ingest and...  ...success in leveraging AI/ML to drive developer... 
    Local area
    Remote work
    Flexible hours

    Ring

    Menlo Park, CA
    4 days ago
  •  ...Client is hiring an experienced Network Engineer to help grow and scale our campus network infrastructure and zero‑trust framework, as...  ...Engineering. You will come up to speed on existing capabilities as a...  ...to support scalable and high‑growth infrastructure environments... 
    Remote work

    MDAEdge

    Mountain View, CA
    4 hours ago
  • Founding Machine Learning Infrastructure Engineer Location: Onsite in Palo Alto...  ...vertically integrated stack: high-throughput, cost-efficient serving...  ...Role We are looking for an ML Systems Engineer to help...  ...model execution, runtime, networking, and infrastructure. Work with... 

    Model AI

    Palo Alto, CA
    3 days ago
  • $153.2k - $234.1k

     ...vehicle behavior across real-world scenarios. As a Senior ML engineer, you will build critical infrastructure that powers every machine learning engineer working...  ...Contribute to a strong engineering culture through high-quality code reviews, documentation, and operational... 
    Remote work
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    4 days ago
  •  ...on large-scale models depends on world-class ML infrastructure. We’re looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable...  ...with modeling teams to accelerate iteration speed and reduce training costs Build internal tools... 

    Mind Robotics Inc.

    Palo Alto, CA
    1 day ago
  •  ...AV efforts.We’re proud to serve as the infrastructure platform for teams developing...  ...innovation and development by prioritizing high-impact, ML-centric use cases. About the Role We are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute... 

    General Motors

    Mountain View, CA
    1 day ago
  •  ...We're developing end-to-end ML models for robot manipulation...  ...expertise: data pipelines, training infrastructure or inference. You'll build...  ...and Research Tooling Design high-throughput pipelines for...  ...Looking For Strong software engineering and systems fundamentals Experience... 

    Sunday

    Mountain View, CA
    1 day ago
  • $235k - $352k

     ...looking for a Staff Technical Lead for Onboard Infrastructure. This role involves defining and building a high-performance foundation for the Nuro Driver, focusing...  ..., along with significant experience in software engineering and technical leadership. This position offers a... 

    Kindredventures

    Mountain View, CA
    4 days ago
  • $174k - $253k

    Senior Software Engineer, Cloud, ML Infrastructure Google, Sunnyvale, CA, USA Apply Bachelor’s degree or...  ...level programming. Experience with high-performance computing. About the job...  ...AI for Google Cloud, Google Global Networking, Data Center operations, systems research... 
    Worldwide

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $120.3k - $194.53k

     ...Mission At Palo Alto Networks®, we’re united by a shared...  ...of cloud‑native infrastructure, where reliability, scale...  ...Senior Site Reliability Engineer, you will design and...  ...monitoring Leverage AI/ML to automate incident...  ...distributed systems handling high‑volume transactions... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    20 hours ago
  •  ...Role: Senior Infrastructure / Network Engineer Location: Sunnyvale, California, United States (4 Days Onsite) Long Term Contract Role: Required...  ...frameworks Background supporting large‑scale, high‑availability environments Experience with Equinix data... 
    Long term contract
    Remote work

    Nityo Infotech

    Sunnyvale, CA
    20 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Network Engineer - ML Infrastructure (High-Speed Interconnects). Be the first to apply!