Network Engineer - ML Infrastructure (High-Speed Interconnects)
$180kxAI
Job Description
Job Description
About xAI
xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
ABOUT THE ROLE:xAI is building at a furious pace with the latest compute and switching hardware to help people understand the universe. We are looking for exceptional ML Infrastructure Engineers with deep expertise in high-speed interconnect technologies to design, build, and optimize the network fabric that powers large-scale AI training and inference clusters. This strategic role will drive innovation in high-bandwidth, low-latency, power-efficient interconnects critical for AI/ML clusters based on advanced computing platforms.
You will have the opportunity to work on all modalities of interconnects connecting GPUs and switches both inside and between data centers, including our primary front and backend networks that train Grok and that customers use for inference. Engineers will own all aspects from design and development to build and operations. You will be expected to define and improve team processes and to contribute to scaling and maintenance efforts.
You will focus on the physical layer and system-level integration of copper (ACC, AEC, CPC) and optical (FRO, LRO/TRO, LPO, AOC, CPO) interconnects that directly determine the performance, power efficiency, scale, and cost of next-generation AI/ML clusters. This is a highly technical, hands-on role bridging ML cluster requirements with cutting-edge interconnect hardware — ideal for engineers who love both large-scale AI systems and the physics/engineering of 200G+ SerDes, PAM4, photonics, signal integrity and diagnostics.
RESPONSIBILITIES:- Design, validate, and productize high-speed copper and optical connectivity solutions for AI clusters (100k+ GPU scale).
- Own vendor due diligence and onboarding for new 1.6T products including AEC and pluggable optical transceivers (DR4/8, FR4) including rigorous bring-up & characterization.
- Investigate the opportunity for LPO and LRO in our network.
- Evaluate early co-packaged and near-packaged engines for switches and GPUs.
- Pathfinding for new interconnect modalities including VCSEL, microLED, THz radio-based solutions to improve network economics and reliability.
- Work closely with vendors (transceiver, cable, SerDes, DSP, silicon photonics foundries) to influence roadmaps and ensure timely delivery of next-gen solutions.
- Collaborate with ML training teams to translate workload communication patterns into concrete interconnect topology and optical reconfigurability requirements.
- Perform system-level simulation of end-to-end fabric performance.
- Drive failure analysis, root cause, and corrective actions for interconnect-related issues in production clusters through fleet-level metrics gathering and analysis.
- Contribute to internal tooling and automation for interconnect health monitoring, telemetry, diagnostics, remediation and automated qualification pipelines.
- Stay current with industry standards (OIF CMIS, IEEE) and emerging technologies (multi-core/hollow-core fiber, 448G SerDes, TFLN, ring resonators)
- At least 8+ years of hands-on experience in designing, deploying and operating high-speed copper and optical interconnects, preferably in a module design role or in a hyperscale datacenter environment.
- Master's or PhD degree in Electrical Engineering, Photonics or Physics.
- Deep knowledge of PAM4 SerDes performance, equalization, jitter, crosstalk.
- Solid operational understanding of FEC, Retimers, TIAs and Drivers.
- Deep knowledge of optical link budget analysis and performance metrics including TDECQ, OMA, Tcode, stressed receiver sensitivity and associated diagnostics.
- Expertise in transceiver components including CW lasers, SiPh PICs, EML, DSP, passive subassemblies, their failure modes and characterization.
- Knowledge of thermal, mechanical, power, signal integrity constraints in dense hardware.
- Knowledge of SiPh design process, yield improvement and reliability testing.
- Familiarity with CPO technologies and challenges/risk areas.
- Familiarity with subcomponent supply chains and global manufacturers, ODMs and CMs.
- Strong problem-solving skills and ability to thrive in a fast-paced, ambiguous setting.
$180,000 - $440,000 USD
Base salary is just one part of our total rewards package at X, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer. For details on data processing, view our Recruitment Privacy Notice.
- ...generation of humanoid robots — from high-performance, software-... .... We're looking for a Cloud Infrastructure Engineer to build and operate the... ...Azure), including compute, networking, storage, and managed database... ...building infrastructure for ML workloads — GPU cluster management...Suggested
$138k - $198k
...degree in Electrical Engineering, Computer Engineering... ...years of experience in high-performance ASIC... ...Experience developing networking IP across one or more... ...Experience with high-speed interconnects. Experience with scripting... ...within AI/ML-driven systems. Participate...Suggested- ...Senior Staff Software Engineer at Hippocratic AI, you’... ...engineering standards, CI/CD infrastructure, and developer platform... ...directly with ML researchers, clinical advisors... ...infrastructure, build speeds, linting, and inner‑... ...engineering culture at a high‑growth startup....SuggestedWork at officeLocal area
- ...candidate to tackle software integration challenges. You will work with cutting-edge GPU architectures and AI infrastructure projects, focusing on high-speed communication and virtualization. Your responsibilities will include managing the integration of large-scale products...Suggested
- NVIDIA Gruppe in Santa Clara is seeking experienced Software Engineers to develop high-speed interconnect technologies. This role involves close collaboration with hardware architects and significant application of C++ programming skills. Ideal candidates will possess...Suggested
$262k - $365k
Senior Staff Software Engineer, AI Infrastructure, Google Cloud, Applied AI corporate... ...Systems, LLMs and High Performance Computing. Preferred... ...infrastructure, and talent. AI/ML software engineers in Cloud... ...We enable high adoption and speed to value by building solutions...- ...are looking for a Network Architect to join our Cluster Engineering Team and help shape... ...datacenter and interconnect fabric for the current... ..., reliable, and high‑throughput connectivity... ...fabrics for AI/ML and HPC clusters,... ...of network infrastructure using Python, including...
$155.42k - $395.9k
...Description About the Team: The ML Inference Platform is part of the AV ML Infrastructure organization. Our team owns the cloud... ...development by optimizing for high-priority, ML-centric use cases. Our... ...a Senior ML Infrastructure engineer to help build and scale robust platforms...Local areaRemote workRelocationRelocation packageFlexible hours$70 - $79 per hour
FocusKPI is seeking an AI Infrastructure & Experience Engineer to join one of our clients, a high-tech SaaS company.... ...‑intensive tasks. AI/ML Frameworks: Extensive... ...communication in a local network environment. Overall... ...in environments where speed and creativity are...Contract workLocal areaShift work$204k - $343k
...creating the digital infrastructure needed to bring... ...About the role As an Engineering Manager on the ML Platform team, you'll... ...architecture, scheduling, networking, and resource... ...accelerate their iteration speed Drive hiring,... ...mentoring, and growth for a high-performing, mission-...Full timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift- ...launch new services at the speed we have been since our inception... ...in this always-on, high-tech, and hyper-connected world... ...As our Senior Staff Software Engineer, ML infra Engineer for Search &... ...pipelines * Develop and scale data infrastructure that powers batch and real-...Full timeTemporary workFlexible hours
- Agentic Search Infrastructure Engineer - Moveworks Other Mountain View, CALIFORNIA, United States Full... ...on top of the corpus. Design and run high-throughput ingestion and enrichment pipelines... ...-functionally with Search Ranking, ML, and Platform engineering teams to...Full timeShift work
- ...generation of humanoid robots — from high-performance, software-... ...re looking for an Inference Infrastructure Engineer to help build and operate... ...pipelines for managing distributed ML workloads Own resource... ...gRPC, NATS) Background in networking, low-latency systems, or...
- ...advanced signal processing, and is engineered to operate from -40 °C to +12... ...’ll help extend our existing high‑level dialects and design a... ...with hardware engineers and ML developers, your work will... ...migration. Define validation infrastructure within MLIR, including interpretation...
- Gigamon is seeking a Hardware Engineer to define high-level system design and hardware specifications for new platforms... ...will have over 10 years of experience in networking and embedded products, specifically with high-speed circuit design. This role requires a solid...
- ...data scientist can scale an ML application from their... ...a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide... ...data plane, which ensures high-performance execution of... ...deployments Deep understanding of networking, security, and...
$180k - $250k
...blog posts sharing our high-level results for text... ...research and data engineering necessary to solve this... ...an experienced Cloud Infrastructure Engineer to join our core... ...training large-scale ML models Ensure our infrastructure... ...-level debugging—networking issues, memory leaks,...Work at officeRelocation package$163k - $347.5k
...Distinguished Technologist Mechanical Engineer (Network Infrastructure)Applylocations: Sunnyvale,... ...into outcomes at the speed required to thrive in... ....The ideal candidate is a highly experienced, hands-on... ...cooling, liquid cooling, interconnects, materials, and manufacturing...Work experience placementWork at office- Arista Networks, located in Santa Clara, California, is seeking an experienced Hardware Design Engineer to innovate and develop high-speed networking and Ethernet products. You will be responsible for the end-to-end design and development of advanced hardware solutions...
$210k - $247k
...traditional options like engines, turbines, and fuel... ...utility customers. We have a high-impact opportunity for... ...impact physical infrastructure and global energy transition... ...the team building high-speed systems to ingest and... ...success in leveraging AI/ML to drive developer...Local areaRemote workFlexible hours- ...Client is hiring an experienced Network Engineer to help grow and scale our campus network infrastructure and zero‑trust framework, as... ...Engineering. You will come up to speed on existing capabilities as a... ...to support scalable and high‑growth infrastructure environments...Remote work
- Founding Machine Learning Infrastructure Engineer Location: Onsite in Palo Alto... ...vertically integrated stack: high-throughput, cost-efficient serving... ...Role We are looking for an ML Systems Engineer to help... ...model execution, runtime, networking, and infrastructure. Work with...
$153.2k - $234.1k
...vehicle behavior across real-world scenarios. As a Senior ML engineer, you will build critical infrastructure that powers every machine learning engineer working... ...Contribute to a strong engineering culture through high-quality code reviews, documentation, and operational...Remote workRelocation packageFlexible hours- ...on large-scale models depends on world-class ML infrastructure. We’re looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable... ...with modeling teams to accelerate iteration speed and reduce training costs Build internal tools...
- ...AV efforts.We’re proud to serve as the infrastructure platform for teams developing... ...innovation and development by prioritizing high-impact, ML-centric use cases. About the Role We are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute...
- ...We're developing end-to-end ML models for robot manipulation... ...expertise: data pipelines, training infrastructure or inference. You'll build... ...and Research Tooling Design high-throughput pipelines for... ...Looking For Strong software engineering and systems fundamentals Experience...
$235k - $352k
...looking for a Staff Technical Lead for Onboard Infrastructure. This role involves defining and building a high-performance foundation for the Nuro Driver, focusing... ..., along with significant experience in software engineering and technical leadership. This position offers a...$174k - $253k
Senior Software Engineer, Cloud, ML Infrastructure Google, Sunnyvale, CA, USA Apply Bachelor’s degree or... ...level programming. Experience with high-performance computing. About the job... ...AI for Google Cloud, Google Global Networking, Data Center operations, systems research...Worldwide$120.3k - $194.53k
...Mission At Palo Alto Networks®, we’re united by a shared... ...of cloud‑native infrastructure, where reliability, scale... ...Senior Site Reliability Engineer, you will design and... ...monitoring Leverage AI/ML to automate incident... ...distributed systems handling high‑volume transactions...Full timeWork at officeVisa sponsorshipWork visa- ...Role: Senior Infrastructure / Network Engineer Location: Sunnyvale, California, United States (4 Days Onsite) Long Term Contract Role: Required... ...frameworks Background supporting large‑scale, high‑availability environments Experience with Equinix data...Long term contractRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Network Engineer - ML Infrastructure (High-Speed Interconnects). Be the first to apply!
- principal network engineer Palo Alto, CA
- network consulting engineer Palo Alto, CA
- core network engineer Palo Alto, CA
- network infrastructure engineer Palo Alto, CA
- network engineer - transport Palo Alto, CA
- network engineer level Palo Alto, CA
- cisco network engineer Palo Alto, CA
- juniper network engineer Palo Alto, CA
- lead network engineer Palo Alto, CA
- network software engineer Palo Alto, CA


