Principal Back-End Network Engineer - AI Infrastructure Operations
$150k - $215kNscale
Principal Back-End Network Engineer - AI Infrastructure Operations US About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future. About The Role The Network Operations and Engineering teams at Nscale operate some of the most demanding networking environments in the industry, supporting tightly coupled GPU clusters where network performance directly impacts customer outcomes. We’re looking for a Principal Network Engineer – AI Infrastructure to provide technical leadership across Nscale’s high-speed networking domain. This role is focused on owning the reliability, scalability, and long-term evolution of our Infiniband and RDMA-based network fabrics. You will operate as a technical authority, influencing architecture, standards, and operational practices across teams while tackling the most complex network challenges in the platform. What You'll Be Doing Owning the technical direction and operational strategy for Nscale’s AI interconnect networks Designing, reviewing, and evolving large-scale Infiniband and RoCE fabric architectures to support future growth and workload demands Acting as the senior escalation point for the most complex network incidents, guiding deep technical investigations and systemic fixes Driving cross-team initiatives to improve fabric reliability, performance predictability, and operational maturity Defining standards for hardware configuration, congestion control, routing, firmware lifecycle management, and change safety Partnering with SRE, Compute Platform, and Network Architecture teams to influence end-to-end system design Mentoring senior and mid-level network engineers, raising the bar for operational rigor and technical excellence Driving measurable improvements in uptime, latency consistency, capacity efficiency, and incident reduction About You (Skills / Qualifications) 10+ years of experience in network engineering, with deep focus on HPC, AI, or hyperscale data centre networking Expert-level operational and architectural experience with Infiniband and/or large-scale RoCE fabrics Deep understanding of RDMA internals, congestion management, and fabric-level failure modes Strong expertise in modern data centre routing and control planes (BGP, OSPF, ECMP) Proven ability to debug and resolve cross-layer issues spanning hardware, firmware, kernel, and application communication libraries Demonstrated ability to lead complex technical initiatives across teams without direct authority A systems-level mindset, balancing performance, reliability, scalability, and operational cost Nice to Have Extensive experience with NVIDIA/Mellanox networking platforms in production AI or HPC environments Deep familiarity with distributed training frameworks and GPU communication patterns Experience designing network observability systems for high-cardinality, high-throughput environments Prior experience influencing platform or infrastructure strategy at scale What We Can Offer You At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core. Highly competitive package (base + equity) with reviews every 12 months. Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds. If there’s anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role. The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation. Salary Range
$150,000 - $215,000 USD
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here. #J-18808-Ljbffr Nscale- ...mark on cutting-edge AI. Expect a dynamic progression... ...is the GPU cloud engineered for AI. We provide... ...effective, high-performance infrastructure for AI start-ups and... ...Nscale is seeking a Network Architect Engineer to... ...evolution, reliability, and operational excellence of our...OperationsPrincipalFixed term contractFlexible hours
- ...experience — real-time AI agents that can... ...this, we need more great engineers. The work affects millions... ...complex support and operations workflows across voice... ...We're looking for an infrastructure engineer to build the... ...own critical systems end-to-end and make decisions...Operations
- ...pioneering the future of agentic AI and our focus is to... ...innovative platform is engineered from the ground up to boost operations efficiency and enhance... ...~$200B market. Off the back of our Series-A in early... ...creating robust analytics infrastructure to measure AI...OperationsImmediate startShift work
$150k - $215k
Nscale is seeking a Principal Back-End Network Engineer to lead technical initiatives for AI infrastructure. Responsibilities include owning the reliability of Infiniband networks and driving operational excellence. With over 10 years of experience in HPC networking, you...Principal$347k
...what we build but operational in how we execute,... ...OpenAI is seeking a Principal Software Engineer to join the Infrastructure Security (InfraSec... ..., datacenters, networking, storage, and the... ...power our frontier AI models. Our charter... ...of employment or end of assignment; and...OperationsPrincipalFull timeWork at officeLocal areaRemote workRelocation packageFlexible hours$196k - $245k
...playing games. Our Platform Infrastructure teams are responsible for... ...seamlessly. Through developing and operating foundational platform... ...scalable. As a Senior Software Engineer on these teams, you will... ...library. Experience utilizing AI tools like Claude Code and Cursor...OperationsFull timeRelocationRelocation package- Principal Engineer, AI Platform & Infrastructure About the Role SPREEAI is building the future of AI-powered commerce... ...Training Enablement Build and operate SPREEAI’s end‑to‑end ML platform spanning... ...systems, containers, networking, GPU memory, and storage layers...Principal
$300 per month
...As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons... ...About This Role: We are seeking a Principal Software Engineer - Software Defined Networking to lead the development and...OperationsPrincipalTemporary work- Nscale is seeking a Network Front-End Engineer for AI Infrastructure Operations in San Francisco, California. The role involves designing, deploying, and managing high-speed Ethernet networks critical for both internal and customer-facing operations. You will troubleshoot...Operations
- Senior Software Engineer, Infrastructure & Platform Role Overview: As a Senior Software Engineer, Infrastructure... ...data pipelines used to train frontier AI models. This is a highly technical... ..., reliability, and infrastructure operations. Required Qualifications Strong...Operations
- ...a highly skilled Principal Engineer to join our Endpoint Protection & Infrastructure Vulnerability Scanning... ..., leading end-to-end software delivery... ...Support on-call operations on a rotating... ...zones. Leverage AI-assisted development... ..., including network, application and...OperationsPrincipal
- Network Front-End Engineer, AI Infrastructure Operations US About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start‑ups and large enterprise customers. Nscale enables AI-focused companies to achieve...Operations
$200k - $350k
...builds and commercializes AI agents for science.... ...top researchers and engineers across AI and biology... ...scientist. Role As a Principal Infrastructure Engineer, you’ll play... ..., scaling, and operating the core platform infrastructure... ...). Own storage and networking strategy within...PrincipalWork at office- A leading AI technology company based in San Francisco is looking for a seasoned Software Engineer with expertise in cloud architecture to join their Infrastructure Engineering team. The successful candidate... ...oversee multi-region cloud operations. Candidates should possess...Operations
$180k - $280k
About the Role As an ML infrastructure and reliability engineer, you will join the team responsible for building... ...requests, improving our debugging and operations stance. Ramp up oncall engineers on... ...and dinner. Visa sponsorships. 401K plans. #J-18808-Ljbffr TypeSafe AIOperationsVisa sponsorship- ...in 2015, Shield AI is a venture-backed defense-tech... ...actively supports operations worldwide. For... ...responsible for the infrastructure that underpins... ...are seeking a Principal Engineer that will scale... ...Consistent end-to-end lifecycle... ...storage, high-speed networking) Why Join...OperationsPrincipalFull timeTemporary workPart timeWorldwide
$175k - $200k
...in San Francisco is looking for an experienced Infrastructure Engineer. This role involves managing the deployment and operation of the company’s cloud platform, with a focus... ...strong coding skills, and the ability to leverage AI for automation and efficiency. The position...Operations$200k - $250k
...had to do. Powerful AI will be the... ...frontier compute infrastructure fastest will decide... ...data centers, and operate them - with teams... ...looking for a Software Engineer, Infrastructure Platform... ...assets, network topology, and configuration... ...Develop end-to-end asset lifecycle...OperationsLocal area$181.1k - $318.4k
AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training... ...Machine Learning and AI Description As an engineer... ...Leverage high‑performance networking technologies such as NCCL for... ...and automate pre‑training operations. Operationalize large‑scale...OperationsRelocation- ...that enhance defensive cyber security operations, including automating critical use-cases... ...is your opportunity. The Role As an Infrastructure Engineer, you'll architect and lead the systems... ...to influence product design, support AI workloads, and ensure the infrastructure...Operations
$2,000 per month
...is to build a world where AI/ML and analytics are powered... ...you. The Role As a Principal Infrastructure Engineer , you will help lead and... ...in managing and operating Kubernetes clusters across... ...AKS -Strong experience in network architecture, particularly...Principal$195k - $280k
...with the long-term backing of leading insurer... ...company that engineers advanced risk prediction... ...& increasingly AI-native insurance platform... ...and underwriting operations, coordinating... ...product features end-to-end # Evaluate... ...with cloud-based infrastructure providers and modern...OperationsRemote jobExtra incomeLocal areaWork from homeHome officeFlexible hours- ...partner with leading AI labs and enterprises to... ...development. Our vast talent network trains frontier AI... ...researchers, operators, and AI companies at the... ...You’ll own cloud and infrastructure security at a company... ...security, or platform/SRE engineering with a strong security...OperationsWork at officeRemote workRelocation package
- ...is building the infrastructure foundation for the... ...generation of AI. The Data Center Engineering team defines the... ...mechanical, controls, network, hardware,... ...deployment, and operations workstreams. You... ...activities and feed them back into playbooks,... ...employment or end of assignment;...OperationsWork at officeShift work
- Palantir is seeking a Backend Software Engineer in San Francisco to develop scalable software for data-driven operations. The role requires expertise in programming languages... ...in distributed systems and cloud infrastructure. The position offers significant autonomy in...OperationsRelocation package
$150k - $200k
...We're seeking a Staff Backend Engineer to join our remotely... ...maintainability of our critical financial infrastructure, directly impacting how our... ...team is at the heart of our operations, building and maintaining the... ...Development (SDD) with AI tools, ensuring engineers consistently...OperationsImmediate startRemote workFlexible hours- COL Limited is seeking a Software Engineer in San Francisco to build and maintain cloud infrastructure critical for research operations. This role will involve setting the vision for the platform, ensuring its security, and supporting the engineering team. The ideal candidate...OperationsFlexible hours
- A leading AI platform company in San Francisco is looking for a Senior Infrastructure Engineer to design and operate production infrastructure for high-scale, low-latency systems. Your focus will be on critical services, improving reliability, and enhancing developer velocity...Operations
- ...This role is infrastructure-first, with a second gear in backend... ...is building the operating system for charter aviation... ...a Staff Platform Engineer to own the infrastructure... ...software gets built. AI-accelerated... ...negotiably: Own problems end-to-end and don't wait...OperationsSecond jobVisa sponsorship
$232k - $319k
...Every Identity, from AI to Human Identity is... ...the trusted, neutral infrastructure that enables organizations... ...and owners who operate with speed and urgency... ...teams focused on Edge networking, K8s platform, Observability... ...velocity of SRE and product engineering by developing robust...OperationsPermanent employmentLocal areaWorldwideFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Back-End Network Engineer - AI Infrastructure Operations. Be the first to apply!
- principal network engineer San Francisco, CA
- senior director engineering San Francisco, CA
- engineering director San Francisco, CA
- principal engineer San Francisco, CA
- assistant chief engineer San Francisco, CA
- technical director engineering San Francisco, CA
- principal security engineer San Francisco, CA
- director systems engineering San Francisco, CA
- director software engineering San Francisco, CA
- project engineer assistant project manager San Francisco, CA


