Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Cluster Architect

$165k - $185k

Vultr

Who We Are

Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. With 33 global cloud data center locations, Vultr is trusted by hundreds of thousands of active customers across 185 countries for its flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. In December 2024 Vultr announced an equity financing at a $3.5 billion valuation. Founded by David Aninowsky and self-funded for over a decade, Vultr has grown to become the world's largest privately-held cloud infrastructure company.
Vultr Cares
  • Excellent Medical Benefits w/ 100% company-paid premiums for employee only plan + 100% company-paid dental & vision premiums
  • 401(k) plan that matches 100% up to 4% with immediate vesting
  • Professional Development Reimbursement of $2,500 each year
  • 11 Holidays + Paid Time Off Accrual + Rollover Plan + take your birthday off
  • Commitment matters to Vultr! Increased PTO at 3 year & 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
  • $500 first year remote office setup + $400 each following year for new equipment
  • Internet reimbursement up to $75 per month
  • Gym membership reimbursement up to $50 per month
  • Company-paid Wellable subscription
Join Vultr

Vultr is looking for an AI Cluster Architect who will be responsible for creating and refining large-scale GPU cluster architectures within strict power and infrastructure limits. This role focuses heavily on power-aware design: starting from a fixed power envelope, the architect determines the optimal number of GPUs while accounting for the full stack of services needing to be deployed-compute nodes, storage systems, networking fabric, cooling, and facility constraints. This role requires deep experience navigating heterogeneous environments, multiple generations of hardware, and end user requirements.

The architect must understand how different GPU SKUs, NICs, switches, and fabrics interact at scale, including their individual and aggregate power and thermal characteristics. They will evaluate multi-plane, rail-optimized, and tiered fabric designs across technologies like InfiniBand, RoCE, and SpectrumX to ensure the networking architecture supports the intended GPU count without overrunning facility limits or switch radix and/or topology constraints. This role balances customer-specific requirements for compute, storage, and service density, ensuring that the final cluster design maintains acceptable levels of GPU and fabric performance, while maximizing the number of usable GPUs within the total power budget.

Key Responsibilities
  • Architect large-scale GPU clusters within fixed site power budgets that optimizes for maximum GPU density while reserving necessary headroom for compute services, storage, and networking.
  • Model and validate power consumption across the full cluster bill of materials (GPUs, CPUs, NICs, switches, fabric components, storage, and facility limits).
  • Evaluate tradeoffs across multiple fabric networking architectures (InfiniBand, RoCE, SpectrumX) as well as multi-plane, 2-tier/3-tier, and rail-optimized topologies.
  • Determine network scale limits based on switch radix, link speed, topology, and blocking requirements.
  • Gather, interpret, and maintain detailed SKU-level power and thermal specifications for GPUs, NICs, switches, DPUs, storage, and server platforms.
  • Develop power-aware cluster configuration templates and capacity-planning models that can scale across sites with varying constraints and allow for quick iteration and ideation.
  • Document architecture, design choices, tradeoff analyses, and operational considerations for deployment and lifecycle management.
  • Provide guidance on future-proofing, including the ability to incorporate next-gen GPUs, NICs, or fabrics.
  • Collaborate with vendors on novel fabric architectures that enable large-scale cluster deployments (100k+ GPUs)
Qualifications
  • 7+ years designing or building large-scale HPC, AI, or hyperscale GPU clusters.
  • Expert understanding of GPU and accelerator system design, including node topology, PCIe/NVLink/NVSwitch/ROCm, and NIC-to-GPU affinity considerations.
  • Strong familiarity with InfiniBand, RoCE, and SpectrumX networking, including multi-tier, multi-plane, Clos/dragonfly variants, and large-radix switch design.
  • Demonstrated experience modeling power draw and thermal characteristics of servers, GPUs, NICs, switches, optics, and storage systems.
  • Ability to design networks that maintain full non-blocking performance or intentionally introduce over/under-subscription while understanding impacts on workload performance.
  • Proven ability to gather and analyze vendor SKU-level specifications and incorporate them into scalable cluster architectures.
  • Experience balancing customer-driven requirements for compute, storage, and service density in combination with overall GPU count.
  • Strong documentation, communication, and cross-functional collaboration skills.

Compensation

$165,000 - $185,000

This salary can vary based on location, years of experience, background and skill set.

Inclusion & Privacy

We are an equal opportunity employer and are committed to creating an inclusive environment for all employees. We welcome applications from individuals of all backgrounds and experiences, and we prohibit discrimination based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected status under applicable laws. Vultr will consider qualified applicants with arrest or conviction records in accordance with applicable laws and will not conduct a background check until after an offer of employment has been extended and accepted.

We also take your privacy seriously. We handle personal information responsibly and follow applicable laws, including U.S. privacy rules and India's Digital Personal Data Protection Act, 2023. Your data is used only for legitimate business purposes and is protected with proper security measures.

Where allowed by law, applicants may request details about the data we collect, access or delete their information, withdraw consent for its use, and opt out of nonessential communications. For more details, please see our Privacy Policy.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the AI Cluster Architect in United States vacancy
  • $184k - $356.5k

    NVIDIA Gruppe is seeking an experienced engineer to lead GPU cluster design and support for AI and HPC deployments in Santa Clara, California. The ideal candidate will have over 8 years of experience with large-scale GPU infrastructure and a strong ability to communicate... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • Advanced Micro Devices, Inc. is seeking a Cluster Thermal Engineer in Austin, Texas, to design scalable thermal solutions for AI/HPC clusters. Ideal candidates will have a mechanical engineering background with a solid understanding of thermodynamics and fluid dynamics... 
    Suggested

    Advanced Micro Devices, Inc.

    Austin, TX
    2 days ago
  • NVIDIA Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • AMD is seeking a Cluster Thermal Engineer in Austin, Texas, responsible for architecting scalable thermal solutions for AI and HPC clusters. The ideal candidate will have a strong foundation in thermodynamics and fluid dynamics, along with a passion for data center cooling... 
    Suggested

    AMD

    Austin, TX
    1 day ago
  •  ...As a member of the Professional Services team, AI Architects help position, sell, and support our platform as the gold standard for AI...  ...are an expert at architecting and maintaining production-grade clusters in air-gapped, egress-constrained, or "high-side" disconnected... 
    Suggested
    Work experience placement
    Local area

    The Consortium

    Washington DC
    1 day ago
  •  ...Job Summary T he AI Interconnect Architect designs and engineers high-speed networking and communication systems for AI inference infrastructure...  ...of AI hardware architecture, including GPU/accelerator clusters and data center infrastructure. ~ Deep expertise in... 

    Compunnel

    Milpitas, CA
    4 days ago
  •  ...AI/ML Architect We are seeking an experienced AI/ML Architect with deep hands-on expertise in Databricks on AWS to lead the design and...  ...strong separation of compute and serving layers. Optimize cluster performance and jobs using Spark tuning, caching, and shuffle... 

    Yantran LLC

    Los Angeles, CA
    1 day ago
  •  ...AI/Gen AI Architect Location: Sunnyvale, CA (3x/ week onsite) Duration: 6 months • 8-15 years of experience in implementing AI/ML...  ...practices regarding LLM usage and Mainly in the Area of AIOps - Clustering| classification| Anomaly detection| capacity... 

    AceStack LLC

    Sunnyvale, CA
    5 days ago
  •  ...AI Platform Architect Austin, Texas, United States Graphcore is one of the world's leading innovators in Artificial Intelligence compute...  ...inference. By orchestrating everything from advanced clustering and distributed training frameworks down to the physical layer... 

    Graphcore

    Austin, TX
    3 days ago
  •  ...Job Title: Generative AI Platform Architect - Evinova Location: Gaithersburg, MD At AstraZeneca, we pride ourselves on crafting a collaborative...  ...the organization. Oversee GenAI-related Kubernetes (k8s) cluster management and provide expertise on alternative GenAI... 
    Hourly pay
    Temporary work
    Work at office
    Relocation
    3 days per week

    AstraZeneca

    Gaithersburg, MD
    3 days ago
  •  ...keep our world moving forward. Job Description An AI Interconnect Architect defines and engineers high-speed networking and communication...  ...Hardware Architecture: Familiarity with GPU/accelerator clusters and data center infrastructure ~ Deep, working knowledge... 
    Temporary work
    Remote work
    Flexible hours
    Shift work

    Sandisk

    Milpitas, CA
    9 days ago
  • NVIDIA Corporation is seeking a Senior HPC Architect to enhance GPU compute clusters. This role involves designing solutions for operationalizing NVIDIA products and collaborating closely with engineering teams. Ideal candidates should have over 8 years of experience in... 

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  •  ...Administrator to manage and optimize their Redis infrastructure. This role involves installing, configuring, and maintaining Redis clusters, as well as implementing high availability and disaster recovery strategies. The ideal candidate will have extensive hands-on... 

    Net2Source (N2S)

    New York, NY
    2 days ago
  •  ...Compute and Cloud LLC is seeking an HPC Kubernetes Solutions Architect to provide customer guidance in designing and integrating GPU-...  ...Responsibilities include architecting and operating Kubernetes clusters, developing integration strategies, and leading proof-of-concept... 

    NorthMark Compute and Cloud LLC

    Dallas, TX
    1 day ago
  • A technology startup specializing in AI infrastructure is seeking a Principal Deployment Engineer to lead the deployment of large-scale GPU clusters. This role entails defining technical standards, architecting high-performance network fabrics, and mentoring engineers.... 

    Nscale

    Seattle, WA
    1 day ago
  • $85k - $136k

     ...of impact. Job Description The Revenue AI Strategist is a transformative leader within...  ...role serves as the primary "Business Architect," responsible for vetting and defining the...  ...time-series analysis, gradient boosting, clustering, neural networks) Familiarity with cloud... 
    Shift work

    RR Donnelley

    Warrenville, IL
    4 days ago
  • $115k - $140k

     ...success of customers deploying GPU workloads. The role involves advising on GPU cluster design, optimizing performance, and ensuring cost-effective solutions. Requirements include 2–5+ years in AI/ML roles, strong knowledge of GPU architectures, and excellent communication... 

    Vultr

    New York, NY
    1 day ago
  • Majestic Labs ai is seeking a highly experienced SoC Architect - AI Acceleration to lead the architecture and integration of advanced compute subsystems. This role focuses on RISC-V-based compute clusters and optimization for AI workloads. The ideal candidate will have... 

    Majestic Labs ai

    Los Altos, CA
    3 days ago
  •  ...Overview We are looking for a Staff Software Architect to lead the technical vision and...  ...for our cloud-native platform and agentic AI capabilities. You will help shape how our...  ...infrastructure. This includes ECS, Kubernetes clusters (EKS), service mesh, API gateway strategy... 
    Work at office
    Local area
    Remote work
    Flexible hours

    Experian

    United States
    4 days ago
  •  ...Client-facing via NTT DATA) Primary Stack: Python, Azure (Azure AI Studio, Azure ML, Azure OpenAI) Day to Day job Duties (what...  ...orchestration) Develop classical ML models (risk scoring, prediction, clustering, anomaly detection) Implement HIPAA aware AI architectures with... 
    Remote work

    NTT Data Americas, Inc.

    Dallas, TX
    5 days ago
  •  ...Job Description: DataRobot delivers AI that maximizes impact and minimizes business...  ...of the Professional Services team, AI Architects help position, sell, and support the DataRobot...  ...and maintaining production-grade clusters in air-gapped, egress-constrained, or "high... 
    Work experience placement
    Local area
    Worldwide
    Flexible hours

    DataRobot

    Washington DC
    3 days ago
  • $62k - $141k

     ...Share job via: Share SAS Architect & System Administrator The Opportunity...  ...Identifiable Information (PII). Deploy, scale, cluster, and troubleshoot SAS environments...  ...identity and prevent fraud. Candidate AI Usage Policy AI is a part of our daily... 
    Full time
    Contract work
    Part time
    Work at office
    Local area
    Remote work

    Booz Allen Hamilton

    United States
    3 days ago
  •  ...their businesses. For more information, visit We are hiring a Architect, Data AI to lead the next generation of AI/ML across JAGGAER's Source...  ...machine learning models for prediction, classification, clustering, and time‑series analysis. Develop Generative AI and LLM‑powered... 
    Contract work
    Live in

    JAGGAER

    Raleigh, NC
    5 days ago
  •  ...Details Job Description We are seeking an experienced Silicon Architect to lead the definition and architectural development of compute...  ...ARM-based SoCs. This role focuses on high-performance compute cluster design, including ARM core complexes, CMN/CCN mesh interconnects... 
    Local area
    Shift work

    Intel Corporation

    Santa Clara, CA
    1 day ago
  • $122.65k - $170.34k

     ...AI Architect Director - Agentic Systems NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow...  ...orchestration) Develop classical ML models (risk scoring, prediction, clustering, anomaly detection) Implement HIPAA aware AI architectures... 
    Temporary work
    Work experience placement
    Remote work
    Flexible hours

    NTT DATA

    Dallas, TX
    5 days ago
  •  ...Database Engineer with cloud migration experience. The role is remote and requires strong expertise in Redis, including both standalone clustered and sentinel architectures. This position offers competitive compensation, aligning with market standards for mid-senior level... 
    Contract work
    Remote work

    GAC Solutions

    New York, NY
    1 day ago
  •  ...Job Title: Platform Architect/AWS solution Architect Location: Onsite (San Diego, CA or...  ...infrastructure in Amazon EKS, including cluster design, workload deployment, and security...  ...Infrastructure: Practical experience with AWS AI/ML services (SageMaker, Bedrock,... 
    Shift work

    Jobs via Dice

    San Diego, CA
    4 days ago
  • Intel Corporation is seeking an experienced Silicon Architect in Santa Clara, CA. In this role, you will lead the architectural development...  ...in networking architecture and memory subsystems and be proficient in compute cluster design. #J-18808-Ljbffr Intel Corporation

    Intel Corporation

    Santa Clara, CA
    1 day ago
  • A technology solutions company in Phoenix is seeking a MongoDB DBA specialist to configure and manage replica sets and sharded clusters. You will automate monitoring tasks using Python and shell scripting, and enhance database security to meet compliance standards. The... 

    Ethereum Technologies LLC

    Phoenix, AZ
    17 hours ago
  •  ...Fractional AI Architect (Consultant) Bangalore, Karnataka, India About the Job Fractional AI Architect (Consultant) Generative...  ...systems predictive analytics forecasting models clustering and segmentation pipelines. Assess the architecture supporting... 
    Contract work
    Part time
    Remote work

    Bridge IT

    United States
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Cluster Architect. Be the first to apply!