Principal Solutions Architect
ePlus
Principal Solutions Architect
San Ramon, CA
We are seeking an elite Solutions Architect to lead the end-to-end design, sizing, and deployment of NVIDIA AI Factory-aligned infrastructure. In this highly technical, customer-facing role you will translate complex AI and machine learning workload requirements into fully engineered infrastructure solutions spanning colocation facilities, GPU compute, high-performance networking, parallel storage, and the complete NVIDIA AI software stack.
You will serve as a trusted technical advisor to enterprise and hyperscale customers, partnering with sales, product, and engineering teams to win and deliver transformational AI infrastructure programs. Your expertise will directly shape how organizations build and operate production AI Factories capable of training frontier models, running large-scale inference fleets, and accelerating data science pipelines at scale.
Solution Design & Architecture
- Lead discovery workshops to capture AI/ML workload requirements, including model training scale, inference SLAs, data pipeline throughput, and multi-tenancy needs.
- Architect full-stack AI Factory solutions aligned to NVIDIA reference architectures, integrating colocation, GPU compute, networking, storage, and software layers.
- Develop detailed Bills of Materials (BOMs), rack elevation diagrams, network topology drawings, and power/cooling budgets for customer proposals.
- Define GPU cluster architectures using NVIDIA DGX, HGX, and MGX systems with B200, B300, and GB300 Blackwell SXM and NVLink-Switch configurations.
- Design RTX PRO 6000 Blackwell Server Edition deployments for inference-optimized and enterprise AI workloads.
- Conduct workload sizing and TCO/ROI modeling to validate infrastructure dimensioning for training, finetuning, and inference at scale.
Colocation & Facility Planning
- Specify colocation requirements including critical power load (MW-scale), UPS and generator configurations, and PUE targets.
- Design high-density GPU deployments utilizing air-cooled, direct liquid cooling (DLC), and rear-door heat exchanger configurations.
- Define meet-me room (MMR) and cross-connect requirements; specify carrier-neutral telecom diversity strategies.
- Engage colocation providers and data center operators to validate capacity availability and negotiate technical SLAs.
- Coordinate with facilities and MEP engineers to validate power infrastructure from utility feed through PDU to rack level.
GPU Compute Infrastructure
- Architect multi-node GPU clusters optimized for large language model (LLM) pre-training, fine-tuning, and reinforcement learning from human feedback (RLHF).
- Size and configure DGX SuperPOD, HGX H/B-series, and MGX modular systems based on model parameter count, dataset size, and iteration timelines.
- Define server firmware, BIOS, BMC, and DGXOS baselines for production GPU infrastructure.
- Establish GPU health monitoring, RAS (Reliability, Availability, Serviceability) policies, and lifecycle management procedures.
High-Performance Networking
- Design backend GPU fabric networks using NVIDIA Quantum InfiniBand (NDR 400Gb/s and HDR 200Gb/s) for distributed training traffic.
- Architect Spectrum-X Ethernet-based AI networking solutions for inference clusters requiring highbandwidth, low-latency connectivity.
- Specify ConnectX-8/7 HCA deployments and configure RDMA over Converged Ethernet (RoCEv2) or InfiniBand transport for NCCL collective operations.
- Integrate BlueField-3 DPUs for GPU-accelerated network functions, storage offload, zero-trust security isolation, and bare-metal provisioning.
- Design leaf-spine and fat-tree topologies for non-blocking bisectional bandwidth in GPU training clusters.
- Define Quality of Service (QoS) policies separating storage, compute fabric, and management plane traffic.
Parallel Storage Architecture
- Design high-performance parallel file system solutions using VAST Data, Hammerspace, and Pure Storage FlashBlade//E for AI training and checkpoint storage.
- Size storage capacity, IOPS, and throughput based on dataset characteristics, checkpoint frequency, and concurrent reader/writer counts.
- Architect multi-tier storage hierarchies: hot NVMe flash (VAST/FlashBlade) for active datasets, warm object storage for model archives, and cold tape/cloud for long-term retention.
- Configure VAST Data Universal Storage for disaggregated storage with NFS, S3, and POSIX access; tune for large sequential read performance.
- Deploy Hammerspace Global Data Environment for distributed data management and NFS-over-RDMA acceleration across geographically dispersed GPU clusters.
- Define data pipeline architectures ingesting from cloud object stores (S3, GCS, ABS) to local flash for GPUlocal data loading without I/O bottlenecks.
AI Software Stack & Orchestration
- Deploy and configure NVIDIA AI Enterprise (NVAIE) software stack including NVIDIA GPU Operator, NIM microservices, and RAPIDS accelerated data science libraries.
- Architect inference serving infrastructure using NVIDIA NIM (NVIDIA Inference Microservices) for optimized LLM and vision model deployment with autoscaling.
- Implement NVIDIA Dynamo for distributed inference and disaggregated serving of large-scale generative AI models.
- Configure and optimize CUDA toolkit, cuDNN, NCCL communication libraries, and custom kernel environments for training workloads.
- Deploy Base Command Manager and DGXOS for cluster lifecycle management, node provisioning, health dashboards, and job scheduling integration.
- Integrate NVIDIA Mission Control for AI Factory operations, observability, and multi-cluster fleet management.
- Design and deploy Kubernetes-based AI platforms using NVIDIA GPU Operator, integrating with Run:ai for dynamic GPU resource scheduling and multi-tenant workload isolation.
- Configure SLURM workload manager for traditional HPC-style job scheduling on bare-metal GPU clusters, including preemption policies, fair-share scheduling, and burst-to-cloud integration.
- Establish MLOps toolchain integrations with popular frameworks (PyTorch, JAX, TensorFlow) and experiment tracking platforms (MLflow, Weights & Biases).
Customer Engagement & Delivery
- Serve as primary technical point of contact throughout the pre-sales and delivery lifecycle, from initial discovery through post-deployment optimization.
- Produce and present architecture design documents, technical proposals, and executive-level briefings to CTO/CIO and VP-level stakeholders.
- Lead proof-of-concept (POC) and pilot deployments, including benchmark design, execution, and results analysis.
- Collaborate with procurement, logistics, and deployment teams to ensure on-time delivery of complex infrastructure programs.
- Provide post-deployment hypercare support, performance tuning, and capacity planning advisory services.
- Contribute to internal knowledge bases, solution playbooks, and reference architectures for repeatable AI Factory deployments.
Technology Stack
Candidates must demonstrate deep, hands-on expertise across the following technology domains:
GPU Compute - DGX B200 / B300, DGX H100 / H200, HGX B200 / B300, HGX H100 / H200,MGX platforms, GB300 NVL72 / GB200 NVL72, RTX PRO 6000 BlackwellServer Edition, NVLink Switch System, NVLink-C2C
Networking - NVIDIA Quantum InfiniBand (NDR 400G, HDR 200G), Spectrum-X Ethernet, ConnectX-8 / ConnectX-7 HCAs, BlueField-3 DPU, SHARP in-network computing, UFM Fabric Manager, RDMA / RoCEv2 / InfiniBand
Storage - VAST Data Universal Storage (NFS/S3/POSIX), Hammerspace Global Data Environment, Pure Storage FlashBlade//E (Evergreen//One), NFS-over-RDMA, parallel file systems (Lustre, GPFS/WEKA), S3-compatible object storage
AI Software - NVIDIA AI Enterprise (NVAIE), NIM Microservices, RAPIDS (cuDF, cuML, cuGraph), NVIDIA Dynamo, CUDA Toolkit, cuDNN, NCCL, TensorRT, Triton Inference Server
Cluster Mgmt - Base Command Manager, DGXOS, NVIDIA Mission Control, DGX Cloud, UFM, IPMI / Redfish BMC management
Orchestration - Kubernetes (K8
$105 - $115 per hour
...Solution Architect (Principal Enterprise Architecture- Product-to-Market) Contract (1 Year + Possible extension) San Francisco, CA or Pleasanton, CA (Onsite) Pay Rate -$105-115 an hour on W2 (DOE) Key Responsibilities: Assess the current P2M (Product-to...PrincipalContract workLocal area- ...A technology solutions provider is seeking a Solution Architect in San Ramon, CA. This role involves owning application architecture, developing technology roadmaps, and collaborating with stakeholders. Candidates should have a BA/BS degree and 7+ years of Enterprise...SuggestedWork at officeRemote work
$120k - $170k
...Sr Solutions Architect - Collaboration Presales San Ramon, CA Hybrid remote opportunity for candidates located in or near the Bay Area. Candidates must be local to accommodate on-site customer meetings. As a Senior Collaboration Presales Architect (Cisco Webex...SuggestedLocal areaRemote work$175k - $188.3k
...Trc Companies, Inc. is seeking a Lead Solution Architect in San Ramon, CA. This role involves leading the design and implementation of enterprise GIS solutions while driving business growth through client engagement and new opportunities. Candidates should have a Bachelor...Suggested- ...TechDigital Group, located in the United States, is seeking an AI Solution Architect to translate complex business challenges into viable AI/ML solutions. This role demands a deep understanding of AI methodologies, cloud platforms, and software engineering practices....Suggested
- ...n and implementation of the overall solution architecture comprising of conceptual (functional and non functional), technical and physical architecture. Demonstrate Thought Leadership towards white space solutions. Provide system & application level solutions framework...
- ...AI Solution Architect The AI Solution Architect will be responsible for translating complex business challenges into viable, scalable, and secure AI/ML solutions. This role requires a deep understanding of AI/ML methodologies, data architectures, cloud platforms, and...
$175k - $188.3k
...Lead Solution Architect TRC has long set the bar for clients who require more than just engineering, combining science with the latest technology to devise innovative solutions that stand the test of time. From pipelines to power plants, roadways to reservoirs, schoolyards...Full timeTemporary workPart timeLocal area- Job Posting Applicants will be screened based on relevant experience and training. Applications are accepted beginning January 3, 2023 and will be reviewed. Eligible applicants will be invited to a screening interview at one of our Recruitment Fairs. Invitation to the...Principal
- Job Posting Thank you for submitting your application on EdJoin. Please note that you will receive a follow-up email to complete the second part of the application process. No emails or phone calls, please. Applicants will be screened based on relevant experience & ...Principal
- Job Title Administrative Services Credential Multiple Subject Teaching Credential - General Subjects (or Single Subject) Requirements / Qualifications Due to the variety & combinations of applicable credentials, we do not list them all on our postings. Though...Principal
- ...Motion Recruitment Partners LLC is seeking a Senior Solutions Designer to join their healthcare client. This role involves collaborating with business stakeholders, providing technical guidance, and aligning technology with strategic objectives. The position offers a hybrid...Work at office
- ...Sr. Principal Architect – Platform Engineering (Hybrid, San Ramon, CA) We are seeking an exceptional Sr Principal Architect – Platform Engineering... ...teams to deliver reliable, secure, and maintainable solutions. Provide architectural guidance for platform extensions, migrations...PrincipalWork at officeLocal areaFlexible hours2 days per week
- ...Software Engineer V - Solution Architect page is loaded## Software Engineer V - Solution Architectlocations: San Ramon, CAtime type: Full timeposted on: Posted Todayjob requisition id: R0014972We are looking for a Solution Architect for a major utility provider’s Enterprise...Work at officeRemote work2 days per week1 day per week
- ...revolution, transforming industries throughcutting-edge digital solutions and next-generation AI. We empower businesses-and their... ...The Role We are looking to add a skilled Senior Solutions Architect to our team in Southern or Northern CA. As a member of the Pre...For contractors
- ...make us who we are and the work we do possible. Sr. Principal Architect - Platform Engineering-Hybrid San Ramon, CA We are... ...functional teams to deliver reliable, secure, and maintainable solutions Provide architectural guidance for platform extensions, migrations...PrincipalWork at officeLocal areaImmediate startFlexible hours2 days per week
$158k - $175k
...Job Description Title: Principal Project Manager - Transmission & Substation Location: Northern California (San Ramon), Reno, NV, or Las Vegas, NV Hire Type: Direct Hire Salary: $158,000 - $175,000 (based on education and experience) Benefits: Medical...PrincipalWork at office$147k - $190k
...We’re seeking a Solutions Architect that is excited to help guide customers in adopting Pulumi’s infrastructure-as-code platform, redefining how teams build and operate cloud infrastructure through real programming languages, automation, and developer-first workflows....Full timeLocal areaRemote workFlexible hours$138.14k - $186.5k
...kinds can tap into the world’s largest network of branded payment solutions. BHN helps businesses grow revenue, increase loyalty, motivate... ...Overview Blackhawk Network is seeking an exceptional Solution Architect- Observability to design and scale our enterprise...Full timeWork experience placementWork at officeLocal areaRemote workFlexible hours- ...Oracle Fusion Finance Solution Architect Pleasanton, United States | Posted on 06/10/2026 dotSolved, headquartered in Silicon Valley USA, is a leading global provider of business process automation, modern application engineering, and cloud infrastructure services. dotSolved...
- ...Architecture strategies, processes, methodologies and models. • Promotes the development of common reusable enterprise technology solutions while respecting the main principles of Domain Driver Design (DDD). • Assesses the immediate and long term strategic goals of...Immediate start
- ...Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking Solution Architect for one of our client, Please share your resume with current location & full contact info Job Title: Solution...1 day per week
- ...Job Title: Solution Architect Location: Pleasanton, CA 94588 or San Francisco, CA 94105 - (Onsite) Shift timings: 8:00 AM - 5:00 PM PST Duration: 12-month contract (with possible extension) Start Date: Targeting May 2026 Pay Rate: $1...Contract workShift work
$138.14k - $186.5k
...Blackhawk Network is seeking a Solution Architect- Observability to design and scale enterprise observability capabilities. This role involves defining observability architecture, implementing solutions, and enabling teams with observability capabilities while ensuring...Remote work$160k - $180k
...Field Service Management Solution Architect 1 day ago – Be among the first 25 applicants Company: Celerity Job Title: Solution Architect (Field Service Management) Salary: $160 - $180K annually Location: Remote (work‑from‑home eligible); travel may be...Full timeRemote workWork from homeWork visaFlexible hours$170.36k - $230k
...kinds can tap into the world’s largest network of branded payment solutions. BHN helps businesses grow revenue, increase loyalty, motivate... ...reshaping how we operate. We are looking for a hands-on AI Architect who can partner directly with business leaders to identify,...Full timeWork experience placementWork at officeLocal areaRemote workFlexible hours$155k
...Job Category: Engineering / Science Job Level: Manager/Principal Business Unit: Electric Engineering Work Type: Onsite... ...Electric (PG&E) has delivered innovative, value-driven technology solutions for over 40 years. With a team of more than 120 engineers, scientists...PrincipalTemporary work- ...A technology company is seeking a Principal Technologist specializing in Product Security for their cloud platform. This role involves defining security architecture, leading threat assessments, and driving secure configurations. The ideal candidate has over 15 years of...Principal
$151k
...166916 Job Category: Engineering / Science Job Level: Manager/Principal Business Unit: Electric Engineering Work Type: Onsite Job Location... ...a forward-thinking, technological leader providing high-value solutions and services needed across the Company. ATS high-value...PrincipalTemporary work$144k
Requisition ID # 172728 Job Category: Project / Program Management Job Level: Manager/Principal Business Unit: Strategy & Growth Work Type: Hybrid Job Location: Oakland; Alameda; Alta; American Canyon; Angels Camp; Antioch; Auberry; Auburn; Avenal; Avila Beach; Bakersfield...PrincipalWork experience placementWork at officeLocal areaRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Solutions Architect. Be the first to apply!
- senior cloud solutions architect San Ramon, CA
- solutions architect San Ramon, CA
- business solutions architect San Ramon, CA
- solution designer San Ramon, CA
- contact center solution architect San Ramon, CA
- principal San Ramon, CA
- senior principal cloud computing engineer San Ramon, CA
- cloud solutions architect
- entry level aws solution architect
- uipath solution architect


