Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Solutions Architect San Ramon, CA

$170k - $190k

ePlus

Principal Solutions Architect (Req#1048)

San Ramon, CA

Overview

We are seeking an elite Solutions Architect to lead the end-to-end design, sizing, and deployment of NVIDIA AI Factory‑aligned infrastructure. In this highly technical, customer‑facing role you will translate complex AI and machine learning workload requirements into fully engineered infrastructure solutions spanning colocation facilities, GPU compute, high‑performance networking, parallel storage, and the complete NVIDIA AI software stack.

You will serve as a trusted technical advisor to enterprise and hyperscale customers, partnering with sales, product, and engineering teams to win and deliver transformational AI infrastructure programs. Your expertise will directly shape how organizations build and operate production AI Factories capable of training frontier models, running large‑scale inference fleets, and accelerating data science pipelines at scale.

Your Impact
  • Lead discovery workshops to capture AI/ML workload requirements, including model training scale, inference SLAs, data pipeline throughput, and multi‑tenancy needs.
  • Architect full‑stack AI Factory solutions aligned to NVIDIA reference architectures, integrating colocation, GPU compute, networking, storage, and software layers.
  • Develop detailed Bills of Materials (BOMs), rack elevation diagrams, network topology drawings, and power/cooling budgets for customer proposals.
  • Define GPU cluster architectures using NVIDIA DGX, HGX, and MGX systems with B200, B300, and GB300 Blackwell SXM and NVLink‑Switch configurations.
  • Design RTX PRO 6000 Blackwell Server Edition deployments for inference‑optimized and enterprise AI workloads.
  • Conduct workload sizing and TCO/ROI modeling to validate infrastructure dimensioning for training, finetuning, and inference at scale.
Colocation & Facility Planning
  • Specify colocation requirements including critical power load (MW‑scale), UPS and generator configurations, and PUE targets.
  • Design high‑density GPU deployments utilizing air‑cooled, direct liquid cooling (DLC), and rear‑door heat exchanger configurations.
  • Define meet‑me room (MMR) and cross‑connect requirements; specify carrier‑neutral telecom diversity strategies.
  • Engage colocation providers and data center operators to validate capacity availability and negotiate technical SLAs.
  • Coordinate with facilities and MEP engineers to validate power infrastructure from utility feed through PDU to rack level.
GPU Compute Infrastructure
  • Architect multi‑node GPU clusters optimized for large language model (LLM) pre‑training, fine‑tuning, and reinforcement learning from human feedback (RLHF).
  • Size and configure DGX SuperPOD, HGX H/B‑series, and MGX modular systems based on model parameter count, dataset size, and iteration timelines.
  • Define server firmware, BIOS, BMC, and DGXOS baselines for production GPU infrastructure.
  • Establish GPU health monitoring, RAS (Reliability, Availability, Serviceability) policies, and lifecycle management procedures.
High‑Performance Networking
  • Design backend GPU fabric networks using NVIDIA Quantum InfiniBand (NDR 400Gb/s and HDR 200Gb/s) for distributed training traffic.
  • Architect Spectrum‑X Ethernet‑based AI networking solutions for inference clusters requiring highbandwidth, low‑latency connectivity.
  • Specify ConnectX‑8/7 HCA deployments and configure RDMA over Converged Ethernet (RoCEv2) or InfiniBand transport for NCCL collective operations.
  • Integrate BlueField‑3 DPUs for GPU‑accelerated network functions, storage offload, zero‑trust security isolation, and bare‑metal provisioning.
  • Design leaf‑spine and fat‑tree topologies for non‑blocking bisectional bandwidth in GPU training clusters.
  • Define Quality of Service (QoS) policies separating storage, compute fabric, and management plane traffic.
  • Design high‑performance parallel file system solutions using VAST Data, Hammerspace, and Pure Storage FlashBlade//E for AI training and checkpoint storage.
  • Size storage capacity, IOPS, and throughput based on dataset characteristics, checkpoint frequency, and concurrent reader/writer counts.
  • Architect multi‑tier storage hierarchies: hot NVMe flash (VAST/FlashBlade) for active datasets, warm object storage for model archives, and cold tape/cloud for long‑term retention.
  • Configure VAST Data Universal Storage for disaggregated storage with NFS, S3, and POSIX access; tune for large sequential read performance.
  • Deploy Hammerspace Global Data Environment for distributed data management and NFS‑over‑RDMA acceleration across geographically dispersed GPU clusters.
  • Define data pipeline architectures ingesting from cloud object stores (S3, GCS, ABS) to local flash for GPU‑local data loading without I/O bottlenecks.
AI Software Stack & Orchestration
  • Deploy and configure NVIDIA AI Enterprise (NVAIE) software stack including NVIDIA GPU Operator, NIM microservices, and RAPIDS accelerated data science libraries.
  • Architect inference serving infrastructure using NVIDIA NIM (NVIDIA Inference Microservices) for optimized LLM and vision model deployment with autoscaling.
  • Implement NVIDIA Dynamo for distributed inference and disaggregated serving of large‑scale generative AI models.
  • Configure and optimize CUDA toolkit, cuDNN, NCCL communication libraries, and custom kernel environments for training workloads.
  • Deploy Base Command Manager and DGXOS for cluster lifecycle management, node provisioning, health dashboards, and job scheduling integration.
  • Integrate NVIDIA Mission Control for AI Factory operations, observability, and multi‑cluster fleet management.
  • Design and deploy Kubernetes‑based AI platforms using NVIDIA GPU Operator, integrating with Run:ai for dynamic GPU resource scheduling and multi‑tenant workload isolation.
  • Configure SLURM workload manager for traditional HPC‑style job scheduling on bare‑metal GPU clusters, including preemption policies, fair‑share scheduling, and burst‑to‑cloud integration.
  • Establish MLOps toolchain integrations with popular frameworks (PyTorch, JAX, TensorFlow) and experiment tracking platforms (MLflow, Weights & Biases).
  • Serve as primary technical point of contact throughout the pre‑sales and delivery lifecycle, from initial discovery through post‑deployment optimization.
  • Produce and present architecture design documents, technical proposals, and executive‑level briefings to CTO/CIO and VP‑level stakeholders.
  • Lead proof‑of‑concept (POC) and pilot deployments, including benchmark design, execution, and results analysis.
  • Collaborate with procurement, logistics, and deployment teams to ensure on‑time delivery of complex infrastructure programs.
  • Provide post‑deployment hypercare support, performance tuning, and capacity planning advisory services.
  • Contribute to internal knowledge bases, solution playbooks, and reference architectures for repeatable AI Factory deployments.
Technical Domains

GPU Compute

  • DGX B200 / B300, DGX H100 / H200, HGX B200 / B300, HGX H100 / H200, MGX platforms, GB300 NVL72 / GB200 NVL72, RTX PRO 6000 Blackwell Server Edition, NVLink Switch System, NVLink‑C2C

Networking

  • NVIDIA Quantum InfiniBand (NDR 400G, HDR 200G), Spectrum‑X Ethernet, ConnectX-8 / ConnectX-7 HCAs, BlueField‑3 DPU, SHARP in‑network computing, UFM Fabric Manager, RDMA / RoCEv2 / InfiniBand

Storage

  • VAST Data Universal Storage (NFS/S3/POSIX), Hammerspace Global Data Environment, Pure Storage FlashBlade//E (Evergreen//One), NFS‑over‑RDMA, parallel file systems (Lustre, GPFS/WEKA), S3‑compatible object storage

AI Software

  • NVIDIA AI Enterprise (NVAIE), NIM Microservices, RAPIDS (cuDF, cuML, cuGraph), NVIDIA Dynamo, CUDA Toolkit, cuDNN, NCCL, TensorRT, Triton Inference Server

Orchestration

  • Base Command Manager, DGXOS, NVIDIA Mission Control, DGX Cloud, UFM, IPMI / Redfish BMC management

Colocation

Frameworks

  • PyTorch, JAX, TensorFlow, Hugging Face Transformers, DeepSpeed, Megatron‑LM, vLLM, LMDeploy
Qualifications
  • Bachelor's degree in Computer Science, Electrical Engineering, Computer Engineering, or a related technical discipline; Master's degree preferred.
  • 8+ years of solutions architecture, systems engineering, or technical pre‑sales experience, with at least 4 years focused on GPU infrastructure or HPC environments.
  • Proven track record designing and deploying NVIDIA DGX or HGX‑based GPU clusters in production AI/ML environments.
  • Deep understanding of distributed deep learning concepts: tensor parallelism, pipeline parallelism, data parallelism, gradient checkpointing, and mixed‑precision training.
  • Hands‑on experience with InfiniBand or high‑speed Ethernet fabric design, RDMA configuration, and collective communication tuning (NCCL, MPI).
  • Direct experience sizing and deploying parallel storage systems (VAST, Hammerspace, or Lustre/WEKA/GPFS) for AI training workloads.
  • Strong working knowledge of Kubernetes, GPU Operator, and at least one GPU workload scheduler (Run:ai or SLURM).
  • Experience with Linux system administration, CUDA development environment configuration, and GPU driver/firmware management.
  • Demonstrated ability to create compelling technical proposals, architecture diagrams (Visio/Lucidchart/draw.io), and BOM‑level documentation.
  • Exceptional communication skills with proven ability to present to both deep technical audiences and C‑level executives.
Preferred Qualifications
  • NVIDIA‑certified professional credentials (DCA‑Core, NCP‑DS, or equivalent).
  • Experience with NVIDIA Base Command Platform or Mission Control for multi‑cluster AI Factory operations.
  • Familiarity with sovereign AI, government cloud, or regulated industry AI infrastructure requirements.
  • Experience integrating AI Factory infrastructure with public cloud (AWS, Azure, GCP) for hybrid and burst‑to‑cloud architectures.
  • Background in MLOps, LLMOps, or platform engineering for production AI model lifecycle management.
  • Prior experience with colocation data center procurement, RFP development, and SLA negotiation.
  • Contributions to open‑source AI infrastructure projects or published technical content (blogs, whitepapers, conference presentations).
  • Active participation in the NVIDIA Partner Network (NPN) ecosystem or prior experience at an NVIDIA Elite Solution Provider.
Core Competencies

Technical Depth

End‑to‑end AI infrastructure expertise from silicon to software; ability to go deep on any layer of the stack.

Systems Thinking

Ability to reason holistically about performance, reliability, power, cost, and operability trade‑offs across complex integrated systems.

Customer Obsession

Relentless focus on understanding customer AI objectives and delivering solutions that accelerate time‑to‑value.

Executive Presence

Confidence and clarity when presenting complex technical architectures to senior business and technology leaders.

Analytical Rigor

Data‑driven approach to workload sizing, performance modeling, and TCO analysis with attention to detail.

Ability to lead cross‑functional pursuit teams, align internal stakeholders, and orchestrate complex delivery programs.

Position Specifics

The initial base salary range for this position is expected to be between $170,000 and $190,000 annually. The final base salary offered will be determined by multiple factors, including, but not limited to, job‑related knowledge, depth of experience, skills, certifications, and geographic location. In addition to the base salary, our compensation structure may include other components such as commissions and discretionary bonuses.

ePlus offers a full range of medical, financial, and other benefits (including 401(k) eligibility, employee stock purchase program and various paid time off benefits, such as vacation, sick time, and personal leave), dependent on the position offered. Details of participation in these benefit plans will be provided if an offer of employment is extended.

If hired, employee will be in an “at‑will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.

Position is presented by ePlus for legal compliance under the following designations: #LI‑DY1 #IND1

Equal Employment Opportunity

We are an equal‑to‑employment‑opportunity employer that does not discriminate or allow discrimination based on race, color, religion, sex, sexual orientation, gender identity, age, national origin, citizenship, disability, veteran status, or any other classification protected by federal, state, or local law. ePlus is dedicated to fostering, cultivating, and preserving a culture that represents diversity, enables inclusion, and makes our employees feel comfortable bringing their full, unique selves to work.

Physical Requirements
  • While performing this role, you will engage in both seated and occasional standing or walking activities. We provide reasonable accommodations, in accordance with relevant laws, to support success in this position.
  • By embracing our values, you will contribute to our collective mission… The job description is a guide, not an employment contract.
#J-18808-Ljbffr
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Principal Solutions Architect San Ramon, CA in San Ramon, CA vacancy
  •  ...Sr. Python Developer San Ramon, CA Sr. Python developer San Ramon, CA Primary: Python, SQL, PL/SQL, Snowflake, Data Analysis Secondary: AWS, Glue, Control-M, Informatica, Jira/Agile, Power Platform (nice to have) Soft Skills: Independent contributor, lead meetings... 
    Suggested

    ESR Healthcare

    San Ramon, CA
    1 day ago
  • $125k - $140k

    San Ramon, CA Overview As a Senior Business Development Executive, you will be responsible for developing and implementing strategies for...  ...internal teams to ensure that client needs are met and that solutions are effectively delivered. Negotiate contracts and... 
    Suggested
    Local area

    ePlus inc.

    San Ramon, CA
    3 days ago
  • $213.51k

     ...prototypes for client demos and presentations. Architect, design, and develop the workflow of...  .... Develop prototypes and innovative solutions by harnessing a diverse range of...  ...Bollinger Canyon Road, Suite #570, Building Y, San Ramon, CA 94583 Apply To apply please send... 
    Suggested
    Permanent employment
    Full time

    App Orchid Inc

    San Ramon, CA
    4 days ago
  • $164.82k - $219.76k

    CooperVision Limited in San Ramon, CA seeks a Clinical Data Management Lead to oversee all aspects of clinical data management from initiation to study closeout. The role requires strong leadership and compliance oversight, ensuring high quality standards and efficient... 
    Suggested
    Full time

    CooperVision Limited

    San Ramon, CA
    1 day ago
  • $134.2k - $163.27k

     ...growth within the larger Sonepar group. What you will do The Solutions Manager Lighting leads the Projects Team and is accountable for...  ...and ability to adjust focus. Location This role is based in San Leandro, CA. Work hours 8:00am - 5:00pm. Salary Range This is an exempt... 
    Suggested
    Temporary work
    For contractors
    Work at office
    Local area

    Neier Inc.

    San Leandro, CA
    3 days ago
  • $120k - $170k

     ...Sr Solutions Architect - Collaboration Presales San Ramon, CA Hybrid remote opportunity for candidates located in or near the Bay Area. Candidates must be local to accommodate on-site customer meetings. As a Senior Collaboration Presales Architect (Cisco Webex... 
    Local area
    Remote work

    ePlus

    San Ramon, CA
    1 day ago
  • A technology solutions provider is seeking a Solution Architect in San Ramon, CA. This role involves owning application architecture, developing technology roadmaps, and collaborating with stakeholders. Candidates should have a BA/BS degree and 7+ years of Enterprise Architecture... 
    Work at office
    Remote work

    Astreya Inc.

    San Ramon, CA
    2 days ago
  • $105 - $115 per hour

     ...Solution Architect (Principal Enterprise Architecture- Product-to-Market) Contract (1 Year + Possible extension) San Francisco, CA or Pleasanton, CA (Onsite) Pay Rate -$105-115 an hour on W2 (DOE) Key Responsibilities: Assess the current P2M (Product-to... 
    Principal
    Contract work
    Local area

    Pride Global

    Pleasanton, CA
    5 days ago
  • $175k - $188.3k

    Trc Companies, Inc. is seeking a Lead Solution Architect in San Ramon, CA. This role involves leading the design and implementation of enterprise GIS solutions while driving business growth through client engagement and new opportunities. Candidates should have a Bachelor... 

    Trc Companies, Inc.

    San Ramon, CA
    14 hours ago
  •  ...GSPANN Technologies, Inc is seeking a WMS Solutions Architect to work onsite in San Ramon, CA. The role requires strong expertise in WMS product technical configuration and integration with various systems. You will solve complex problems, develop consulting practices,... 

    GSPANN Technologies

    San Ramon, CA
    3 days ago
  • $166.5k - $250k

     ...PTC is looking for a Sr. Principal Software Engineer – Platform Engineering located in San Ramon, CA. This hybrid role focuses on designing and building scalable platform...  ...engineers, and ensuring high-quality platform solutions. This role offers an annual salary range of... 
    Principal

    PTC

    San Ramon, CA
    3 days ago
  •  ...Are you warm, friendly, and enjoy helping seniors? We are seeking Helpers and caregivers for private, non-medical senior care in San Ramon, California, 94583 . This role involves assisting with easy, everyday tasks and providing companionship. Ideal for gig workers looking... 
    Full time
    Temporary work
    Part time
    Flexible hours

    Within

    San Ramon, CA
    1 day ago
  •  ...Sr. Principal Architect – Platform Engineering (Hybrid, San Ramon, CA) We are seeking an exceptional Sr Principal Architect – Platform Engineering to lead the architecture...  ...to deliver reliable, secure, and maintainable solutions. Provide architectural guidance for platform... 
    Principal
    Work at office
    Local area
    Flexible hours
    2 days per week

    PTC

    San Ramon, CA
    3 days ago
  •  ...transforming industries throughcutting-edge digital solutions and next-generation AI. We empower businesses-and their...  ...We are looking to add a skilled Senior Solutions Architect to our team in Southern or Northern CA. As a member of the Pre-Sales Engineering Team, you... 
    For contractors

    Presidio

    Pleasanton, CA
    2 days ago
  •  ...we are and the work we do possible. Sr. Principal Architect - Platform Engineering-Hybrid San Ramon, CA We are seeking an exceptional Sr Principal...  ...teams to deliver reliable, secure, and maintainable solutions Provide architectural guidance for platform... 
    Principal
    Work at office
    Local area
    Immediate start
    Flexible hours
    2 days per week

    PTC

    San Ramon, CA
    1 day ago
  • Software Engineer V - Solution Architect page is loaded## Software Engineer V - Solution Architectlocations: San Ramon, CAtime type: Full timeposted on: Posted Todayjob requisition...  ...working from your remote office and in Oakland, CA approximately 1 - 2 days per week, or more... 
    Work at office
    Remote work
    2 days per week
    1 day per week

    Astreya Inc.

    San Ramon, CA
    2 days ago
  • $190.72k - $286.08k

     ...Principal Cyber Security Engineer Location: San Ramon, California, United States Cloud Software Group Corporate Key...  ...ranges: $190,720-$286,080 CA generally ranges: $199,012-$298,5...  ...one of the world's largest cloud solution providers, serving more than 100... 
    Principal
    Local area

    Cloud

    San Ramon, CA
    1 day ago
  • $128k - $190k

     ...Chain  Job Level: Manager/Principal Business Unit: Engineering...  ...Hybrid Job Location: Oakland; San Ramon    Department Overview...  ...mutually beneficial total value solutions for goods and services. The...  ...Preferred work location is Oakland, CA, but other locations may be... 
    Principal
    Contract work
    Work experience placement
    Work at office
    Remote work

    PG&E Corporation

    San Ramon, CA
    4 days ago
  • $158k - $175k

     ...Job Description Title: Principal Project Manager - Transmission & Substation Location: Northern California (San Ramon), Reno, NV, or Las Vegas, NV Hire Type: Direct Hire Salary: $158,000 - $175,000 (based on education and experience) Benefits: Medical... 
    Principal
    Work at office

    Sterling Engineering

    San Ramon, CA
    3 days ago
  • $83.8k - $108.93k

    Back Senior Financial Analyst, Sales Finance #1944 San Leandro, California, United States X Facebook LinkedIn Email Copy Location San Leandro, California, United States Employment Type Regular Our Company/Job Summary This high profile, growth-oriented, and challenging corporate... 

    Ghirardelli Chocolate

    San Leandro, CA
    3 days ago
  • $160k - $240k

     ...unique contributions are valued and drive innovative solutions to meet the needs of our patients, care partners, families...  ...The assigned territory for this role encompasses San Francisco, San Jose, and Walnut Creek CA. Candidates must live within commuting distance of... 
    Full time
    Private practice
    Work at office
    Remote work
    Flexible hours

    Genmab A/S

    Walnut Creek, CA
    1 day ago
  • $138.14k - $186.5k

     ...Solution Architect - Observability ID 2026-26494 Category Technology Position Type Full Time Location : Location US-CA-Pleasanton About Blackhawk Network: Today, through BHN’s single global...  ...all positions and, pursuant to the San Francisco and Los Angeles Fair... 
    Full time
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours

    Fyrfly

    Pleasanton, CA
    4 days ago
  •  ...Job Title: Solution Architect Location: Pleasanton, CA 94588 or San Francisco, CA 94105 - (Onsite) Shift timings: 8:00 AM - 5:00 PM PST Duration: 12-month contract (with possible extension) Start Date: Targeting May 2026 Pay Rate: $1... 
    Contract work
    Shift work

    Tailored Management

    Pleasanton, CA
    1 day ago
  •  ...services in the US. We are actively seeking Solution Architect for one of our client, Please share your...  ...Location: Hybrid in Dublin, CA ; 1 day a week on-site Job Description...  ...demonstrated experience in: o Enterprise SAN and Backup Technologies... 
    1 day per week

    Rootshell Enterprise Technologies

    Dublin, CA
    5 days ago
  • $200k - $230k

     ...Solutions Architect Beverly Hills, California, United States Mindful movement. It's at the core of why we do what we do at ALO—it's our...  ...require you to work onsite at either our Beverly Hills or Sam Ramon office. The base salary range for this position is $20... 
    Work at office

    ALO Yoga

    San Ramon, CA
    2 days ago
  • $151k

     ...Engineering / Science  Job Level: Manager/Principal Business Unit: Electric Engineering Work Type: Onsite Job Location: San Ramon    Department Overview Applied...  ...technological leader providing high-value solutions and services needed across the Company.... 
    Principal
    Temporary work

    PG&E Corporation Careers

    Danville, CA
    2 days ago
  •  ...and more exclusive features. Now Hiring: Identity Security Solutions Engineer (U.S. Remote) Redblock is reimagining how...  ...by 2x Get notified about new Solutions Engineer jobs in San Ramon, CA . San Francisco, CA $170,000 - $200,000 1 day ago San... 
    Full time
    Immediate start
    Remote work

    Redblock

    San Ramon, CA
    14 days ago
  • $144k

     ...Program Management  Job Level: Manager/Principal Business Unit: Energy Delivery Work...  ...Round Mountain; Sacramento; Salida; Salinas; San Bruno; San Carlos; San Francisco; San...  ...Luis Obispo; San Mateo; San Rafael; San Ramon; San Ramon; Sanger; Santa Cruz; Santa Maria... 
    Principal
    Work at office
    Remote work

    PG&E Corporation

    San Ramon, CA
    5 days ago
  •  ...IBM Certified enterprise. We guarantee you the best rate for your skills and performance. Job Description Position: Solution Engineer Location: San ramon, CA Duration:8 Months Summary: seeking a Solutions Engineer to deploy cutting-edge technologies to optimize and enrich... 

    SA TECHNOLOGIES

    San Ramon, CA
    4 days ago
  • $136k

     ...Assurance  Job Level: Manager/Principal Business Unit: Strategy &...  ...Sacramento; Salida; Salinas; San Bruno; San Carlos; San Francisco...  ...; San Mateo; San Rafael; San Ramon; San Ramon; Sanger; Santa Cruz...  ...pressure-test and refine compliance solutions and present test findings and... 
    Principal
    Work experience placement
    Work at office
    Flexible hours
    2 days per week
    3 days per week

    PG&E Corporation

    San Ramon, CA
    7 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Solutions Architect San Ramon, CA. Be the first to apply!