AI Platform Engineer, Training and Inference

$240k - $260k

Saviynt

AI Platform Engineer - Training & Inference

Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit

The AI Platform team is building the compute layer that trains, evaluates, and serves every AI model at Saviynt. We need an ML Platform Engineer to own distributed training on Ray + H100s, the multi-engine LLM inference mesh (vLLM, SGLang, NVIDIA Triton), and the full model promotion lifecycle - from shadow mode through canary rollout to GA.

The AI Platform team's mission is to build a secure, scalable, product-agnostic AI foundation that enables Saviynt's identity products to deliver measurable AI-powered outcomes. Training & Inference is the engine - it turns data into deployed models that make Saviynt's products smarter.

What You Will Be Doing

• Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
• Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
• Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
• Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
• Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
• Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS

• Operate the full model promotion lifecycle: quality gate - integration tests - load tests (k6) - shadow mode - A/B gate - canary (10%-100%) with golden-signal auto-rollback

• Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 - 2%), and automated Flyte DAG through to canary
• Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference

What You Bring

• Experience in ML engineering with time in an ML platform or MLOps role
• Production Ray depth: Ray Train, Serve, Core, and Data - debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
• LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton - PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
• Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
• RL working knowledge: PPO, policy gradient, or RLHF - able to translate an algorithm into distributed compute primitives

• Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation

• Vector databases: Pgvector or Qdrant - ANN index strategies, embedding upsert, and query latency tuning under inference load
• Strong Python and PyTorch; Flyte or equivalent ML orchestrator
• Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
• Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs.

You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

$240,000 - $260,000 a year

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including but are not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. A reasonable estimate of the current range is $240,000 - $260,000 annually.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the AI Platform Engineer, Training and Inference in Milpitas, CA vacancy

AI Platform Security Engineer Hybrid
$128.4k - $172.3k
...Join Cisco's Enterprise AI Team Join Cisco's... ...build secure, scalable AI platforms that empower teams to... ...—partnering across engineering, security, compliance,... ...sensitive data, models, and inference endpoints. Partner... ..., and/or training. The full salary range...
Training
Full time
Temporary work
Local area
Flexible hours
Webex Events (formerly Socio)
San Jose, CA
1 day ago
Senior Lead AI Engineer (FM Hosting, LLM Inference)
$229.9k - $262.4k
...Senior Lead AI Engineer (FM Hosting, LLM Inference) Overview: At Capital One, we are creating responsible... ...of customers. Our AI models and platforms empower teams across Capital One to... ...components including foundation model training, large language model inference,...
Training
Full time
Part time
Local area
Capital One
San Jose, CA
4 days ago
Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)
$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems, changing... ...AI software components including foundation model training, large language model inference, similarity search...
Training
Full time
Part time
Local area
Capital One Financial Corp
San Jose, CA
3 days ago
Distinguished AI Engineer (Agentic AI Platform)
$269.1k - $307.2k
...Distinguished AI Engineer (Agentic AI Platform) At Capital One, we are creating responsible and reliable... ...or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails... ...of-the-art techniques for optimizing training and inference software to improve...
Training
Full time
Part time
Work at office
Local area
Capital One Financial Corp
San Jose, CA
7 days ago
Senior Lead AI Engineer (GenAI Platform Services)
$229.9k - $262.4k
...Senior Lead AI Engineer (GenAI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems,... ...AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation,...
Training
Local area
Comfort Systems USA
San Jose, CA
1 day ago
Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI)
$229.9k - $262.4k
...Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI) Overview: At Capital One, we are creating responsible and... ...AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation,...
Training
Full time
Part time
Local area
Capital One Financial Corp
San Jose, CA
20 hours ago
Embodied AI Engineer
Title: Embodied AI Engineer About Us: UnitX builds the world's leading physical AI systems... ...grounds for our fleet, but also design, train, and deploy the advanced machine learning... ...and optimizing ML models for real‑time inference on robotic hardware (e.g., NVIDIA Jetson...
Training
UnitX
Milpitas, CA
1 day ago
Remote Principal AI Engineer - Scalable ML Platform
...leading automotive company is seeking a Principal AI Engineer to lead the design and optimization of its AI platform. The successful candidate will guide the infrastructure for large-scale training and cloud inference, working closely with data scientists and engineers...
Training
Remote job
General Motors
Sunnyvale, CA
1 day ago
Edge AI Engineer for Embedded ML & Inference
$110k - $300k
...are redefining the future of AI with our groundbreaking innovations... .... Our talented team of engineers and industry-leading executives... ...ML models on embedded platforms, including FPGA and custom ASIC... ...embedded AI applications. Improve inference efficiency and model...
TetraMem Inc
San Jose, CA
4 days ago
Software Engineer, Inference Platform
...Systems builds the world's largest AI chip, 56 times larger than... ...to deliver industry-leading training and inference speeds; over 10 times faster... ...Role We’re hiring a Software Engineer to help contribute to projects on our Inference Platform team. Our team primarily owns...
Training
Cerebras Systems, Inc.
Sunnyvale, CA
9 hours ago
Staff Software Engineer, Inference Platform
...builds the world's largest AI chip, 56 times larger than GPUs... ...to deliver industry-leading training and inference speeds; over 10 times faster... ...Role We're hiring a Staff Engineer to help lead, drive, and contribute... ...projects on our Inference Platform team. Our team primarily...
Training
Cerebras Systems, Inc.
Sunnyvale, CA
9 hours ago
Senior AI Platform Engineer
$172.5k - $306.63k
...to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging,... ...orchestration, tool integration, memory systems, inference services, data flows, evaluation loops, and...
Local area
Dormont Manufacturing Company
San Jose, CA
3 days ago
AI DevOps Engineer
...IT Consulting services in the US. We are actively seeking AI DevOps Engineer for one of our client, Please share your resume with... ...TFX) • Solid understanding of computer algorithms, AI training, inference, and AI powered use cases • Good to have infrastructure...
Training
Rootshell Enterprise Technologies
Santa Clara, CA
20 hours ago
Senior AI Platform Engineer
...to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging,... ...and persistent memory. Develop high‑performance inference and runtime systems with strong guarantees...
Adobe
San Jose, CA
2 days ago
Senior AI Platform Engineer
$172.5k - $306.63k
...Staff Engineer - AI For Engineering Adobe empowers individuals and organizations to create... ...builds a scalable, production-grade AI platform that powers creativity across design,... ...orchestration, tool integration, memory systems, inference services, data flows, evaluation loops,...
Temporary work
Local area
Worldwide
Adobe
San Jose, CA
2 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV deploys... ...machine learning models from training frameworks (e.g. PyTorch)... ...fold: build the ML deployment platform that makes model rollouts fast... ...currently performed manually by engineers. Build the developer...
Training
Local area
Remote work
Flexible hours
Shift work
General Motors
Mountain View, CA
1 day ago
AI Engineer
...AI Engineer Opportunity Hope you are doing well Number of Position: 2 Only W2 I Abhishek... ..., PyTorch ). Experience with model training, tuning, and evaluation. Knowledge of NVIDIA... .... Understanding of generative AI and inference engines. Responsibilities: Preparing and...
Training
Work visa
Syntricate Technologies
Santa Clara, CA
1 day ago
AI Inference Performance Engineer
Cerebras Systems, Inc. is seeking engineers for its Inference Core Platform group in Sunnyvale, California. This role involves building foundational software and hardware infrastructure to enhance AI inference performance on the Cerebras Wafer-Scale Engine. Ideal candidates...
Cerebras Systems, Inc.
Sunnyvale, CA
9 hours ago
AI Inference Performance Engineer
$152k - $241.5k
We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry... ...at the intersection of GPU performance engineering and public accountability. What You Will... ..., agentic workflows, and other emerging AI use cases. Collaborate with framework...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior AI Inference Performance Engineer
...Systems, Inc. is looking for a Senior Performance Engineer to enhance the performance benchmarking and competitive pricing models for their AI chip. The ideal candidate will have extensive experience with open-source inference frameworks and an understanding of ML systems....
Cerebras Systems, Inc.
Sunnyvale, CA
9 hours ago
High-Performance AI Inference Engineer (TensorRT)
$124k - $195.5k
NVIDIA Gruppe is looking for a passionate Software Engineer to join its TensorRT team in Santa Clara, California. This role involves designing and developing high-performance AI inference solutions while contributing to performance optimizations and collaborating with...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior AI Inference Performance Engineer (GPU/Cluster)
$152k - $241.5k
NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior AI Inference Kernel Engineer
$184k - $287.5k
NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior GPU AI Inference Engineer - Triton & Dynamo
A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative...
NVIDIA Corporation
Santa Clara, CA
3 days ago
Senior AI Inference Engineer - High-Performance LLM Serving
$152k - $241.5k
NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior AI Kernel & Inference Engineer
A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In this role, you will... ...innovate and develop groundbreaking AI systems software for inference applications including deep learning framework...
NVIDIA
Santa Clara, CA
4 days ago
Senior AI Engineer
$159.5k - $271.2k
...expert teams of physicists, engineers, data scientists and problem-... ...passionate and motivated Senior AI Engineer with experience... ...~ Experience with LLM pre-training is optional, but a significant... ...~ Understanding of cloud platforms and MLOps for scalable AI deployment...
Training
Minimum wage
Work experience placement
Flexible hours
KLA
Milpitas, CA
1 day ago
Senior AI Engineer
$209k
...Machine Learning Platform Engineer Immigration sponsorship is not available... ...for distributed model training and hyperparameter optimization... ...the auto scale for inference service and multi-models for... ...tolerant, and resource-efficient AI workloads across multi-node...
Training
Work at office
Remote work
1 day per week
Zoom Video Communications
San Jose, CA
1 day ago
Principal AI Engineer
$175.8k - $293k
...'re looking for a Principal AI Engineer to architect, build, and harden... ...orchestration runtimes, and inference serving. Evaluate and adopt... .../GRPO), eval/observability platforms and bridging applied... ...skill sets; experience and training, licensure, and certifications...
Training
BMC Software
Santa Clara, CA
2 days ago
Senior Lead AI Engineer(MLX, Agentic AI, Gen AI platform Services)
$229.9k - $262.4k
...Senior Lead AI Engineer(MLX, Agentic AI, Gen AI platform Services) Overview At Capital One, we are creating responsible and reliable AI... ...software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation...
Training
Local area
Capital One National Association
San Jose, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Platform Engineer, Training and Inference. Be the first to apply!