AI Platform Engineer, Training and Inference
$240k - $260kSaviynt
AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit The AI Platform team is building the compute layer that trains, evaluates, and serves every AI model at Saviynt. We need an ML Platform Engineer to own distributed training on Ray + H100s, the multi-engine LLM inference mesh (vLLM, SGLang, NVIDIA Triton), and the full model promotion lifecycle - from shadow mode through canary rollout to GA. The AI Platform team's mission is to build a secure, scalable, product-agnostic AI foundation that enables Saviynt's identity products to deliver measurable AI-powered outcomes. Training & Inference is the engine - it turns data into deployed models that make Saviynt's products smarter. What You Will Be Doing • Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
• Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
• Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
• Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
• Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
• Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS • Operate the full model promotion lifecycle: quality gate - integration tests - load tests (k6) - shadow mode - A/B gate - canary (10%-100%) with golden-signal auto-rollback • Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 - 2%), and automated Flyte DAG through to canary
• Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference What You Bring • Experience in ML engineering with time in an ML platform or MLOps role
• Production Ray depth: Ray Train, Serve, Core, and Data - debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
• LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton - PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
• Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
• RL working knowledge: PPO, policy gradient, or RLHF - able to translate an algorithm into distributed compute primitives • Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation • Vector databases: Pgvector or Qdrant - ANN index strategies, embedding upsert, and query latency tuning under inference load
• Strong Python and PyTorch; Flyte or equivalent ML orchestrator
• Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
• Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance. $240,000 - $260,000 a year We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including but are not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. A reasonable estimate of the current range is $240,000 - $260,000 annually. We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
• Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
• Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
• Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
• Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
• Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS • Operate the full model promotion lifecycle: quality gate - integration tests - load tests (k6) - shadow mode - A/B gate - canary (10%-100%) with golden-signal auto-rollback • Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 - 2%), and automated Flyte DAG through to canary
• Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference What You Bring • Experience in ML engineering with time in an ML platform or MLOps role
• Production Ray depth: Ray Train, Serve, Core, and Data - debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
• LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton - PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
• Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
• RL working knowledge: PPO, policy gradient, or RLHF - able to translate an algorithm into distributed compute primitives • Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation • Vector databases: Pgvector or Qdrant - ANN index strategies, embedding upsert, and query latency tuning under inference load
• Strong Python and PyTorch; Flyte or equivalent ML orchestrator
• Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
• Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance. $240,000 - $260,000 a year We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including but are not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. A reasonable estimate of the current range is $240,000 - $260,000 annually. We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the AI Platform Engineer, Training and Inference in Milpitas, CA vacancy
$128.4k - $172.3k
...Join Cisco's Enterprise AI Team Join Cisco's... ...build secure, scalable AI platforms that empower teams to... ...—partnering across engineering, security, compliance,... ...sensitive data, models, and inference endpoints. Partner... ..., and/or training. The full salary range...TrainingFull timeTemporary workLocal areaFlexible hours$229.9k - $262.4k
...Senior Lead AI Engineer (FM Hosting, LLM Inference) Overview: At Capital One, we are creating responsible... ...of customers. Our AI models and platforms empower teams across Capital One to... ...components including foundation model training, large language model inference,...TrainingFull timePart timeLocal area$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems, changing... ...AI software components including foundation model training, large language model inference, similarity search...TrainingFull timePart timeLocal area$269.1k - $307.2k
...Distinguished AI Engineer (Agentic AI Platform) At Capital One, we are creating responsible and reliable... ...or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails... ...of-the-art techniques for optimizing training and inference software to improve...TrainingFull timePart timeWork at officeLocal area$229.9k - $262.4k
...Senior Lead AI Engineer (GenAI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems,... ...AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation,...TrainingLocal area$229.9k - $262.4k
...Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI) Overview: At Capital One, we are creating responsible and... ...AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation,...TrainingFull timePart timeLocal area- Title: Embodied AI Engineer About Us: UnitX builds the world's leading physical AI systems... ...grounds for our fleet, but also design, train, and deploy the advanced machine learning... ...and optimizing ML models for real‑time inference on robotic hardware (e.g., NVIDIA Jetson...Training
- ...leading automotive company is seeking a Principal AI Engineer to lead the design and optimization of its AI platform. The successful candidate will guide the infrastructure for large-scale training and cloud inference, working closely with data scientists and engineers...TrainingRemote job
$110k - $300k
...are redefining the future of AI with our groundbreaking innovations... .... Our talented team of engineers and industry-leading executives... ...ML models on embedded platforms, including FPGA and custom ASIC... ...embedded AI applications. Improve inference efficiency and model...- ...Systems builds the world's largest AI chip, 56 times larger than... ...to deliver industry-leading training and inference speeds; over 10 times faster... ...Role We’re hiring a Software Engineer to help contribute to projects on our Inference Platform team. Our team primarily owns...Training
- ...builds the world's largest AI chip, 56 times larger than GPUs... ...to deliver industry-leading training and inference speeds; over 10 times faster... ...Role We're hiring a Staff Engineer to help lead, drive, and contribute... ...projects on our Inference Platform team. Our team primarily...Training
$172.5k - $306.63k
...to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging,... ...orchestration, tool integration, memory systems, inference services, data flows, evaluation loops, and...Local area- ...IT Consulting services in the US. We are actively seeking AI DevOps Engineer for one of our client, Please share your resume with... ...TFX) • Solid understanding of computer algorithms, AI training, inference, and AI powered use cases • Good to have infrastructure...Training
- ...to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging,... ...and persistent memory. Develop high‑performance inference and runtime systems with strong guarantees...
$172.5k - $306.63k
...Staff Engineer - AI For Engineering Adobe empowers individuals and organizations to create... ...builds a scalable, production-grade AI platform that powers creativity across design,... ...orchestration, tool integration, memory systems, inference services, data flows, evaluation loops,...Temporary workLocal areaWorldwide$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV deploys... ...machine learning models from training frameworks (e.g. PyTorch)... ...fold: build the ML deployment platform that makes model rollouts fast... ...currently performed manually by engineers. Build the developer...TrainingLocal areaRemote workFlexible hoursShift work- ...AI Engineer Opportunity Hope you are doing well Number of Position: 2 Only W2 I Abhishek... ..., PyTorch ). Experience with model training, tuning, and evaluation. Knowledge of NVIDIA... .... Understanding of generative AI and inference engines. Responsibilities: Preparing and...TrainingWork visa
- Cerebras Systems, Inc. is seeking engineers for its Inference Core Platform group in Sunnyvale, California. This role involves building foundational software and hardware infrastructure to enhance AI inference performance on the Cerebras Wafer-Scale Engine. Ideal candidates...
$152k - $241.5k
We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry... ...at the intersection of GPU performance engineering and public accountability. What You Will... ..., agentic workflows, and other emerging AI use cases. Collaborate with framework...- ...Systems, Inc. is looking for a Senior Performance Engineer to enhance the performance benchmarking and competitive pricing models for their AI chip. The ideal candidate will have extensive experience with open-source inference frameworks and an understanding of ML systems....
$124k - $195.5k
NVIDIA Gruppe is looking for a passionate Software Engineer to join its TensorRT team in Santa Clara, California. This role involves designing and developing high-performance AI inference solutions while contributing to performance optimizations and collaborating with...$152k - $241.5k
NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant...$184k - $287.5k
NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves...- A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative...
$152k - $241.5k
NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...- A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In this role, you will... ...innovate and develop groundbreaking AI systems software for inference applications including deep learning framework...
$159.5k - $271.2k
...expert teams of physicists, engineers, data scientists and problem-... ...passionate and motivated Senior AI Engineer with experience... ...~ Experience with LLM pre-training is optional, but a significant... ...~ Understanding of cloud platforms and MLOps for scalable AI deployment...TrainingMinimum wageWork experience placementFlexible hours$209k
...Machine Learning Platform Engineer Immigration sponsorship is not available... ...for distributed model training and hyperparameter optimization... ...the auto scale for inference service and multi-models for... ...tolerant, and resource-efficient AI workloads across multi-node...TrainingWork at officeRemote work1 day per week$175.8k - $293k
...'re looking for a Principal AI Engineer to architect, build, and harden... ...orchestration runtimes, and inference serving. Evaluate and adopt... .../GRPO), eval/observability platforms and bridging applied... ...skill sets; experience and training, licensure, and certifications...Training$229.9k - $262.4k
...Senior Lead AI Engineer(MLX, Agentic AI, Gen AI platform Services) Overview At Capital One, we are creating responsible and reliable AI... ...software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation...TrainingLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Platform Engineer, Training and Inference. Be the first to apply!

