Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Platform Engineer, Training and Inference

$240k - $260k

Saviynt

AI Platform Engineer - Training & Inference

Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit

The AI Platform team is building the compute layer that trains, evaluates, and serves every AI model at Saviynt. We need an ML Platform Engineer to own distributed training on Ray + H100s, the multi-engine LLM inference mesh (vLLM, SGLang, NVIDIA Triton), and the full model promotion lifecycle - from shadow mode through canary rollout to GA.

The AI Platform team's mission is to build a secure, scalable, product-agnostic AI foundation that enables Saviynt's identity products to deliver measurable AI-powered outcomes. Training & Inference is the engine - it turns data into deployed models that make Saviynt's products smarter.

What You Will Be Doing

• Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
• Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
• Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
• Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
• Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
• Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS

• Operate the full model promotion lifecycle: quality gate - integration tests - load tests (k6) - shadow mode - A/B gate - canary (10%-100%) with golden-signal auto-rollback

• Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 - 2%), and automated Flyte DAG through to canary
• Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference

What You Bring

• Experience in ML engineering with time in an ML platform or MLOps role
• Production Ray depth: Ray Train, Serve, Core, and Data - debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
• LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton - PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
• Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
• RL working knowledge: PPO, policy gradient, or RLHF - able to translate an algorithm into distributed compute primitives

• Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation

• Vector databases: Pgvector or Qdrant - ANN index strategies, embedding upsert, and query latency tuning under inference load
• Strong Python and PyTorch; Flyte or equivalent ML orchestrator
• Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
• Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs.

You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

$240,000 - $260,000 a year

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including but are not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. A reasonable estimate of the current range is $240,000 - $260,000 annually.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the AI Platform Engineer, Training and Inference in Milpitas, CA vacancy
  • $128.4k - $172.3k

     ...Join Cisco's Enterprise AI Team Join Cisco's...  ...build secure, scalable AI platforms that empower teams to...  ...—partnering across engineering, security, compliance,...  ...sensitive data, models, and inference endpoints. Partner...  ..., and/or training. The full salary range... 
    Training
    Full time
    Temporary work
    Local area
    Flexible hours

    Webex Events (formerly Socio)

    San Jose, CA
    1 day ago
  • $229.9k - $262.4k

     ...Senior Lead AI Engineer (FM Hosting, LLM Inference) Overview: At Capital One, we are creating responsible...  ...of customers. Our AI models and platforms empower teams across Capital One to...  ...components including foundation model training, large language model inference,... 
    Training
    Full time
    Part time
    Local area

    Capital One

    San Jose, CA
    4 days ago
  • $229.9k - $262.4k

     ...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems, changing...  ...AI software components including foundation model training, large language model inference, similarity search... 
    Training
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Jose, CA
    3 days ago
  • $269.1k - $307.2k

     ...Distinguished AI Engineer (Agentic AI Platform) At Capital One, we are creating responsible and reliable...  ...or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails...  ...of-the-art techniques for optimizing training and inference software to improve... 
    Training
    Full time
    Part time
    Work at office
    Local area

    Capital One Financial Corp

    San Jose, CA
    7 days ago
  • $229.9k - $262.4k

     ...Senior Lead AI Engineer (GenAI Platform Services) Overview At Capital One, we are creating responsible and reliable AI systems,...  ...AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation,... 
    Training
    Local area

    Comfort Systems USA

    San Jose, CA
    1 day ago
  • $229.9k - $262.4k

     ...Senior Lead AI Engineer (Gen AI Platform Services, Agentic AI) Overview: At Capital One, we are creating responsible and...  ...AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation,... 
    Training
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Jose, CA
    20 hours ago
  • Title: Embodied AI Engineer About Us: UnitX builds the world's leading physical AI systems...  ...grounds for our fleet, but also design, train, and deploy the advanced machine learning...  ...and optimizing ML models for real‑time inference on robotic hardware (e.g., NVIDIA Jetson... 
    Training

    UnitX

    Milpitas, CA
    1 day ago
  •  ...leading automotive company is seeking a Principal AI Engineer to lead the design and optimization of its AI platform. The successful candidate will guide the infrastructure for large-scale training and cloud inference, working closely with data scientists and engineers... 
    Training
    Remote job

    General Motors

    Sunnyvale, CA
    1 day ago
  • $110k - $300k

     ...are redefining the future of AI with our groundbreaking innovations...  .... Our talented team of engineers and industry-leading executives...  ...ML models on embedded platforms, including FPGA and custom ASIC...  ...embedded AI applications. Improve inference efficiency and model... 

    TetraMem Inc

    San Jose, CA
    4 days ago
  •  ...Systems builds the world's largest AI chip, 56 times larger than...  ...to deliver industry-leading training and inference speeds; over 10 times faster...  ...Role We’re hiring a Software Engineer to help contribute to projects on our Inference Platform team. Our team primarily owns... 
    Training

    Cerebras Systems, Inc.

    Sunnyvale, CA
    9 hours ago
  •  ...builds the world's largest AI chip, 56 times larger than GPUs...  ...to deliver industry-leading training and inference speeds; over 10 times faster...  ...Role We're hiring a Staff Engineer to help lead, drive, and contribute...  ...projects on our Inference Platform team. Our team primarily... 
    Training

    Cerebras Systems, Inc.

    Sunnyvale, CA
    9 hours ago
  • $172.5k - $306.63k

     ...to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging,...  ...orchestration, tool integration, memory systems, inference services, data flows, evaluation loops, and... 
    Local area

    Dormont Manufacturing Company

    San Jose, CA
    3 days ago
  •  ...IT Consulting services in the US. We are actively seeking AI DevOps Engineer for one of our client, Please share your resume with...  ...TFX) • Solid understanding of computer algorithms, AI training, inference, and AI powered use cases • Good to have infrastructure... 
    Training

    Rootshell Enterprise Technologies

    Santa Clara, CA
    20 hours ago
  •  ...to create exceptional content effortlessly. The AI for Engineering team builds a scalable, production‑grade AI platform that powers creativity across design, imaging,...  ...and persistent memory. Develop high‑performance inference and runtime systems with strong guarantees... 

    Adobe

    San Jose, CA
    2 days ago
  • $172.5k - $306.63k

     ...Staff Engineer - AI For Engineering Adobe empowers individuals and organizations to create...  ...builds a scalable, production-grade AI platform that powers creativity across design,...  ...orchestration, tool integration, memory systems, inference services, data flows, evaluation loops,... 
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    2 days ago
  • $128.7k - $261.3k

     ...Team The Model Deployment & Inference Solutions team in GM AV deploys...  ...machine learning models from training frameworks (e.g. PyTorch)...  ...fold: build the ML deployment platform that makes model rollouts fast...  ...currently performed manually by engineers. Build the developer... 
    Training
    Local area
    Remote work
    Flexible hours
    Shift work

    General Motors

    Mountain View, CA
    1 day ago
  •  ...AI Engineer Opportunity Hope you are doing well Number of Position: 2 Only W2 I Abhishek...  ..., PyTorch ). Experience with model training, tuning, and evaluation. Knowledge of NVIDIA...  .... Understanding of generative AI and inference engines. Responsibilities: Preparing and... 
    Training
    Work visa

    Syntricate Technologies

    Santa Clara, CA
    1 day ago
  • Cerebras Systems, Inc. is seeking engineers for its Inference Core Platform group in Sunnyvale, California. This role involves building foundational software and hardware infrastructure to enhance AI inference performance on the Cerebras Wafer-Scale Engine. Ideal candidates... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    9 hours ago
  • $152k - $241.5k

    We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry...  ...at the intersection of GPU performance engineering and public accountability. What You Will...  ..., agentic workflows, and other emerging AI use cases. Collaborate with framework... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...Systems, Inc. is looking for a Senior Performance Engineer to enhance the performance benchmarking and competitive pricing models for their AI chip. The ideal candidate will have extensive experience with open-source inference frameworks and an understanding of ML systems.... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    9 hours ago
  • $124k - $195.5k

    NVIDIA Gruppe is looking for a passionate Software Engineer to join its TensorRT team in Santa Clara, California. This role involves designing and developing high-performance AI inference solutions while contributing to performance optimizations and collaborating with... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

    NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

    NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative... 

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

    NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In this role, you will...  ...innovate and develop groundbreaking AI systems software for inference applications including deep learning framework... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $159.5k - $271.2k

     ...expert teams of physicists, engineers, data scientists and problem-...  ...passionate and motivated Senior AI Engineer with experience...  ...~ Experience with LLM pre-training is optional, but a significant...  ...~ Understanding of cloud platforms and MLOps for scalable AI deployment... 
    Training
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    1 day ago
  • $209k

     ...Machine Learning Platform Engineer Immigration sponsorship is not available...  ...for distributed model training and hyperparameter optimization...  ...the auto scale for inference service and multi-models for...  ...tolerant, and resource-efficient AI workloads across multi-node... 
    Training
    Work at office
    Remote work
    1 day per week

    Zoom Video Communications

    San Jose, CA
    1 day ago
  • $175.8k - $293k

     ...'re looking for a Principal AI Engineer to architect, build, and harden...  ...orchestration runtimes, and inference serving. Evaluate and adopt...  .../GRPO), eval/observability platforms and bridging applied...  ...skill sets; experience and training, licensure, and certifications... 
    Training

    BMC Software

    Santa Clara, CA
    2 days ago
  • $229.9k - $262.4k

     ...Senior Lead AI Engineer(MLX, Agentic AI, Gen AI platform Services) Overview At Capital One, we are creating responsible and reliable AI...  ...software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation... 
    Training
    Local area

    Capital One National Association

    San Jose, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Platform Engineer, Training and Inference. Be the first to apply!