Senior AI Runtime Engineer: Distributed Training & Scale
FlexAI
A forward-thinking AI infrastructure company is seeking a Staff AI Runtime Engineer to lead the design and optimization of their AI compute platform. In this leadership role, you'll enhance AI training and inference capabilities. Successful candidates will have over 8 years of experience in systems engineering, expertise with PyTorch and TensorFlow, and strong programming skills in Python and C++. This role is based in Santa Clara, CA, and offers a competitive salary along with the chance to work on cutting-edge technology. #J-18808-Ljbffr
$184k - $356.5k
...NVIDIA Corporation is seeking a Senior Software Engineer in Santa Clara to enhance the performance and reliability of large-scale AI infrastructures. The role involves leadership in debugging and optimizing distributed training workloads across NVIDIA’s GPU platforms....SeniorTraining- ...Build and Deploy AI the right way, anywhere. The... ...teams are strategically distributed across Silicon Valley and... ...for next-generation training and inference workloads. As a Staff AI Runtime Engineer , you’ll play a pivotal... ...training and inference at scale. Design resilient and...TrainingWork at office
- ...Senior Staff AI/ML System Software Engineer At d-Matrix, we are focused on unleashing... ...able to build and scale software... ...Experience with distributed, high performance software... ...with deep learning runtimes (such as ONNX Runtime... ...deployment, including training, quantization,...SeniorTrainingWork experience placement3 days per week
- ...A leading technology company is seeking a Fellow/Sr. Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The candidate will work on distributed training of large models and improve training efficiency. Responsibilities include enhancing...SeniorTraining
$148k - $235.75k
...the unlimited potential of AI to define the next era of... ...looking for outstanding Senior High Performance AI Engineer to build groundbreaking... ...build innovative agentic runtimes and compiler-integrated orchestration... .../libraries, frameworks, distributed training, and inference/serving-...SeniorTraining$180k
A cutting-edge AI research firm in California seeks a Member of Technical Staff specializing... ...hands-on experience with multimodal pre-training and a strong proficiency in Python, JAX,... ...Responsibilities include designing large-scale systems and developing data pipelines to push...SeniorTraining$168k - $322k
...NVIDIA Gruppe is seeking a Senior AI Platform Engineer to improve engineering efficiency and data security... ...Cloud and AI/ML teams to build and scale infrastructure and shape the... ...strong Python skills, and expertise in distributed systems along with Kubernetes. Competitive...Senior- ...Senior AI Systems Performance Engineer Palo Alto, California, United States... ...and operations at scale. SambaNova Suite... ...collaborating across compiler, runtime, and hardware... ...single-node and distributed systems. Basic... ...multimodal model training and inference....SeniorTraining
$180k - $240k
...the role We are seeking a Senior AI Infrastructure Engineer to design, build, and scale the high-performance AI... ...infrastructure that enables distributed training, experiment tracking, and seamless... ...artifacts using TensorRT, ONNX Runtime, and Triton Inference Server,...SeniorTrainingOdd jobWork at office$200k - $400k
...Institute Of Foundation Models Engineer The Institute of... ...and operates ultra-scale GPU supercomputing systems to train next-generation foundation... ...communication systems, runtime, and hardware topology.... ...communication performance, distributed reliability, and cross-layer...SeniorTrainingVisa sponsorship$208k - $327.75k
...Manager to lead strategic AI platform initiatives... ...closely with engineering, architecture, and platform... ...that improve how large-scale systems are built and... ...infrastructure, including distributed training, inference... ...test, deployment, and runtime environments. Outstanding...SeniorTrainingTemporary work$184k - $287.5k
...that powers innovative AI research and... ...infrastructure software engineer to join our team. You'... ...platforms that enable large-scale AI training, inferencing, fine-tuning... ...production. As a senior DGX Cloud AI Infrastructure... ...scaling large-scale distributed systems. Experience...SeniorTraining$184k - $287.5k
...NVIDIA's DGX Cloud AI Efficiency Team... ...AI workloads - pre-training, post-training, inference... ...resources and scale to foster... ...infrastructure software engineer to join our team.... ...systems. As a senior DGX Cloud AI Infrastructure... ...large-scale distributed systems. Experience...SeniorTraining$152k - $241.5k
...passionate, and dedicated Senior AI Infrastructure Engineer to join our DGX Cloud group... ...build and maintain large-scale production systems with high... ...for large-scale AI training and inferencing platform built... ...infrastructure automation and distributed systems architecture...SeniorTraining$155.42k - $395.9k
...supports the end-to-end AI lifecycle of ML... ...experimentation and large-scale training to evaluation, lineage... ...interfaces, enabling ML engineers and researchers to... ...The Role: As a Senior AI/ML Engineer, you will... ...implement, and test scalable distributed computing and data...SeniorTrainingLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$133k - $254k
...Us 42dot is a mobility AI company committed to solving... .... Our AI Data Pipeline Engineers build up the core data... .... We develop the distributed system of a scalable data pipeline for large‑scale dataset (millions of scenes... ...SDKs for ML model training / evaluation. The data...SeniorTrainingWork experience placement$170.6k - $261.3k
...world! The Data Labeling Engineering team designs, builds, and operates... ..., data engineering, and AI/ML, defining the strategies... ...that create reliable training data at scale. Our tools and platform are... ...experience building robust distributed platforms and applications....SeniorTrainingLocal areaRemote workWork from homeFlexible hours$174.72k - $295.68k
...Senior AI Data Infrastructure/Pipeline Engineer Santa Clara, CA XPENG is a leading smart... ...dataset production → model training / simulation input. In... ...daily flow of petabyte-scale sensor data. Key Responsibilities... ...I/O, etc., and build a distributed data processing system...SeniorTrainingFull timeOverseas$160k - $253k
...Senior Technical Marketing Engineer - Data Center Scale Out page is loaded## Senior Technical Marketing... ...software to power AI at scale. To help customers... ...-leading inference and training performance and power efficiency... ...cabling, power distribution, and thermal scaling.*...SeniorTraining- ...A leading robotics company in Palo Alto seeks a Staff/Principal ML Systems Engineer to enhance training performance for their innovative humanoid robots. You will optimize distributed training systems and engage closely with researchers to transform model changes into...SeniorTraining
$166k - $225k
...world's best data and AI infrastructure... ...business. Founded by engineers — and customer... ...interfacing with data to scaling our services and... ...engineer on the Runtime team at Databricks... ...next generation distributed data storage and processing... ...and training, and specific work...SeniorTrainingLocal areaWorldwide- ...Dormont Manufacturing Co seeks a skilled engineer to optimize Wafer Scale Engines in Sunnyvale, California. This position requires expertise in both... ...Verilog. Join us to work on groundbreaking technology in the AI sector and help us build the future. #J-18808-Ljbffr...Senior
$184k - $287.5k
...We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the... ...kernel implementations, new LLM inference runtimes components, and kernel code generators... ...solutions for LLM inference and training (e.g. FlashInfer, Flash Attention) Expertise...SeniorTraining$140k - $215k
...world's most advanced AI-native platform. Our... ...Development Engineer role on the Cloud Runtime Protection team that... ...workloads deployed at scale Design and develop highly... ...effectively in a distributed team Benefits of Working... ..., selection, training, compensation, benefits...SeniorTrainingFull timeWork experience placementWork at officeLocal area2 days per week3 days per week$87.4k - $115k
...receive an alert: Sr. Software Engineer Location: Sunnyvale,... ...powered by Artificial Intelligence (AI) and Machine Learning (ML) that... ...reliability, and release confidence at scale. If you care deeply about DX,... ...sets, years of experience, training, education, geography, and...SeniorTraining$166k - $244k
...About the job Google's software engineers develop the next-generation... ...information at massive scale, and extend well beyond web search... ...including information retrieval, distributed computing, large-scale system... ..., and relevant education or training. Your recruiter can share...SeniorTrainingFull time$174k - $252k
Senior Software Engineer, Google Distributed Cloud Hosted, Infrastructure Google Sunnyvale, CA, USA Bachelor’s degree... ...experience with developing large-scale infrastructure, distributed systems... ...experience, and relevant education or training. Your recruiter can share more...SeniorTrainingFull time$110k - $190k
...We are hiring a Senior Software & AI Engineer to build production-grade AI systems, with a strong emphasis... ...right solution: data preparation, training, evaluation, deployment, and monitoring... ...core to how we create value, scale operations, and differentiate in the...SeniorTraining$151.8k - $265.35k
...content with ease. The AI Foundations team... ...We’re looking for an engineer to help develop and scale the AI infrastructure... ...pipelines for training, evaluation, fine-tuning... ...ML models. Support runtime systems for inference... ...Good understanding of distributed systems fundamentals...SeniorTrainingFull timeTemporary workLocal areaWorldwide- ...experiences-from AI and data centers... ...looking for a Senior Staff AI Infra Engineer who is passionate... ...accelerate LLM training and inference on... ...including large-scale training and inference... ..., network, and runtime layers. •... ...infrastructure, distributed systems, or performance...Training
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Runtime Engineer: Distributed Training & Scale. Be the first to apply!
- senior ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- ai engineer remote Santa Clara, CA
- ai engineer Santa Clara, CA
- ai prompt engineer Santa Clara, CA
- ai developer Santa Clara, CA
- machine learning ai engineer Santa Clara, CA
- senior cost analyst Santa Clara, CA
- senior computer engineer Santa Clara, CA
- senior development engineer Santa Clara, CA

