Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Runtime Engineer: Distributed Training & Scale

FlexAI

A forward-thinking AI infrastructure company is seeking a Staff AI Runtime Engineer to lead the design and optimization of their AI compute platform. In this leadership role, you'll enhance AI training and inference capabilities. Successful candidates will have over 8 years of experience in systems engineering, expertise with PyTorch and TensorFlow, and strong programming skills in Python and C++. This role is based in Santa Clara, CA, and offers a competitive salary along with the chance to work on cutting-edge technology. #J-18808-Ljbffr

Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Senior AI Runtime Engineer: Distributed Training & Scale in Santa Clara, CA vacancy
  • $184k - $356.5k

     ...NVIDIA Corporation is seeking a Senior Software Engineer in Santa Clara to enhance the performance and reliability of large-scale AI infrastructures. The role involves leadership in debugging and optimizing distributed training workloads across NVIDIA’s GPU platforms.... 
    Senior
    Training

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...Build and Deploy AI the right way, anywhere. The...  ...teams are strategically distributed across Silicon Valley and...  ...for next-generation training and inference workloads. As a Staff AI Runtime Engineer , you’ll play a pivotal...  ...training and inference at scale. Design resilient and... 
    Training
    Work at office

    FlexAI

    Santa Clara, CA
    8 hours ago
  •  ...Senior Staff AI/ML System Software Engineer At d-Matrix, we are focused on unleashing...  ...able to build and scale software...  ...Experience with distributed, high performance software...  ...with deep learning runtimes (such as ONNX Runtime...  ...deployment, including training, quantization,... 
    Senior
    Training
    Work experience placement
    3 days per week

    D-Matrix

    Santa Clara, CA
    3 days ago
  •  ...A leading technology company is seeking a Fellow/Sr. Fellow Machine Learning Engineer to join the Training At Scale team in San Jose, CA. The candidate will work on distributed training of large models and improve training efficiency. Responsibilities include enhancing... 
    Senior
    Training

    Advanced Micro Devices , Inc.

    San Jose, CA
    8 hours ago
  • $148k - $235.75k

     ...the unlimited potential of AI to define the next era of...  ...looking for outstanding Senior High Performance AI Engineer to build groundbreaking...  ...build innovative agentic runtimes and compiler-integrated orchestration...  .../libraries, frameworks, distributed training, and inference/serving-... 
    Senior
    Training

    NVIDIA Gruppe

    Santa Clara, CA
    8 hours ago
  • $180k

    A cutting-edge AI research firm in California seeks a Member of Technical Staff specializing...  ...hands-on experience with multimodal pre-training and a strong proficiency in Python, JAX,...  ...Responsibilities include designing large-scale systems and developing data pipelines to push... 
    Senior
    Training

    x.ai

    Palo Alto, CA
    1 day ago
  • $168k - $322k

     ...NVIDIA Gruppe is seeking a Senior AI Platform Engineer to improve engineering efficiency and data security...  ...Cloud and AI/ML teams to build and scale infrastructure and shape the...  ...strong Python skills, and expertise in distributed systems along with Kubernetes. Competitive... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    7 hours ago
  •  ...Senior AI Systems Performance Engineer Palo Alto, California, United States...  ...and operations at scale. SambaNova Suite...  ...collaborating across compiler, runtime, and hardware...  ...single-node and distributed systems. Basic...  ...multimodal model training and inference.... 
    Senior
    Training

    SambaNova Systems

    Palo Alto, CA
    3 days ago
  • $180k - $240k

     ...the role We are seeking a Senior AI Infrastructure Engineer to design, build, and scale the high-performance AI...  ...infrastructure that enables distributed training, experiment tracking, and seamless...  ...artifacts using TensorRT, ONNX Runtime, and Triton Inference Server,... 
    Senior
    Training
    Odd job
    Work at office

    Gatik AI

    Mountain View, CA
    1 day ago
  • $200k - $400k

     ...Institute Of Foundation Models Engineer The Institute of...  ...and operates ultra-scale GPU supercomputing systems to train next-generation foundation...  ...communication systems, runtime, and hardware topology....  ...communication performance, distributed reliability, and cross-layer... 
    Senior
    Training
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    2 days ago
  • $208k - $327.75k

     ...Manager to lead strategic AI platform initiatives...  ...closely with engineering, architecture, and platform...  ...that improve how large-scale systems are built and...  ...infrastructure, including distributed training, inference...  ...test, deployment, and runtime environments. Outstanding... 
    Senior
    Training
    Temporary work

    NVIDIA

    Santa Clara, CA
    8 hours ago
  • $184k - $287.5k

     ...that powers innovative AI research and...  ...infrastructure software engineer to join our team. You'...  ...platforms that enable large-scale AI training, inferencing, fine-tuning...  ...production. As a senior DGX Cloud AI Infrastructure...  ...scaling large-scale distributed systems. Experience... 
    Senior
    Training

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...NVIDIA's DGX Cloud AI Efficiency Team...  ...AI workloads - pre-training, post-training, inference...  ...resources and scale to foster...  ...infrastructure software engineer to join our team....  ...systems. As a senior DGX Cloud AI Infrastructure...  ...large-scale distributed systems. Experience... 
    Senior
    Training

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...passionate, and dedicated Senior AI Infrastructure Engineer to join our DGX Cloud group...  ...build and maintain large-scale production systems with high...  ...for large-scale AI training and inferencing platform built...  ...infrastructure automation and distributed systems architecture... 
    Senior
    Training

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $155.42k - $395.9k

     ...supports the end-to-end AI lifecycle of ML...  ...experimentation and large-scale training to evaluation, lineage...  ...interfaces, enabling ML engineers and researchers to...  ...The Role: As a Senior AI/ML Engineer, you will...  ...implement, and test scalable distributed computing and data... 
    Senior
    Training
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • $133k - $254k

     ...Us 42dot is a mobility AI company committed to solving...  .... Our AI Data Pipeline Engineers build up the core data...  .... We develop the distributed system of a scalable data pipeline for large‑scale dataset (millions of scenes...  ...SDKs for ML model training / evaluation. The data... 
    Senior
    Training
    Work experience placement

    42dot

    Sunnyvale, CA
    8 hours ago
  • $170.6k - $261.3k

     ...world! The Data Labeling Engineering team designs, builds, and operates...  ..., data engineering, and AI/ML, defining the strategies...  ...that create reliable training data at scale. Our tools and platform are...  ...experience building robust distributed platforms and applications.... 
    Senior
    Training
    Local area
    Remote work
    Work from home
    Flexible hours

    General Motors

    Sunnyvale, CA
    29 days ago
  • $174.72k - $295.68k

     ...Senior AI Data Infrastructure/Pipeline Engineer Santa Clara, CA XPENG is a leading smart...  ...dataset production → model training / simulation input. In...  ...daily flow of petabyte-scale sensor data. Key Responsibilities...  ...I/O, etc., and build a distributed data processing system... 
    Senior
    Training
    Full time
    Overseas

    XPENG

    Santa Clara, CA
    4 days ago
  • $160k - $253k

     ...Senior Technical Marketing Engineer - Data Center Scale Out page is loaded## Senior Technical Marketing...  ...software to power AI at scale. To help customers...  ...-leading inference and training performance and power efficiency...  ...cabling, power distribution, and thermal scaling.*... 
    Senior
    Training

    NVIDIA

    Santa Clara, CA
    8 hours ago
  •  ...A leading robotics company in Palo Alto seeks a Staff/Principal ML Systems Engineer to enhance training performance for their innovative humanoid robots. You will optimize distributed training systems and engage closely with researchers to transform model changes into... 
    Senior
    Training

    Rhoda ai

    Palo Alto, CA
    1 day ago
  • $166k - $225k

     ...world's best data and AI infrastructure...  ...business. Founded by engineers — and customer...  ...interfacing with data to scaling our services and...  ...engineer on the Runtime team at Databricks...  ...next generation distributed data storage and processing...  ...and training, and specific work... 
    Senior
    Training
    Local area
    Worldwide

    Databricks

    Mountain View, CA
    5 days ago
  •  ...Dormont Manufacturing Co seeks a skilled engineer to optimize Wafer Scale Engines in Sunnyvale, California. This position requires expertise in both...  ...Verilog. Join us to work on groundbreaking technology in the AI sector and help us build the future. #J-18808-Ljbffr... 
    Senior

    Dormont Manufacturing Company

    Sunnyvale, CA
    8 hours ago
  • $184k - $287.5k

     ...We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the...  ...kernel implementations, new LLM inference runtimes components, and kernel code generators...  ...solutions for LLM inference and training (e.g. FlashInfer, Flash Attention) Expertise... 
    Senior
    Training

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $140k - $215k

     ...world's most advanced AI-native platform. Our...  ...Development Engineer role on the Cloud Runtime Protection team that...  ...workloads deployed at scale Design and develop highly...  ...effectively in a distributed team Benefits of Working...  ..., selection, training, compensation, benefits... 
    Senior
    Training
    Full time
    Work experience placement
    Work at office
    Local area
    2 days per week
    3 days per week

    Koitecc Solutions

    Sunnyvale, CA
    8 hours ago
  • $87.4k - $115k

     ...receive an alert: Sr. Software Engineer Location: Sunnyvale,...  ...powered by Artificial Intelligence (AI) and Machine Learning (ML) that...  ...reliability, and release confidence at scale. If you care deeply about DX,...  ...sets, years of experience, training, education, geography, and... 
    Senior
    Training

    Vistance Networks, Inc.

    Sunnyvale, CA
    7 hours ago
  • $166k - $244k

     ...About the job Google's software engineers develop the next-generation...  ...information at massive scale, and extend well beyond web search...  ...including information retrieval, distributed computing, large-scale system...  ..., and relevant education or training. Your recruiter can share... 
    Senior
    Training
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $174k - $252k

    Senior Software Engineer, Google Distributed Cloud Hosted, Infrastructure Google Sunnyvale, CA, USA Bachelor’s degree...  ...experience with developing large-scale infrastructure, distributed systems...  ...experience, and relevant education or training. Your recruiter can share more... 
    Senior
    Training
    Full time

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $110k - $190k

     ...We are hiring a Senior Software & AI Engineer to build production-grade AI systems, with a strong emphasis...  ...right solution: data preparation, training, evaluation, deployment, and monitoring...  ...core to how we create value, scale operations, and differentiate in the... 
    Senior
    Training

    Covalent

    Sunnyvale, CA
    7 hours ago
  • $151.8k - $265.35k

     ...content with ease. The AI Foundations team...  ...We’re looking for an engineer to help develop and scale the AI infrastructure...  ...pipelines for training, evaluation, fine-tuning...  ...ML models. Support runtime systems for inference...  ...Good understanding of distributed systems fundamentals... 
    Senior
    Training
    Full time
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    3 days ago
  •  ...experiences-from AI and data centers...  ...looking for a Senior Staff AI Infra Engineer who is passionate...  ...accelerate LLM training and inference on...  ...including large-scale training and inference...  ..., network, and runtime layers. •...  ...infrastructure, distributed systems, or performance... 
    Training

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Runtime Engineer: Distributed Training & Scale. Be the first to apply!