Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Infrastructure Engineer

$180k - $240k

Gatik AI

Who we are

Gatik, the leader in autonomous middle-mile logistics, is revolutionizing the B2B supply chain with its autonomous transportation-as-a-service (ATaaS) solution and prioritizing safe, consistent deliveries while streamlining freight movement by reducing congestion. The company focuses on short-haul, B2B logistics for Fortune 500 retailers and in 2021 launched the world's first fully driverless commercial transportation service with Walmart. Gatik's Class 3-7 autonomous trucks are commercially deployed across major markets, including Texas, Arkansas, and Ontario, Canada, driving innovation in freight transportation.


The company's proprietary Level 4 autonomous technology, Gatik Carrier™, is custom-built to transport freight safely and efficiently between pick-up and drop-off locations on the middle mile. With robust capabilities in both highway and urban environments, Gatik Carrier™ serves as an all-encompassing solution that integrates advanced software and hardware powering the fleet, facilitating effortless integration into customers' logistics operations.
About the role

We are seeking a Senior AI Infrastructure Engineer to design, build, and scale the high-performance AI platform powering our autonomous driving models. While researchers focus on developing perception, planning, and world models, you will be responsible for the underlying infrastructure that enables distributed training, experiment tracking, and seamless model deployment. You will bridge the gap between research and production, ensuring our AI stack is scalable, resilient, and highly efficient

This role is onsite 5 days a week at our Mountain View, CA office!

What you'll do

  • Distributed Training & ML Systems Support
    • Scale Research Workloads: Enable researchers to scale complex models (VLA, World Models) across multi-node setups using PyTorch Distributed, and Ray Train.
    • Performance Optimization: Architect and optimize multi-GPU setups, ensuring efficient model parallelism and data parallelism techniques across H100/A100 clusters.
    • Networking & Hardware Tuning: Optimize low-level communication (e.g., NCCL tuning, InfiniBand, or RoCE v2) to minimize latency for 3D Gaussian Splatting (3DGS) and large-scale training.
    • Intelligent Resource Scheduling: Optimize hardware utilization and cost-efficiency through Kubernetes-native GPU scheduling (NVIDIA GPU Operator, KubeFlow).
    • Inference Performance Engineering: Deploy and scale optimized model artifacts using TensorRT, ONNX Runtime, and Triton Inference Server, fine-tuning pipelines for both real-time and batch processing
  • Agentic Infrastructure & Automation
    • Self-Healing AI Infrastructure: Architect and deploy Autonomous AI Agents (LangGraph, CrewAI, or AutoGen) to monitor GPU cluster health, enabling automated real-time triage of hardware failures and NCCL timeouts.
    • Agentic DevOps & CI/CD: Develop agent-driven automation, such as Agentic PR Reviewers for infrastructure code and AI agents that proactively suggest model-specific Kubernetes resource optimizations.
    • Agentic Data Curation: Support researchers in building "Data Machines" where AI agents autonomously curate, label, and verify high-priority edge cases from raw data.
  • Model Management & Lifecycle (MLOps)
    • Automated Lifecycle Management: Design and maintain ML infrastructure leveraging MLFlow, Argo Workflows, and Kubernetes to automate the end-to-end model lifecycle.
    • Experiment & Model Tracking: Integrate feature stores and experiment tracking systems to provide a robust system of record for every model iteration.
    • Deployment Strategies: Implement robust serving mechanisms, including A/B testing, shadow deployments, and rollback mechanisms
  • Cloud-Native Foundations & Data Integration
    • Infrastructure as Code: Drive the "Everything as Code" philosophy using Terraform and Helm.
    • Data Pipelines: Collaborate with data teams to scale ETL pipelines using Apache Airflow, Kafka, and Spark for large-scale dataset management. •
    • Integrated Data Factories: Collaborate with data engineering teams to scale high-bandwidth ETL pipelines using Apache Airflow, Kafka, and Spark, ensuring seamless data flow from raw sensor logs to optimized storage in S3, GCS, or Delta Lake
  • Monitoring & Observability
    • System Metrics: Define and track key ML system metrics, including training convergence, latency, throughput, and drift detection.
    • Infrastructure Health: Maintain deep visibility into platform health using Prometheus, Grafana, OpenTelemetry, and ELK Stack.
    • Deep Stack Observability: Develop comprehensive monitoring using Prometheus, Grafana, and OpenTelemetry to track low-level infrastructure health alongside high-level ML metrics like training convergence and throughput.
    • AI-Specific Metrics & Drift: Define and monitor critical ML system KPIs, including model latency, inference throughput, and feature drift detection
What we're looking for
  • Experience: 5+ years in ML infrastructure, MLOps, or DevOps supporting high-scale compute environments.
  • ML Expertise: Deep understanding of multi-GPU training strategies (FSDP, DeepSpeed, Ray Train) and high-performance networking (NCCL, InfiniBand).
  • Infrastructure Automation: Mastery of Kubernetes, Terraform, and Helm, with a focus on GPU-native orchestration.
  • AI Agent Frameworks: Proven experience building or supporting Agentic Workflows for infrastructure or data automation (e.g., using LLMs to drive DevOps tasks).
  • Platform Mastery: Expertise in MLFlow, Argo Workflows, and Kubernetes.
  • Containerization: Strong experience with Docker, Kubernetes, and Helm.
  • Data & CI/CD: Proficiency in Apache Airflow, Kafka, Spark, and GitOps automation.
  • Core Skills: Proficiency in Python and Bash; experience with Go or Rust is a plus
Bonus Qualifications
  • Advanced AI Protocols: Familiarity with the Model Context Protocol (MCP) to standardize how AI agents interact with internal databases and orchestration APIs.
  • Hybrid & Physical AI: Experience in hybrid cloud and on-prem GPU cluster management for Physical AI workloads (e.g., 3DGS, World Models).
  • Agentic Observability: Experience utilizing LLMs for semantic monitoring and log analysis to detect complex distributed system failures that traditional threshold-based alerts miss.
Salary Ranges - $180,000- $240,000
More about Gatik

Founded in 2017 by experts in autonomous vehicle technology, Gatik has rapidly expanded its presence to Mountain View, Dallas-Fort Worth, Arkansas, and Toronto. As the first and only company to achieve fully driverless middle-mile commercial deliveries, Gatik holds a unique and defensible position in the AV industry, with a clear trajectory toward sustainable growth and profitability.

We have delivered complete, proprietary AV technology - an integration of software and hardware - to enable earlier successes for our clients in constrained Level 4 autonomy. By choosing the middle mile - with defined point-to-point delivery, we have simplified some of the more complex AV challenges, enabling us to achieve full autonomy ahead of competitors. Given extensive knowledge of Gatik's well-defined, fixed route ODDs and hybrid architecture, we are able to hyper-optimize our models with exponentially less data, establish gate-keeping mechanisms to maintain explainability, and ensure continued safety of the system for unmanned operations.

Visit us at Gatik for more company information and Careers at Gatik for more open roles.
Notable News
  • Bloomberg: Autonomous Trucking Firm Gatik Inks Contracts Worth $600 Million
  • Forbes: Hundreds' Of Gatik Robot Delivery Trucks Headed For U.S. Roads
  • Forbes:Gatik And Loblaw Announce Largest Commercial Deployment Of AV Trucks
  • Forbes: Forget robotaxis. Upstart Gatik sees middle-mile deliveries as the path to profitable AVs
  • Tech Brew: Gatik AI exec unpacks the regulations that could shape the AV industry
  • Business Wire: Gatik Paves the Way for Safe Driverless Operations ('Freight-Only') at Scale with Industry-First Third-Party Safety Assessment Framework
  • Auto Futures: Autonomous Trucking Group Gatik Secures Investment From NIPPON EXPRESS HOLDINGS
  • Automotive News: Gatik foresees hundreds of self-driving trucks on road soon, and that's just the beginning
  • Forbes: Isuzu And Gatik Go All In To Scale Up Driverless Freight Services
  • Bloomberg: Autonomous Vehicle Startup Takes Off by Picking Off Easier Routes
  • Reuters: Driverless vehicles on limited routes bump along despite US robotaxi scrutiny
Taking care of our team

At Gatik, we connect people of extraordinary talent and experience to an opportunity to create a more resilient supply chain and contribute to our environment's sustainability. We are diverse in our backgrounds and perspectives yet united by a bold vision and shared commitment to our values. Our culture emphasizes the importance of collaboration, respect and agility.

We at Gatik strive to create a diverse and inclusive environment where everyone feels they have opportunities to succeed and grow because we know that together we can do great things. We are committed to an inclusive and diverse team. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior AI Infrastructure Engineer in Mountain View, CA vacancy
  • $174k - $252k

    Google Inc. is looking for a Senior Software Engineer in Sunnyvale, CA, to join the AI and Infrastructure team. The role involves developing next-generation technologies, managing project priorities, and working on critical projects that impact billions of users. Candidates... 
    Senior

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $174k - $252k

    A leading tech company is seeking a Senior Software Engineer for AI and Infrastructure. This position involves writing and testing software, participating in design reviews, and maintaining coding best practices. Candidates should have at least 5 years of programming experience... 
    Senior

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • A leading AI healthcare solutions company in Mountain View is seeking a Senior/Staff Software Engineer to innovate in building AI agent infrastructure for healthcare operations. The ideal candidate has over 7 years of experience in developing AI systems and a strong product... 
    Senior
    Full time

    Joinhoneyhealth

    Mountain View, CA
    3 days ago
  • $191k - $315k

     ...Senior Staff AI Engineer, Network Growth AI LinkedIn is the world's largest professional network, built to create economic opportunity...  ...related discipline. Prior experience with large scale ML data infrastructure Experience with developing and designing production scale... 
    Senior
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Sunnyvale, CA
    1 day ago
  • $174k - $252k

    Google Inc. is seeking a Senior Software Engineer in Mountain View, CA. The role involves developing...  ...technologies and implementing AI/ML solutions. Candidates should have...  ...software design and machine learning infrastructure. The position offers a competitive salary... 
    Senior

    Google Inc.

    Mountain View, CA
    4 days ago
  • $140k - $215k

    CrowdStrike, Inc. is seeking a Software Development Engineer for the Cloud Runtime Protection team. In this role, you will design critical features for the Falcon platform, focusing on AI and cloud-native workloads. Experience with C/C++, Linux, and eBPF is required. This... 
    Senior
    Work at office

    Koitecc Solutions

    Sunnyvale, CA
    2 days ago
  • $140k - $215k

    CrowdStrike, Inc. in Sunnyvale, California is seeking an AI Engineer to develop cutting-edge AI applications. The ideal candidate will have over 6 years experience in cloud app development, particularly in Generative AI Apps and cloud services like AWS, GCP, and Azure.... 
    Senior
    2 days per week
    3 days per week

    Koitecc Solutions

    Sunnyvale, CA
    2 days ago
  • $248.7k - $342k

    Uniphore is seeking a Senior Director of Engineering to lead the architecture and delivery of AI-driven products utilizing Large Language Models (LLMs). This role involves managing multiple teams and driving innovation in conversational AI systems. Ideal candidates will... 
    Senior

    Uniphore

    Palo Alto, CA
    1 day ago
  • Commure, located in Mountain View, California, is seeking a skilled software engineer to join our Air AI team. This role focuses on leveraging AI technologies to enhance clinical workflows and deliver automation solutions to clinicians. Candidates must have a Bachelor's... 
    Senior

    Commure

    Mountain View, CA
    19 hours ago
  • $248.7k - $342k

    A leading AI-driven technology firm in Palo Alto is seeking a Senior Director of Engineering to lead architecture and product delivery. This role focuses on driving innovation using Large Language Models and enhancing conversational intelligence. The ideal candidate has... 
    Senior

    Uniphore Technologies Inc.

    Palo Alto, CA
    3 days ago
  • Google Inc. is looking for a passionate Senior Security Engineer specializing in Google Photos AI Security. You will develop security strategies, conduct threat assessments, and collaborate with teams to enhance user security. This role offers the chance to work on impactful... 
    Senior

    Google Inc.

    Mountain View, CA
    2 days ago
  •  ...issues. Candidates must hold a Bachelor's degree and have programming experience in Python or C++, as well as expertise in machine learning infrastructure. The position comes with a competitive salary, bonus, equity, and comprehensive benefits. #J-18808-Ljbffr Google Inc.
    Senior

    Google Inc.

    Mountain View, CA
    1 day ago
  • $120k - $130k

    A Bachelor’s or Higher Degree is the minimum entry required for the position Role: Senior Software Engineer - Enterprise AI (Platform & Infrastructure) Key Skills 6-10 years backend/infrastructure engineering Strong coding in Python / Go AWS / GCP (compute, storage... 
    Senior

    Tech Mahindra

    Palo Alto, CA
    1 day ago
  •  ...Senior Lead Software Engineer Be an integral part of an agile team that's constantly pushing the envelope...  ...Chase within the Corporate Sector, Infrastructure Platforms team, you are an integral...  ...infrastructure platforms optimized for AI and machine learning workloads.... 
    Senior
    For contractors

    Chase

    Palo Alto, CA
    1 day ago
  • A leading technology company based in Sunnyvale, CA is seeking a Senior Staff Software Engineer to innovate within AI/ML and Google Cloud Applications. The role entails designing and enhancing large-scale software solutions, leading technical direction on impactful projects... 
    Senior

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • Google Inc. is seeking a Senior Software Engineer for AI/ML in Sunnyvale, CA. The candidate will develop technologies that enhance user interaction and handle massive scale information. Responsibilities include writing code, testing, design collaboration, and ML solutions... 
    Senior

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • CrowdStrike, Inc. seeks an experienced professional for a full-time position focusing on designing and implementing contract testing frameworks. Candidates should have 5-7 years of experience in software development and test automation, with a proven ability to deliver ...
    Senior
    Full time
    Contract work

    Koitecc Solutions

    Sunnyvale, CA
    19 hours ago
  • $141k - $202k

     ...technology company based in Mountain View, CA seeks a Software Engineer to work on product development and GenAI solutions. This role demands proficiency in Python or C++ and machine learning infrastructure. You'll collaborate with teams, troubleshoot issues, and contribute... 
    Senior

    Google Inc.

    Mountain View, CA
    1 day ago
  • $171k - $260k

     ...deliver top-notch technology products. As a Senior Lead Software Engineer at JPMorgan Chase within the Corporate Sector, Infrastructure Platforms team, you are an integral part of...  ...infrastructure platforms optimized for AI and machine learning workloads. Collaborate... 
    Senior
    For contractors

    Fairygodboss

    Palo Alto, CA
    3 days ago
  •  ...About Obvio AI Each year, more than 40,000 people in the U.S. leave home and never...  ...and lifecycle layer. Stand up the infrastructure that loads versioned CV models and handles...  ...back without pipeline downtime. Set the engineering standard. This is an early hire. You'll... 
    Senior
    Local area

    Obvio

    San Carlos, CA
    4 days ago
  • $145.1k - $273.2k

     ...the underlying hardware logic of various AI accelerators ; evaluate the power-...  ...implementation of emerging technologies within cloud infrastructure. Who We Look For 1.Education: Master's or Ph.D. degree in Computer Engineering, Electronic Engineering, Microelectronics... 
    Senior
    Relocation package

    Tencent

    Palo Alto, CA
    19 hours ago
  • $124.09k - $210k

     ...Senior AI Data Infrastructure Engineer Santa Clara, CA XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical... 
    Senior
    Full time
    Work experience placement

    XPENG

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research...  ...an AI infrastructure software engineer to join our team. You'll be instrumental...  ...of AI systems. As a senior DGX Cloud AI Infrastructure software... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...product that powers innovative AI research and developers. We...  ...well as developing scalable AI infrastructure services globally. We are...  ...an AI infrastructure software engineer to join our team. You'll be instrumental...  ...AI in production. As a senior DGX Cloud AI Infrastructure... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $181.1k - $318.4k

     ...for its Special Projects team in Cupertino, California. The role focuses on building innovative applications and robust infrastructure to support AI research. Candidates should excel in programming languages like Go or Swift and have experience with web services and containers... 
    Senior

    Apple Inc.

    Cupertino, CA
    19 hours ago
  • Google Inc. in Sunnyvale, CA is seeking a Senior Product Manager to lead product development for the Gemini API, focusing on agentic capabilities...  ...management experience and strong familiarity with Generative AI and API development. A Bachelor's degree in Computer Science or a... 
    Senior

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • Snorkel is seeking a Senior Software Engineer for its AI Platform in Redwood City, CA, focusing on architecting solutions for synthetic data generation and large-scale AI systems. This hybrid role calls for extensive experience in cloud-native software systems and deep... 
    Senior

    jobs.frontdoordefense.com - Jobboard

    Redwood City, CA
    2 days ago
  • $152k - $287.5k

    A leading technology company is seeking a Senior Software Engineer to develop solutions for GPU clusters aimed at enhancing machine learning...  ...software engineering with significant involvement in ML infrastructure, strong coding skills in Python, C++, or Rust, and familiarity... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $220k - $350k

     ...actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. SR AI ENGINEER, PLATFORM INFRASTRUCTURE, SPECIAL PROGRAMS As an AI Engineer, Platform Infrastructure you will build the tooling, and work with our cleared... 
    Senior
    Permanent employment
    Temporary work
    Immediate start
    Weekend work

    SpaceX

    Palo Alto, CA
    1 day ago
  • NVIDIA AI is seeking a skilled Engineering Lead in Santa Clara, California, to manage a customer-facing team specializing in networking technologies. This role involves providing technical leadership, driving product improvements, and enhancing team performance. The ideal... 
    Senior

    NVIDIA AI

    Santa Clara, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Infrastructure Engineer. Be the first to apply!