Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Infrastructure Engineer

$180k - $240k

Gatik AI

Who we are

Gatik, the leader in autonomous middle-mile logistics, is revolutionizing the B2B supply chain with its autonomous transportation-as-a-service (ATaaS) solution and prioritizing safe, consistent deliveries while streamlining freight movement by reducing congestion. The company focuses on short-haul, B2B logistics for Fortune 500 retailers and in 2021 launched the world's first fully driverless commercial transportation service with Walmart. Gatik's Class 3-7 autonomous trucks are commercially deployed across major markets, including Texas, Arkansas, and Ontario, Canada, driving innovation in freight transportation.


The company's proprietary Level 4 autonomous technology, Gatik Carrier™, is custom-built to transport freight safely and efficiently between pick-up and drop-off locations on the middle mile. With robust capabilities in both highway and urban environments, Gatik Carrier™ serves as an all-encompassing solution that integrates advanced software and hardware powering the fleet, facilitating effortless integration into customers' logistics operations.
About the role

We are seeking a Senior AI Infrastructure Engineer to design, build, and scale the high-performance AI platform powering our autonomous driving models. While researchers focus on developing perception, planning, and world models, you will be responsible for the underlying infrastructure that enables distributed training, experiment tracking, and seamless model deployment. You will bridge the gap between research and production, ensuring our AI stack is scalable, resilient, and highly efficient

This role is onsite 5 days a week at our Mountain View, CA office!

What you'll do

  • Distributed Training & ML Systems Support
    • Scale Research Workloads: Enable researchers to scale complex models (VLA, World Models) across multi-node setups using PyTorch Distributed, and Ray Train.
    • Performance Optimization: Architect and optimize multi-GPU setups, ensuring efficient model parallelism and data parallelism techniques across H100/A100 clusters.
    • Networking & Hardware Tuning: Optimize low-level communication (e.g., NCCL tuning, InfiniBand, or RoCE v2) to minimize latency for 3D Gaussian Splatting (3DGS) and large-scale training.
    • Intelligent Resource Scheduling: Optimize hardware utilization and cost-efficiency through Kubernetes-native GPU scheduling (NVIDIA GPU Operator, KubeFlow).
    • Inference Performance Engineering: Deploy and scale optimized model artifacts using TensorRT, ONNX Runtime, and Triton Inference Server, fine-tuning pipelines for both real-time and batch processing
  • Agentic Infrastructure & Automation
    • Self-Healing AI Infrastructure: Architect and deploy Autonomous AI Agents (LangGraph, CrewAI, or AutoGen) to monitor GPU cluster health, enabling automated real-time triage of hardware failures and NCCL timeouts.
    • Agentic DevOps & CI/CD: Develop agent-driven automation, such as Agentic PR Reviewers for infrastructure code and AI agents that proactively suggest model-specific Kubernetes resource optimizations.
    • Agentic Data Curation: Support researchers in building "Data Machines" where AI agents autonomously curate, label, and verify high-priority edge cases from raw data.
  • Model Management & Lifecycle (MLOps)
    • Automated Lifecycle Management: Design and maintain ML infrastructure leveraging MLFlow, Argo Workflows, and Kubernetes to automate the end-to-end model lifecycle.
    • Experiment & Model Tracking: Integrate feature stores and experiment tracking systems to provide a robust system of record for every model iteration.
    • Deployment Strategies: Implement robust serving mechanisms, including A/B testing, shadow deployments, and rollback mechanisms
  • Cloud-Native Foundations & Data Integration
    • Infrastructure as Code: Drive the "Everything as Code" philosophy using Terraform and Helm.
    • Data Pipelines: Collaborate with data teams to scale ETL pipelines using Apache Airflow, Kafka, and Spark for large-scale dataset management. •
    • Integrated Data Factories: Collaborate with data engineering teams to scale high-bandwidth ETL pipelines using Apache Airflow, Kafka, and Spark, ensuring seamless data flow from raw sensor logs to optimized storage in S3, GCS, or Delta Lake
  • Monitoring & Observability
    • System Metrics: Define and track key ML system metrics, including training convergence, latency, throughput, and drift detection.
    • Infrastructure Health: Maintain deep visibility into platform health using Prometheus, Grafana, OpenTelemetry, and ELK Stack.
    • Deep Stack Observability: Develop comprehensive monitoring using Prometheus, Grafana, and OpenTelemetry to track low-level infrastructure health alongside high-level ML metrics like training convergence and throughput.
    • AI-Specific Metrics & Drift: Define and monitor critical ML system KPIs, including model latency, inference throughput, and feature drift detection
What we're looking for
  • Experience: 5+ years in ML infrastructure, MLOps, or DevOps supporting high-scale compute environments.
  • ML Expertise: Deep understanding of multi-GPU training strategies (FSDP, DeepSpeed, Ray Train) and high-performance networking (NCCL, InfiniBand).
  • Infrastructure Automation: Mastery of Kubernetes, Terraform, and Helm, with a focus on GPU-native orchestration.
  • AI Agent Frameworks: Proven experience building or supporting Agentic Workflows for infrastructure or data automation (e.g., using LLMs to drive DevOps tasks).
  • Platform Mastery: Expertise in MLFlow, Argo Workflows, and Kubernetes.
  • Containerization: Strong experience with Docker, Kubernetes, and Helm.
  • Data & CI/CD: Proficiency in Apache Airflow, Kafka, Spark, and GitOps automation.
  • Core Skills: Proficiency in Python and Bash; experience with Go or Rust is a plus
Bonus Qualifications
  • Advanced AI Protocols: Familiarity with the Model Context Protocol (MCP) to standardize how AI agents interact with internal databases and orchestration APIs.
  • Hybrid & Physical AI: Experience in hybrid cloud and on-prem GPU cluster management for Physical AI workloads (e.g., 3DGS, World Models).
  • Agentic Observability: Experience utilizing LLMs for semantic monitoring and log analysis to detect complex distributed system failures that traditional threshold-based alerts miss.
Salary Ranges - $180,000- $240,000
More about Gatik

Founded in 2017 by experts in autonomous vehicle technology, Gatik has rapidly expanded its presence to Mountain View, Dallas-Fort Worth, Arkansas, and Toronto. As the first and only company to achieve fully driverless middle-mile commercial deliveries, Gatik holds a unique and defensible position in the AV industry, with a clear trajectory toward sustainable growth and profitability.

We have delivered complete, proprietary AV technology - an integration of software and hardware - to enable earlier successes for our clients in constrained Level 4 autonomy. By choosing the middle mile - with defined point-to-point delivery, we have simplified some of the more complex AV challenges, enabling us to achieve full autonomy ahead of competitors. Given extensive knowledge of Gatik's well-defined, fixed route ODDs and hybrid architecture, we are able to hyper-optimize our models with exponentially less data, establish gate-keeping mechanisms to maintain explainability, and ensure continued safety of the system for unmanned operations.

Visit us at Gatik for more company information and Careers at Gatik for more open roles.
Notable News
  • Bloomberg: Autonomous Trucking Firm Gatik Inks Contracts Worth $600 Million
  • Forbes: Hundreds' Of Gatik Robot Delivery Trucks Headed For U.S. Roads
  • Forbes:Gatik And Loblaw Announce Largest Commercial Deployment Of AV Trucks
  • Forbes: Forget robotaxis. Upstart Gatik sees middle-mile deliveries as the path to profitable AVs
  • Tech Brew: Gatik AI exec unpacks the regulations that could shape the AV industry
  • Business Wire: Gatik Paves the Way for Safe Driverless Operations ('Freight-Only') at Scale with Industry-First Third-Party Safety Assessment Framework
  • Auto Futures: Autonomous Trucking Group Gatik Secures Investment From NIPPON EXPRESS HOLDINGS
  • Automotive News: Gatik foresees hundreds of self-driving trucks on road soon, and that's just the beginning
  • Forbes: Isuzu And Gatik Go All In To Scale Up Driverless Freight Services
  • Bloomberg: Autonomous Vehicle Startup Takes Off by Picking Off Easier Routes
  • Reuters: Driverless vehicles on limited routes bump along despite US robotaxi scrutiny
Taking care of our team

At Gatik, we connect people of extraordinary talent and experience to an opportunity to create a more resilient supply chain and contribute to our environment's sustainability. We are diverse in our backgrounds and perspectives yet united by a bold vision and shared commitment to our values. Our culture emphasizes the importance of collaboration, respect and agility.

We at Gatik strive to create a diverse and inclusive environment where everyone feels they have opportunities to succeed and grow because we know that together we can do great things. We are committed to an inclusive and diverse team. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior AI Infrastructure Engineer in Mountain View, CA vacancy
  •  ...A tech company specializing in AI infrastructure is seeking a Software Engineer to build a scalable compute platform for its generative video models. The ideal candidate will have over 5 years of experience in MLOps or AI infrastructure management, along with strong Python... 
    Senior

    HeyGen

    Palo Alto, CA
    3 days ago
  • $126k - $423k

    Decisive Point is seeking a Research Engineer (AI/RL Infrastructure) in Sunnyvale, California to design and operate large-scale ML systems. You will collaborate with leading experts and contribute to next-generation physical AI, impacting self-driving technologies. This... 
    Senior

    Decisive Point

    Sunnyvale, CA
    5 days ago
  • $182k - $242k

    CoreWeave is seeking an experienced professional to contribute to building distributed systems and ML infrastructure. The successful candidate will play a pivotal role in designing an optimal research cluster experience, including a Python SDK, while collaborating closely... 
    Senior

    jobr.pro

    Sunnyvale, CA
    2 days ago
  • $262k - $365k

    Google Inc. seeks a Senior Staff Software Engineer for AI Infrastructure within Google Cloud. This role involves architecting high-performance, distributed infrastructure for agentic AI workflows, with responsibilities including system reliability and transitioning experimental... 
    Senior

    Google Inc.

    Sunnyvale, CA
    5 days ago
  •  ...Senior Lead Software Engineer Be an integral part of an agile team that's constantly pushing the envelope...  ...Chase within the Corporate Sector, Infrastructure Platforms team, you are an integral...  ...scalable cloud platforms optimized for AI/ML workloads. Partner with AI... 
    Senior
    For contractors

    Chase

    Palo Alto, CA
    1 day ago
  • $174k - $253k

    Google Inc. is seeking a Senior Software Engineer to support AI/ML Training Infrastructure in Mountain View, CA. The role involves building data and training foundations for AI innovation, collaborating with teams on design and code reviews, and ensuring effective operations... 
    Senior

    Google Inc.

    Mountain View, CA
    2 days ago
  • Google Inc. is seeking a Software Engineer III to focus on generative AI and infrastructure development in Mountain View, CA. The ideal candidate will possess strong software development skills and experience with GenAI techniques. This role offers a unique opportunity... 
    Senior

    Google Inc.

    Mountain View, CA
    1 day ago
  • $220k - $350k

    United States Digital Space LLC seeks a Principal AI Engineer to integrate AI capabilities for U.S. federal agencies. Responsibilities include designing and optimizing models and ensuring effective deployment. Candidates should have extensive software engineering experience... 
    Senior

    United States Digital Space LLC

    Palo Alto, CA
    3 days ago
  • Rivian VW Group is searching for a Senior Software Engineer with a focus on agentic applications. You will lead architectural strategies for integrating LLMs within our systems and drive cognitive automation for enhanced efficiency. The ideal candidate would have 8+ years... 
    Senior
    Full time
    Local area

    Rivian VW Group

    Palo Alto, CA
    4 days ago
  • $174k - $253k

    Google is seeking an Applied AI Customer Engineer to drive success in Google Cloud through technical expertise in Conversational AI and customer experience. You will serve as a trusted advisor, designing cloud architectures and integrating AI solutions with existing systems... 
    Senior

    Google

    Mountain View, CA
    3 days ago
  • $210k - $295k

     ...EXPLORATION TECHNOLOGIES CORP in Sunnyvale, CA, is seeking a Principal Software Engineer for the Platform Team. This role focuses on building foundational AI tooling and security infrastructure to enhance engineering workflows at SpaceX. The ideal candidate will have... 
    Senior

    SPACE EXPLORATION TECHNOLOGIES CORP

    Sunnyvale, CA
    4 days ago
  • Uniphore Technologies North America Inc is looking for a Senior Director of Engineering to lead the architecture and delivery of AI-driven products that utilize Large Language Models and Retrieval-Augmented Generation. This pivotal role will drive innovation in conversational... 
    Senior

    Uniphore Technologies North America Inc

    Palo Alto, CA
    2 days ago
  • $220k - $350k

    United States Digital Space LLC in California is seeking a skilled SR AI ENGINEER for platform infrastructure. You will build tools critical for operations at classified sites and ensure your work delivers high reliability in environments where support is limited. The ideal... 
    Senior

    United States Digital Space LLC

    Palo Alto, CA
    3 days ago
  • $174k - $253k

    Google is seeking an Applied AI Customer Engineer in Sunnyvale, CA, offering a competitive salary ranging from $174,000 to $253,000 plus a bonus and equity. In this role, you will leverage your technical expertise to assist customers in adopting Conversational AI solutions... 
    Senior

    Google

    Sunnyvale, CA
    3 days ago
  • $174k - $253k

    Google Inc. is seeking a Senior Software Engineer specialized in AI/ML for its Sunnyvale, CA location. The role requires expertise in developing and optimizing machine learning infrastructure, along with deep experience in programming with Python or C++. Candidates should... 
    Senior

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • Google Inc. is looking for a Software Engineer to develop next-generation technologies that transform user interaction across various sectors...  ...in Python or C++, as well as experience in machine learning and AI. You will collaborate with a diverse team to tackle critical... 
    Senior

    Google Inc.

    Mountain View, CA
    3 days ago
  • Israelvcforum is looking for a Senior Engineer who will be responsible for enhancing developer productivity through high-quality CI experiences. The ideal candidate should possess extensive experience in cloud production systems and be proficient in languages like Go and... 
    Senior
    Work at office
    3 days per week

    Israelvcforum

    Mountain View, CA
    5 days ago
  • Google Inc. is seeking a Senior Software Engineer for AI/ML in Sunnyvale, CA. The candidate will develop technologies that enhance user interaction and handle massive scale information. Responsibilities include writing code, testing, design collaboration, and ML solutions... 
    Senior

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $160k - $240k

     ...Health, we’re redefining what’s possible with Generative AI in healthcare. Our infrastructure provides the guardrails for safe AI governance,...  ...healthcare. Job Summary: We're looking for a skilled Platform Engineer to contribute to the development of our Gen AI for Healthcare... 
    Senior
    Live in
    Flexible hours
    3 days per week

    Qualified Health PBC

    Palo Alto, CA
    2 days ago
  • Commure is seeking a Senior Software Engineer for our R&D team, focusing on building innovative AI solutions for healthcare. The role encompasses research, productizing new technologies, and delivering impactful solutions. This position is hybrid, based in Mountain View... 
    Senior

    Commure

    Mountain View, CA
    2 days ago
  • COMMURE Incorporated is seeking a Senior Software Engineer to join our R&D team in Mountain View, CA. This hybrid role focuses on integrating...  ...healthcare, with responsibilities including building infrastructure for AI agents and collaborating with product managers and... 
    Senior

    COMMURE Incorporated

    Mountain View, CA
    2 days ago
  • General Motors is seeking a Senior Engineer who will play a critical role in improving the development ecosystem for engineering teams. This position is hybrid, requiring some office presence in Mountain View or other designated areas. The ideal candidate will have extensive... 
    Senior
    Remote job
    Work at office

    General Motors

    Mountain View, CA
    5 days ago
  •  ...& Lineage team at GM builds core infrastructure that supports the end-to-end AI lifecycle of ML pipelines—from local...  ...-facing interfaces, enabling ML engineers and researchers to develop, understand...  ...environments. The Role: As a Senior AI/ML Engineer, you will focus on... 
    Senior
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    Israelvcforum

    Mountain View, CA
    5 days ago
  • $170k - $240k

     ...Seattle, Washington area. About Us The AI Cloud and Developer Infrastructure organization is responsible for...  ...maintaining the tools and services engineers at GM use every day to do their...  ...domain. The Role We are looking for a Senior Engineer with extensive engineering... 
    Senior
    Work experience placement
    Work at office
    Remote work
    Relocation package
    Flexible hours
    3 days per week

    General Motors

    Mountain View, CA
    5 days ago
  • NVIDIA Corporation in Santa Clara is seeking a Senior Software Engineer to lead the optimization of large-scale AI systems. This role will involve profiling and...  ...will have over 8 years of experience in software infrastructure for AI systems, with expert-level programming... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $283.4k

    KLA is seeking a Sr. AI Infrastructure Software Engineer in Milpitas, California. This role focuses on C++ programming and involves designing core infrastructure for AI workloads. Join a top-notch team solving complex problems at the intersection of software and hardware... 
    Senior

    Dormont Manufacturing Company

    Milpitas, CA
    3 days ago
  • Drive Capital is seeking a Senior Customer Support Engineer in Campbell, CA. This role involves responding to customer inquiries, managing technical operations, and building strong relationships with customers based on technical excellence. The ideal candidate will have... 
    Senior

    Drive Capital

    Campbell, CA
    1 day ago
  • NVIDIA Gruppe in Santa Clara seeks a Software Engineer to join the Managed AI Research Superclusters team. You'll design and operate cutting-edge infrastructure to enable AI research, collaborating with engineers to ensure reliability and scalability. The ideal candidate... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $356.5k

    NVIDIA Gruppe is seeking an experienced AI infrastructure software engineer to join its DGX Cloud AI Efficiency Team in Santa Clara, California. This role focuses on developing the infrastructure for optimizing AI workloads and ensuring high availability and efficiency... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • Eridu AI in Saratoga, California, is seeking a Hardware Engineer to lead the design and production of advanced hardware systems. The role demands a solid foundation...  ...across various teams to deliver innovative AI infrastructure solutions. The ideal candidate will possess at... 
    Senior

    Eridu AI

    Saratoga, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Infrastructure Engineer. Be the first to apply!