AI Inference Infra Tech Lead - Cloud GPU & Scale
$208.8kByteDance
A leading tech company in San Jose is looking for a Tech Lead Software Engineer specializing in AI Inference Infrastructure. This role entails designing container-based management systems and collaborating across teams to develop state-of-the-art inference solutions. Candidates should have significant experience in ML infrastructure and orchestration technologies like Docker and Kubernetes. This position offers an attractive salary range of $208,800 - $438,000 annually, alongside comprehensive benefits. #J-18808-Ljbffr
$212.3k - $275.8k
...Join Cisco's CX AI Incubation Team as... ...Experiences, across cloud and on-prem environments... ...to large multi-GPU servers, including... ...work on cutting-edge inference optimization - speculative... ...models at scale. WhatYou'llDo... ...~On-Prem, Edge & Infra Hands-on experience...CloudFull timeTemporary workLocal areaFlexible hours3 days per week$272k - $431.25k
...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join... ...bottlenecks and lead initiatives to systematically... ...processing, model training, and inference pipelines.* Proficiency in... ...well as familiarity with cloud computing platforms (e.g.,...Suggested$212.8k - $387.6k
...building large-scale and highly available cloud infrastructure,... ...infrastructure or AI infrastructure.... ...the areas below: GPU Infra (GPU cluster management... ...frameworks, Inference engines (vLLM,... ...industry-leading public-cloud platforms... ...rapidly growing tech company. By constantly...CloudTemporary workLocal area- ...Tech Lead, Data & Inference Engineer Sunnyvale, California, United States About... ...how business brands scale demand generation and account... ...specialized vertical in Applied AI, Machine Learning, and Data... ...Exposure to Kubernetes and cloud infrastructure (AWS, GCP, or...CloudFull time
$184k - $356.5k
NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves... ...high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming experience...Cloud$262k - $365k
Senior Staff Software Architect, GPU Uber Tech Leads corporate_fare Google place... ...information at massive scale, and extend well beyond web search... ...that power Google’s AI and HPC infrastructure. Your... ...critical Google services and Cloud. Your work is fundamental to...CloudFull time$181.1k - $318.4k
...GPU Software Architecture Engineer, Graphics... ...engineer to lead server-side ML acceleration... ...on Private Cloud Compute that enables... ...Intelligence at unprecedented scale. It will involve... ...understanding of inference workload... ...define the future of AI experiences delivered...CloudRelocation- ...builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the... ...to deliver industry-leading training and inference speeds and empowers machine... ...over 10 times faster than GPU-based hyperscale cloud inference services. This...Cloud
$156k - $387.6k
...Volcano Engine Public Cloud. Our mission is... ...for cloud and AI computing. Our... ...acceleration - GPU virtualization and... ...wave of cloud-scale computing. Responsibilities... ...training and inference. - Drive end-to-... ...people. We lead with curiosity,... ...rapidly growing tech company. By...CloudTemporary workLocal area$184k - $287.5k
...unlimited potential of AI to define the next era of... ...computing. An era in which our GPU acts as the brains of... ...on GPU Performance at Scale. At NVIDIA, this role is... ...and develop new, leading solutions. Engage with HPC... ...Experience with modern cloud and container-based enterprise...CloudRemote work$124k - $195.5k
...NVIDIA Corporation is seeking an AI Inference Performance Engineer - New College Grad 2026 in Santa Clara. This role involves optimizing AI inference benchmarks using NVIDIA’s accelerators and working with various teams on performance enhancements. Applicants should have...- ...Corporation is seeking a Principal Developer Operations Lead in Santa Clara, CA, to drive the global scale expansion of AI infrastructure. You will be responsible for... ...strategic capacity planning across a growing AI/GPU infrastructure portfolio. Applying your extensive...Cloud
- Senior Systems Software Engineer - GPU Performance at Scale We are looking for a dedicated... ...will drive innovation in AI and GPU computing. What You’ll Be Doing Lead the implementation of performance... ...(CUDA). Experience with modern cloud and container‑based enterprise...Cloud
$218.8k - $335.3k
...the team: The AV ML Infra team at GM builds... ...unique demands of AI and ML innovation,... ...includes: AI Validation & Inference: Ensures robust... ...performance by running large-scale simulation... ...inference across cloud and on‑prem compute... ...infrastructure. You will lead technically complex...CloudFlexible hours- ...the team The AV ML Infra team builds end‑to‑... ...products to support AI and ML innovation... ...includes: AI Validation & Inference: Ensures robust... ...by running large‑scale simulation workloads... ...inference across cloud and on‑prem compute... ...Project Ownership: Lead projects from inception...Cloud
$275.8k - $340.5k
...team: The AV ML Infra team at GM builds ML... ...unique demands of AI and ML innovation,... ...AI Validation & Inference: Ensures robust model... ...performance by running large-scale simulation... ...and inference across cloud and on-prem compute... .../ML Engineer will lead a growing...CloudLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours$244.8k
...About the Team The Inference Infrastructure... ...plane for large-scale LLM inference.... ...computing across multi-cloud and global... ...of cloud-native, GPU-optimized... ...developers to bring AI workloads from research... ...people. We lead with curiosity,... ...rapidly growing tech company. By constantly...CloudTemporary workLocal area$165k - $242k
...Senior Software Engineer II, Inference role at CoreWeave.... ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and... ...cost-per-token analytics, GPU resource isolation)....CloudPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work- ...computing experiences-from AI and data centers, to PCs... ...member of the LLM inference framework team, you will... ...multi-node inference at scale. Your work will directly... ...strategic partners, and cloud providers) and will be upstreamed... ...systems, and GPU runtime and kernel backends...Cloud
$136.8k - $259.2k
...Software Engineer Graduate (Inference Infrastructure) - 2026... ...control plane for large-scale LLM inference. We are... ...computing across multi-cloud and global datacenters.... ...external developers to bring AI workloads from research... ..., scheduling, and GPU acceleration. Responsibilities...CloudTemporary work$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA /... ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and... ...cost-per-token analytics, GPU resource isolation)....CloudPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA /... ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global... ...-per-token analytics, GPU resource isolation)....CloudPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$272k - $425.5k
...Software Engineer – Large-Scale LLM Memory and... ...throughput, low-latency inference framework for serving generative AI and reasoning models... ...Dynamo orchestrates GPU shards, routes... ...remote file/object/cloud storage to support large... ...integrations with leading LLM serving engines...CloudLocal areaRemote work$168k - $322k
...NVIDIA Gruppe is seeking a Senior AI Platform Engineer to improve engineering efficiency and data security through AI-powered products. The role involves working with Cloud and AI/ML teams to build and scale infrastructure and shape the technological future of the organization...Cloud$184k - $287.5k
...engineers to join us and build AI inference systems that serve large-scale models with extreme... ...inference stacks, optimize GPU kernels and compilers, drive... ..., multi-node, and multi-cloud environments. You’ll collaborate... ...to the industry‑leading MLPerf Inference benchmarking...Cloud- ...builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the... ...to deliver industry-leading training and inference speeds and empowers machine... ...over 10 times faster than GPU-based hyperscale cloud inference services. About...Cloud
$230k - $250k
...is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience with containerization...Cloud$184k - $287.5k
...influential Generative AI Technical Engagement Lead to evangelize for,... ...This includes NVIDIA GPU architectures, DGX systems... ...NeMo frameworks, and inference libraries like... ...findings from large-scale model training and inference... ...on-premise and cloud infrastructures. Possess...Cloud$272k - $431.25k
...unlimited potential of AI to define the next era... ...computing. An era in which our GPU acts as the brains of... ..., as a Principal Rack Scale Systems Infrastructure... ...NVIDIA, partners, and leading cloud and enterprise clients... ..., firmware, and infra management as one operational...CloudShift work$296.3k
...Foundations team is a part of the Scaling Foundations team in Embodied AI and is responsible for... ...pipelines on modern cloud / GPU infrastructure, with... ...observability and cost efficiency. Lead development of... ...Consumption/Mining/Quality and Infra Foundations to turn...CloudLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Infra Tech Lead - Cloud GPU & Scale. Be the first to apply!
- technical leader San Jose, CA
- technical lead San Jose, CA
- cloud admin San Jose, CA
- junior cloud administrator San Jose, CA
- oracle cloud technical San Jose, CA
- cloud engineer azure San Jose, CA
- senior cloud service delivery manager San Jose, CA
- vp cloud San Jose, CA
- cloud administrator San Jose, CA
- road techs San Jose, CA

