Staff Infra Engineer - Global GPU ML Inference
The Token Company
The Token Company in San Francisco is seeking a Member of Technical Staff for their infrastructure team. In this role, you will own the cloud systems that serve our compression API and build global low-latency, high-throughput GPU ML inference infrastructure. The ideal candidate will have solid experience in cloud infrastructure, including AWS and Docker, and a proven track record in building production environments. Additional benefits include equity, housing, food, and visa sponsorship. #J-18808-Ljbffr The Token Company
- A tech startup focusing on AI optimization is seeking engineers in San Francisco to enhance their GPU kernel optimization framework. Candidates should possess... .... Previous experience in GPU programming and AI/ML research is advantageous. Join a small team committed...Suggested
- B Capital is seeking a skilled engineer for GPU infrastructure in San Francisco. This role involves designing and operating high-performance systems for model inference, synthetic data generation, and reinforcement learning. The ideal candidate has strong GPU systems experience...Suggested
- ...there. The Opportunity Our Edge Inference team compiles Liquid... ...require deep understanding of both ML architectures and hardware constraints... ...kernels for CPU, NPU, and GPU architectures across diverse... ...Experience Embedded software engineering experience or work on resource...Suggested
- Jaide Health is seeking an engineer for their Model Efficiency team in... ...focuses on building reliable ML systems while enhancing core performance... ...techniques such as GPU/CUDA optimizations and collaborate... ...and insights into the LLM inference ecosystem. A commitment to diversity...SuggestedRemote job
- ..., Fly over AWS when it makes sense, PyTorch over legacy ML frameworks. The Work GPU inference : We run our own ASR models. Real-time transcription : WebSocket... ...C# Runtime : Bun, Node.js, Django, FastAPI ML : PyTorch Infra : Fly.io, Terraform, AWS (RDS), Redis Protocols : gRPC,...Suggested
- A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over...Remote work
$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate... ...closely across teams to ensure operational excellence in GPU serving workloads. Competitive salary range of $192,000 to $26...$200k - $250k
...Overview Build and operate the ML platform that powers... ...scalable training, inference, and cost‑efficient operations... ...ECS, SageMaker, GPU fleets, model serving, autoscaling... ..., or similar). Prior staff‑level role in a company with a significant AI infra footprint. Experience...Remote work- Acceler8 Talent is seeking a Member of Technical Staff focused on ML Systems & Inference in San Francisco, California. This role includes building and... ...AI workloads. The ideal candidate has strong software engineering roots and experience in inference systems. You will...
- ...technology company in San Francisco is seeking a Data Platform Engineer to drive architecture and implementation of core systems. The ideal... ...and demonstrates strong analytical skills in fields such as ML and statistics. Responsibilities include planning technical roadmaps...
$160k - $300k
...product development. We empower global innovators in automotive,... ...mission is to revolutionize how engineering decisions are made, turning... ...About the Role As a Senior / Staff Infrastructure Engineer at... ...distributed systems) Exposure to ML infra Personality & Values:...Work at officeVisa sponsorshipFlexible hours- A tech-first company is seeking a Member of Technical Staff to focus on cutting-edge AI research and development. The role involves building and scaling training and inference infrastructure, designing ML kernels, and optimizing performance. Ideal candidates should have...
- Crusoe in San Francisco is looking for a Senior Staff Network Operations Engineer to oversee the reliability of its global network. This role entails leading incident responses, defining operational standards, and guiding a team of engineers in maintaining a high-performing...
- Crusoe is seeking a Staff Software Infrastructure Engineer in San Francisco to manage cloud infrastructure, develop... ...critical role requires expertise in GPU troubleshooting, strong Linux skills... ...make a significant impact on the global energy landscape. #J-18808-Ljbffr...
- A cutting-edge AI research firm in San Francisco is seeking talent to build and optimize GPU infrastructure for large-scale model inference and training workloads. The ideal candidate will have hands-on experience with GPU systems and optimization techniques, actively...
- Claryo is seeking a Staff Software Engineer with a focus on Computer Vision Deployment based in San Francisco. The successful candidate will develop... ...include creating and managing distributed cloud GPU infrastructures and building comprehensive computer vision pipelines...Work at office3 days per week
- Requirements Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar)... ...laid out for you 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems Familiarity with at least...
$300 per month
...We’re crafting the engine that powers a world... ...the Role As a Senior Staff Cloud Support... ...networking, and AI/ML infrastructure, and... ...AI infrastructure globally. What You’ll Be Working... ...Troubleshoot NCCL, IB, GPU driver/firmware... ...workloads (training + inference) with performance...Full timeTemporary work- We are looking for an AI Infra engineer to join our growing team. We work... ...partnering closely with our Inference and Research teams to build,... ...observability solutions tailored to ML workloads running on... ...strategies) Experience managing GPU clusters and optimizing compute...
- ...candidates with expertise in AI simulation development. The role emphasizes optimizing training efficiency, enhancing GPU performance, and ensuring low-latency inference. Applicants should be proficient in methodologies for gradient checkpointing, Nsight profiling, and job...
- Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while focusing...
$197.3k - $313.7k
Staff ML Engineer, Fine Tuning - SlackSkip to main content#Staff ML Engineer, Fine Tuning - Slack... ...finetuning training pipelines on GPU infrastructure.* Brainstorm with Product... ...Familiarity with model optimization for inference (quantization, pruning, speculative decoding...Work at office$190.9k - $232.8k
A leading data and AI company is seeking a Staff Software Engineer for GenAI inference to lead the architecture and optimization of the inference engine. The role requires expertise in CUDA, GPU programming, and distributed systems design. Ideal candidates will have a strong...$181.1k - $318.4k
Apple Inc. is looking for a Staff ML Infrastructure Engineer in San Francisco to lead pre-training initiatives for cutting-edge foundation models in machine learning. The successful candidate will have over 6 years of experience in building scalable backend systems, be...$227.2k - $417k
Software Engineer, ML Infra & Distributed Systems (Staff & Principal) About the Role: As a Software Engineer on the ML Infrastructure team, you will collaborate... ...Product teams to build world‑class machine learning inference platforms. These platforms power essential services...Full timeTemporary workLocal areaFlexible hours- ...a web application that distills complex ML signals, building automation tools that run... ...for: We’re looking for an experienced engineer to help shape our architecture, strengthen... ...platform securing digital trust for leading global businesses. Our deep investments in...
- A leading streaming service is seeking a Staff Software Engineer to enhance ML infrastructure. The role involves designing scalable systems, mentoring engineers, and collaborating with cross-functional teams. Candidates should have over 8 years of experience in building...
$200k - $400k
Inferact is looking for a Developer Relations Engineer in San Francisco, California, to help developers utilize vLLM for AI inference. This unique role involves teaching technical concepts, creating educational content, and engaging with the AI infrastructure community....Remote work$253k - $308k
Staff Engineer, Engineering Productivity & AI Quality Harper is an AI-native commercial insurance... ...productivity, platform, CI/CD, build, test‑infra, or internal tooling that other engineers... ...not a process or PM role. Production AI/ML systems experience (agent harness, eval...Part timeWork at officeRelocation- ...company in San Francisco is seeking a Member of Technical Staff focused on kernels and GPU performance. This role involves optimizing GPU and... ...various hardware. Ideal candidates have strong software engineering foundations and experience with performance-critical systems...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Infra Engineer - Global GPU ML Inference. Be the first to apply!
- assistant civil engineer San Francisco, CA
- engineering aide San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- assistant engineering manager San Francisco, CA
- project engineer assistant project manager San Francisco, CA
- senior staff systems engineer San Francisco, CA
- staff automation engineer San Francisco, CA
- staff design engineer San Francisco, CA
- staff security engineer San Francisco, CA
- staff engineer San Francisco, CA


