Senior AI Inference Engineer - GPU, Rust & CUDA
$220kPerplexity
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr Perplexity
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...Senior- ...Sciforium AI Infrastructure Role Sciforium... ...support from AMD engineers the team is scaling... ..., and distributed inference features.... ...runtime, service, and GPU layers, working closely... ...proficiency in C++/Python/Go/Rust ~ Experience... ...Proficiency in CUDA or ROCm and...SeniorWork at officeFlexible hours
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model... ...and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer... ...management to support in API Gateway. GPU kernels migration to CuTe DSL. Port...Suggested$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...SeniorLocal area$175k - $225k
...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla... ...We're looking for an AI Inference Engineer who lives at the boundary... ...country. If you are obsessed with CUDA kernels, TensorRT... ...kernels and perform low-level GPU tuning to maximize throughput...SuggestedLocal areaRemote work$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings... ...programming in one or more of: Rust, Go, Python, or... ...plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies...SeniorFull timeLocal area$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...SeniorRemote work- Asari AI in San Francisco is seeking individuals to optimize high-performance, mission-critical computing systems. You'll work with AI... ...and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive...Flexible hours
- Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing... ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance...Senior
$216k - $270k
...As a Software Engineer on the Machine Learning Infrastructure... ..." for our large-scale GPU clusters. You will... ...compute into breakthrough AI. You will: Architect... ...languages (e.g. Python, Go, Rust, C++) ~ Experience... ...software and hardware stack (CUDA, NCCL) Experience...SeniorFull time- A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...Senior
- ...frontier of distributed and decentralised AI agents. Our research spans vector-... ...RAG techniques. Comfortable with CUDA tooling for debugging and optimising GPU workloads. Able to design and train... ...DeepSpeed or vLLM for efficient inference serving. Familiarity with LangChain...Senior
- Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...SeniorRemote workRelocation package
- ...leading design technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will architect... ...demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams....Senior
- Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure.... ...impact millions of meetings, ensuring efficient GPU utilization, and debugging production...
- ...About Us Most AI is frozen in place... ...intelligence - the inference services that serve... ...Researchers and ML engineers will hand you workloads... ...heterogeneous GPU fleets. Batching, scheduling... ...language (Go, Rust, C++). ~ Working... ...accelerator stack: CUDA fundamentals, NCCL,...Flexible hours
$150k - $250k
...Senior/Staff AI Engineer Job Locations US-CA-San Francisco - Remote | US-NC-Raleigh... ...infrastructure behind real-world model serving and inference. This is the role for engineers who... ...Improve performance across GPU and CPU pathways Work on KV cache,...SeniorFull timeRemote work- ...Senior Software Engineer We're hiring a Senior Software Engineer onto our Applied AI team to build and extend the backend systems that power... ...layer that connects them to our GPU-resident compute. A note... ...Familiarity with causal inference or graph-based systems...SeniorWork at office
$175k - $220k
...building the next generation of AI perception systems for... .... We are seeking a Senior Applied AI & Machine Learning Engineer to design, optimize, and... ...and maintain training and inference pipelines that support... ...stacks (ONNX, TensorRT, CUDA) Experience working on...SeniorShift work- An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming... ...scalable model serving, optimize multi-GPU infrastructure, and enhance system reliability...
- A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...Senior
- ...The role: SoFi's Staff AI Engineer is a hands-on AI engineering... ...organization. This is a critical, senior role responsible for setting... ...high-throughput, low-latency inference across diverse hardware... ...managing the underlying Kubernetes/GPU orchestration for custom...SeniorRemote work
$150k - $200k
...Electricity demand is skyrocketing, driven by AI factories, electric vehicles, and... ...including data ingestion, feature engineering, model training, inference, deployment, and monitoring.... ...programming skills in Python; Go, Java, or Rust is a plus. ~ Experience with...SeniorTemporary workWork experience placementLocal areaShift work- AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization... ...distributed object store, and configure RayData for GPU-direct streaming from GCS/S3. Operate distributed training...
$207k - $290k
...Description About JazzX AI: Vision:... ...seeking an experienced AI Engineer with deep expertise in... ...to join our team as a Senior Staff Architect. In this... ...techniques , including inference-time search, chain-of-thought... ...(Kubernetes, GPU/TPU clusters, and cloud...SeniorWorldwideFlexible hours- ...San Francisco is seeking an experienced engineer for its Inference Platform team. This role involves... ...inference deployments, driving improvements in AI performance, and utilizing Kubernetes... ...LLM serving frameworks and deploying GPU workloads. The position offers a competitive...Senior
$300k
...startup building an AI and cloud platform,... ...model training, or inference. Our client... ...operates high-performance GPU clusters powering... ...operate inference engines such as vLLM, SGLang... ...in Python, Go, Rust, or a comparable language... ...software stacks (CUDA, Triton, NCCL) and...SeniorPermanent employmentWorldwide$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design... ...languages (e.g., Python, Go, Rust, C++). ~ Experience with LLM... ...TensorRT-LLM, or text-generation-inference. Compensation packages... ...is to develop reliable AI systems for the world's most important...SeniorFull time$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe... ...leverage our expensive GPU resources while meeting internal... ...similar) Nice to have CUDA FFmpeg Compensation The...$142.2k - $204.6k
...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize... ..., etc. Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS... ...Databricks Databricks is the data and AI company. More than 10,000...Local areaWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!
- machine learning ai engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai research engineer San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior development executive San Francisco, CA
- senior technical manager San Francisco, CA


