Distributed LLM Inference Engineer - Scale HighThroughput AI
Cerebras
Anyscale is seeking a Distributed LLM Inference Engineer in Palo Alto, California. The role focuses on pushing the boundaries of performance for AI inference at large scale, collaborating closely with product teams and open source communities. The ideal candidate should have experience in running ML inference, familiarity with top deep learning frameworks like PyTorch, and a strong grasp of distributed systems. Attractive benefits and compensation plan included. #J-18808-Ljbffr Cerebras
- ...on a mission to democratize distributed computing and make it accessible... ...accelerate the progress of AI applications out into the... ...developer or data scientist can scale an ML application from their... ...the role As a Distributed LLM Inference Engineer, you will help systems and...SuggestedWork at office
$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal... ...Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for...SuggestedLocal areaRemote work- ScOp Venture Capital is looking for an ML Systems Engineer to optimize LLM inference systems crucial for their AI platform. The role focuses on enhancing performance and efficiency via low-level systems optimization, directly impacting industry leader processes in semiconductor...Suggested
- ...Department: Backend Engineer · Work type: On-... ...About A rchetype AI Archetype AI is developing... ...-time multimodal LLM for real life,... ..., and resilient distributed systems. You’ll work... ...production—at scale, with reliability,... ...-latency AI model inference and data services....SuggestedFull time
$184k - $287.5k
...skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme... ...architecture, parallel programming, distributed systems, deep learning theories... ...building and optimizing LLM inference engines (e.g., vLLM...Suggested$262k - $365k
Senior Engineering Manager AI Inference Platform, Distributed Cloud Location: Sunnyvale, CA, USA Pay US: $262,000 - $365... ...experience optimizing, profiling, and scaling production‑grade systems on GPU... ...experience implementing advanced LLM serving architectures and...- A leading AI infrastructure company in California seeks a Member of Technical Staff — Training to design and optimize large-scale distributed training systems for frontier AI models. Candidates should have 5+ years of experience in ML systems and be proficient in Python...
- ...Principal Software Engineer at JPMorganChase... ...services, enabling scale across teams and functions... ...using Model Inference servers such as... ...production operations for AI workloads,... ...architecting and deploying LLM & GNN solutions on... ...optimization and distributed systems for large...
$135k - $160k
Application Software Engineer, Inference SpaceX was founded under... ...a high-performance AI inference platform that... ...design and optimize large-scale model serving systems... ...everything from distributed infrastructure to deep... ...SGLang, vLLM, TensorRT-LLM) Develop custom tools...Permanent employmentTemporary workRemote workWorldwideWeekend work$166k - $225k
...running the world's best data and AI infrastructure platform so... ...their business. Founded by engineers — and customer obsessed — we... ...for interfacing with data to scaling our services and infrastructure... ...building the next generation distributed data storage and processing...Local areaWorldwide$168k - $270.25k
Senior Software Engineer, Distributed Systems - NIM Factory page is loaded## Senior... ...upon which every new AI-powered application is built... ...infrastructure and automation for NVIDIA Inference Microservices (NIMs). The... ...in working with large scale full stack developmentWe are...Remote work$160.36k - $240.54k
...driver, combining cutting-edge AI with automotive-grade... ...clear path to AVs at commercial scale, empowering a safer, richer,... ...Role We’re looking for senior engineers to build/scale Nuro's large-scale... ...and developing large-scale distributed applications (e.g. Kubernetes...$180k - $220k
black.ai is looking for a Senior Software Engineer, Calibration & Control in Palo Alto, CA. In this role, you will... ...the control systems for utility-scale quantum computers. You will be responsible... ...in Python or C++, with a focus on distributed storage and graph databases. The...$192k - $260k
...running the world's best data and AI infrastructure platform, so... ...companies in the world. Our engineering teams build highly technical... ...the resilience, security and scale that is critical to making... ...Optional: MS or PhD in databases, distributed systems. Comfortable working...Work at officeLocal area- Senior AI Systems Performance Engineer Palo Alto, California, United States... ...and operations at scale. SambaNova Suite™... ...for large‑scale AI inference. Responsibilities... ...both single‑node and distributed systems. Basic Qualifications... ...‑on experience with LLM or multimodal model...Full timeTemporary workLocal areaFlexible hours
- ...unlimited potential of AI to define the next era... ...supports 1,000+ chip design engineers by building tools and... ...with an emphasis on distributed systems and operational... ...concurrency, and reliability at scale. Responsibilities... ...language (including LLM‑generated code) to implement...
$198k - $286k
...mission to revolutionize AI infrastructure by... ...Modular, we optimize inference from kernel to cloud on... ...makes this possible at scale. We continuously apply... ...kernels, the inference engine, and distributed systems so that customer... ...Cloud, delivering LLM performance on the Pareto...Remote jobWork experience placementWork at officeLocal areaFlexible hours$152k - $241.5k
...platform upon which every new AI‑powered application is... ...a Senior Software Engineer - AI Inference to advance open‑source LLM serving by contributing... ...low‑latency inference at scale. This is a hands‑on role... ...mindset. Familiarity with distributed systems concepts and concurrency...$152k - $241.5k
...learning ignited modern AI — the next era of... ...seeking top‑tier AI Compiler Engineers to drive innovation... ...tangible impact on a global scale. What you’ll be doing:... ...for AI workloads (both inference and training) and... ...accelerator architectures. LLM Knowledge: Deep understanding...- ...computing experiences—from AI and data centers, to PCs,... ...enabling RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi... .... THE PERSON: Skilled engineer with strong technical and... ...serving and RL‑training. Distributed System Optimization: Tune...
- ...the Role We are seeking a Senior Inference Engineer to accelerate the performance of Pika's AI-driven products. In this highly... ...‑leading user experiences at scale. You will design and optimize inference... ...computing kernels and distributed workloads using CUDA and NCCL....Work at office3 days per week
$248.71k - $292.6k
...Groq delivers fast, efficient AI inference. Our LPU-based system powers... ...developers the speed and scale they need. Headquartered in... ...Build fast. Sr. Staff Software Engineer - High Performance GPU... ...opportunities in this role Distributed Systems Engineering : Design...$272k - $431.25k
...platform for every new AI-powered application. We... ...seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves... ...-latency inference at scale. This is a hands‑on,... ...performance engineering, and distributed systems. You will...- A leading AI infrastructure company in California is seeking a Member of Technical Staff — Inference to design and optimize large-scale AI inference systems. The role demands 5+ years in systems engineering and expertise in large-scale inference systems. Successful candidates...Flexible hours
$152k - $241.5k
NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...$154.4k - $212.3k
...one of the largest B2B AI‑native companies—decades‑proven, built‑for‑scale and designed for the enterprise... ...Overview As a Staff QA Engineer at Uniphore, you’ll... ...thrives in fast‑paced, distributed environments and is passionate... ...testing frameworks, LLM workflows, or chatbot...$152k - $241.5k
...and benchmark GenAI inference on NVIDIA's latest... ...within TensorRT-LLM, SGLang, and vLLM,... ...serving performance at scale. This team sits at... ...GPU performance engineering and public... ...memory management, and distributed inference across TensorRT... ...other emerging AI use cases....$147.4k - $272.1k
Senior Software Engineer - Distributed Systems Cupertino, California, United States Machine Learning and AI Our team is on a mission to build innovative infrastructure and tools... ...performance through algorithm design and testing Scale services to ever-increasing problem sizes...Relocation$126.8k - $220.9k
Software Engineer - Distributed Build Systems Cupertino, California, United States Software and Services... ...ships to billions of customers — a scale that has few peers in the industry. This... ...monitoring, or SRE practices Leveraging AI-assisted development tools to improve...Relocation- ...Role Are you a software engineer who has honed your... ...at the cutting edge of AI agents? This may be the... ...to perform reliably at scale. You will have the opportunity... ..., agent memory, LLM self-reflection and improvement... ...We approach our distributed world of work with flexibility...Work at officeImmediate startRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale HighThroughput AI. Be the first to apply!

