Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed LLM Inference Engineer - Scale HighThroughput AI

Cerebras

Anyscale is seeking a Distributed LLM Inference Engineer in Palo Alto, California. The role focuses on pushing the boundaries of performance for AI inference at large scale, collaborating closely with product teams and open source communities. The ideal candidate should have experience in running ML inference, familiarity with top deep learning frameworks like PyTorch, and a strong grasp of distributed systems. Attractive benefits and compensation plan included. #J-18808-Ljbffr Cerebras

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Distributed LLM Inference Engineer - Scale HighThroughput AI in Palo Alto, CA vacancy
  •  ...on a mission to democratize distributed computing and make it accessible...  ...accelerate the progress of AI applications out into the...  ...developer or data scientist can scale an ML application from their...  ...the role As a Distributed LLM Inference Engineer, you will help systems and... 
    Suggested
    Work at office

    Cerebras

    Palo Alto, CA
    5 days ago
  • $272k - $425.5k

    Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal...  ...Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for... 
    Suggested
    Local area
    Remote work

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • ScOp Venture Capital is looking for an ML Systems Engineer to optimize LLM inference systems crucial for their AI platform. The role focuses on enhancing performance and efficiency via low-level systems optimization, directly impacting industry leader processes in semiconductor... 
    Suggested

    ScOp Venture Capital

    Santa Clara, CA
    2 days ago
  •  ...Department: Backend Engineer · Work type: On-...  ...About A rchetype AI Archetype AI is developing...  ...-time multimodal LLM for real life,...  ..., and resilient distributed systems. You’ll work...  ...production—at scale, with reliability,...  ...-latency AI model inference and data services.... 
    Suggested
    Full time

    Neara

    Palo Alto, CA
    2 days ago
  • $184k - $287.5k

     ...skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme...  ...architecture, parallel programming, distributed systems, deep learning theories...  ...building and optimizing LLM inference engines (e.g., vLLM... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $262k - $365k

    Senior Engineering Manager AI Inference Platform, Distributed Cloud Location: Sunnyvale, CA, USA Pay US: $262,000 - $365...  ...experience optimizing, profiling, and scaling production‑grade systems on GPU...  ...experience implementing advanced LLM serving architectures and... 

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • A leading AI infrastructure company in California seeks a Member of Technical Staff — Training to design and optimize large-scale distributed training systems for frontier AI models. Candidates should have 5+ years of experience in ML systems and be proficient in Python... 

    RadixArk

    Palo Alto, CA
    4 days ago
  •  ...Principal Software Engineer at JPMorganChase...  ...services, enabling scale across teams and functions...  ...using Model Inference servers such as...  ...production operations for AI workloads,...  ...architecting and deploying LLM & GNN solutions on...  ...optimization and distributed systems for large... 

    TwinThread LLC

    Palo Alto, CA
    5 days ago
  • $135k - $160k

    Application Software Engineer, Inference SpaceX was founded under...  ...a high-performance AI inference platform that...  ...design and optimize large-scale model serving systems...  ...everything from distributed infrastructure to deep...  ...SGLang, vLLM, TensorRT-LLM) Develop custom tools... 
    Permanent employment
    Temporary work
    Remote work
    Worldwide
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Palo Alto, CA
    1 day ago
  • $166k - $225k

     ...running the world's best data and AI infrastructure platform so...  ...their business. Founded by engineers — and customer obsessed — we...  ...for interfacing with data to scaling our services and infrastructure...  ...building the next generation distributed data storage and processing... 
    Local area
    Worldwide

    Databricks Inc.

    Mountain View, CA
    2 days ago
  • $168k - $270.25k

    Senior Software Engineer, Distributed Systems - NIM Factory page is loaded## Senior...  ...upon which every new AI-powered application is built...  ...infrastructure and automation for NVIDIA Inference Microservices (NIMs). The...  ...in working with large scale full stack developmentWe are... 
    Remote work

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $160.36k - $240.54k

     ...driver, combining cutting-edge AI with automotive-grade...  ...clear path to AVs at commercial scale, empowering a safer, richer,...  ...Role We’re looking for senior engineers to build/scale Nuro's large-scale...  ...and developing large-scale distributed applications (e.g. Kubernetes... 

    Icehouseventures

    Mountain View, CA
    3 days ago
  • $180k - $220k

    black.ai is looking for a Senior Software Engineer, Calibration & Control in Palo Alto, CA. In this role, you will...  ...the control systems for utility-scale quantum computers. You will be responsible...  ...in Python or C++, with a focus on distributed storage and graph databases. The... 

    black.ai

    Palo Alto, CA
    3 days ago
  • $192k - $260k

     ...running the world's best data and AI infrastructure platform, so...  ...companies in the world. Our engineering teams build highly technical...  ...the resilience, security and scale that is critical to making...  ...Optional: MS or PhD in databases, distributed systems. Comfortable working... 
    Work at office
    Local area

    Menlo Ventures

    Mountain View, CA
    5 days ago
  • Senior AI Systems Performance Engineer Palo Alto, California, United States...  ...and operations at scale. SambaNova Suite™...  ...for large‑scale AI inference. Responsibilities...  ...both single‑node and distributed systems. Basic Qualifications...  ...‑on experience with LLM or multimodal model... 
    Full time
    Temporary work
    Local area
    Flexible hours

    SambaNova

    Palo Alto, CA
    1 day ago
  •  ...unlimited potential of AI to define the next era...  ...supports 1,000+ chip design engineers by building tools and...  ...with an emphasis on distributed systems and operational...  ...concurrency, and reliability at scale. Responsibilities...  ...language (including LLM‑generated code) to implement... 

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $198k - $286k

     ...mission to revolutionize AI infrastructure by...  ...Modular, we optimize inference from kernel to cloud on...  ...makes this possible at scale. We continuously apply...  ...kernels, the inference engine, and distributed systems so that customer...  ...Cloud, delivering LLM performance on the Pareto... 
    Remote job
    Work experience placement
    Work at office
    Local area
    Flexible hours

    Modular Mailing Systems, Inc.

    Los Altos, CA
    2 days ago
  • $152k - $241.5k

     ...platform upon which every new AI‑powered application is...  ...a Senior Software Engineer - AI Inference to advance open‑source LLM serving by contributing...  ...low‑latency inference at scale. This is a hands‑on role...  ...mindset. Familiarity with distributed systems concepts and concurrency... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...learning ignited modern AI — the next era of...  ...seeking top‑tier AI Compiler Engineers to drive innovation...  ...tangible impact on a global scale. What you’ll be doing:...  ...for AI workloads (both inference and training) and...  ...accelerator architectures. LLM Knowledge: Deep understanding... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...computing experiences—from AI and data centers, to PCs,...  ...enabling RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi...  .... THE PERSON: Skilled engineer with strong technical and...  ...serving and RL‑training. Distributed System Optimization: Tune... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  •  ...the Role We are seeking a Senior Inference Engineer to accelerate the performance of Pika's AI-driven products. In this highly...  ...‑leading user experiences at scale. You will design and optimize inference...  ...computing kernels and distributed workloads using CUDA and NCCL.... 
    Work at office
    3 days per week

    Pika

    Palo Alto, CA
    2 days ago
  • $248.71k - $292.6k

     ...Groq delivers fast, efficient AI inference. Our LPU-based system powers...  ...developers the speed and scale they need. Headquartered in...  ...Build fast. Sr. Staff Software Engineer - High Performance GPU...  ...opportunities in this role Distributed Systems Engineering : Design... 

    I did my part and supported the Regular Toilet

    Palo Alto, CA
    3 days ago
  • $272k - $431.25k

     ...platform for every new AI-powered application. We...  ...seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves...  ...-latency inference at scale. This is a hands‑on,...  ...performance engineering, and distributed systems. You will... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • A leading AI infrastructure company in California is seeking a Member of Technical Staff — Inference to design and optimize large-scale AI inference systems. The role demands 5+ years in systems engineering and expertise in large-scale inference systems. Successful candidates... 
    Flexible hours

    RadixArk

    Palo Alto, CA
    1 day ago
  • $152k - $241.5k

    NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $154.4k - $212.3k

     ...one of the largest B2B AI‑native companies—decades‑proven, built‑for‑scale and designed for the enterprise...  ...Overview As a Staff QA Engineer at Uniphore, you’ll...  ...thrives in fast‑paced, distributed environments and is passionate...  ...testing frameworks, LLM workflows, or chatbot... 

    Uniphore Technologies North America Inc

    Palo Alto, CA
    3 days ago
  • $152k - $241.5k

     ...and benchmark GenAI inference on NVIDIA's latest...  ...within TensorRT-LLM, SGLang, and vLLM,...  ...serving performance at scale. This team sits at...  ...GPU performance engineering and public...  ...memory management, and distributed inference across TensorRT...  ...other emerging AI use cases.... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $147.4k - $272.1k

    Senior Software Engineer - Distributed Systems Cupertino, California, United States Machine Learning and AI Our team is on a mission to build innovative infrastructure and tools...  ...performance through algorithm design and testing Scale services to ever-increasing problem sizes... 
    Relocation

    Apple Inc.

    Cupertino, CA
    3 days ago
  • $126.8k - $220.9k

    Software Engineer - Distributed Build Systems Cupertino, California, United States Software and Services...  ...ships to billions of customers — a scale that has few peers in the industry. This...  ...monitoring, or SRE practices Leveraging AI-assisted development tools to improve... 
    Relocation

    Apple Inc.

    Cupertino, CA
    5 days ago
  •  ...Role Are you a software engineer who has honed your...  ...at the cutting edge of AI agents? This may be the...  ...to perform reliably at scale. You will have the opportunity...  ..., agent memory, LLM self-reflection and improvement...  ...We approach our distributed world of work with flexibility... 
    Work at office
    Immediate start
    Remote work
    Flexible hours

    Centaur Labs

    Mountain View, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale HighThroughput AI. Be the first to apply!