Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Software Engineer, Inference Cloud

Cerebras Systems, Inc.

About the Role We're hiring a Staff Engineer to own major areas of the architecture of our Inference Cloud Platform. This team owns the cloud layer behind our Inference Service, with responsibility for availability, latency, reliability, and global scale. This is a hands‑on individual contributor role for an engineer who wants to work on the hardest distributed systems problems in the stack: multi‑region traffic architecture, graceful degradation under bursty AI workloads, performance at high QPS, and the operating model for a platform that has to stay fast and available under load. You'll write code, lead key architectural decisions in your domain, debug production issues, and help shape technical direction across adjacent teams. If you're interested in building the next‑generation architecture of a globally distributed inference platform, we'd like to talk. Responsibilities Platform Direction. Help shape the technical direction for the Inference Cloud Platform, including multi‑region topology, failure domains, service boundaries, and system evolution over time, and own the roadmap for major technical areas. Core Cloud Systems. Design and build critical platform components such as service discovery, request routing, load balancing, caching, batching, and traffic management for AI inference workloads. Reliability & Performance. Architect active‑active systems with rapid failover, graceful degradation, and clear SLOs. Drive system‑level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand. Traffic Control & Service Tiers. Define platform mechanisms for admission control, quota management, rate limiting, and differentiated quality of service across workload types and customer tiers. Execution on Critical Paths. Write and review production code in the most important parts of the platform. Make high‑consequence architectural decisions within your area and set the technical bar through design reviews, code reviews, and sound engineering judgment. Production Leadership. Lead on the hardest production issues and cross‑system bottlenecks. Drive observability, incident response, capacity planning, and post‑incident improvement with a high standard for operational rigor. Technical Influence. Partner with ML, Product, Infrastructure, and Platform teams to translate product and business requirements into scalable system designs, and drive alignment on shared technical decisions within your domain and adjacent platform surfaces. Mentorship. Raise the effectiveness of senior engineers through design feedback, pairing, and clear technical standards. Skills & Qualifications 8+ years of experience in software engineering, with substantial individual contributor experience building and operating large‑scale distributed systems or cloud infrastructure. Deep expertise in distributed systems architecture in cloud environments, including networking, compute orchestration, container platforms, and multi‑region production services. Strong track record of making sound architectural decisions for highly available, latency‑sensitive systems at scale. Experience optimizing latency, throughput, and efficiency in high‑QPS systems. Experience with TTFT and tail‑latency reduction is a strong plus. Strong proficiency in backend or systems languages such as Go, C++, or Python, with the expectation that you can contribute production code directly. Experience designing observability and reliability practices, including metrics, logging, tracing, alerting, incident response, and SLO‑driven operations. Ability to influence senior engineers and cross‑functional partners through technical credibility, communication, and judgment, especially within your domain and adjacent systems. Preferred Skills & Qualifications Experience with ML inference infrastructure, model serving systems, or GPU‑accelerated workloads. Why Join Cerebras People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cutting‑edge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, non‑corporate work culture that respects individual beliefs. Find out more about what it's like to work at Cerebras here ! Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around us. This website or its third‑party tools process personal data. For more details, click here to review our CCPA disclosure notice. #J-18808-Ljbffr Cerebras Systems, Inc.

Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer, Inference Cloud in Sunnyvale, CA vacancy
  • Location: Sunnyvale We're hiring a Staff Engineer to help lead, drive, and contribute to projects on our Inference Platform team. Our team...  ...which glues together the cloud components to the ML components...  ...Qualifications 8+ years of experience in software engineering, with substantial... 
    Suggested

    Cerebras

    Sunnyvale, CA
    2 days ago
  • Cerebras Systems, Inc. is seeking a Software Engineer in Sunnyvale, California to enhance high-performance, low-latency inference infrastructure. This role involves deploying scalable services, optimizing resource allocation, and integrating with containerized environments... 
    Suggested

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  • Cerebras Systems in Sunnyvale, CA is seeking a Member of Technical Staff (Software Engineer) to implement infrastructure for high-performance, low-latency inference services. Applicants should have a Master’s degree in Computer Science or a related field and at least one... 
    Suggested

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    1 day ago
  • $248.71k - $292.6k

     ...Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud...  ...anything is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference Systems...  ...across ML compilers, orchestration, cloud infrastructure, and hardware ops to ensure... 
    Suggested

    I did my part and supported the Regular Toilet

    Palo Alto, CA
    2 days ago
  • $188k - $275k

     ...Description CoreWeave is The Essential Cloud for AI™. Built for pioneers by...  .... Learn more at What You'll Do: Inference Platform Team The Inference team builds...  ...inference systems. About the role: As a Staff Software Engineer (IC5) on the Inference team, you will... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    26 days ago
  • $181.1k - $318.4k

    Senior/Staff Software Engineer - AI, Search & Knowledge Platforms Santa Clara, California, United States Machine Learning and AI The...  ...to over half a billion end-user devices and to Private Cloud Compute inference infrastructure. As a member of the team, you would design... 
    Relocation

    Apple Inc.

    Santa Clara, CA
    1 day ago
  • $160.5k - $240.7k

     ...Technologies, Inc. Job Area Engineering Group Machine Learning Engineering...  ...learning hardware and software. Minimum Qualifications Bachelor...  ...hardware, firmware, cloud, and product teams. Experience...  ...spanning model architectures, inference pipelines, and runtime frameworks... 
    Work experience placement
    Work from home

    Qualcomm

    Santa Clara, CA
    3 days ago
  •  ...Systems, Inc. is looking for a Sr. Member of Technical Staff to design software features that enhance system resiliency and high...  .... The role includes developing scalable AI inference services and deploying cloud-based workflows. Ideal candidates have a master's degree... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  • Cerebras Systems, Inc. is looking for a Software Engineer to enhance its Inference Platform. You will design and maintain critical software to support a high-performance AI architecture. As part of your role, you will tackle innovative challenges and help ensure the reliability... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  •  ...industry-leading training and inference speeds; over 10 times faster than GPU‑based hyperscale cloud inference services. This order...  .... We're hiring a Principal Engineer for our Inference Cloud Platform...  ...10+ years of experience in software engineering, with substantial... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  • $230k - $250k

    Cerebras Systems is seeking a Sr. Member of Technical Staff in Sunnyvale, CA. This role involves designing resilient software features for cloud-based AI inference, leveraging AWS tools and services. Candidates should have a Master’s degree in Computer Science and experience... 

    Cerebras Systems

    Sunnyvale, CA
    1 day ago
  •  ...Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude...  ...inference. About the Role We’re hiring a Software Engineer to help contribute to projects on our... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  • Cerebras is seeking a Software Engineer to join our Inference Platform team in Sunnyvale, California. This role involves developing and leading projects that integrate cloud and ML components. You will contribute to shaping the technical direction and improve system performance... 

    Cerebras

    Sunnyvale, CA
    2 days ago
  • Cerebras Systems, Inc. is hiring a Staff Engineer to oversee critical areas of the architecture for their Inference Cloud Platform. This role focuses on hands-on contributions...  ...ideal candidate will have over 8 years in software engineering with expertise in distributed... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  • $165k - $242k

    A cloud service provider is seeking a Senior Software Engineer II for their Inference team in Sunnyvale, California. In this role, you'll lead design reviews, implement optimizations, and improve service reliability. The ideal candidate has extensive experience with distributed... 

    CoreWeave

    Sunnyvale, CA
    1 day ago
  • Cerebras Systems, Inc. is looking for an experienced Staff Engineer to join our Inference Platform team in Sunnyvale, California. The role involves designing and maintaining production software that operates at scale, solving complex engineering challenges on the cutting... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  • $262k - $365k

    Senior Engineering Manager AI Inference Platform, Distributed Cloud Location: Sunnyvale, CA, USA Pay US: $262,000 - $365,000 (USD) + 25% bonus target + equity + benefits. About the role In this role, you will be pivotal in architecting and optimizing the serving stack... 

    Google Inc.

    Sunnyvale, CA
    1 day ago
  •  ...RL training and SOTA LLM and multimodal inference at scale across multi‑GPU and multi‑node...  ...You will collaborate across internal GPU software teams and engage with open‑source...  ...software ecosystem. THE PERSON Skilled engineer with strong technical and analytical expertise... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

    NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer - AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • NVIDIA Gruppe in Santa Clara is seeking a Deep Learning Software Engineer focused on improving performance of deep learning inference software like TensorRT. The ideal candidate will have a strong foundation in C++ and Python, relevant experience with deep learning frameworks... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

    Senior Software Engineer - Deep Learning Inference What you’ll be doing: Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance Develop components of TensorRT, NVIDIA’s SDK for high-performance deep learning... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $128.7k - $261.3k

    About the Team The Model Deployment & Inference Solutions team in GM AV deploys machine learning...  ...currently performed manually by engineers. Build the developer experience that ML...  ...Experience designing clean, well‑tested software with clear interfaces and good abstractions... 
    Local area
    Remote work
    Flexible hours
    Shift work

    General Motors

    Mountain View, CA
    1 day ago
  •  ...Mountain View is seeking a Machine Learning Engineer to build and optimize the infrastructure...  ...data understanding, optimizing inference pipelines, and collaborating with teams...  ...frameworks are required. Knowledge of NLP and cloud ML infrastructure is preferred. #J-18808... 

    Corvic

    Mountain View, CA
    21 hours ago
  • Cerebras Systems, Inc. is seeking a Principal Engineer to lead their Inference Cloud Platform team. This pivotal role involves identifying key platform...  ...reliability. The ideal candidate has over 10 years of software engineering experience and deep expertise in... 

    Cerebras Systems, Inc.

    Sunnyvale, CA
    10 hours ago
  •  ...looking for a Senior ML Infrastructure Engineer in Mountain View, California. This position...  ...build and scale robust platforms for ML inference workflows supporting GM’s AI efforts....  ...model serving strategies and handle backend software components. The position demands 5+... 
    Remote job

    Israelvcforum

    Mountain View, CA
    2 days ago
  • $155.42k - $395.9k

     ...Description About the Team: The ML Inference Platform is part of the AV ML...  .... Our team owns the cloud-agnostic, reliable, and cost-...  ...seeking a Senior ML Infrastructure engineer to help build and scale...  ...implement core platform backend software components. Collaborate with... 
    Local area
    Remote work
    Relocation
    Relocation package
    Flexible hours

    Israelvcforum

    Mountain View, CA
    2 days ago
  • General Motors is seeking a Senior ML Infrastructure Engineer to build and scale a robust platform for machine learning inference workflows. You will design backend software components, collaborate with ML engineers, and lead initiatives across GM's ML ecosystem. With... 
    Remote job

    General Motors

    Sunnyvale, CA
    4 days ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • Cerebras is seeking a Staff Engineer to join their Inference Platform team in Sunnyvale, California. This role...  ...contributing to projects focused on cloud and ML components, with responsibilities...  ...have over 8 years of experience in software engineering, particularly in... 

    Cerebras

    Sunnyvale, CA
    2 days ago
  • $262k - $365k

     ...experience. 8 years of experience in software development. 7 years of...  ...Master’s degree or PhD in Engineering, Computer Science, or a...  ...compilers that power the Google Cloud Platform (GCP) Cloud Tensor Processing...  ...execute massive training and inference workloads using PyTorch and... 
    Worldwide

    Google

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer, Inference Cloud. Be the first to apply!