Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Software Engineer, Inference Infrastructure

Jaide Health

Location San Francisco, Toronto, London, New York, Montreal Employment Type Full time Location Type Hybrid Department Inference Model Serving Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications? We are looking for Members of Technical Staff to join the Model Serving team at Cohere. The team is responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints. In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments. You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs. You may be a good fit if you have: 5+ years of engineering experience running production infrastructure at a large scale Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters Experience with Kubernetes dev and production coding and support Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments Experience in compute/storage/network resource and cost management Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork The grit and adaptability to solve complex technical challenges that evolve day to day Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference. Strong understanding or working experience with distributed systems. Experience in Golang, C++ or other languages designed for high-performance scalable servers If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit the Accommodations Request Form, and we will work together to meet your needs. Full-Time Employees at Cohere enjoy these Perks: An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in-office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% Parental Leave top-up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend 6 weeks of vacation (30 working days!) #J-18808-Ljbffr Jaide Health

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer, Inference Infrastructure in San Francisco, CA vacancy
  • $405k

    About the role Anthropic's Inference organization serves Claude...  ...add. We're looking for a Staff Engineer to be a technical lead for...  ...counterpart to Anthropic's central Infrastructure org on the compilers, build...  ...of them Have significant software engineering experience,... 
    Suggested
    Work at office
    Visa sponsorship
    Flexible hours

    jobr.pro

    San Francisco, CA
    3 days ago
  • $200k - $300k

    F2 Staff Software Engineer, Infrastructure Location: San Francisco Employment Type: Full time Location Type: Hybrid Department: Engineering, Product,...  ...vector search infrastructure, and high‑throughput LLM inference paths; balancing latency, throughput, and cost. Design... 
    Suggested
    Full time

    F2

    San Francisco, CA
    2 days ago
  • Software Engineer (AI Infrastructure / Training / Inference) About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to operate reliably at production scale. This role exists because modern generative... 
    Suggested

    SpreeAI

    San Francisco, CA
    4 days ago
  • $320k - $405k

     ...group of committed researchers, engineers, policy experts, and...  ...with research, training, and inference to understand workload shapes...  ...qualifications Significant software engineering experience building...  ...deployments) Familiarity with ML infrastructure: GPUs, TPUs, or Trainium;... 
    Suggested

    Menlo Ventures

    San Francisco, CA
    5 days ago
  •  ...innovative GPU marketplace and AI inference service that promise affordability...  ...the Role We're seeking a Platform Engineer to design and build the control plane...  ..., developer platforms, or infrastructure services Expert-level software engineering skills in Go (Golang)... 
    Suggested
    Worldwide

    Hyperbolic Labs

    San Francisco, CA
    3 days ago
  • $208k - $250k

     ...that operate across Ripple's polyrepo engineering environment. Define and advance Ripple...  ...Collaborate closely with engineering, infrastructure, product, and business partners to...  ...hybrid environments, including managed inference endpoints and GPU‑based workloads. Excellent... 
    Full time
    Local area

    TryApplyNow

    San Francisco, CA
    3 days ago
  • An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate... 

    SpreeAI

    San Francisco, CA
    4 days ago
  •  ...involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or DevOps, strong skills in Kubernetes, Docker, Terraform, and a... 
    Flexible hours

    FriendliAI

    San Francisco, CA
    4 days ago
  • Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving... 

    SupportFinity™

    San Francisco, CA
    4 days ago
  • $200k - $400k

    Inferact Inc. seeks a cloud orchestration engineer to build the operational backbone ensuring reliable performance at scale for vLLM, the company's AI inference engine. This role involves designing systems for cluster management, deployment automation, and production monitoring... 
    Remote work

    Inferact Inc.

    San Francisco, CA
    18 hours ago
  • $192k - $260k

     ...the world's best data and AI infrastructure platform so our customers...  ...and serving frontier AI model inference for open source models like...  ...necessary. We’re looking for engineers who have owned high scale operational...  ...runtimes at scale. As a Staff Engineer, you’ll play a... 
    Local area
    Worldwide

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • $405k

     ...growing group of committed researchers, engineers, policy experts, and business...  ...is seeking an exceptional Senior Staff Software Engineer to join the Claude Developer...  ...partnering closely with Research, Inference, Platform, Infrastructure, and Safeguards to ensure the... 
    Work at office
    Remote work
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • $170k - $220k

     ...supply chain and enterprise software investors. We're live with manufacturers...  ...with design and backend/infrastructure to shape APIs and UX for...  ...of professional software engineering experience with a strong...  ...discriminated unions, type inference). • Next.js mastery: Production... 

    Tenkara Labs, Inc.

    San Francisco, CA
    4 days ago
  •  ...looking for a Developer Platform Engineer to build and maintain their API platform for inference. This role involves defining...  ...APIs and creating robust infrastructures across cloud providers. Ideal candidates have 5+ years of software engineering experience, are collaborative... 

    TypeSafe AI

    San Francisco, CA
    5 days ago
  •  ...the most persistent challenges in data infrastructure: extracting accurate, structured information...  .... We are a small, fast-growing team of engineers in San Francisco powering Fortune 100...  ...in low-latency, high-throughput inference for OCR and multimodal models. Own profiling... 
    Work at office
    Visa sponsorship
    Relocation package

    Trypulse

    San Francisco, CA
    2 days ago
  • $325k

    About the Team Our Inference team brings OpenAI's most capable research and technology to the world through...  ...model inference. About the Role We're hiring engineers to scale and optimize OpenAI's inference infrastructure across emerging GPU platforms. You'll work across... 

    Centaur Labs

    San Francisco, CA
    3 days ago
  • $205k - $250k

     ...About the Role We are seeking a Backend Engineer to design and scale high-performance...  ...delivering reliable, secure, and scalable infrastructure. Ideally, you’ve worked on services...  ...integrating external AI APIs, managing ML inference pipelines, or supporting data infrastructure... 
    Work experience placement
    Private practice
    Work at office

    3Y Health

    San Francisco, CA
    3 days ago
  • NextGenEnergyJobs is seeking a Staff Software Engineer to develop and enhance datasets and models for autonomous driving technology. This role will involve improving dataset quality, training and inference pipelines, and collaborating with cross-functional teams. Candidates... 

    NextGenEnergyJobs

    San Francisco, CA
    1 day ago
  •  ...Experience building and deploying AI Inference and Generative AI...  ...with foundation models, prompt engineering, fine‑tuning, semantic search...  ...Experience with AI/ML orchestration software KServe, Knative, Kubeflow (...  ...Cloudera is looking for a Staff Software Engineer to join the... 

    Cloudera

    San Francisco, CA
    3 days ago
  • $190.9k - $232.8k

    About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the...  ...performance GPU kernels powering our GenAI inference stack. You will lead development of...  ...best practices Collaborate with infrastructure, tooling, and ML teams to roll out... 
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    4 days ago
  • $189k - $303k

     ...more efficient and accessible for all. We’re searching for a Staff Software Engineer on the Autonomy Data: Continuous Learning team. The ideal...  ...interesting events to millions of miles Own model training and inference pipelines for all core Autonomy models Collaborate across... 
    Local area

    I did my part and supported the Regular Toilet

    San Francisco, CA
    1 day ago
  • $189k - $303k

    Staff Software Engineer, Continuous Learning The role involves developing and improving datasets and models for autonomous driving technology,...  ...reinforcement learning techniques, as well as managing training and inference pipelines to enhance the Aurora Driver system. Key... 
    Work at office
    3 days per week

    NextGenEnergyJobs

    San Francisco, CA
    2 days ago
  • $150k - $230k

    fal is building the fastest and most scalable infrastructure for AI inference. Fal Serverless powers 1,300+ endpoints on the fal Marketplace and handles...  ...product. About this role As a Forward Deployed Engineer on Serverless, you will work directly with enterprise customers... 
    Currently hiring
    Relocation
    Visa sponsorship

    Fal

    San Francisco, CA
    1 day ago
  •  ...: Join Onton as a Founding Engineer and set the strategic foundation...  ...stack — web app, container, infrastructure, etc. Stay up-to-date on...  ...a performant and scalable inference engine to support more powerful...  ...is passionate about making software tools accessible to all, we... 
    Full time
    Work at office
    Local area
    Remote work
    Relocation
    3 days per week

    Onton

    San Francisco, CA
    4 days ago
  • Maven in San Francisco is seeking an innovative engineer to join their Acceleration team. This role emphasizes leveraging AI tools to improve product delivery and enhance user experience. The ideal candidate will have substantial industry experience and a strong proficiency... 

    Maven

    San Francisco, CA
    1 day ago
  • $207k - $345k

    Senior Staff Software Engineer, Infrastructure About this Position Rippling gives businesses one place to run HR, IT, and Finance. It brings together all of the workforce systems that are normally scattered across a company, like payroll, expenses, benefits, and computers... 
    Work at office
    Local area
    3 days per week

    Rippling

    San Francisco, CA
    2 days ago
  • $180k - $250k

     ...generation of AI products. We build the infrastructure, tools, and model access that teams...  ...unified platform where high-performance inference, orchestration, and observability come...  ...architecture on top of our in‑house inference engine, focusing on maximizing throughput... 
    Currently hiring
    Relocation package

    fal

    San Francisco, CA
    2 days ago
  •  ...growing business with billions in revenue About the Role As a Staff Software Engineer on the Consumer Experience team, you'll build the products...  ...graphs, including entity resolution and real‑time inference Experience building AI‑powered systems, including LLM‑based... 
    Full time
    Freelance
    Internship
    Work at office
    Remote work
    Flexible hours

    Handshake

    San Francisco, CA
    5 days ago
  • $197.3k - $313.7k

    ## Staff Software Engineer, Electron & Browser Infrastructure - Slack DesktopApplyremote type: Office Tech-Flexiblelocations: Georgia - Atlanta: Washington - Seattle Metro - Remote: Washington - Seattle: California - Remote: California - San Franciscotime type: Full timeposted... 
    Work at office
    Remote work

    Slack Enterprise

    San Francisco, CA
    2 days ago
  •  ...general intelligence benefits all of humanity. The Identity Infrastructure Engineering team sits at the core of this effort, designing and...  ...innovative AI research. About the Role We’re looking for a Staff+ Software Engineer to help build and evolve the identity... 
    Work at office
    Relocation package

    Aimling

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer, Inference Infrastructure. Be the first to apply!