Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Software Engineer, Inference Infrastructure

Jaide Health

Location San Francisco, Toronto, London, New York, Montreal Employment Type Full time Location Type Hybrid Department Inference Model Serving Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications? We are looking for Members of Technical Staff to join the Model Serving team at Cohere. The team is responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints. In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments. You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs. You may be a good fit if you have: 5+ years of engineering experience running production infrastructure at a large scale Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters Experience with Kubernetes dev and production coding and support Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments Experience in compute/storage/network resource and cost management Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork The grit and adaptability to solve complex technical challenges that evolve day to day Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference. Strong understanding or working experience with distributed systems. Experience in Golang, C++ or other languages designed for high-performance scalable servers If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit the Accommodations Request Form, and we will work together to meet your needs. Full-Time Employees at Cohere enjoy these Perks: An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in-office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% Parental Leave top-up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend 6 weeks of vacation (30 working days!) #J-18808-Ljbffr Jaide Health

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer, Inference Infrastructure in San Francisco, CA vacancy
  • $405k

     ...of committed researchers, engineers, policy experts, and business...  ...role We are seeking a Staff Software Engineer to build and...  ...organization and the Cloud Inference team: taking classifiers, detection...  ...inside a CSP partner's infrastructure at serving-path latency and... 
    Suggested
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 day ago
  • $190.9k - $232.8k

     ...P-1285 About This Role As a staff software engineer for GenAI inference, you will lead the architecture, development, and optimization of the...  ...Integrating with federated, distributed inference infrastructure - orchestrate across nodes, balance load, handle communication... 
    Suggested
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  • $200k - $400k

     ...as a team. About the Team The Infrastructure team builds and operates the foundations...  ...GPU and modelserving platforms for LLM inference with multiprovider routing and support...  ...We're hiring a Senior Infrastructure Engineer to design, build, and operate production... 
    Suggested
    Full time
    Work at office
    Local area

    Decagon

    San Francisco, CA
    2 days ago
  • $200k - $400k

     ...team. About the Team The ML Infrastructure team builds the systems that power...  ...and the routing layer that manages inference across multiple providers. We work...  ...About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering Decagon... 
    Suggested
    Full time
    Work at office
    Local area

    Decagon

    San Francisco, CA
    2 days ago
  • $252k - $315k

    Scale GP is building the infrastructure that makes enterprise AI...  ...looking for a Senior or Staff Infrastructure Engineer to act as a primary technical...  ...knowledge retrieval and inference engines. You won't just...  ...with 5+ years of full-time software engineering experience.... 
    Suggested
    Full time

    Scale AI

    San Francisco, CA
    3 days ago
  •  ...runs through the perception team. We're hiring a Staff Software Engineer to own ML Infrastructure at Voxel. Our applied ML team is shipping vision models...  ...-deploy handoff: export trained models to optimized inference formats (TensorRT, ONNX), quantify accuracy and... 
    Work at office
    Flexible hours

    Voxel Labs

    San Francisco, CA
    16 hours ago
  • $236k - $290k

     ...we're just getting started. Role Overview As a Staff Software Engineer on the Core Infrastructure team at Harvey, you'll play a critical role in...  ...model proxy architecture that routes millions of daily inference requests while maintaining model API compatibility and... 
    Relocation package

    Harvey

    San Francisco, CA
    2 days ago
  •  ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to operate reliably...  ...- including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    16 hours ago
  • $300k - $430k

     ...as a team. About the Team The ML Infrastructure team builds the systems that power...  ...and the routing layer that manages inference across multiple providers. We work...  ...experiences. About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering... 
    Work at office

    Decagon

    San Francisco, CA
    16 hours ago
  • $170k - $216k

     ...states. The Simulation Infrastructure team creates reliable, scalable...  ...evaluate the Waymo Driver's software stack at a massive scale. We...  ...range of customers Software Engineers, Product, Data Science,...  ...will: Build and evolve ML inference infrastructure for... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    16 hours ago
  • $208k - $250k

     ...Are you an ambitious engineer looking to make an outstanding...  ...are seeking a GenAI Platform Staff Software Engineer to join our team in...  ...closely with engineering, infrastructure, product, and business partners...  ..., including managed inference endpoints and GPU-based workloads... 
    Full time
    Work at office
    Local area

    Ripple

    San Francisco, CA
    4 days ago
  • $320k - $405k

     ...committed researchers, engineers, policy experts, and...  ...research, training, and inference to understand workload...  ...Significant software engineering experience...  ...~ Familiarity with ML infrastructure: GPUs, TPUs, or Trainium...  ...Currently, we expect all staff to be in one of our offices... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 day ago
  • $192k - $260k

     ...running the world's best data and AI infrastructure platform so our customers can use deep...  ...models. It offers real-time, low-latency inference, governance, monitoring, and lineage....  ...SLAs and cost efficiency. As a Staff Engineer, you'll play a critical role in shaping... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  • Staff Software Engineer - Machine Learning Platform (San Francisco) Replicate makes it easy for...  ...need something custom. We handle the infrastructure, so you can focus on building. Our team...  ...our Models team to speed up model inference through techniques like caching,... 
    Full time
    Work at office
    Shift work
    3 days per week

    Replicate, Inc.

    San Francisco, CA
    3 days ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA | Seattle, WA About Anthropic Anthropic's mission is to create reliable, interpretable...  ...Design, build, and own backend services and infrastructure that serve Claude across multiple CSPs, accounting... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  • An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate... 

    SpreeAI

    San Francisco, CA
    16 hours ago
  •  ...involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or DevOps, strong skills in Kubernetes, Docker, Terraform, and a... 
    Flexible hours

    FriendliAI

    San Francisco, CA
    16 hours ago
  • Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving... 

    SupportFinity™

    San Francisco, CA
    16 hours ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality...  ...not going to reach our goal with reliable & scalable infrastructure, which is going to become the differentiating factor... 

    Luma AI

    San Francisco, CA
    2 days ago
  •  ...We are seeking a highly technical Inference Engine Engineer to optimize the performance and...  ...optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic...  ...performance bottlenecks across the software and hardware stack, and implement targeted... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    2 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI...  ...strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies...  ...in building the next generation AI infrastructure. Compensation We offer... 
    Full time
    Local area

    Together AI

    San Francisco, CA
    2 days ago
  • $165k

     ...Fluidstack At Fluidstack, we’re building the infrastructure for abundant intelligence. We partner...  ...what's next. About the Role Inference is now the defining cost and latency...  ...Qualifications 5+ years of professional software engineering experience with a track record of... 
    Local area

    Fluidstack

    San Francisco, CA
    1 day ago
  • A technology infrastructure company in San Francisco is seeking an experienced engineer for its Inference Platform team. This role involves managing end-to-end inference deployments...  ...Candidates should have deep experience in software engineering, particularly with Python or... 

    Fluidstack

    San Francisco, CA
    1 day ago
  •  ...Join Onton as a Founding Engineer and set the strategic foundation...  ...stack - web app, container, infrastructure, etc. Stay up-to-date on...  ...out a performant and scalable inference engine to support more...  ...is passionate about making software tools accessible to all, we... 
    Full time
    Work at office
    Local area
    Remote work
    Relocation
    3 days per week

    Onton

    San Francisco, CA
    16 hours ago
  • $252k - $315k

     ...platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale...  ...candidate will have a strong understanding of software engineering principles and practices, as well... 
    Full time

    DiversityJobs Inc

    San Francisco, CA
    15 days ago
  • $160k - $200k

     ...Staff Software Engineer, Payments fal is the generative media ecosystem powering the next generation...  ...of AI products. We build the infrastructure, tools, and model access that teams...  ...unified platform where high-performance inference, orchestration, and observability... 
    Currently hiring
    Flexible hours

    fal

    San Francisco, CA
    4 days ago
  • $207k - $385k

     ...the Team Join the engineering teams that bring...  ...Role We're seeking Software Engineers who can solve...  ...new environments and infrastructure that power critical...  ...optimizing how we serve inference in unique, high-...  ...Member of Technical Staff . We use Senior Staff... 

    OpenAI

    San Francisco, CA
    16 hours ago
  • $190.9k - $232.8k

     ...P-1285 About This Role As a staff software engineer for GenAI Performance and Kernel, you...  ...GPU kernels powering our GenAI inference stack. You will lead development of...  ...best practices Collaborate with infrastructure, tooling, and ML teams to roll out kernel... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  • $189k - $303k

     ...more efficient and accessible for all. We're searching for a Staff Software Engineer on the Autonomy Data: Continuous Learning team. The...  ...interesting events to millions of miles Own model training and inference pipelines for all core Autonomy models Collaborate... 
    Work at office
    Local area
    3 days per week

    Aurora Innovation

    San Francisco, CA
    16 hours ago
  •  ...smarter, safer, and more productive. As a staff software engineer on this team, you'll lead the...  ...specifically at the intersection of ML infrastructure and product engineering Technical...  ...feature stores, model registries, online inference platforms) Prior work on trust &... 
    Work experience placement
    Casual work
    Live in
    Work at office
    Remote work

    GrabJobs

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer, Inference Infrastructure. Be the first to apply!