Staff Software Engineer, Inference Infrastructure
Jaide Health
Location San Francisco, Toronto, London, New York, Montreal Employment Type Full time Location Type Hybrid Department Inference Model Serving Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications? We are looking for Members of Technical Staff to join the Model Serving team at Cohere. The team is responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints. In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments. You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs. You may be a good fit if you have: 5+ years of engineering experience running production infrastructure at a large scale Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters Experience with Kubernetes dev and production coding and support Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments Experience in compute/storage/network resource and cost management Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork The grit and adaptability to solve complex technical challenges that evolve day to day Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference. Strong understanding or working experience with distributed systems. Experience in Golang, C++ or other languages designed for high-performance scalable servers If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit the Accommodations Request Form, and we will work together to meet your needs. Full-Time Employees at Cohere enjoy these Perks: An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in-office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% Parental Leave top-up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend 6 weeks of vacation (30 working days!) #J-18808-Ljbffr Jaide Health
$405k
...of committed researchers, engineers, policy experts, and business... ...role We are seeking a Staff Software Engineer to build and... ...organization and the Cloud Inference team: taking classifiers, detection... ...inside a CSP partner's infrastructure at serving-path latency and...SuggestedWork at officeVisa sponsorshipFlexible hours$190.9k - $232.8k
...P-1285 About This Role As a staff software engineer for GenAI inference, you will lead the architecture, development, and optimization of the... ...Integrating with federated, distributed inference infrastructure - orchestrate across nodes, balance load, handle communication...SuggestedLocal areaWorldwide$200k - $400k
...as a team. About the Team The Infrastructure team builds and operates the foundations... ...GPU and modelserving platforms for LLM inference with multiprovider routing and support... ...We're hiring a Senior Infrastructure Engineer to design, build, and operate production...SuggestedFull timeWork at officeLocal area$200k - $400k
...team. About the Team The ML Infrastructure team builds the systems that power... ...and the routing layer that manages inference across multiple providers. We work... ...About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering Decagon...SuggestedFull timeWork at officeLocal area$252k - $315k
Scale GP is building the infrastructure that makes enterprise AI... ...looking for a Senior or Staff Infrastructure Engineer to act as a primary technical... ...knowledge retrieval and inference engines. You won't just... ...with 5+ years of full-time software engineering experience....SuggestedFull time- ...runs through the perception team. We're hiring a Staff Software Engineer to own ML Infrastructure at Voxel. Our applied ML team is shipping vision models... ...-deploy handoff: export trained models to optimized inference formats (TensorRT, ONNX), quantify accuracy and...Work at officeFlexible hours
$236k - $290k
...we're just getting started. Role Overview As a Staff Software Engineer on the Core Infrastructure team at Harvey, you'll play a critical role in... ...model proxy architecture that routes millions of daily inference requests while maintaining model API compatibility and...Relocation package- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to operate reliably... ...- including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms...InternshipImmediate start
$300k - $430k
...as a team. About the Team The ML Infrastructure team builds the systems that power... ...and the routing layer that manages inference across multiple providers. We work... ...experiences. About the Role We're hiring a Staff ML Infrastructure Engineer to own the platforms powering...Work at office$170k - $216k
...states. The Simulation Infrastructure team creates reliable, scalable... ...evaluate the Waymo Driver's software stack at a massive scale. We... ...range of customers Software Engineers, Product, Data Science,... ...will: Build and evolve ML inference infrastructure for...Full timeRemote work$208k - $250k
...Are you an ambitious engineer looking to make an outstanding... ...are seeking a GenAI Platform Staff Software Engineer to join our team in... ...closely with engineering, infrastructure, product, and business partners... ..., including managed inference endpoints and GPU-based workloads...Full timeWork at officeLocal area$320k - $405k
...committed researchers, engineers, policy experts, and... ...research, training, and inference to understand workload... ...Significant software engineering experience... ...~ Familiarity with ML infrastructure: GPUs, TPUs, or Trainium... ...Currently, we expect all staff to be in one of our offices...Work at officeVisa sponsorshipFlexible hours$192k - $260k
...running the world's best data and AI infrastructure platform so our customers can use deep... ...models. It offers real-time, low-latency inference, governance, monitoring, and lineage.... ...SLAs and cost efficiency. As a Staff Engineer, you'll play a critical role in shaping...Local areaWorldwide- Staff Software Engineer - Machine Learning Platform (San Francisco) Replicate makes it easy for... ...need something custom. We handle the infrastructure, so you can focus on building. Our team... ...our Models team to speed up model inference through techniques like caching,...Full timeWork at officeShift work3 days per week
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA | Seattle, WA About Anthropic Anthropic's mission is to create reliable, interpretable... ...Design, build, and own backend services and infrastructure that serve Claude across multiple CSPs, accounting...Work at officeVisa sponsorshipFlexible hours- An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate...
- ...involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or DevOps, strong skills in Kubernetes, Docker, Terraform, and a...Flexible hours
- Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving...
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality... ...not going to reach our goal with reliable & scalable infrastructure, which is going to become the differentiating factor...- ...We are seeking a highly technical Inference Engine Engineer to optimize the performance and... ...optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic... ...performance bottlenecks across the software and hardware stack, and implement targeted...WorldwideFlexible hours
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI... ...strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies... ...in building the next generation AI infrastructure. Compensation We offer...Full timeLocal area$165k
...Fluidstack At Fluidstack, we’re building the infrastructure for abundant intelligence. We partner... ...what's next. About the Role Inference is now the defining cost and latency... ...Qualifications 5+ years of professional software engineering experience with a track record of...Local area- A technology infrastructure company in San Francisco is seeking an experienced engineer for its Inference Platform team. This role involves managing end-to-end inference deployments... ...Candidates should have deep experience in software engineering, particularly with Python or...
- ...Join Onton as a Founding Engineer and set the strategic foundation... ...stack - web app, container, infrastructure, etc. Stay up-to-date on... ...out a performant and scalable inference engine to support more... ...is passionate about making software tools accessible to all, we...Full timeWork at officeLocal areaRemote workRelocation3 days per week
$252k - $315k
...platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong engineer to join our team and help us build and scale... ...candidate will have a strong understanding of software engineering principles and practices, as well...Full time$160k - $200k
...Staff Software Engineer, Payments fal is the generative media ecosystem powering the next generation... ...of AI products. We build the infrastructure, tools, and model access that teams... ...unified platform where high-performance inference, orchestration, and observability...Currently hiringFlexible hours$207k - $385k
...the Team Join the engineering teams that bring... ...Role We're seeking Software Engineers who can solve... ...new environments and infrastructure that power critical... ...optimizing how we serve inference in unique, high-... ...Member of Technical Staff . We use Senior Staff...$190.9k - $232.8k
...P-1285 About This Role As a staff software engineer for GenAI Performance and Kernel, you... ...GPU kernels powering our GenAI inference stack. You will lead development of... ...best practices Collaborate with infrastructure, tooling, and ML teams to roll out kernel...Local areaWorldwide$189k - $303k
...more efficient and accessible for all. We're searching for a Staff Software Engineer on the Autonomy Data: Continuous Learning team. The... ...interesting events to millions of miles Own model training and inference pipelines for all core Autonomy models Collaborate...Work at officeLocal area3 days per week- ...smarter, safer, and more productive. As a staff software engineer on this team, you'll lead the... ...specifically at the intersection of ML infrastructure and product engineering Technical... ...feature stores, model registries, online inference platforms) Prior work on trust &...Work experience placementCasual workLive inWork at officeRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Software Engineer, Inference Infrastructure. Be the first to apply!
- data infrastructure engineer San Francisco, CA
- infrastructure engineering manager San Francisco, CA
- remote infrastructure engineer San Francisco, CA
- principal infrastructure engineer San Francisco, CA
- senior infrastructure engineer San Francisco, CA
- security infrastructure engineer San Francisco, CA
- lead infrastructure engineer San Francisco, CA
- entry level infrastructure engineer San Francisco, CA
- infrastructure engineer San Francisco, CA
- infrastructure automation engineer San Francisco, CA

