Software Engineer- BIS (Baseten Inference Stack)

Baseten

Software Engineer - Inference Stack

Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. Join us and help build the platform engineers turn to to ship AI products.

Baseten's Inference Stack team builds the distributed runtime that powers large-scale LLM inference across our platform. We operate at the intersection of distributed systems, model performance, infrastructure, and developer experience. We enable customers to deploy and operate cutting-edge LLM models with industry-leading performance, scalability, reliability, and ease of use.

As a Software Engineer on the Inference Stack team, you'll work across the stack - from the developer experience customers use to deploy models, the libraries used for features like tool calling and reasoning, all the way down to the systems we use to orchestrate deployments in Kubernetes and route traffic efficiently. This is an ideal role for engineers who enjoy owning systems in production, solving hard integration problems, and making complex infrastructure simple and reliable for users.

Example Initiatives:

Blog Posts

Responsibilities:

Develop infrastructure and orchestration systems for deploying and managing large-scale distributed LLM inference
Work across the stack, from customer-facing features to low-level infrastructure components
Build platform capabilities related to routing, autoscaling, scheduling, observability, and runtime management
Improve the reliability, scalability, and usability of our inference stack
Collaborate closely with Model Performance engineers to make new inference optimizations broadly available to customers and easy to configure
Help define best practices around testing, release automation, benchmarking, and operational excellence
Debug complex production systems spanning Kubernetes, distributed runtimes, networking, and GPU workloads
Make thoughtful engineering tradeoffs balancing performance, reliability, operational simplicity, and developer experience
Own projects end-to-end: from architecture and implementation through deployment, monitoring, and iteration based on customer feedback

Requirements:

Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, or a related field
Strong background in distributed systems, backend infrastructure, or platform engineering
Experience building and operating production systems where reliability, latency, and scale are first-class concerns
Strong sense of developer experience: you think about how systems are used, not just how they work
Motivated and willing to learn new languages, frameworks, and systems as needed
Ability to debug complex systems across multiple layers of the stack
Genuine interest in inference engineering. You don't need to have hands on experience but are willing to learn
Excellent communication and collaboration skills

Bonus:

Experience with Kubernetes, including concepts like operators and custom resources
Prior work on Dynamo, vLLM, SGLang, TensorRT-LLM, or similar inference frameworks
Experience with distributed scheduling, autoscaling, or service orchestration
Experience operating GPU workloads in production
Familiarity with observability tooling, CI/CD systems, or release automation
Experience contributing to open-source infrastructure or ML systems

Benefits:

Competitive compensation, including meaningful equity.
100% coverage of medical, dental, and vision insurance for employee and dependents
Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
Paid parental leave
Fertility and family-building stipend through Carrot
Company-facilitated 401(k)
Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law.

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Software Engineer- BIS (Baseten Inference Stack) in United States vacancy

Senior Software Development Engineer - SGLang and Inference Stack
...and SOTA LLM and Multimodal inference at scale across multi-GPU and... ...collaborate across internal GPU software teams and engage with open-... ...THE PERSON: Skilled engineer with strong technical and... ...level optimizations with full-stack performance goals. Initiate and...
Suggested
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Full-Stack Software Engineer, Inference
...customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate... ..., trust, and pay for. As a Senior Software Engineer, you will: Improve the... ...writing clean backend code. Our stack includes: Golang and React. You've built...
Suggested
Full time
Work at office
Remote work
Flexible hours
Cohere
New York, NY
5 days ago
Software Engineer - Dedicated Inference
...Software Engineer - Dedicated Inference Team Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research... ...customer issues with urgency Work across the stack - regardless of where you start, you'll end up touching...
Suggested
Remote work
Flexible hours
Baseten
United States
1 day ago
Full Stack Software Engineer
...Full-Stack Software Engineer Opportunity Drug discovery is a design problem. Chemists spend hours each week combining experimental data with... ...infrastructure for model management and low-latency inference, including security features, performance optimization, and...
Suggested
Remote work
Inductive Bio
United States
3 days ago
Software Engineer - Voice AI (Inference Runtime)
...Baseten Voice AI Engineer Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research, flexible infrastructure... ...of Baseten Voice AI - our in-house inference stack to power Voice AI models - from product roadmap...
Suggested
Remote work
Flexible hours
BaseTen
United States
2 days ago
Full Stack Software Engineer
$125k - $160k
...started. Role Overview We are seeking a versatile Full Stack Software Engineer to join our engineering team. Reporting to the Software... ...-Augmented Generation) architectures, or local model inference (Ollama). Experience in automated testing at multiple levels...
Full time
Local area
Remote work
Visa sponsorship
Work visa
Shift work
Cala Health
United States
1 day ago
Full Stack Software Engineer
...Full Stack Software Engineer Location: Merrifield, VA on site Type: Full Time Complete Data Solutions (CDS) is a leading data engineering... ..., MongoDB, Janus Graph). Integrate AI/ML models or inference APIs to enhance data analysis and decision support tools....
Full time
Navstar
Vienna, VA
2 days ago
Full Stack Software Engineer
...Overview BigBear.ai is seeking a Full Stack Software Engineer to help build the next generation of AI infrastructure that will drive innovation... ...for the customer’s AI capabilities. You will focus on inference services while supporting the broader ecosystem of AI-enabled...
BigBear Inc
Columbia, MD
2 days ago
Full-stack Software Engineer
...Full-Stack Software Engineer We are the movers of the world and the makers of the future. We get up every day, roll up our sleeves and build... ...solutions for low-latency image acquisition and real-time inference. Database Integration: Architect and manage data pipelines...
Full time
Immediate start
Remote work
Relocation
Flexible hours
Ford Motor Company
Dearborn, MI
2 days ago
Full Stack Software Engineer (North America)
...Full Stack Software Engineer About Patlytics: Patlytics is the fastest-growing AI-native patent intelligence platform, transforming how... ...our Python/FastAPI backend, Next.js frontend, and LLM inference pipeline, each serving millions of patent analysis requests...
Work experience placement
Immediate start
Remote work
Patlytics
New York, NY
5 days ago
Full-Stack Software Engineer
$110k - $270k
...(GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint... ...C++ DSP and control code. Role The Full-Stack Engineer is key to making the Quadric product and...
Work at office
Local area
Immediate start
Worldwide
Flexible hours
quadric.io
Burlingame, CA
4 days ago
Full-Stack Software Engineer
$120k - $180k
...yet, our team is tackling cutting-edge engineering challenges to bring revolutionary... ...role We are looking for a full-stack software enginee r to turn whiteboard ideas into... ...features that showcase real-time sensing and inference in compelling, reliable ways....
Visa sponsorship
TAC IT
San Francisco, CA
3 days ago
Senior Full Stack Software Engineer
...the Team We are a team of engineers, scientists, and domain... ...We are looking for full stack engineers who are passionate... ...excited to write high quality software to solve complex challenges.... ...production-level software to enable inference, optimization, and other complex...
Anori
San Mateo, CA
5 days ago
Full-Stack Software Engineer
Full-Stack Software Engineer About Deep AI Lab Deep AI Lab is building the future of accounting work. We’re building the multi-tenant, SaaS, agentic... ...Boot. Integrate backend services with knowledge bank, inference and agentic AI pipelines. Collaborate closely with UI/UX,...
Deep AI Lab
New York, NY
4 days ago
Full-Stack Software Engineer & Scientist (Virtual Sensing)
$98.4k - $164k
...Job Description Summary Job Description Summary Full-Stack Software Engineer & Science (Virtual Sensing) - Decentralized Grid Operations We’re... ...coordination. Develop and deploy robust virtual sensing algorithms to infer critical power grid parameters (e.g., voltage stability,...
Contract work
Work experience placement
Remote work
Relocation package
GE Vernova
Melbourne, FL
3 days ago
Full-Stack Software Engineer, Manufacturing/R&D Data Platform (NestJS, Next.js, Kafka)
...Full-Stack Software Engineer We are seeking a motivated, hardworking Full-Stack Software Engineer to join our team. The ideal candidate has... ...Support integrating AI/ML into internal tools (data pipelines, inference endpoints, and dashboard integration). System...
Internship
Sakuu
San Jose, CA
2 days ago
Full Stack Software Engineer (AWS Certification Required)
$8k
...required. Visionist has an exciting new opportunity for a Full Stack Software Engineer. You will be joining a critical mission supporting our... ...implement, and optimize infrastructure to support AI model inference at scale - Support the development and ongoing...
Permanent employment
Contract work
Temporary work
Flexible hours
Visionist, Inc.
Laurel, MD
5 days ago
Senior Full Stack Software Development Engineer - AI, Search & Knowledge
...intelligent experiences across hardware, software and service products. We are looking for a senior full-stack software engineer who is passionate about building tooling that... ...(data preparation, training, evaluation, inference) and the developer experience challenges...
Apple
Seattle, WA
2 days ago
Full Stack Software Engineer - ML Compute Capacity
...accelerators creates challenges that few engineers ever encounter. In Apple’s Machine... ...that powers large-scale ML training and inference workloads, bringing together expertise in... ...throughout the company. You'll work across the stack — from data pipelines and backend...
Apple
Santa Clara, CA
2 days ago
Full Stack Software Engineer- AI Infrastructure
$10k
...the Work You Do, Any Mission Is Possible Position: Full Stack Software Engineer- AI Infrastructure ***(Active Clearance with a Polygraph... ...underpins the organization's AI capabilities, with a focus on inference services while supporting the broader ecosystem of AI-...
Extra income
Tiber Technologies Inc
Annapolis, MD
2 days ago
Full Stack Software Engineer (AI Infrastructure)
...innovation across the customer organization. We're seeking a full-stack software engineer to support our AI infrastructure team. In this role, you'll... ...for the customer's AI capabilities, with a focus on inference services while supporting a broader ecosystem of AI-enabled...
thejosefgroup.com
Annapolis Junction, MD
2 days ago
Software Engineer - GenAI inference
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers... .... Your work will touch the full GenAI inference stack - from kernels and runtimes to orchestration and memory...
Local area
Worldwide
Databricks
San Francisco, CA
4 days ago
Inference Software Engineer
$2,000 per month
...building the world's first AI inference system purpose-built for... ...investors and staffed by leading engineers, Etched is redefining the infrastructure... ...or complex distributed software systems like Linux internals,... ...and user-space networking stacks. Deep understanding of...
Work at office
Relocation package
ETCHED LLC
San Jose, CA
4 days ago
Software Engineer, Inference
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality... ...~ Bonus points: ~ Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink) ~ Experience...
Luma AI
San Francisco, CA
2 days ago
Software Engineer, Full Stack (Serverless)
$150k - $230k
...Software Engineer, Full Stack (Serverless) San Francisco fal is the generative media ecosystem powering the next generation of AI products... ...but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock...
Currently hiring
Relocation package
Fal
San Francisco, CA
5 days ago
Full Stack Software Engineer (TS/SCI with Poly)
...Clearance at the TS/SCI level. We're hiring a solution driven Software Engineer to work onsite with U.S. Government customers to create... ...and maintainable software solutions. Work with cloud tech stacks to perform data extraction, manipulation, transformation; visualization...
Full time
Flexible hours
Vantor
Springfield, VA
1 hour ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency... ...architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry...
NVIDIA
Santa Clara, CA
3 days ago
Full Stack Software Engineer
...Clearance at the TS/SCI level. We're hiring a solution driven Software Engineer to work onsite with U.S. Government customers to create... ...and maintainable software solutions. Work with cloud tech stacks to perform data extraction, manipulation, transformation; visualization...
Full time
Flexible hours
Vantor
Springfield, VA
1 hour ago
Full Stack Software Engineer (AI Infrastructure)
$200k - $220k
Description: Bytoa is seeking a Full-Stack Software Engineer to support our AI infrastructure team. In this role, you’ll help build and maintain... ...foundation for the customer’s AI capabilities, focusing on inference services while supporting the broader ecosystem of AI-...
Extra income
Contract work
Bytoa
Laurel, MD
4 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... ...Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry). Practical...
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer- BIS (Baseten Inference Stack). Be the first to apply!