Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer- BIS (Baseten Inference Stack)

Baseten

Software Engineer - Inference Stack

Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. Join us and help build the platform engineers turn to to ship AI products.

Baseten's Inference Stack team builds the distributed runtime that powers large-scale LLM inference across our platform. We operate at the intersection of distributed systems, model performance, infrastructure, and developer experience. We enable customers to deploy and operate cutting-edge LLM models with industry-leading performance, scalability, reliability, and ease of use.

As a Software Engineer on the Inference Stack team, you'll work across the stack - from the developer experience customers use to deploy models, the libraries used for features like tool calling and reasoning, all the way down to the systems we use to orchestrate deployments in Kubernetes and route traffic efficiently. This is an ideal role for engineers who enjoy owning systems in production, solving hard integration problems, and making complex infrastructure simple and reliable for users.

Example Initiatives:

  • Blog Posts

Responsibilities:

  • Develop infrastructure and orchestration systems for deploying and managing large-scale distributed LLM inference
  • Work across the stack, from customer-facing features to low-level infrastructure components
  • Build platform capabilities related to routing, autoscaling, scheduling, observability, and runtime management
  • Improve the reliability, scalability, and usability of our inference stack
  • Collaborate closely with Model Performance engineers to make new inference optimizations broadly available to customers and easy to configure
  • Help define best practices around testing, release automation, benchmarking, and operational excellence
  • Debug complex production systems spanning Kubernetes, distributed runtimes, networking, and GPU workloads
  • Make thoughtful engineering tradeoffs balancing performance, reliability, operational simplicity, and developer experience
  • Own projects end-to-end: from architecture and implementation through deployment, monitoring, and iteration based on customer feedback

Requirements:

  • Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, or a related field
  • Strong background in distributed systems, backend infrastructure, or platform engineering
  • Experience building and operating production systems where reliability, latency, and scale are first-class concerns
  • Strong sense of developer experience: you think about how systems are used, not just how they work
  • Motivated and willing to learn new languages, frameworks, and systems as needed
  • Ability to debug complex systems across multiple layers of the stack
  • Genuine interest in inference engineering. You don't need to have hands on experience but are willing to learn
  • Excellent communication and collaboration skills

Bonus:

  • Experience with Kubernetes, including concepts like operators and custom resources
  • Prior work on Dynamo, vLLM, SGLang, TensorRT-LLM, or similar inference frameworks
  • Experience with distributed scheduling, autoscaling, or service orchestration
  • Experience operating GPU workloads in production
  • Familiarity with observability tooling, CI/CD systems, or release automation
  • Experience contributing to open-source infrastructure or ML systems

Benefits:

  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Fertility and family-building stipend through Carrot
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Software Engineer- BIS (Baseten Inference Stack) in United States vacancy
  •  ...and SOTA LLM and Multimodal inference at scale across multi-GPU and...  ...collaborate across internal GPU software teams and engage with open-...  ...THE PERSON: Skilled engineer with strong technical and...  ...level optimizations with full-stack performance goals. Initiate and... 
    Suggested

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  •  ...customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate...  ..., trust, and pay for. As a Senior Software Engineer, you will: Improve the...  ...writing clean backend code. Our stack includes: Golang and React. You've built... 
    Suggested
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    New York, NY
    5 days ago
  •  ...Software Engineer - Dedicated Inference Team Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research...  ...customer issues with urgency Work across the stack - regardless of where you start, you'll end up touching... 
    Suggested
    Remote work
    Flexible hours

    Baseten

    United States
    1 day ago
  •  ...Full-Stack Software Engineer Opportunity Drug discovery is a design problem. Chemists spend hours each week combining experimental data with...  ...infrastructure for model management and low-latency inference, including security features, performance optimization, and... 
    Suggested
    Remote work

    Inductive Bio

    United States
    3 days ago
  •  ...Baseten Voice AI Engineer Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research, flexible infrastructure...  ...of Baseten Voice AI - our in-house inference stack to power Voice AI models - from product roadmap... 
    Suggested
    Remote work
    Flexible hours

    BaseTen

    United States
    2 days ago
  • $125k - $160k

     ...started. Role Overview We are seeking a versatile Full Stack Software Engineer to join our engineering team. Reporting to the Software...  ...-Augmented Generation) architectures, or local model inference (Ollama). Experience in automated testing at multiple levels... 
    Full time
    Local area
    Remote work
    Visa sponsorship
    Work visa
    Shift work

    Cala Health

    United States
    1 day ago
  •  ...Full Stack Software Engineer Location: Merrifield, VA on site Type: Full Time Complete Data Solutions (CDS) is a leading data engineering...  ..., MongoDB, Janus Graph). Integrate AI/ML models or inference APIs to enhance data analysis and decision support tools.... 
    Full time

    Navstar

    Vienna, VA
    2 days ago
  •  ...Overview BigBear.ai is seeking a Full Stack Software Engineer to help build the next generation of AI infrastructure that will drive innovation...  ...for the customer’s AI capabilities. You will focus on inference services while supporting the broader ecosystem of AI-enabled... 

    BigBear Inc

    Columbia, MD
    2 days ago
  •  ...Full-Stack Software Engineer We are the movers of the world and the makers of the future. We get up every day, roll up our sleeves and build...  ...solutions for low-latency image acquisition and real-time inference. Database Integration: Architect and manage data pipelines... 
    Full time
    Immediate start
    Remote work
    Relocation
    Flexible hours

    Ford Motor Company

    Dearborn, MI
    2 days ago
  •  ...Full Stack Software Engineer About Patlytics: Patlytics is the fastest-growing AI-native patent intelligence platform, transforming how...  ...our Python/FastAPI backend, Next.js frontend, and LLM inference pipeline, each serving millions of patent analysis requests... 
    Work experience placement
    Immediate start
    Remote work

    Patlytics

    New York, NY
    5 days ago
  • $110k - $270k

     ...(GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint...  ...C++ DSP and control code. Role The Full-Stack Engineer is key to making the Quadric product and... 
    Work at office
    Local area
    Immediate start
    Worldwide
    Flexible hours

    quadric.io

    Burlingame, CA
    4 days ago
  • $120k - $180k

     ...yet, our team is tackling cutting-edge engineering challenges to bring revolutionary...  ...role We are looking for a full-stack software enginee r to turn whiteboard ideas into...  ...features that showcase real-time sensing and inference in compelling, reliable ways.... 
    Visa sponsorship

    TAC IT

    San Francisco, CA
    3 days ago
  •  ...the Team We are a team of engineers, scientists, and domain...  ...We are looking for full stack engineers who are passionate...  ...excited to write high quality software to solve complex challenges....  ...production-level software to enable inference, optimization, and other complex... 

    Anori

    San Mateo, CA
    5 days ago
  • Full-Stack Software Engineer About Deep AI Lab Deep AI Lab is building the future of accounting work. We’re building the multi-tenant, SaaS, agentic...  ...Boot. Integrate backend services with knowledge bank, inference and agentic AI pipelines. Collaborate closely with UI/UX,... 

    Deep AI Lab

    New York, NY
    4 days ago
  • $98.4k - $164k

     ...Job Description Summary Job Description Summary Full-Stack Software Engineer & Science (Virtual Sensing) - Decentralized Grid Operations We’re...  ...coordination. Develop and deploy robust virtual sensing algorithms to infer critical power grid parameters (e.g., voltage stability,... 
    Contract work
    Work experience placement
    Remote work
    Relocation package

    GE Vernova

    Melbourne, FL
    3 days ago
  •  ...Full-Stack Software Engineer We are seeking a motivated, hardworking Full-Stack Software Engineer to join our team. The ideal candidate has...  ...Support integrating AI/ML into internal tools (data pipelines, inference endpoints, and dashboard integration). System... 
    Internship

    Sakuu

    San Jose, CA
    2 days ago
  • $8k

     ...required. Visionist has an exciting new opportunity for a Full Stack Software Engineer. You will be joining a critical mission supporting our...  ...implement, and optimize infrastructure to support AI model inference at scale - Support the development and ongoing... 
    Permanent employment
    Contract work
    Temporary work
    Flexible hours

    Visionist, Inc.

    Laurel, MD
    5 days ago
  •  ...intelligent experiences across hardware, software and service products. We are looking for a senior full-stack software engineer who is passionate about building tooling that...  ...(data preparation, training, evaluation, inference) and the developer experience challenges... 

    Apple

    Seattle, WA
    2 days ago
  •  ...accelerators creates challenges that few engineers ever encounter. In Apple’s Machine...  ...that powers large-scale ML training and inference workloads, bringing together expertise in...  ...throughout the company. You'll work across the stack — from data pipelines and backend... 

    Apple

    Santa Clara, CA
    2 days ago
  • $10k

     ...the Work You Do, Any Mission Is Possible Position: Full Stack Software Engineer- AI Infrastructure ***(Active Clearance with a Polygraph...  ...underpins the organization's AI capabilities, with a focus on inference services while supporting the broader ecosystem of AI-... 
    Extra income

    Tiber Technologies Inc

    Annapolis, MD
    2 days ago
  •  ...innovation across the customer organization. We're seeking a full-stack software engineer to support our AI infrastructure team. In this role, you'll...  ...for the customer's AI capabilities, with a focus on inference services while supporting a broader ecosystem of AI-enabled... 

    thejosefgroup.com

    Annapolis Junction, MD
    2 days ago
  • $142.2k - $204.6k

     ...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers...  .... Your work will touch the full GenAI inference stack - from kernels and runtimes to orchestration and memory... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    4 days ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for...  ...investors and staffed by leading engineers, Etched is redefining the infrastructure...  ...or complex distributed software systems like Linux internals,...  ...and user-space networking stacks. Deep understanding of... 
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    4 days ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality...  ...~ Bonus points: ~ Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink) ~ Experience... 

    Luma AI

    San Francisco, CA
    2 days ago
  • $150k - $230k

     ...Software Engineer, Full Stack (Serverless) San Francisco fal is the generative media ecosystem powering the next generation of AI products...  ...but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock... 
    Currently hiring
    Relocation package

    Fal

    San Francisco, CA
    5 days ago
  •  ...Clearance at the TS/SCI level. We're hiring a solution driven Software Engineer to work onsite with U.S. Government customers to create...  ...and maintainable software solutions. Work with cloud tech stacks to perform data extraction, manipulation, transformation; visualization... 
    Full time
    Flexible hours

    Vantor

    Springfield, VA
    1 hour ago
  • $184k - $287.5k

     ...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency...  ...architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...Clearance at the TS/SCI level. We're hiring a solution driven Software Engineer to work onsite with U.S. Government customers to create...  ...and maintainable software solutions. Work with cloud tech stacks to perform data extraction, manipulation, transformation; visualization... 
    Full time
    Flexible hours

    Vantor

    Springfield, VA
    1 hour ago
  • $200k - $220k

    Description: Bytoa is seeking a Full-Stack Software Engineer to support our AI infrastructure team. In this role, you’ll help build and maintain...  ...foundation for the customer’s AI capabilities, focusing on inference services while supporting the broader ecosystem of AI-... 
    Extra income
    Contract work

    Bytoa

    Laurel, MD
    4 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave...  ...Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry). Practical... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer- BIS (Baseten Inference Stack). Be the first to apply!