Software Engineer - Training Infrastructure
Baseten
Software Engineer
Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products.
The Role
As a Software Engineer on the Training Infrastructure team, you'll architect and lead development of our training platform, supporting top tier research engineers and model developers. You'll make key technical decisions for the infrastructure enabling developers to deploy, scale, and monitor their workloads with high performance and reliability. You'll own scheduling, storage, networking, reliability, and observability of technical systems in the training stack.
Example Initiatives
Take a look at what we've built so far:
- Overview of the product so far
- Training docs overview
- Story of the Training product
- Research we've done
Responsibilities
- Design and architect scalable infrastructure systems for our ML training platform (e.g. scheduling, storage, and networking)
- Partner closely with developers and research engineers to translate complex training requirements into technical solutions
- Design and architect a global training scheduler
- Design and architect reinforcement learning systems and continuous learning pipelines
- Drive long-term improvements to improve reliability of systems and velocity of development
- Partner closely with SRE and Capacity teams to unlock state of the art training infrastructure
- Make critical architectural decisions balancing performance with system reliability
- Lead technical discussions and mentor junior engineers on infrastructure best practices
- Contribute to long-term technical strategy and infrastructure roadmap
Requirements
- Bachelor's degree or higher in Computer Science or related field
- Proficiency in Go, with Python experience a plus
- Deep expertise with Kubernetes in production environments
- Extensive experience with major cloud providers (AWS, GCP) and neo-cloud providers (Crusoe, DigitalOcean, Nebius) a plus
- Advanced understanding of distributed systems concepts and performance tuning
- Proven experience designing observability systems
- Experience with ML/AI workloads and MLOps platforms highly valued
Nice To Have
- Experience with distributed storage systems
- Experience with workload orchestration platforms like Temporal or Airflow
- Familiarity or experience with the open source training stack and frameworks (NCCL, PyTorch, Megatron, NemoRL, VeRL, Axolotl, HF Trainier) and distributed training techniques (FSDP, DeepSpeed).
- Experience developing AI products, tooling, or agents
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Fertility and family-building stipend through Carrot
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.
At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.
We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).
- ...Staff Software Engineer Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building... ...adoption of AI. The internal infrastructure team is responsible for building world-class...TrainingFull timeWork at officeRemote workFlexible hours
- ...democratize access to cutting‑edge AI infrastructure previously reserved for... ...layer seamlessly routes training and inference jobs across... ...As an Infrastructure Product Engineer, you will play a pivotal role... ...environments. ~ Advanced software engineering skills; capable...TrainingFull timeRemote work
- ...benefits all of humanity. The Identity Infrastructure Engineering team sits at the core of this effort,... .... About the Role As a Software Engineer on the Identity... ...cloud deployments, large-scale model training, and emerging AI use cases. Implement...TrainingWork at officeRemote workRelocation package
- ...Compute Infrastructure Engineer Compute Infrastructure builds the platform that turns enormous amounts... ...storage, data centers, orchestration software, agent infrastructure, developer tools... ...capacity online, optimize training workloads from profiler traces and benchmarks...TrainingRemote work
- ...As an Infrastructure Engineer, you'll build and deploy the computational infrastructure that powers... ...administration skills Experience releasing complex software, including building and packaging... ...operating cryptocurrency mining or ML training infrastructure at scale Familiarity...TrainingRemote workFlexible hours
$170k - $216k
...Software Engineer, Simulation Infrastructure Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver... ..., including exact work location, experience, relevant training and education, and skill level. Your recruiter can share...TrainingFull timeRemote work$600 per month
...Senior Software Engineer, Infrastructure & Tools Austin, TX About Osano: Osano is an innovative B-Corporation built around a simple belief... ...with significant potential You'll have access to our training program, well-defined career paths, and a leadership...TrainingRemote work$204k - $259k
...Senior Software Engineer, Simulation ML Infrastructure Waymo is an autonomous driving technology company with the mission to be the world's most trusted... ...of realistic environments for the testing and training of the Waymo Driver. To increase the fidelity and steerability...TrainingFull timeRemote work$170k - $200k
...valuable. About The Role Zora is looking for an experienced infrastructure software engineer to work closely with the development team to ensure that... ...related skills, experience and relevant education and training, to determine compensation that is fair and competitive for...TrainingFull timeLocal areaRemote workHome officeFlexible hours- ...demands. About the Role: In the ML Training, our mission is to provide a reliable,... ...the overall developer experience of ML engineers including building tools for testing, validation... ...position may also involve working with software and technologies subject to U.S. export...Training
- ...Software Engineer, ML Infrastructure Engineering · Full-time · San Francisco; New York Our mission is to automate coding. The first step in our... ...engineers to enable their work through improvements to our training framework, systems reliability/performance, and...TrainingFull time
- ...functional group working across engineering, product, research, and design... ...re looking for an experienced Software Engineer to help build the machine learning infrastructure that powers OpenAI's... ...that enables teams to build, train, deploy, serve, monitor, and continuously...TrainingRemote work
- ...personal freedom. The Department: Onchain The Role: Software Engineer (Infrastructure) The infrastructure team at Gemini creates and... ...~ Experience working with engineering teams, teaching, training, and mentoring on how to implement best-practice technical...TrainingRemote workFlexible hours
$230k
...unchecked growth. About the role As a software engineer on the Fleet High Performance... ...Minimizing hardware failure is key to research training progress and stable services, as even... ...and efficiency of our supercomputing infrastructure. Our team empowers strong engineers...Training- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI to operate reliably... ...Design and build scalable infrastructure supporting training and inference workflows. Develop high-performance...TrainingInternshipImmediate start
$184k - $259.44k
...Scale AI is seeking a highly skilled and motivated Software Engineer, Frontier AI Infrastructure to join our dynamic Public Sector Engineering team.... ...qualifications, interview performance, and relevant education or training. Scale employees in eligible roles are also granted...TrainingFull timeWork at office3 days per weekEarly shift$232k - $283k
...Senior Software Engineer 3 - (AI Infrastructure, Kubernetes, Python) Clearance: TS/SCI w/ poly Position ID: 20-24-017-SWE3 Location: Annapolis... ...classes and will cover costs associated with job related training and certifications. Akina is committed to excellence...TrainingContract workFlexible hours$100k - $300k
...Software Engineer, Ai Training And Infrastructure Pittsburgh, San Francisco, Bengaluru Company Overview At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe...Training$148k - $222k
...Senior Software Engineer – Developer Infrastructure At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves... ...job-related skills, relevant experience, education or training, and work location. In addition to base salary, our...TrainingRemote work$127k - $223k
...realistic closed-loop simulation engine built with the latest in... ...Develop the tooling, infrastructure, and pipelines to support complex... ...of interesting scenarios for training and evaluation. Develop and... ...programming and strong software engineering fundamentals with...TrainingFull timeWork at officeRemote workWork from homeFlexible hours$232k - $283k
...Senior Software Engineer 3 - (AI Infrastructure, AWS, Kubernetes) Join us in building the next generation of AI infrastructure that will power innovation... ...and will cover costs associated with job related training and certifications. Akina is committed to excellence...TrainingContract workFlexible hours$180k - $300k
...DatologyAI Infrastructure Engineer Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. At DatologyAI, we've...TrainingWork at officeRelocation package$180k - $250k
...Senior Software Engineer, Infrastructure Artemis is building the future of AI-driven defense - helping companies detect and defend themselves... ...with AI/ML and data teams — Support GPU workloads, model training pipelines, and large-scale data warehouses (Snowflake, ClickHouse...Training- ...world's military and critical infrastructure. We are building a... ...be combined at the speed of software, limited by only the hard constraints... ...and optimize inference engine architecture Tune data storage... ...Experience with PyTorch, training and fine-tuning Machine Learning...TrainingWork at officeLocal area3 days per week
$100k - $300k
...Senior Software Engineer, Infrastructure Pittsburgh, San Francisco, Bengaluru Company Overview At Skild AI, we are building the world's... ...software infrastructure and back-end services (e.g., model training infrastructure, AI developer tools, metrics dashboards)....TrainingWork experience placement$191k - $234k
...Software Engineer 2 - (AI Infrastructure, AWS, Kubernetes) Join us in building the next generation of AI infrastructure that will power innovation... ...classes and will cover costs associated with job related training and certifications. Akina is committed to excellence...TrainingFlexible hours- ...Software Engineer Voxel's perception system is the technical core of everything we ship. Our models detect human activity... ...'re hiring a strong software engineer to own the ML Infrastructure that powers how Voxel trains and ships vision models. You'll build systems that...TrainingWork at officeRemote workFlexible hours
- ...re building Helsing's first U.S.-based engineering team in Washington, DC. As an early member... ...team, you'll architect and build the infrastructure foundation that enables our mission in... ...workload infrastructure for builds and AI training, and co-develop Python and Rust based...TrainingLocal areaRemote workFlexible hours
- ...hiring, and upskilling, from freelance AI training gigs to first internships to full-time... ...Role Handshake is building the infrastructure layer that powers the next generation... ...AI agents across our platform. As a Software Engineer on our Agentic Infrastructure team, you...TrainingFull timeFreelanceInternshipWork at officeRemote workFlexible hours
- ...Software Engineer, AI Compute Infrastructure Los Angeles, Palo Alto, San Francisco, Toronto, Singapore About HeyGen At HeyGen, our mission is to... ...powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation...TrainingFull time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer - Training Infrastructure. Be the first to apply!
- software sales engineer United States
- software engineer full time United States
- facebook software engineer United States
- startup software engineer United States
- intermediate software engineer United States
- research software engineer United States
- software developer no experience United States
- labview software developer United States
- rust software engineer United States
- freelance software developer United States

