Infrastructure Ops Engineer
Baseten
Infrastructure Ops Engineer At Baseten
Baseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. Join us and help build the platform engineers turn to to ship AI products.
As an Infrastructure Ops Engineer at Baseten, you are the operational engine of our global infrastructure. You will sit at the intersection of technical customer success and infrastructure engineering, partnering closely with our SRE and FDE teams to execute the complex hardware lifecycles that power our fleet.
This role is designed for high-energy, detail-oriented individuals who thrive on technical coordination and operational excellence. You aren't just managing clusters; you are acting as the technical glue between our customers' needs and our hardware reality, ensuring that high-level capacity strategies are translated into boots-on-the-ground results.
While you will be hands-on with Kubernetes and cloud-native tools, your success is measured by your ability to project-manage the resolution of capacity puzzles, ensuring our platform remains reliable, observable, and ready for the next massive AI deployment.
Example Initiatives
- The "Lost Node" Investigation: Debugging cluster-level blockers to solve why pods aren't scheduling despite available capacity
- Regional Compliance Guard: Auditing and correcting scheduling policies to ensure customer data stays within specified geographical constraints (e.g., EU-only vs US-only)
- High-Stakes Maintenance Orchestration: Coordinating critical maintenance cycles both externally (with vendors) and internally (with Baseten SREs) to evacuate workloads from unhealthy nodes and integrate replacement hardware with zero customer disruption
Responsibilities
- Fleet Maintenance: Manage daily node operations including tainting/untainting, node draining, and PVC repairs to ensure GPU fleet health and operational cost control
- GTM & Capacity Fulfillment: Partner with Sales and account teams to scope and fulfill customer capacity requests, translating complex timelines into concrete infrastructure actions and clear ETAs
- Process & Observability Engineering: Identify recurring gaps in the capacity lifecycle (intake, triage, comms) and drive fixes by defining lightweight processes and improving system observability
- Technical Orchestration: Act as the operational bridge between SRE and Infra teams, executing discrete changes and verifying system status during high-stakes maintenance windows
- Technical Documentation: Contribute to the internal knowledge base for GPU-specific issues (H100/A100/B200) to accelerate future incident resolution
- Automation & Tooling: Identify repetitive workflows and partner with engineering to build scripts, dashboards, and internal tools that reduce manual intervention and shorten time-to-mitigation
- Knowledge Excellence: Maintain a living database of GPU-specific intelligence (H100/B200) and market moves to accelerate incident resolution and support strategic briefings for leadership
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- 2+ years of professional work experience, ideally in a customer-facing technical role or as a junior SRE/Cloud Engineer
- Strong familiarity with Kubernetes and the lifecycle of cloud-based container orchestration
- Strong ownership mindset and attention to detail, demonstrated through fast detection, clear communication, and reliable follow-through
- Demonstrated ability to communicate complex technical blockers clearly to both internal engineering teams and external vendors
- Preference for SF or NYC-based candidates to foster a close-knit "family" atmosphere in the office
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Fertility and family-building stipend through Carrot
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.
At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.
We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).
- Infrastructure Ops Engineer III page is loaded## Infrastructure Ops Engineer IIIlocations: US - Georgia - Atlanta Officetime type: Full timeposted on: Posted Yesterdayjob requisition id: R0001632Candescent is a forward-thinking technology company transforming how financial...SuggestedRemote work
- ...Position Name: Network & SDWAN Ops Engineer Location: Plainsboro, New Jersey Work from Office Role, working hours will... ...supporting, troubleshooting, and maintaining enterprise network infrastructure in a dynamic, multi-site environment. Required...SuggestedWork at officeShift workWeekend work
- ...from a research notebook to a production API serving millions of requests is one of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the critical bridge between research and production -- building the pipelines, deployment systems...SuggestedHome officeFlexible hours
$20 - $30 per hour
...OPS Web Infrastructure and Systems Support Specialist Job no: 540014 Work type: Temp Full-Time Location: Main Campus (Gainesville, FL) Categories: Computer Science, Information Technology, Office/Administrative/Fiscal Support Department...SuggestedHourly payFull timeTemporary workWork at office$54.9 - $91.49 per hour
Staff Network Ops Engineer page is loaded## Staff Network Ops Engineerlocations: Chicago - 20 S. Wackertime type: Full timeposted on:... ...Center (NOCC) team, providing crucial support for CME's network infrastructure; through monitoring, troubleshooting and change...SuggestedWorldwideShift workNight shift- ...Network/Operations - Dev Ops Engineer - Sr Join our team as a Senior Dev Ops Engineer in Plano, TX, where you will be responsible for integrating and optimizing network infrastructures. This contract role requires expertise in network protocols and cloud environments...Contract work
- A leading technology firm in Chicago is seeking a Technical Commander to manage infrastructure operations. The ideal candidate will have expertise in Ansible, strong team leadership skills, and experience with vendor negotiations. This role is critical for ensuring system...
- ...Cloud Ops Engineer This position would provide engineering support and would primarily collaborate with cross functional teams that... ...possess a functional understanding of core IT concepts, such as – networking, databases, infrastructure, and software engineering....
- ...Requirement – Dev/Ops Cloud Engineer Client is seeking a Cloud Engineer for a contract position in Raleigh, NC. This position will be... ...responsible for analysing application requirements as they relate to infrastructure services and either designing a solution or implementing a...Contract work
- ...Cloud Ops Engineer Apply Online Responsibilities Design, implement, and maintain cloud infrastructure, primarily within AWS. Manage and optimize CI/CD pipelines to support efficient software delivery. Monitor system performance, availability, and reliability...Local area
- ...Job Title: Cloud Ops Engineer Location: Rockville MD / Remote This is 100% percent hands-on role. Control Tower, Organization policies and management Multi-Account deployment and management AWS Backups and SSM Patching process - in detail...Remote work
- ...Compute And Cloud Ops Engineer This is Harsh from Jconnect INC. Below is the requirement with my client. Location: Manhattan,... ...servers. - Manage VMware environments. - Support cloud infrastructure operations. - Perform patching, monitoring, and maintenance...Full timeImmediate startRelocation
$125k - $180k
...individual to join our growing team as Dev Ops Cloud Developer . In this role, you... ...such as Platform as a Service (PaaS), Infrastructure as a Service (IaaS), Software as a... ...bachelor's degree in computer science, engineering, or related field; or equivalent work experience...Work experience placement- ...responsible for designing, building, and supporting scalable AWS infrastructure to enable AI and machine learning workloads. This role focuses... ...security, and system architecture ~ Experience with AWS AI Ops tools including Amazon Bedrock, CloudWatch, X-Ray, Model...
- ...part in something special!The Software Engineer works in an Agile team in a multi-technology... ...being built. We are seeking a Cloud Ops Engineer with 2+ years of experience to... ...Microsoft Azure Build and manage Infrastructure as Code using Terraform Operate...Full timeLocal areaWork from home
- ...Role: Cloud Ops Engineer Location: Remote Type: Full Time Job Description: Strong working knowledge... ...and CI/CD automation. Experience with Terraform for infrastructure as code. Scripting and coding experience in Python or Bash...Full timeRemote work
- ...Azure Infrastructure Ops Engineer Job Location: Omaha, NE or Berkeley Heights, NJ # Positions: 1 Employment Type: FTE Key Technology: Azure, Agile, Scrum Job Responsibilities: Infrastructure Design and Planning: Work closely with the development and...
$55 per hour
...Cloud Ops/DevOps Engineer Location: Onsite in Charlotte, NC (locals required for onshore), onshore resource required to come to office 3 days per week. Duration: long term. Client: Eastdil Secured Rate: $55/hr on CTC. Study, analyse and understand business requirements...Work at officeLocal areaShift work3 days per week- ...Systems Software Engineer — Machine Learning Ops What if your systems engineering skills could directly shape the infrastructure powering the world's most advanced AI models? We're looking for a senior C++ engineer to build and optimize the data pipelines, annotation...Hourly payOngoing contractContract workFreelanceRemote workFlexible hours
- ...Lead Cloud Ops Engineer Durham North Carolina/Frisco, TX (onsite) Contract Responsibilities Design, maintenance, and administration of infrastructure and applications in Microsoft Azure. Continuous collaboration with other team members and other teams distributed...Contract workWork experience placement
$132k - $153k
...Dev Ops Engineer - Hybrid/Azure Innova Solutions has an immediate need for a Dev Ops Engineer with Azure experience! This job... ...will be responsible for overseeing and enhancing the infrastructure, processes, and tools that support software development and...Contract workTemporary workWork experience placementLocal areaImmediate startRemote workWorldwideFlexible hours- ...work share on a Project with a Major Prime) for the following position for a federal agency. Job Title : Cloud/Ops Platform Engineer - an Active Top SECRET required Compensation : Negotiable with standard benefits based on experience, education,...Contract work3 days per week
$175k
...Azure Dev Ops Architect Little Falls, DE, or Santa Clara, CA Dev Ops Architect... ...artifact management, containerization, infrastructure-as-code, monitoring, and security.... ...with a Bachelor's or Master's degree in Engineering, Computer Science, or a related field,...Full timeContract workFor contractorsRemote workRelocationShift work- ...automation adoption roadmaps, architect DevSecOps tooling infrastructure, and provide mentorship to DevOps Engineers. Your primary responsibilities will include: •... ...roadmaps for clients, covering both Dev and Ops requirements. • Architect Tooling Infrastructure:...Worldwide
- ...Systems Software Engineer - Machine Learning Ops (AI Infrastructure) About the Role What if your expertise in systems programming could directly shape the infrastructure powering the next generation of AI? We're looking for seasoned C++ engineers to build high...Hourly payContract workFreelanceRemote work
- Iambic Therapeutics, Inc is seeking a talented Cloud Ops Engineer to support the technology and drug discovery operations. This hybrid role mainly focuses on AWS cloud infrastructure, CI/CD pipelines, and Infrastructure as Code. The ideal candidate will have extensive experience...
$147.76k - $221.64k
...a better world, so we can all enjoy living in it. Engineering Manager, IAM Platform (Ops, SRE & AI Enablement) We are seeking a strategic Engineering... ...overall health and security of the global identity infrastructure. Serve as the final point of escalation for complex...Hourly payTemporary workPart timeRelocationRelocation packageFlexible hours$48k - $168k
...Job Title Infrastructure Automation & Management CI/CD Pipeline Management Monitoring... ...Experience: 7+ years of experience as a DevOps Engineer or Cloud Engineer, with hands-on... ...Identification 24371 Job Category Dev & Run Ops Posting Date 11/10/2025, 03:01 PM...Full time- ...Overview: Job Title: AI/ML Ops & Infrastructure Engineer Company: R2 Technologies Location: Alpharetta, GA (Hybrid / Remote Options Available) Employment Type: Full-Time / Contractual About R2 Technologies: R2 Technologies is a Certified Minority...Full timeRemote workShift work
- A tech company specializing in airfield operations is seeking a Senior Software Engineer to lead the infrastructure powering their HALO platform. The role requires at least 4 years of experience building and operating production systems and involves designing cloud architecture...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Infrastructure Ops Engineer. Be the first to apply!
- data infrastructure engineer United States
- infrastructure engineering manager United States
- remote infrastructure engineer United States
- associate infrastructure engineer United States
- principal infrastructure engineer United States
- senior infrastructure engineer United States
- junior infrastructure engineer United States
- security infrastructure engineer United States
- lead infrastructure engineer United States
- entry level infrastructure engineer United States

