Senior Site Reliability Engineer, Fleet Management
$127k - $249kMongoDB HQ
The Team
Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational functions that support the broader engineering organization. Among these are our multi-cloud-provider Kubernetes infrastructure, networking, load balancing (including our public-facing edge and internal service mesh), and observability and alerting systems.
The Fleet Management team provides the core runtime environment that empowers our developers to build and ship products to delight our customers. We manage the end-to-end lifecycle of our Kubernetes fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As our infrastructure scales to support new use cases and products, we are spearheading a migration from Terraform-based Infrastructure as Code (IaC) to an Operator-driven lifecycle management model.
This role can be based out of our Austin, Boston, Los Angeles, New York City, Raleigh, or San Francisco offices, remotely in the United States region, or our European office in Dublin.
Responsibilities
Contribute to developing and maintaining a scalable and secure runtime environment on top of Kubernetes that supports product needs across MongoDB
Provide internal support for our Kubernetes ecosystem, partnering with engineering teams to help them solve domain-specific problems
Participate in a 24/7 on-call rotation to resolve critical issues
Prioritize blameless post-mortems and dedicate engineering time to systemic fixes, ensuring you aren't paged for the same issue twice
You may be a good fit if you
Have 6+ years of experience in software development and operating distributed systems
Are proficient in Go, Python, or a similar language, with a strong commitment to code quality and testing practices (writing unit, integration, and E2E tests)
Have deep experience using and extending containerization technologies, preferably Kubernetes
Have a solid understanding of Linux operating system internals and networking concepts (e.g., filesystems, TCP/IP, DNS, TLS)
Possess a customer focused mindset, treating internal developers as your primary users
Have strong operational ownership, including a track record of debugging complex production issues and driving them to resolution
Prefer automation over manual processes ("allergic to ops work")
We are a small team of software engineers with a strong bias toward building software solutions to eliminate toil
Strong candidates may also have experience with
Designing and implementing secure, multi-tenant runtime environments from first principles
Proficiency with Kubernetes ecosystem tools such as Helm, Kustomize, Gatekeeper, Kyverno, and CRDs/Operators, CRI, CSI
Expertise in cloud infrastructure platforms, including AWS, GCP, or Azure
Proficiency in provisioning infrastructure using tools like Terraform, Crossplane, and AWS Controllers for Kubernetes (ACK)
Advanced Linux systems internals and networking concepts specifically relevant to containers, such as namespaces and cgroups
About MongoDB
MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. We have redefined the data platform for the AI era, enabling builders to create, transform, and disrupt industries with software. MongoDB's unified data platform, the most widely available, globally distributed data platform on the market, helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Our cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud data platform and is available across AWS, Google Cloud, and Microsoft Azure.
With offices worldwide and over 67,000 customers, including 75% of the Fortune 100 and AI-native startups, relying on MongoDB for their most important applications, we're powering the next era of software.
Our compass at MongoDB is our Leadership Commitment, ( guiding how and why we make decisions, show up for each other, and win. It's what makes us MongoDB.
To drive the personal growth and business impact of our employees, we're committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy ( , we value our employees' wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it's like to work at MongoDB ( , and help us make an impact on the world!
MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.
MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Req ID: 426182
MongoDB's base salary range for this role is posted below. Compensation at the time of offer is unique to each candidate and based on a variety of factors such as skill set, experience, qualifications, and work location. Salary is one part of MongoDB's total compensation and benefits package. Other benefits for eligible employees may include: equity, participation in the employee stock purchase program, flexible paid time off, 20 weeks fully-paid gender-neutral parental leave, fertility and adoption assistance, 401(k) plan, mental health counseling, access to transgender-inclusive health insurance coverage, and health benefits offerings. Please note, the base salary range listed below and the benefits in this paragraph are only applicable to U.S.-based candidates.
MongoDB's base salary range for this role in the U.S. is:
$127,000-$249,000 USD
- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About... ...hyperscalers. We began with a single managed cluster - but it filled almost... ...capacity planning across heterogeneous GPU fleets optimized for training throughput....SeniorFleetFull timeRemote work
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects... ...funds and SPVs, representing nearly $185B in assets under management, with tools designed to enhance the strategic impact of...SeniorFull timeWork at office$220k - $235k
...Staff/Senior Staff Site Reliability Engineer Ironclad is the leading AI contracting platform that transforms agreements into assets. Contracts move... ...Wave and Gartner Magic Quadrant for Contract Lifecycle Management, a Fortune Great Place to Work, and one of Fast Company...SeniorFull timeContract workWork at office- We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance...SeniorFleetRemote job
- ...perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the... ...network device configuration management end to end, ensuring consistency and reliability across the fleet. Improve system and network reliability...SeniorFleet
$151k - $297k
The Team Platform Engineering is the department within SRE that is responsible for a range... ...and alerting systems. The Fleet Management team provides the core runtime environment... ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager...FleetLocal areaImmediate startRemote workFlexible hoursShift work- Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6...Senior
- A technology company focused on grid management is looking for a Senior Software Engineer in San Francisco. You will lead the development of systems for device... ...support thousands of edge devices, and ensure the reliability of sensors. The role requires over 5 years of...SeniorFleet
- Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of...Senior
$190k - $215k
...groundbreaking new class of grid management called active grid response... ...of the grid that affect reliability and safety. Gridware’s advanced... ...related tooling so on‑call engineers can quickly find and fix... ...Experience supporting IoT / embedded fleets at scale, including secure...SeniorFleet$175k - $195k
...groundbreaking new class of grid management called active grid response... ...of the grid that affect reliability and safety. Gridware’s... ...Description We’re looking for a Senior Software Engineer to lead the development of... ...that manage our growing fleet of devices - the foundation...SeniorFleet- Nuro is seeking a Product Manager to enhance operations for Autonomous Vehicles. This role involves driving initiatives by working closely with teams across Operations, Design, Engineering, and more. The ideal candidate will prioritize user workflows, build tools to scale...SeniorFleet3 days per week
$101.9k - $140.14k
CEI Fleet Collision and Safety is seeking an Environmental Health and Safety (EHS) Manager to oversee safety and risk programs at our San Diego facility. The role involves developing EHS programs, ensuring compliance with all safety regulations, and promoting a proactive...SeniorFleet- A leading aerospace technology firm in California is seeking a Senior Ground Segment Engineer to manage satellite connectivity and ground segment operations. This role involves overseeing cross-fleet contact allocation, optimizing network infrastructure, and providing...SeniorFleetFlexible hours
- Sky Chefs is seeking a Fleet Maintenance Manager in San Francisco to oversee all fleet maintenance operations ensuring compliance with regulations. The role involves managing repairs, collaborating with vendors, and leading a team in a fast-paced environment. The ideal...SeniorFleet
$220k
...things like change detection or visual semantic data mining. AI Fleet management tools drive value to large fleets of vehicles.... ...Working closely with operations, product development, and other engineering teams to deliver data-intensive cross-functional platform solutions...SeniorFleet$164.2k - $205.2k
...clusters, and must deliver extreme elasticity, reliability and cost efficiency. As a Senior Software Engineer on the Compute Infra team, you will design and... ...with high performance and efficiency Scale the fleet management systems that launch and configure millions of VMs...SeniorFleetLocal area- ...talent agency is seeking a Senior Technical Program Manager to join our Client's team.... ...working closely with Project Engineers (PEs) and coordinating day‑... ...field installation, or on‑site commissioning Background in... ...compute, computer vision, or fleet management systems...SeniorFleetContract work
$238k - $288k
...Type Full time Location Type On-site Department Cloud Engineering Crusoe builds and operates AI-first... ...in the firmware that underpins fleet reliability, security, and operability - and we... ...kernel, U-Boot, device tree, sensor management, fan and thermal control, power...SeniorFleetFull timeTemporary work$122.4k - $180k
...pipelines, assets, and tools to enable our fleets to scale in the real world. See more about Sim here!: As a software engineer on the Simulation team, you will be in the... ...develop user‑efficiency tools for simulation management and authoring workflows to help test...SeniorFleetLocal areaImmediate startRemote workFlexible hours$150k - $190k
...Department of Defense. The Systems and Safety Engineering team at Kodiak is seeking an experienced... ...Kodiak's next-generation Autonomy Fault Management System. This individual will lead the... ...safety system-it is a primary lever of fleet availability, utilization, and cost per...SeniorFleetTemporary workWork at officeVisa sponsorshipFlexible hours- ...that significantly outperforms individual engineers. We combine language models with human... ...: We are seeking an experienced Site Reliability Engineer to join our Platform Engineering... ...roles ~ Proven track record of managing production systems at scale, preferably...Senior
$155k - $190k
...Senior Backend/Infrastructure Software Engineer We are searching for a Senior Backend/Infrastructure Software Engineer... ...our customers' complex software management environments. As a Senior Backend... ...to manage and support our growing fleet of autonomous robots Building...SeniorFleetFull timeWork at officeImmediate start$146k
...the Role As the Service Operations Program Manager , you will be the primary architect of the... ...that keep Uber’s autonomous vehicle (AV) fleets moving. You aren't just managing a steady state; you are building the "Service Engine" from the ground up. You will define how...SeniorFleetFull time$120k - $150k
...Technical Customer Success Manager, AI & Ops Tread is the AI-native operating system... ...projects, alongside the family-owned hauling fleets that have moved the material this work... ...standards for what reaches Product and Engineering — and send incomplete escalations back...SeniorFleetFor contractors- ...intelligence, redefining how cities are managed. Powered by a proprietary visual intelligence engine with full spatial reasoning, EchoTwin transforms municipal fleets into mobile urban sensors—creating... ...EchoTwin AI is looking for a Senior Technical Program Manager to plan...SeniorFleetFlexible hours
$300k
..., full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation... ..., ensuring seamless orchestration across environments managed by Slurm, Kubernetes, or direct SSH access. As well as...Senior- ...the frontier of applying machine learning to investment management. We have become a multibillion‑dollar asset manager, and we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to...SeniorLocal area
$50 per hour
...carbon-negative distributed computing solutions. Crusoe Cloud is a managed cloud services platform powered by stranded energy that... ...contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in...SeniorTemporary workWork experience placement$300 per month
...in the software systems that manage, observe, and heal our... ...network at scale. We are hiring a Senior Engineering Manager, SDN Management... ...runs across our entire network fleet. This is a senior software engineering... ...with operational reliability and stakeholder needs. Clear...SeniorFleetTemporary work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer, Fleet Management. Be the first to apply!
- site reliability engineer remote San Francisco, CA
- site reliability engineer San Francisco, CA
- site reliability engineer sre San Francisco, CA
- senior data management analyst San Francisco, CA
- senior app developer San Francisco, CA
- senior game producer San Francisco, CA
- senior retail sales associate San Francisco, CA
- senior manager quality engineering San Francisco, CA
- senior software test automation engineer San Francisco, CA
- senior quantitative risk analyst San Francisco, CA

