Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

Mango

Mango, Inc. Senior Site Reliability Engineer Los Angeles, CA·Full time We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on‑premise instruments, data systems, and machine learning pipelines. This role combines systems‑level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads. About Mango, Inc. Mango is a new type of microscope for rapid bioburden testing. Description We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on‑premise instruments, data systems, and machine learning pipelines. This role combines systems‑level engineering with software craftsmanship , requiring deep understanding of how compute, storage, and networking layers interact under real workloads. You will be the go‑to expert for diagnosing performance issues in our on‑prem system. This could be from kernel‑level I/O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable. Key Responsibilities Infrastructure Design & Reliability Design, deploy, and maintain our on‑premise and hybrid infrastructure which includes Dell PowerEdge and PowerVault servers, prosumer NAS units, and high‑throughput data processing clusters. Implement fault‑tolerant systems with reproducible deployments and clear observability. Performance & Systems Analysis Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in‑house application‑level metrics to uncover root causes in filesystems, caching layers, or I/O scheduling. Automation & Tooling Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable. Work closely with our software and hardware teams to co‑design systems that meet the needs of high‑resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees. Observability & Incident Response Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post‑mortem efforts with a focus on learning and prevention. Documentation & Communication Produce clear documentation and communicate findings effectively to the broader team — from network topology diagrams to kernel tuning rationales. General Qualifications Deep understanding of Linux systems and performance (I/O schedulers, RAID, caching, NUMA, kernel parameters). Hands‑on experience designing and managing on‑premise servers, storage arrays, or HPC clusters. Comfort with automation and software development (Python, Go, Bash, or similar). Strong diagnostic and analytical skills: ability to decompose performance problems across multiple layers. Proven track record of improving system reliability, throughput, and maintainability in a fast‑paced environment. Excellent written and verbal communication skills for cross‑disciplinary collaboration. Self‑driven, curious, and motivated by understanding systems deeply rather than just maintaining them. Bonus Qualities (Not Required) 5–10 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles. Experience tuning Linux filesystems (ext4, btrfs) and software RAID (mdadm). Familiarity with containerization and orchestration (Docker, Compose, Kubernetes). Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE/40 GbE). Experience supporting data‑heavy scientific or ML workloads. Demonstrated technical leadership — mentoring others in debugging, reliability, or performance analysis. #J-18808-Ljbffr

Vacancy posted 23 hours ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Los Angeles, CA vacancy
  • $150k - $200k

     ...gamifying everyday life, you’ll thrive in our fast‑moving, collaborative environment. About the Role We are looking for a Senior Site Reliability Engineer to help ensure the reliability, scalability, and performance of the infrastructure that powers favorited’s real‑time... 
    Senior
    Full time

    Favorited

    Santa Monica, CA
    5 days ago
  •  ...research company in Los Angeles is seeking a Senior Infrastructure Engineer to build and manage critical hybrid environments...  ...over 10 years of experience operating high reliability infrastructure and a strong background in Site Reliability Engineering. Responsibilities... 
    Senior

    OpenAI

    Los Angeles, CA
    2 days ago
  • A leading aerospace company in Hawthorne, CA is seeking a Senior Site Reliability Engineer to enhance Starlink’s satellite internet infrastructure. The role involves upgrading systems for geo-redundancy, collaborating closely with engineers, and ensuring optimal performance... 
    Senior

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    5 days ago
  • $183k - $235k

     ...organization with a real path to advance your career, this is the place. POSITION OVERVIEW HiveWatch is seeking a Senior Staff Site Reliability Engineer to join our Platform Team, where you'll architect and maintain mission‑critical edge infrastructure that connects our... 
    Senior
    Flexible hours

    Saasventurecapital

    El Segundo, CA
    1 day ago
  • SPACE EXPLORATION TECHNOLOGIES CORP in Hawthorne, CA is seeking a Senior Site Reliability Engineer to develop automation for deploying and managing resources in cloud and on-premises environments. This role requires significant experience with Linux and Kubernetes, as... 
    Senior

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    3 days ago
  • $160k - $220k

    Sr. Site Reliability Engineer (Starlink) Hawthorne, CA SpaceX is developing Starlink, the world’s largest satellite constellation and is providing...  ...COMPENSATION AND BENEFITS: Pay Range: Software Engineer/Senior: $160,000.00 - $220,000.00 per year Your actual level and base... 
    Senior
    Temporary work
    Worldwide
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    5 days ago
  • $165k - $230k

    SR. SITE RELIABILITY ENGINEER (STARSHIELD) SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with... 
    Senior
    Permanent employment
    Temporary work
    Immediate start
    Weekend work

    Latent AI

    Hawthorne, CA
    2 days ago
  • $165k - $230k

    Sr. Site Reliability Engineer (Starshield) Hawthorne, CA SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this... 
    Senior
    Permanent employment
    Temporary work
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    3 days ago
  •  ...cybersecurity will depend on you Learn how Illumio approaches AI with integrity — view our Transparency Statement. Senior Backend Software Engineer (Python (Golang a plus)) Hybrid: 2 days in office/week in Sunnyvale, CA In this role, you will focus on the Azure... 
    Work at office
    2 days per week

    Illumio

    Los Angeles, CA
    5 days ago
  • KBR, Inc is seeking a Release Train Engineer (RTE) in El Segundo, California. This role will coordinate Agile Program Increment execution across multiple development teams for the U.S. Space Force, focusing on stakeholder engagement and alignment. Ideal candidates will... 
    Senior

    KBR, Inc

    El Segundo, CA
    4 days ago
  • SPACE EXPLORATION TECHNOLOGIES CORP is seeking a Site Reliability Engineer in Hawthorne, California, to manage mission-critical products for Guidance, Navigation, and Control (GNC) teams. The ideal candidate possesses a degree in a relevant field or equivalent experience... 

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    2 days ago
  • $125k - $175k

    Software Engineer, Site Reliability Engineering (application Software) Design, deploy, and scale SpaceX mission‑critical software infrastructure for vehicle operations. Location: Hawthorne, California, United States Compensation: $125,000 - 175,000 USD / year Job Tags... 
    Permanent employment
    Temporary work
    Weekend work

    jobs.frontdoordefense.com - Jobboard

    Hawthorne, CA
    3 days ago
  • $125k - $175k

    SpaceX is seeking a Site Reliability Engineer in Hawthorne, California. The role involves deploying, maintaining, and scaling mission-critical software infrastructure for vehicle operations, ensuring software quality and reliability. The ideal candidate will have a strong... 

    jobs.frontdoordefense.com - Jobboard

    Hawthorne, CA
    3 days ago
  • $125k - $145k

     ...SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. Site Reliability Engineer, GNC SpaceX’s mission is to make humanity multiplanetary by developing fully and rapidly reusable launch systems capable... 
    Permanent employment
    Temporary work
    Flexible hours
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    2 days ago
  • $125k - $145k

    Site Reliability Engineer - Top Secret Clearance Hawthorne, CA SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to... 
    Permanent employment
    Temporary work
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    3 days ago
  • Site Reliability Engineer, Frontier Systems Infrastructure The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training. We take data center designs, turn them into... 

    OpenAI

    Los Angeles, CA
    4 days ago
  • $164k - $270k

     ...the 21st century and beyond. The Role What You’ll Do Own the reliability of our robotics systems , from PLCs through ROS2/middleware...  ...remediation. Partner with controls, robotics, and platform engineering teams to bake reliability in early . Review designs, develop... 
    Permanent employment
    Full time
    Local area
    Relocation package
    Flexible hours

    Hadrian Automation

    Los Angeles, CA
    4 days ago
  • DevOps / Site Reliability Engineer ID70127 Full time | AgileEngine | United States Posted On 06/17/2026 Job Information City Los Angeles State/Province California 90001 IT Services Job Description AgileEngine is an Inc. 5000 company that creates award-winning software... 
    Full time
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    AgileEngine, LLC.

    Los Angeles, CA
    4 days ago
  • $170k - $220k

     ...monitoring and more, all enhancing operational efficiency and reducing observability spend by up to 70%. We are looking for a Site Reliability Engineer to work as part of our Cloud Infrastructure Team. Focusing on Enterprise FedRal Cloud Infrastructure. In This Role, You... 

    CTERA Networks

    Los Angeles, CA
    2 days ago
  •  ...A leading aerospace technology firm in Los Angeles seeks a Ground Software Engineer to design and implement satellite communication software. The role requires extensive programming skills in Python, C++, and Rust, along with experience in microservices and cloud solutions... 
    Senior

    Apex Technology

    Los Angeles, CA
    5 days ago
  • A leading AI research accelerator is looking for an entry-level software engineer to refine AI-generated code and develop verification solutions. The ideal candidate will have over 5 years of software engineering experience, including 2 years at a top-tier company. This... 
    Senior
    Contract work
    Remote work
    10 hours per week
    Flexible hours

    Turing

    Los Angeles, CA
    1 day ago
  • $125k - $150k

    Site Reliability Engineer, Kubernetes Platform (Starshield) Hawthorne, CA SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies... 
    Permanent employment
    Temporary work
    Immediate start
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Hawthorne, CA
    3 days ago
  • $150k - $175k

     ...Kixie is seeking a Senior Software Engineer to take ownership of critical product subdomains, such as Telephony and CRM integrations, within their innovative sales engagement platform. The ideal candidate should have over 5 years of software engineering experience, proficient... 
    Senior

    Medium

    Santa Monica, CA
    6 hours ago
  • $67k - $136.8k

     ...Ernst & Young Oman is seeking an FSO DevOps Engineer Senior Analyst to enhance the Web3 Platform. The role demands proficiency in Infrastructure as Code and DevSecOps, along with experience in cloud platforms like Azure. Applicants should possess a bachelor’s degree in... 
    Senior

    Ernst & Young Oman

    Los Angeles, CA
    5 days ago
  • $108.5k - $135.6k

     ...Position Summary We are seeking a Senior Reliability Engineer for an onsite position in El Segundo, CA, to evaluate, predict, and enhance product reliability through statistical analysis, reliability modeling, and testing. The role involves life data analysis, reliability... 
    Senior
    Work experience placement

    EVgo

    Los Angeles, CA
    5 days ago
  • $230k - $385k

     ...A leading AI research firm in San Francisco is seeking a staff-level Software Engineer specializing in infrastructure for their Analytics Platform. The ideal candidate will have extensive experience in Rust or C++ and a strong background in distributed systems. This role... 
    Senior

    OpenAI

    Los Angeles, CA
    1 day ago
  • $168k - $252k

    Anduril Industries is looking for a Senior Software Engineer in Los Angeles, California. In this role, you will build cutting-edge infrastructure to enable rapid development and deployment of autonomous systems. With compensation ranging from $168,000 to $252,000, the... 
    Senior

    jobs.frontdoordefense.com - Jobboard

    Los Angeles, CA
    3 days ago
  • $121k - $151k

     ...Senior Software Systems Engineer page is loaded## Senior Software Systems Engineerlocations: El Segundo,...  ...recommendations to improve integration reliability, repeatability, and sustainment...  ...Work Environment:*** **Location:** On-site* **Travel Requirements:** Minimal* *... 
    Senior
    For contractors
    Work at office

    KBR

    El Segundo, CA
    5 days ago
  • $108.5k - $135.6k

    EVgo Services LLC is seeking a Senior Reliability Engineer in El Segundo, CA to enhance product reliability through statistical analysis and reliability modeling. The role includes analyzing failure behaviors, collaborating with cross-functional teams, and developing reliability... 
    Senior

    EVgo Services LLC

    El Segundo, CA
    2 days ago
  •  ...A leading technology firm in Los Angeles is seeking a Senior Backend Engineer to develop scalable backend services using Rust. In this role, you will design data pipelines for high-frequency sensor data and collaborate closely with product teams to define system architecture... 
    Senior

    Revel

    Los Angeles, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!