Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer (SRE)

$158k - $225k

Instrumental Inc

Senior Site Reliability Engineer (SRE)

Manufacturing advanced electronics requires understanding millions of signals generated across complex assembly processes. Instrumental builds systems that capture and analyze those signals — images, test results, and process data — enabling engineers to discover failures, identify root causes, and deploy production controls that improve yield and product maturity. Leading companies such as NVIDIA, Cisco, and Meta rely on Instrumental to accelerate new product development and scale manufacturing across global factories. Instrumental has become mission-critical for manufacturers building and scaling the next generation of AI infrastructure hardware. The Instrumental platform collects, intelligently transforms, and contextually presents manufacturing data to technical end-users, enabling them to optimize their manufacturing process in real-time. Our core technology is proprietary ML algorithms, packaged in an accessible, user-centric user interface – we believe we must have both the best technology and the best access to that technology to win.

Requirements:
  • 5 or more years of DevOps or SRE experience deploying and operating commercial SaaS platforms on public cloud infrastructure, AWS preferred.
  • Expert knowledge with Linux, shell, containerization, Kubernetes, IaC (terraform preferred), monitoring, logging, and APM tools.
  • Proven ability to take initiative and drive impactful projects to completion efficiently and independently.
  • Comfort with ambiguity, pace, and frequent pivots inherent in a startup environment, with a track record of creating clarity for teams.
  • Experience introducing and integrating AI tools/processes into development and operation workflows.
  • Demonstrated skill in setting, iterating on, and measuring KPIs to ensure ongoing performance, reliability and efficiency.
  • Network/application security and compliance experience is a plus.
Who You Are:
  • Dead serious about performance, scalability, and reliability (PSR): You care deeply about how systems behave in the real world and sweat the details around latency, uptime, and scale.
  • Systems engineering & infrastructure expertise: You've spent real time building and running distributed systems and know your way around cloud infrastructure, networks, and operating systems.
  • Automation, automation, automation: If something is repetitive or error-prone, your first instinct is to automate it and make it disappear.
  • Operating in ambiguity & high-growth environments: You're comfortable making good calls without perfect information and adapting as the system and company grow fast.
  • Dependable, trustworthy: People trust you to own problems, show up when things are broken, and follow through.

This position requires access to items and data that are developed under U.S. government contracts and subject to dissemination controls that limit access to U.S. citizens only. We're a growing team that works collaboratively, is supportive of each other, and is highly energized by the opportunity for a large impact. We actively work to promote an inclusive environment, valuing passion and the ability to learn. You're encouraged to apply even if your experience doesn't precisely match the job description! The following is a representative annual base salary range for this position within the Bay Area: $158-225k. We consider candidates at multiple levels for this role. Job level and salary opportunities are evaluated through our interview process – we review the experience, knowledge, skills, and abilities of each applicant. Instrumental is proud to offer a highly-rated variety of benefits, including health, vision, dental, commuter plans, and parental leave. At Instrumental, protecting company and customer information is a shared responsibility. Employees are expected to comply with company engineering, security, access control, and privacy policies, and promptly report suspected security incidents or policy violations.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer (SRE) in Palo Alto, CA vacancy
  • $181k - $197k

    Senior SRE Palo Alto, CA • Engineering • Hybrid • Full-time Founded by a team of ex-Apple engineers, Instrumental provides a collection of software...  ...on, and measuring KPIs to ensure ongoing performance, reliability and efficiency. Network/application security and... 
    Senior
    Full time

    Clutch Canada

    Palo Alto, CA
    5 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make GPU compute more accessible and affordable for the world's leading enterprises, AI startups, and the AI research community,... 
    Suggested
    Work at office
    Local area
    1 day per week

    Mithril

    Palo Alto, CA
    3 days ago
  •  ...The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems...  ...of a fast-growing customer base, and we need a seasoned SRE to help us scale these systems safely and keep them... 
    Senior

    XRC Ventures

    Palo Alto, CA
    5 days ago
  •  ...Site Reliability Engineer There are NO limits to your career: come shape the future and be part of a truly unique global culture at OutSystems...  ...Hybrid Onsite in Menlo Park, CA Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software... 
    Senior
    Immediate start
    Remote work
    Worldwide

    OutSystems

    Menlo Park, CA
    3 days ago
  • $137.77k - $194.59k

     ...of roughly 80 scientists and engineers building and operating Rubin'...  ...role: You will own the reliability and robustness of Rubin Observatory...  ...Experience working in an SRE, DevOps, or data-intensive...  ...position, SLAC is open to on-site, hybrid, and remote work options... 
    Senior
    Remote work
    Flexible hours
    Night shift

    Stanford University

    Menlo Park, CA
    3 days ago
  • $150k - $180k

    A technology-focused data center developer in Mountain View, CA is looking for a Senior Site Reliability Engineer to manage software infrastructure. This full-time position requires experience in Software Engineering or DevOps, with strong proficiency in Golang. The role... 
    Senior
    Full time

    Verrus, LLC

    Mountain View, CA
    3 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for Pylon’s production systems...  ...’ll build tooling that makes the entire engineering team more effective, establish on‑call...  ...not a pure ops role. At Pylon, we believe SRE work should be a maximum of 50 %... 
    Senior

    Pylon

    Palo Alto, CA
    3 days ago
  • A global technology leader is looking for an experienced SRE software engineer in Cupertino, California, to build and enhance compute infrastructure...  ...Applicants should have at least 8 years of experience in site reliability engineering, a strong background in cloud infrastructure,... 
    Senior

    Apple Inc.

    Cupertino, CA
    5 days ago
  • A leading technology company is looking for a Java SRE Engineer to support large-scale cloud migrations and production systems on AWS and...  ...team members and collaborating with various teams to ensure reliability. This position is onsite in the San Francisco Bay Area. #J-18... 
    Senior

    EITACIES Inc.

    Santa Clara, CA
    1 day ago
  • A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The...  ...Terraform. Ideal candidates will have over 4 years of experience in SRE or DevOps and a strong understanding of security best... 
    Senior

    Amiri Recruiting

    Mountain View, CA
    1 day ago
  •  ...Site Reliability Engineer (SRE) Location: Santa Clara Valley (Cupertino), California, Hybrid. Duration: 6+ Months Job Description Deploy, support and monitor new and existing services, platforms, and application stacks. Use scale testing to measure, tune... 

    Zortech Solutions

    Cupertino, CA
    4 days ago
  •  ...Title: Site Reliability Engineer (SRE) Location: Location: Sunnyvale, CA (3x/ week onsite) Contract Responsibilities: Engage with our product teams to understand requirements, design and implement resilient and scalable infrastructure... 
    Contract work

    AceStack LLC

    Sunnyvale, CA
    1 day ago
  •  ...Senior Site Reliability Engineer LeanData helps the world's fastest-growing companies automate, simplify, and accelerate revenue. We are looking...  ...Experienced Architect: 5+ years of experience in SRE, DevOps, or Systems Engineering, with a proven track record... 
    Senior
    Full time
    Work at office
    Flexible hours
    2 days per week

    LeanData

    Santa Clara, CA
    3 days ago
  • $150k - $175k

     ...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed... 
    Senior
    Remote work

    ASAPP

    Mountain View, CA
    4 days ago
  • $148k - $235.75k

     ...world. NVIDIA is looking for a seasoned SRE to join its complex and fast-paced...  ...organization where you will be working as a Senior SRE Engineer. The position will be part of a fast-...  ...-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...Senior Site Reliability Engineer Latitude AI develops automated driving technologies, including L3, for Ford vehicles at scale. We're driven by the opportunity to reimagine what it's like to drive and make travel safer, less stressful, and more enjoyable for everyone... 
    Senior
    Work at office
    Immediate start

    Latitude AI

    Palo Alto, CA
    3 days ago
  •  ...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native... 
    Senior
    Remote job

    BuildBuddy

    Palo Alto, CA
    1 day ago
  • $207k - $300k

    Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should... 
    Senior

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $169k - $224k

     ...organization of scientists, engineers, and physicians and we are using...  ...GRAIL is seeking a Staff Site Reliability / DevOps Engineer to lead the...  ...~10+ years of experience in SRE, DevOps, or infrastructure engineering...  ...with cross-functional and senior stakeholders Fast-paced,... 
    Full time
    Work at office
    Local area
    Flexible hours
    Shift work

    GRAIL

    Menlo Park, CA
    4 days ago
  • $180k - $260k

     ...effortless integration into customers' logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will... 
    Senior
    Odd job
    Work at office
    Remote work

    Gatik AI

    Mountain View, CA
    3 days ago
  • $126k - $204.5k

     ..., you will collaborate closely with our engineering teams to develop innovative solutions that...  ...of the product and ensure the reliability and availability of our services. Qualifications...  ...~5+ years of experience as a DevOps/SRE engineer with a passion for technology and... 
    Senior
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  •  ...that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and...  ...operating production distributed systems as SRE/DevOps/Platform Ops. Proven ownership of... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $174k - $252k

    A leading tech company is seeking a Senior Software Engineer for Site Reliability Engineering based in Sunnyvale, CA. The role involves ensuring service reliability, leading technical projects, and enhancing systems performance. Candidates should have at least 5 years of... 
    Senior

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $174k - $252k

    Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered...  ..., and a good one. Site Reliability Engineering (SRE) is an engineering discipline that combines software... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $152k - $241.5k

     ...intelligence. Job Overview We’re looking for a Senior SRE to join our Compute Farm team and help...  ...host lifecycle management, fleet reliability/auto‑healing, E2E observability or data...  ..., Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

    Overview NVIDIA is looking for a Senior Site Reliability Engineer (SRE) to join our Compute Farm team and help build the next generation of our global services platform. The role focuses on keeping critical systems operational while leveraging AI technologies to deliver... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • A leading technology firm is in search of a Senior Wireless Network Site Reliability Engineer to manage and enhance their wireless network infrastructure. The ideal candidate has over 8 years of experience in wireless network operations and a strong background in wireless... 
    Senior

    TechDigital Group

    Santa Clara, CA
    2 days ago
  • $200k - $322k

    Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site Reliability Engineeringlocations: US, CA, Santa Claratime type: Full timeposted...  ...leader will apply strong operational execution with an SRE attitude, facilitating the move from reactive processes... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • JPMorgan Chase & Co. is seeking a Director of Site Reliability Engineering to partner with the Infrastructure Platforms and Foundational Services team in Palo Alto. This role involves guiding stakeholders through complex projects, leading the application of AI capabilities... 
    Senior

    JPMorgan Chase & Co.

    Palo Alto, CA
    2 days ago
  • $145k - $165k

    A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key... 
    Senior

    Bolt Graphics, Inc.

    Sunnyvale, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer (SRE). Be the first to apply!