Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer

$227k - $290k

Moveworks

Who We Are 

Moveworks is the universal AI copilot for search and automation across all your business applications. We give employees one place to go to find information and get support while reducing costs for your business. The Moveworks Copilot is powered by an industry-leading Reasoning Engine that uses a combination of public and proprietary language models to understand employee queries, then build and execute multi-step plans that achieve them. It does this by linking into systems (like ITSM, HRIS, ERP, identity management, and more) with native and custom-built integrations that turn natural language into powerful automations for employees.  

The world’s most innovative brands like Databricks, Broadcom, Hearst, and Palo Alto Networks trust Moveworks to eliminate repetitive support issues, deliver instant knowledge, and empower employees to work faster across applications.

Founded in 2016, Moveworks has raised $315 million in funding, at a valuation of $2.1 billion , thanks to our award-winning product and team. In 2023, we were included in the Forbes Cloud 100 list as well as the Forbes AI 50 for the fifth consecutive year. We were also recognized by the 2023 Edison Awards for AI Optimized Productivity, and were included on Fast Company's Most Innovative Companies list for 2024! 

Moveworks has over 500 employees in six offices around the world, and is backed by some of the world's most prominent investors, including Kleiner Perkins, Lightspeed, Bain Capital Ventures, Sapphire Ventures, Iconiq, and more.

Come join one of the most innovative teams on the planet!

What You Will Do

As a site reliability engineer, you will be an owner of and be responsible for overall health, performance, and capacity of the Moveworks AI infrastructure and services. In addition to helping engineering teams with resolving operational issues, you will also design and implement solutions, tools and practices that help us improve operational efficiency and product SLA. This role is a blend of SRE, infrastructure, and software development.

We’re building a team that indexes on moving fast, solving challenging product/engineering problems and providing value to our customers. To be successful, you'll be partnering with and enabling machine learning, search, product, data, and full stack teams to design and build fault tolerant and scalable infrastructure, services and features. This is an opportunity to play an integral role at the fastest-growing AI startup in its space.

  • Design, develop, and evolve site reliability and chaos engineering for Moveworks infrastructure and services.
  • Closely work with machine learning, search, product, infrastructure, data, and frontend teams to understand their infrastructure and operational needs and build solutions that are optimal, fault tolerant, and scalable.
  • Author and advocate for reliability through best distributed system design patterns (error handling, retries, rate limiting, circuit breaking, etc.). Participate in design discussions and ensure operational readiness of infrastructure, services, and features.
  • Design and build tools, libraries, and frameworks that allow engineering teams to rapidly deploy and scale Moveworks infrastructure and applications.
  • Review and participate in application performance analysis / tuning and capacity planning.
  • Setup and maintain monitoring, metrics, and reporting systems for observability and actionable alerting. 
  • Define internal and customer-facing key SLA metrics, implement solutions and practices with different teams to improve those metrics.
  • Own the engineering on-call process and setup. Drive discussions for outages, root cause analysis, and action items.
  • Participate in on-call rotation for second-tier escalation (at Moveworks, each engineer participates in the team specific first-tier on-call rotation). Help diagnose and resolve complex operational issues.

What You Bring To The Table

  • 7+ years of experience in authoring and operating complex distributed infrastructure and applications
  • Strong experience with container orchestration platform like Kubernetes and cloud infrastructure like AWS / GCP / Azure
  • Very high proficiency with Unix/Linux, TCP/IP, DNS, load balancers, autoscaling, file systems and different types of data stores.
  • Software development proficiency with Python, Golang, Java, or C++
  • Experience working across teams and implementing solutions, tools, and practices to improve observability, reliability, and scalability
  • Desire to work at a startup pace in a small company with a high degree of ownership 
  • Strong motivation, gumption, and an appetite for continuous, incremental changes and completing challenging projects fast
  • High level of curiosity about engineering outside of your immediate discipline and an incessant desire to learn
  • BS+ in computer science or a related field

Compensation Range : $227,000 - $290,000

*Our compensation package includes a market competitive salary, equity for all full time roles, exceptional benefits, and, for applicable roles, commissions or bonus plans. 
Ultimately, in determining pay, final offers may vary from the amount listed based on geography, the role’s scope and complexity, the candidate’s experience and expertise, and other factors.

Moveworks Is An Equal Opportunity Employer
*Moveworks is proud to be an equal opportunity employer. We provide employment opportunities without regard to age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, veteran status, or any other characteristics protected by law.

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer in Mountain View, CA vacancy
  • $250k

     ...systems, eGain provides the single source of truth—explainable, reliable, and maintainable—that serves as the repository for all...  ...at scale. Position Overview As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge management platform... 
    Suggested
    Work at office

    eGain Corporation

    Sunnyvale, CA
    4 days ago
  •  ...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is driven by a culture that thrives...  ...basis, you will work on enhancing system reliability and scalability of Illumio SaaS products,... 
    Suggested
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    5 days ago
  • $145k - $175k

     ...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce... 
    Suggested
    Work at office
    Immediate start
    Work from home

    Bolt Graphics

    Sunnyvale, CA
    20 hours ago
  • Job Description : Need to have experience with ticket support, azure, Splunk, ServiceNow, and any Java experience is a plus. Ideally candidates that come from an Enterprise background Handling tickets for the Walmart environment. Splunk, Servicenow...
    Suggested

    3B Staffing LLC

    Sunnyvale, CA
    3 days ago
  •  ...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is shaping the future of...  ...are looking for an experienced Senior Site Reliability Engineer (SRE) with a strong background in... 
    Suggested
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    5 days ago
  •  ...Location: Sunnyvale, CA (3x/ week onsite) Duration: 6 months SRE - Site Reliability Engineer Responsibilities: Engage with our product teams to understand requirements, design and implement resilient and scalable infrastructure solutions.... 

    Diverse Lynx

    Sunnyvale, CA
    4 days ago
  • $170k - $200k

     ...Job Description We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high availability... 
    Full time
    Worldwide

    Fortinet

    Sunnyvale, CA
    3 days ago
  •  ...Senior Site Reliability Engineer Latitude AI develops automated driving technologies, including L3, for Ford vehicles at scale. We're driven by the opportunity to reimagine what it's like to drive and make travel safer, less stressful, and more enjoyable for everyone... 
    Work at office
    Immediate start

    Latitude AI

    Palo Alto, CA
    3 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make GPU compute more accessible and affordable for the world's leading enterprises, AI startups, and the AI research community,... 
    Work at office
    Local area
    1 day per week

    Mithril

    Palo Alto, CA
    3 days ago
  •  ...Overview: *Must have Apple experience* • At least 8+ years in a Reliability Engineering, DevOps or infrastructure focused role • Advanced experience with programming languages (Python, Java) • Passion for designing and building reliable systems • Strong sense... 

    Purple Drive

    Sunnyvale, CA
    3 days ago
  •  ...The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems that power Nectar's platform. We run high-volume data ingestion pipelines and real-time AI agents on top of a fast... 

    XRC Ventures

    Palo Alto, CA
    5 days ago
  • $158k - $225k

     ...Senior Site Reliability Engineer (SRE) Manufacturing advanced electronics requires understanding millions of signals generated across complex assembly processes. Instrumental builds systems that capture and analyze those signals — images, test results, and process data... 

    Instrumental Inc

    Palo Alto, CA
    2 days ago
  • $98.58k - $138.02k

     ...Site Reliability Engineer II Restaurant365 is a SaaS company disrupting the restaurant industry! Our cloud-based platform provides a unique, centralized solution for accounting and back-office operations for restaurants. Restaurant365's culture is focused on empowering... 
    Work at office

    Restaurant365

    Palo Alto, CA
    2 days ago
  • $150k - $175k

     ...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed... 
    Remote work

    ASAPP

    Mountain View, CA
    4 days ago
  •  ...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine... 
    Worldwide
    Relocation

    Apple

    Sunnyvale, CA
    3 hours ago
  • $217.57k - $260k

     ...Left Behind" to enable all people to have a secure digital identity. To learn more, visit Role Overview The Staff Site Reliability Engineer, Infrastructure role is building a high-scale infrastructure team responsible for owning environments with thousands of... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours
    Shift work

    ID.me

    Mountain View, CA
    3 days ago
  • $175k - $250k

     ...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home... 
    Full time

    Figure

    Sunnyvale, CA
    3 days ago
  •  ...Director of Site Reliability Engineering You have discovered the perfect setting to expand your skills and make a meaningful impact. Partner with an organization committed to defining the future of site reliability in the financial sector. As a Director of Site... 

    Chase

    Palo Alto, CA
    4 days ago
  • $252k - $308k

     ...Staff Site Reliability Engineer Mountain View, US About EarnIn As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck... 
    Full time
    Work at office
    2 days per week

    Earnin

    Mountain View, CA
    1 day ago
  • $200k - $260k

     ...for enterprise trust, as we bring Work AI to every employee, in every company. About the Role: Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering excellence, drive technical strategy, and develop a high-performing,... 
    Work at office
    Home office
    Flexible hours

    Glean.info

    Mountain View, CA
    5 days ago
  • $180k - $260k

     ...effortless integration into customers' logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will... 
    Odd job
    Work at office
    Remote work

    Gatik AI

    Mountain View, CA
    3 days ago
  • $232.9k - $335.81k

     ...About the Role: We're looking for a Principal Site Reliability Engineer to join our Platform Engineering team - someone equally at...  ...years in DevOps/SRE/Platform Engineering, with demonstrated Staff- or Principal-scope impact and a track record of transforming... 
    Permanent employment

    Uniphore

    Palo Alto, CA
    20 hours ago
  •  ...that keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity...  ...are looking for an experienced Senior Site Reliability Engineer (SRE) with a strong background in... 
    Work experience placement

    Illumio

    Sunnyvale, CA
    5 days ago
  • $150k - $195k

     ...customers worldwide. Our team is growing, and we are looking for engineers with passion for automation. You will help support the...  ...alongside engineering/operations teams to improve the scalability and reliability of internal processes. Participate in an on‑call rotation.... 
    Full time
    Worldwide

    Fortinet, Inc.

    Sunnyvale, CA
    3 days ago
  •  ...cybersecurity will depend on you Learn how Illumio approaches AI with integrity — view our Transparency Statement. Senior Backend Software Engineer (Python (Golang a plus)) Hybrid: 2 days in office/week in Sunnyvale, CA In this role, you will focus on the Azure Firewall... 
    Work at office
    2 days per week

    Illumio

    Sunnyvale, CA
    4 days ago
  • Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement... 

    Amiri Recruiting

    Mountain View, CA
    7 days ago
  •  ...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid... 
    Work at office
    Weekend work

    FLUIX

    Palo Alto, CA
    1 day ago
  •  ...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native... 
    Remote job

    BuildBuddy

    Palo Alto, CA
    1 day ago
  • Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Knowledge of the application of tools/techniques Experience in one coding language (Preferred) Experience in Database (Preferred... 

    FII

    Sunnyvale, CA
    4 days ago
  • $147.4k - $272.1k

    Site Reliability Engineer, Enterprise Technology Services Sunnyvale, California, United States Software and Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring... 
    Relocation

    Apple Inc.

    Sunnyvale, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer. Be the first to apply!