Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer

$227k - $290k

Moveworks

Who We Are 

Moveworks is the universal AI copilot for search and automation across all your business applications. We give employees one place to go to find information and get support while reducing costs for your business. The Moveworks Copilot is powered by an industry-leading Reasoning Engine that uses a combination of public and proprietary language models to understand employee queries, then build and execute multi-step plans that achieve them. It does this by linking into systems (like ITSM, HRIS, ERP, identity management, and more) with native and custom-built integrations that turn natural language into powerful automations for employees.  

The world’s most innovative brands like Databricks, Broadcom, Hearst, and Palo Alto Networks trust Moveworks to eliminate repetitive support issues, deliver instant knowledge, and empower employees to work faster across applications.

Founded in 2016, Moveworks has raised $315 million in funding, at a valuation of $2.1 billion , thanks to our award-winning product and team. In 2023, we were included in the Forbes Cloud 100 list as well as the Forbes AI 50 for the fifth consecutive year. We were also recognized by the 2023 Edison Awards for AI Optimized Productivity, and were included on Fast Company's Most Innovative Companies list for 2024! 

Moveworks has over 500 employees in six offices around the world, and is backed by some of the world's most prominent investors, including Kleiner Perkins, Lightspeed, Bain Capital Ventures, Sapphire Ventures, Iconiq, and more.

Come join one of the most innovative teams on the planet!

What You Will Do

As a site reliability engineer, you will be an owner of and be responsible for overall health, performance, and capacity of the Moveworks AI infrastructure and services. In addition to helping engineering teams with resolving operational issues, you will also design and implement solutions, tools and practices that help us improve operational efficiency and product SLA. This role is a blend of SRE, infrastructure, and software development.

We’re building a team that indexes on moving fast, solving challenging product/engineering problems and providing value to our customers. To be successful, you'll be partnering with and enabling machine learning, search, product, data, and full stack teams to design and build fault tolerant and scalable infrastructure, services and features. This is an opportunity to play an integral role at the fastest-growing AI startup in its space.

  • Design, develop, and evolve site reliability and chaos engineering for Moveworks infrastructure and services.
  • Closely work with machine learning, search, product, infrastructure, data, and frontend teams to understand their infrastructure and operational needs and build solutions that are optimal, fault tolerant, and scalable.
  • Author and advocate for reliability through best distributed system design patterns (error handling, retries, rate limiting, circuit breaking, etc.). Participate in design discussions and ensure operational readiness of infrastructure, services, and features.
  • Design and build tools, libraries, and frameworks that allow engineering teams to rapidly deploy and scale Moveworks infrastructure and applications.
  • Review and participate in application performance analysis / tuning and capacity planning.
  • Setup and maintain monitoring, metrics, and reporting systems for observability and actionable alerting. 
  • Define internal and customer-facing key SLA metrics, implement solutions and practices with different teams to improve those metrics.
  • Own the engineering on-call process and setup. Drive discussions for outages, root cause analysis, and action items.
  • Participate in on-call rotation for second-tier escalation (at Moveworks, each engineer participates in the team specific first-tier on-call rotation). Help diagnose and resolve complex operational issues.

What You Bring To The Table

  • 7+ years of experience in authoring and operating complex distributed infrastructure and applications
  • Strong experience with container orchestration platform like Kubernetes and cloud infrastructure like AWS / GCP / Azure
  • Very high proficiency with Unix/Linux, TCP/IP, DNS, load balancers, autoscaling, file systems and different types of data stores.
  • Software development proficiency with Python, Golang, Java, or C++
  • Experience working across teams and implementing solutions, tools, and practices to improve observability, reliability, and scalability
  • Desire to work at a startup pace in a small company with a high degree of ownership 
  • Strong motivation, gumption, and an appetite for continuous, incremental changes and completing challenging projects fast
  • High level of curiosity about engineering outside of your immediate discipline and an incessant desire to learn
  • BS+ in computer science or a related field

Compensation Range : $227,000 - $290,000

*Our compensation package includes a market competitive salary, equity for all full time roles, exceptional benefits, and, for applicable roles, commissions or bonus plans. 
Ultimately, in determining pay, final offers may vary from the amount listed based on geography, the role’s scope and complexity, the candidate’s experience and expertise, and other factors.

Moveworks Is An Equal Opportunity Employer
*Moveworks is proud to be an equal opportunity employer. We provide employment opportunities without regard to age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, veteran status, or any other characteristics protected by law.

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer in Mountain View, CA vacancy
  •  ...Job Description Job Description Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes... 
    Suggested

    Amiri Recruiting

    Mountain View, CA
    1 day ago
  • $217.57k - $260k

     ...Identity Left Behind" to enable all people to have a secure digital identity. To learn more, visit Role Overview The Staff Site Reliability Engineer, Infrastructure role is building a high-scale infrastructure team responsible for owning environments with thousands... 
    Suggested
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours
    Shift work

    ID.me

    Mountain View, CA
    1 day ago
  • $168.93k - $192.5k

     ...Left Behind" to enable all people to have a secure digital identity. To learn more, visit Role Overview We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE team builds the automation, observability, and operational... 
    Suggested
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours

    ID.me

    Mountain View, CA
    1 day ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas...  ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle... 
    Suggested

    Forhyre

    Sunnyvale, CA
    10 days ago
  • $180k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Suggested
    Permanent employment
    Temporary work
    Relocation

    xAI

    Palo Alto, CA
    a month ago
  •  ...design by customizing MES tool per business needs Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Experience in C#, Delphi desired Knowledge of the... 
    Work at office

    Foxconn Industrial Internet - FII

    Sunnyvale, CA
    1 day ago
  •  ...that keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity...  ...are looking for an experienced Senior Site Reliability Engineer (SRE) with a strong background in... 
    Work experience placement

    Illumio

    Sunnyvale, CA
    3 days ago
  • $150k - $195k

     ...customers worldwide. Our team is growing, and we are looking for engineers with passion for automation. You will help support the...  ...alongside engineering/operations teams to improve the scalability and reliability of internal processes. Participate in an on‑call rotation.... 
    Full time
    Worldwide

    Fortinet, Inc.

    Sunnyvale, CA
    1 day ago
  • $147.4k - $272.1k

    Site Reliability Engineer, Enterprise Technology Services Sunnyvale, California, United States Software and Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring... 
    Relocation

    Apple Inc.

    Sunnyvale, CA
    2 days ago
  • Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Knowledge of the application of tools/techniques Experience in one coding language (Preferred) Experience in Database (Preferred... 

    FII

    Sunnyvale, CA
    2 days ago
  • $210k - $270k

    Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities... 

    GoTo Meeting

    Palo Alto, CA
    4 days ago
  • $147.4k - $220.9k

    Site Reliability Engineer, Customer Systems Sunnyvale, California, United States Software and Services Imagine what you could do here. Apple is a place where extraordinary people gather to do their best work. Together we craft products and experiences people once couldn... 
    Relocation

    Apple Inc.

    Sunnyvale, CA
    2 days ago
  • $86.33k - $191.9k

     ...guardrails to make going fast also going safely. Identifying reliability anti-patterns and solving them systemically . You dive deep into...  ...of AI‑assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real... 
    Local area
    Flexible hours

    Traveltechessentialist

    Palo Alto, CA
    2 days ago
  •  ...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid... 
    Work at office
    Weekend work

    FLUIX

    Palo Alto, CA
    4 days ago
  • $210k - $270k

    Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You’ll be challenged with building frameworks and processes for ensuring uptime for our patients and providers... 
    Flexible hours

    GoTo Meeting

    Palo Alto, CA
    4 days ago
  • $180k - $360k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Temporary work
    Relocation

    Pantera Capital

    Palo Alto, CA
    2 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for Pylon’s production systems. This means designing and implementing...  ...scale as we grow. You’ll build tooling that makes the entire engineering team more effective, establish on‑call rotations and runbooks... 

    Pylon

    Palo Alto, CA
    1 day ago
  • $174k - $252k

    Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California... 
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $145k - $165k

     ...Your Ego : Selflessly collaborate towards our shared purpose. About the role Bolt Graphics is seeking a highly experienced Site Reliability Engineer (SRE) to design, build, and operate highly reliable developer and production systems. This role is mission-critical to... 
    Work at office
    Immediate start

    Bolt Graphics, Inc.

    Sunnyvale, CA
    1 day ago
  •  ..., and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns... 
    Remote work

    ASAPP

    Mountain View, CA
    19 days ago
  • $232k - $263k

     ...'re scaling quickly toward long-term growth and IPO readiness. Join us as we define the future of SaaS security! Sr. Staff Site Reliability Engineer As a Sr. Staff SRE at Obsidian , you will define and drive the company-wide reliability vision for a complex, multi... 
    Work from home
    Flexible hours

    Obsidian Security

    Palo Alto, CA
    27 days ago
  • $168.93k - $192.5k

     ...Left Behind" to enable all people to have a secure digital identity. To learn more, visit Role Overview We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE team builds the automation, observability, and operational... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours

    ID.me

    Mountain View, CA
    28 days ago
  • $250k

     ...systems, eGain provides the single source of truth—explainable, reliable, and maintainable—that serves as the repository for all...  ...and knowledge at scale. Position Overview As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge management... 
    Work at office

    eGain Corporation

    Sunnyvale, CA
    4 days ago
  •  ...Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available global infrastructure supporting production systems... 

    Prophet Town

    Mountain View, CA
    1 day ago
  • $180k - $260k

     ...facilitating effortless integration into customers’ logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will... 
    Odd job
    Work at office
    Remote work

    Booster

    Mountain View, CA
    4 days ago
  • $147.4k - $272.1k

    Site Reliability Engineer (Edge Services), Infrastructure Services Sunnyvale, California, United States Software and Services We are seeking a proactive Site Reliability Engineer to champion the evolution of our production ecosystems. In this role, you will help drive... 
    Relocation
    Shift work

    Apple Inc.

    Sunnyvale, CA
    2 days ago
  • $207k - $300k

    Staff Software Engineer, Site Reliability Engineering, Traffic Virtnet corporate_fare Google place Sunnyvale, CA, USA Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience with software development in one or... 
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $145k - $165k

    A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key... 

    Bolt Graphics, Inc.

    Sunnyvale, CA
    1 day ago
  • A leading tech recruiting firm is seeking a Site Reliability Engineer to manage and optimize cloud infrastructure primarily using GCP or AWS. The role involves maintaining high availability through Kubernetes clusters and improving CI/CD pipelines with Terraform. Ideal... 

    Amiri Recruiting

    Mountain View, CA
    4 days ago
  • $207k - $300k

    Software Engineering Manager II, Site Reliability Engineering corporate_fare Google Sunnyvale, CA, USA Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience with software development in one or more programming... 
    Full time

    Google Inc.

    Sunnyvale, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer. Be the first to apply!