Staff Site Reliability Engineer

$227k - $290k

Moveworks

Who We Are

Moveworks is the universal AI copilot for search and automation across all your business applications. We give employees one place to go to find information and get support while reducing costs for your business. The Moveworks Copilot is powered by an industry-leading Reasoning Engine that uses a combination of public and proprietary language models to understand employee queries, then build and execute multi-step plans that achieve them. It does this by linking into systems (like ITSM, HRIS, ERP, identity management, and more) with native and custom-built integrations that turn natural language into powerful automations for employees.

The world’s most innovative brands like Databricks, Broadcom, Hearst, and Palo Alto Networks trust Moveworks to eliminate repetitive support issues, deliver instant knowledge, and empower employees to work faster across applications.

Founded in 2016, Moveworks has raised $315 million in funding, at a valuation of $2.1 billion , thanks to our award-winning product and team. In 2023, we were included in the Forbes Cloud 100 list as well as the Forbes AI 50 for the fifth consecutive year. We were also recognized by the 2023 Edison Awards for AI Optimized Productivity, and were included on Fast Company's Most Innovative Companies list for 2024!

Moveworks has over 500 employees in six offices around the world, and is backed by some of the world's most prominent investors, including Kleiner Perkins, Lightspeed, Bain Capital Ventures, Sapphire Ventures, Iconiq, and more.

Come join one of the most innovative teams on the planet!

What You Will Do

As a site reliability engineer, you will be an owner of and be responsible for overall health, performance, and capacity of the Moveworks AI infrastructure and services. In addition to helping engineering teams with resolving operational issues, you will also design and implement solutions, tools and practices that help us improve operational efficiency and product SLA. This role is a blend of SRE, infrastructure, and software development.

We’re building a team that indexes on moving fast, solving challenging product/engineering problems and providing value to our customers. To be successful, you'll be partnering with and enabling machine learning, search, product, data, and full stack teams to design and build fault tolerant and scalable infrastructure, services and features. This is an opportunity to play an integral role at the fastest-growing AI startup in its space.

Design, develop, and evolve site reliability and chaos engineering for Moveworks infrastructure and services.
Closely work with machine learning, search, product, infrastructure, data, and frontend teams to understand their infrastructure and operational needs and build solutions that are optimal, fault tolerant, and scalable.
Author and advocate for reliability through best distributed system design patterns (error handling, retries, rate limiting, circuit breaking, etc.). Participate in design discussions and ensure operational readiness of infrastructure, services, and features.
Design and build tools, libraries, and frameworks that allow engineering teams to rapidly deploy and scale Moveworks infrastructure and applications.
Review and participate in application performance analysis / tuning and capacity planning.
Setup and maintain monitoring, metrics, and reporting systems for observability and actionable alerting.
Define internal and customer-facing key SLA metrics, implement solutions and practices with different teams to improve those metrics.
Own the engineering on-call process and setup. Drive discussions for outages, root cause analysis, and action items.
Participate in on-call rotation for second-tier escalation (at Moveworks, each engineer participates in the team specific first-tier on-call rotation). Help diagnose and resolve complex operational issues.

What You Bring To The Table

7+ years of experience in authoring and operating complex distributed infrastructure and applications
Strong experience with container orchestration platform like Kubernetes and cloud infrastructure like AWS / GCP / Azure
Very high proficiency with Unix/Linux, TCP/IP, DNS, load balancers, autoscaling, file systems and different types of data stores.
Software development proficiency with Python, Golang, Java, or C++
Experience working across teams and implementing solutions, tools, and practices to improve observability, reliability, and scalability
Desire to work at a startup pace in a small company with a high degree of ownership
Strong motivation, gumption, and an appetite for continuous, incremental changes and completing challenging projects fast
High level of curiosity about engineering outside of your immediate discipline and an incessant desire to learn
BS+ in computer science or a related field

Compensation Range : $227,000 - $290,000

*Our compensation package includes a market competitive salary, equity for all full time roles, exceptional benefits, and, for applicable roles, commissions or bonus plans.
Ultimately, in determining pay, final offers may vary from the amount listed based on geography, the role’s scope and complexity, the candidate’s experience and expertise, and other factors.

Moveworks Is An Equal Opportunity Employer
*Moveworks is proud to be an equal opportunity employer. We provide employment opportunities without regard to age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, veteran status, or any other characteristics protected by law.

Apply

Vacancy posted more than 2 months ago

Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer in Mountain View, CA vacancy

Director, Site Reliability Engineering Sunnyvale, CA , USA
$250k
...systems, eGain provides the single source of truth—explainable, reliable, and maintainable—that serves as the repository for all... ...at scale. Position Overview As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge management platform...
Suggested
Work at office
eGain Corporation
Sunnyvale, CA
4 days ago
Site Reliability Engineer II
...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is driven by a culture that thrives... ...basis, you will work on enhancing system reliability and scalability of Illumio SaaS products,...
Suggested
Work experience placement
Immediate start
Illumio
Sunnyvale, CA
5 days ago
Site Reliability Engineer
$145k - $175k
...Site Reliability Engineer (SRE) Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce...
Suggested
Work at office
Immediate start
Work from home
Bolt Graphics
Sunnyvale, CA
20 hours ago
Site Reliability Engineer
Job Description : Need to have experience with ticket support, azure, Splunk, ServiceNow, and any Java experience is a plus. Ideally candidates that come from an Enterprise background Handling tickets for the Walmart environment. Splunk, Servicenow...
Suggested
3B Staffing LLC
Sunnyvale, CA
3 days ago
Sr. Site Reliability Engineer
...keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is shaping the future of... ...are looking for an experienced Senior Site Reliability Engineer (SRE) with a strong background in...
Suggested
Work experience placement
Immediate start
Illumio
Sunnyvale, CA
5 days ago
Site Reliability Engineer
...Location: Sunnyvale, CA (3x/ week onsite) Duration: 6 months SRE - Site Reliability Engineer Responsibilities: Engage with our product teams to understand requirements, design and implement resilient and scalable infrastructure solutions....
Diverse Lynx
Sunnyvale, CA
4 days ago
Site Reliability Engineer
$170k - $200k
...Job Description We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high availability...
Full time
Worldwide
Fortinet
Sunnyvale, CA
3 days ago
Senior Site Reliability Engineer
...Senior Site Reliability Engineer Latitude AI develops automated driving technologies, including L3, for Ford vehicles at scale. We're driven by the opportunity to reimagine what it's like to drive and make travel safer, less stressful, and more enjoyable for everyone...
Work at office
Immediate start
Latitude AI
Palo Alto, CA
3 days ago
Site Reliability Engineer (SRE)
$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto / San Francisco Bay Area About Mithril Mithril is an AI infrastructure platform built to make GPU compute more accessible and affordable for the world's leading enterprises, AI startups, and the AI research community,...
Work at office
Local area
1 day per week
Mithril
Palo Alto, CA
3 days ago
Site Reliability Engineer (SRE)
...Overview: *Must have Apple experience* • At least 8+ years in a Reliability Engineering, DevOps or infrastructure focused role • Advanced experience with programming languages (Python, Java) • Passion for designing and building reliable systems • Strong sense...
Purple Drive
Sunnyvale, CA
3 days ago
Senior Site Reliability Engineer
...The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems that power Nectar's platform. We run high-volume data ingestion pipelines and real-time AI agents on top of a fast...
XRC Ventures
Palo Alto, CA
5 days ago
Senior Site Reliability Engineer (SRE)
$158k - $225k
...Senior Site Reliability Engineer (SRE) Manufacturing advanced electronics requires understanding millions of signals generated across complex assembly processes. Instrumental builds systems that capture and analyze those signals — images, test results, and process data...
Instrumental Inc
Palo Alto, CA
2 days ago
Site Reliability Engineer II
$98.58k - $138.02k
...Site Reliability Engineer II Restaurant365 is a SaaS company disrupting the restaurant industry! Our cloud-based platform provides a unique, centralized solution for accounting and back-office operations for restaurants. Restaurant365's culture is focused on empowering...
Work at office
Restaurant365
Palo Alto, CA
2 days ago
Senior Site Reliability Engineer
$150k - $175k
...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed...
Remote work
ASAPP
Mountain View, CA
4 days ago
Site Reliability Engineer, Enterprise Technology Services
...Site Reliability Engineer, Enterprise Technology Services At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine...
Worldwide
Relocation
Apple
Sunnyvale, CA
3 hours ago
Staff Site Reliability Engineer
$217.57k - $260k
...Left Behind" to enable all people to have a secure digital identity. To learn more, visit Role Overview The Staff Site Reliability Engineer, Infrastructure role is building a high-scale infrastructure team responsible for owning environments with thousands of...
Full time
Temporary work
Work at office
Remote work
Flexible hours
Shift work
ID.me
Mountain View, CA
3 days ago
Staff Site Reliability Engineer
$175k - $250k
...Staff Site Reliability Engineer Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home...
Full time
Figure
Sunnyvale, CA
3 days ago
Director of Site Reliability Engineering
...Director of Site Reliability Engineering You have discovered the perfect setting to expand your skills and make a meaningful impact. Partner with an organization committed to defining the future of site reliability in the financial sector. As a Director of Site...
Chase
Palo Alto, CA
4 days ago
Staff Site Reliability Engineer
$252k - $308k
...Staff Site Reliability Engineer Mountain View, US About EarnIn As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck...
Full time
Work at office
2 days per week
Earnin
Mountain View, CA
1 day ago
Lead Site Reliability Engineer
$200k - $260k
...for enterprise trust, as we bring Work AI to every employee, in every company. About the Role: Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering excellence, drive technical strategy, and develop a high-performing,...
Work at office
Home office
Flexible hours
Glean.info
Mountain View, CA
5 days ago
Senior/Staff Site Reliability Engineer
$180k - $260k
...effortless integration into customers' logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will...
Odd job
Work at office
Remote work
Gatik AI
Mountain View, CA
3 days ago
Principal Site Reliability Engineer
$232.9k - $335.81k
...About the Role: We're looking for a Principal Site Reliability Engineer to join our Platform Engineering team - someone equally at... ...years in DevOps/SRE/Platform Engineering, with demonstrated Staff- or Principal-scope impact and a track record of transforming...
Permanent employment
Uniphore
Palo Alto, CA
20 hours ago
Sr. Site Reliability Engineer
...that keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity... ...are looking for an experienced Senior Site Reliability Engineer (SRE) with a strong background in...
Work experience placement
Illumio
Sunnyvale, CA
5 days ago
Site Reliability Engineer
$150k - $195k
...customers worldwide. Our team is growing, and we are looking for engineers with passion for automation. You will help support the... ...alongside engineering/operations teams to improve the scalability and reliability of internal processes. Participate in an on‑call rotation....
Full time
Worldwide
Fortinet, Inc.
Sunnyvale, CA
3 days ago
Site Reliability Engineer II
...cybersecurity will depend on you Learn how Illumio approaches AI with integrity — view our Transparency Statement. Senior Backend Software Engineer (Python (Golang a plus)) Hybrid: 2 days in office/week in Sunnyvale, CA In this role, you will focus on the Azure Firewall...
Work at office
2 days per week
Illumio
Sunnyvale, CA
4 days ago
Site Reliability Engineer
Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters). Implement...
Amiri Recruiting
Mountain View, CA
7 days ago
Site Reliability Engineer
...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid...
Work at office
Weekend work
FLUIX
Palo Alto, CA
1 day ago
Senior Site Reliability Engineer - Remote & Scalable Impact
...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native...
Remote job
BuildBuddy
Palo Alto, CA
1 day ago
Site Reliability Engineer (Sunnyvale)
Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Knowledge of the application of tools/techniques Experience in one coding language (Preferred) Experience in Database (Preferred...
FII
Sunnyvale, CA
4 days ago
Site Reliability Engineer, Enterprise Technology Services
$147.4k - $272.1k
Site Reliability Engineer, Enterprise Technology Services Sunnyvale, California, United States Software and Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring...
Relocation
Apple Inc.
Sunnyvale, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer. Be the first to apply!