Site Reliability Engineer, Enterprise Technology Services

Apple

Site Reliability Engineer, Enterprise Technology Services

At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine the impact you could make. The Identity Management Services (IdMS) SRE team is seeking a Service Reliability Engineer (SRE) to design, build tools for, and support our critical platform services. We're looking for someone with strong software development skills, deep systems expertise, and a solid understanding of SRE principles, ready to ensure operational precision at Apple's immense scale. Your work will be pivotal in powering services across Apple, partnering with engineering teams to deliver seamless experiences.

This role involves managing one of the largest Identity Management Platform services for a vast customer base across various devices and services. Key responsibilities include overseeing critical services such as device provisioning, authentication, token management, and security. A primary objective is ensuring the high availability and reliability of the system to facilitate critical authentication and authorization transactions, user provisioning, purchases, subscriptions, and account lifecycle management (creation, management, and recovery). This also entails maintaining platform security by blocking and rate-limiting fraud traffic at the perimeter, and ensuring high data consistency and replication across multiple data centers through custom mechanisms. The role covers managing infrastructure, capacity planning, disaster recovery, and auto-failover mechanisms. It also involves monitoring infrastructure and application services, driving incident management for internal and external stakeholders, and defining system and functional observability. Furthermore, this position helps teams overcome system bottlenecks and architectural challenges for efficiency improvements, ensures systems are compliant with industry standards and pass critical audits, and drives automation solutions for large-scale platform service needs. Advanced responsibilities include alert engineering, anomaly detection with Machine Learning tools, and adapting to Generative AI enhancements. Investigating device-related issues by debugging relevant logs is also part of the role, alongside managing the full system lifecycle, including configuration and code deployment in user acceptance test and production environments.

The responsibilities include:

Drive Platform Reliability & SRE Standards: Lead the optimization of a large-scale Identity Management Platform, ensuring ultra-high availability, reliability, and performance for critical authentication, authorization, and provisioning services. Define and implement robust Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) to guide engineering teams toward ambitious reliability and observability goals.
Architect & Engineer Resilient Systems: Design, build, and manage robust, distributed systems across cloud and on-premise infrastructure. Develop advanced capacity planning, disaster recovery, auto-failover, and data consistency mechanisms. Innovate by creating reusable tooling, automation frameworks, and advanced reliability platforms covering observability, alerting, chaos testing, auto-scaling, and failover strategies.
Lead Operational Excellence & Incident Management: Drive comprehensive operational excellence through advanced observability (tracing, logging, metrics, alerting) and next-generation telemetry, leveraging Machine Learning for anomaly detection and exploring GenAI for alert engineering. Lead technical response during major incidents, conducting deep post-mortems, driving systemic improvements, and embedding preventive architectural controls.
Champion Automation & Resilience Engineering: Develop and implement large-scale automation solutions, internal tooling, and frameworks to enhance reliability, cost-efficiency, and operational visibility. Advance resilience engineering by integrating automation pipelines, CI/CD, canary releases, and chaos engineering principles into core development and deployment workflows. Drive initiatives to eliminate toil and contribute to multi-cloud strategy.
Ensure Security & Compliance: Maintain the highest security posture, implementing fraud prevention at the perimeter, and ensuring strict adherence to industry compliance standards (e.g., ISO-27001, PCI). Uphold all architectural and operational practices to rigorously meet security standards, compliance requirements, and audit readiness protocols.
Foster Cross-Functional Collaboration: Partner extensively with engineering, production support, and QA teams to ensure seamless service delivery. Promote a strong DevOps culture and provide technical insights through log analysis and system debugging.

Minimum qualifications include:

5+ years of experience in Site Reliability Engineering with a strong focus on building, scaling, and operating large-scale distributed platform services, and Java.
BS degree in computer science or equivalent field with 7+ years of experience or MS degree in computer science or equivalent field with 5+ years of experience.
Strong technical grasp and experience working on Open Source technologies designed for large-scale data processing.
Experience designing, analyzing, and troubleshooting distributed systems.
Proficiency in at least one programming or scripting language (Python, Java, Go, Bash, Ansible, or similar).
Experience designing observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, ELK, etc.).
Excellent troubleshooting and problem-solving skills.

Preferred qualifications include:

Observability & SRE Principles: Experience with monitoring and logging tools (e.g., Prometheus, Splunk, Grafana, OpenTelemetry) and a strong understanding of SRE principles, including observability, error budgeting, and service reliability metrics (SLA, SLO, SLI).
CI/CD & Automation: Proficiency with CI/CD, Release Engineering, DevOps practices, and source control (Git). Experience designing and implementing CI/CD pipelines and Infrastructure as Code (Helm, CRD).
Programming & Data Systems: Strong programming skills in languages like Java, Python, Go, etc. Experience with various databases (Relational, NoSQL, OLAP) and event-driven architectures (Kafka, RabbitMQ).
Reliability & Operations: Experience with on-call, including incident/problem management (PIR, RCA) and a strong sense of ownership for system reliability.
Security & Compliance: Understanding of security standards, policies, cryptography, and authentication (OAuth, SAML, SSO). Knowledge of Governance and Compliance.
Innovation & Collaboration: Passion for designing reliable systems, advocating for automation, and a desire to collaborate effectively. Experience leveraging ML/GenAI for operational efficiency is a plus.
Certification: Cybersecurity certification will be an added advantage.
Education: Bachelor's or Master's degree in Computer Science or equivalent practical experience.

Pay and benefits at Apple include comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.

Apply

Vacancy posted 5 hours ago

Similar jobs that could be interesting for youBased on the Site Reliability Engineer, Enterprise Technology Services in Sunnyvale, CA vacancy

Site Reliability Engineer, Enterprise Technology Services
$147.4k - $272.1k
Site Reliability Engineer, Enterprise Technology Services Sunnyvale, California, United States Software and Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring...
Suggested
Relocation
Apple Inc.
Sunnyvale, CA
4 days ago
SRE & DevOps Engineer, Enterprise Tech Services
Apple Inc. is seeking a Site Reliability Engineer for its Enterprise Technology Services in Sunnyvale, California. In this role, you will collaborate with application teams to automate operations, optimize infrastructure, and ensure systems are reliable and high-performing...
Suggested
Apple Inc.
Sunnyvale, CA
3 days ago
Director, Site Reliability Engineering Sunnyvale, CA , USA
$250k
...management solutions for enterprises. As organizations... ...truth—explainable, reliable, and maintainable—... ..., with customer service as the primary point... ...As Director of Site Reliability Engineering, you will ensure that... ...with modern cloud technologies and solve complex reliability...
Suggested
Work at office
eGain Corporation
Sunnyvale, CA
4 days ago
Site Reliability Engineering Manager, Storage - Apple Services Engineering
$228.1k - $393.8k
Site Reliability Engineering Manager, Storage - Apple Services Engineering Cupertino, California, United States Software and Services Are you a talented Engineering... ...and lively team bringing distributed storage technologies to Apple's infrastructure? At Apple, scale is...
Suggested
Relocation
Apple Inc.
Cupertino, CA
3 days ago
Principal Engineer Software (Strata Logging Service)
$147k - $237.5k
...solving real‑world problems with cutting‑edge technology and bold thinking. Here, everyone has a... ...Summary Your Career: Strata Logging Service (SLS) powers advanced cybersecurity... ...normalize and correlate data across the entire enterprise. SLS can: Radically simplify customer...
Suggested
Full time
Work at office
Local area
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
5 days ago
Site Reliability Engineering Manager, eBusiness Services
$198.3k - $342.8k
Site Reliability Engineering Manager, eBusiness Services Sunnyvale, California, United States Software and Services Imagine... ...everything we do, from amazing technology to industry‑leading... ...managing complex SRE projects, enterprise services at a large scale. Bachelor...
Relocation
Apple Inc.
Sunnyvale, CA
3 days ago
Site Reliability Engineer II
...running. Location: 5 on-site days a week in... ...Team's Vision: Our Engineering team is driven by a culture... ...with a cutting-edge technology stack that spans... ...responsible for designing new services and applications in... ...on enhancing system reliability and scalability of Illumio...
Work experience placement
Immediate start
Illumio
Sunnyvale, CA
5 days ago
Sr. Site Reliability Engineer
...running. Location: 5 on-site days a week in Sunnyvale... ...Team's Vision: Our Engineering team is shaping the... ...a highly scalable SaaS service built using cloud-native technologies while simultaneously shipping... ...experienced Senior Site Reliability Engineer (SRE) with a...
Work experience placement
Immediate start
Illumio
Sunnyvale, CA
5 days ago
Senior Site Reliability Engineer
$148k - $235.75k
...innovation that’s fueled by great technology—and amazing people. Today,... ...be working as a Senior SRE Engineer. The position will be part... .... Maintain uptime, reliability and readiness of on-prem engineering... ...multiple data centers. Guard service level agreements (SLAs) for...
Remote work
NVIDIA
Santa Clara, CA
4 days ago
Engineering Program Manager, Private Cloud Compute - SRE, Apple Services Engineering
$172.1k - $305.6k
...United States Software and Services The Apple Services Engineering team is one of the most... ...passion for combining art and technology. These are the people who... ...solutions. The Service Reliability Engineering (SRE) team is... ...project management for Site Reliability Engineering /...
Relocation
Apple Inc.
Cupertino, CA
4 days ago
SRE Devops Engineer
...Greetings from Rootshell Rootshell Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking SRE Devops Engineer Fulltime Role for one of our direct client. Role: SRE Devops Engineer...
Full time
Local area
Remote work
Rootshell Enterprise Technologies
Santa Clara, CA
3 days ago
Site Reliability Engineer (Edge Services), Infrastructure Services
...: 200663929-3956 Summary We are seeking a proactive Site Reliability Engineer to champion the evolution of our production ecosystems. In... ...reliability framework. You will play a pivotal role in ensuring our services are resilient, scalable, and observable, bridging the gap...
Shift work
Apple
Sunnyvale, CA
3 days ago
Senior Staff Site Reliability Engineer
$126k - $204.5k
...problems with cutting-edge technology and bold thinking. Here, everyone... ...closely with our engineering teams to develop innovative... ...teams to ensure that all of our services have the right monitoring and... ...the product and ensure the reliability and availability of our services...
Full time
Work at office
Palo Alto Networks
Santa Clara, CA
2 days ago
Principal Site Reliability Engineer
...real-world problems with cutting-edge technology and bold thinking. Here, everyone has... ...XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you... ...build innovative solutions that improve service availability, performance, and...
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks
Santa Clara, CA
2 days ago
Lead Site Reliability Engineer
$200k - $260k
...industry's most advanced enterprise search has evolved into a... ...: Glean is seeking a Site Reliability Engineering Lead to foster a culture of... ...is pivotal in ensuring our services meet stringent Service Level... ...with containerization technologies, including Docker and Kubernetes...
Work at office
Home office
Flexible hours
Glean.info
Mountain View, CA
5 days ago
Sr. Site Reliability Engineer
...running. Location: 5 on-site days a week in Sunnyvale... ...Our Team's Vision: Our Engineering team is shaping the future... ...a highly scalable SaaS service built using cloud‑native technologies while simultaneously... ...experienced Senior Site Reliability Engineer (SRE) with a strong...
Work experience placement
Illumio
Sunnyvale, CA
5 days ago
Senior Site Reliability Engineer - HPC
$152k - $241.5k
...innovation that’s fueled by great technology—and amazing people. NVIDIA... ...heart of our products and services. Our work opens up new... ...lifecycle management, fleet reliability/auto‑healing, E2E observability... ...Perl, or Ruby. Mentored other engineers and influenced technical...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior Site Reliability Engineer, AIOPs
...innovation that’s fueled by great technology—and amazing people. Today,... ...high‑volume telemetry into reliable, job‑centric insights and... ...Join our team of innovative engineers who are building this platform... ...hands‑on ops for Kubernetes/services/streaming stacks are ideal;...
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Principal Site Reliability Engineer ( U.S Citizenship required )
$151.6k - $245.3k
...real-world problems with cutting-edge technology and bold thinking. Here, everyone has... ...GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience... ...be part of a team supporting the services that provide end-to-end visibility and...
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
4 days ago
Principal SRE Engineer (US Citizen)
...world problems with cutting‑edge technology and bold thinking. Here,... ...delivering secure infrastructure for our Enterprise, SaaS, and Public Cloud security services. With your mad SRE/DevSecOps... ...Citizen. BS/MS in Computer Science/Engineering or equivalent training,...
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
1 day ago
Principal Engineer, Enterprise Agentic AI Platform
A leading AI technology firm in California seeks a Principal or Distinguished Engineer to develop enterprise-grade agentic AI systems. The role requires extensive experience with large-scale distributed systems, proficiency in Python/Go, and expertise in Kubernetes. The...
NVIDIA Corporation
Santa Clara, CA
4 days ago
Staff Site Reliability Engineer
$150k - $180k
...scale BESS) through our proprietary technology. This allows Verrus to be a flexible... ...resource to the grid, and provide grid services that facilitate interconnection and the... ...to serve as software-focused Senior Site Reliability Engineer at Verrus. This is a full‑time...
Full time
Work at office
Local area
Flexible hours
Verrus, LLC
Mountain View, CA
5 days ago
Site Reliability Engineer: Platform & Observability
A leading technology company is seeking a Site Reliability Engineer in Cupertino, California. The role involves owning the reliability of AWS and Kubernetes services, designing systems, and collaborating with engineering teams for observability and automation. Candidates...
Apple Inc.
Cupertino, CA
1 day ago
Enterprise Cloud Solutions Engineer - Client-Facing
...Tech Solutions LLC is seeking an experienced Technical Services / Cloud Solutions Engineer to provide customer-facing technical support and cloud... ...architecture guidance. This role involves working closely with enterprise customers to ensure successful cloud adoption and...
Glint Tech Solutions LLC
Sunnyvale, CA
19 hours ago
SFDC - Service Cloud Solutions Architect
...minimally invasive care, our technologies—like the da Vinci surgical... ...worldwide. We’re a team of engineers, clinicians, and innovators... ...highly experienced Salesforce Service Cloud Architect to lead the... ...Service Cloud architecture, enterprise CRM transformation, and customer...
Full time
Local area
Worldwide
Flexible hours
Shift work
Intuitive
Sunnyvale, CA
4 days ago
Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud at Apple Cupertino, CA
$175.8k - $264.2k
Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud Cupertino, CA People at Apple don't just build products - they craft experiences... ...and process enhancements. Evaluate and integrate new technologies to improve system reliability, security, and...
Hong Kong Study Skills Research Institute
Cupertino, CA
4 days ago
Sr Site Reliability Engineer (Internet Security Platform)
$120.3k - $194.53k
...real-world problems with cutting‑edge technology and bold thinking. Here, everyone has... ...across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team... ...supporting Advanced DNS Security services. This includes automation, architecture...
Full time
Work at office
Visa sponsorship
Work visa
Palo Alto Networks, Inc.
Santa Clara, CA
2 days ago
Senior GenAI Software Engineer - Enterprise AI
$174k - $252k
Google Inc. is seeking a Senior Software ML Engineer in Sunnyvale, CA to lead the development of agentic features that enhance enterprise productivity using Generative AI technologies. The candidate must possess a Bachelor's degree in Computer Science or a related field...
Full time
Google Inc.
Sunnyvale, CA
4 days ago
Site Reliability Engineering Tech Lead
...adopted by over 3,000 enterprises, including Apple,... ..., ensure AI system reliability, and implement... ...seeking an experienced Site Reliability Engineering (SRE) Tech Lead to... ...in containerization technologies (Docker, Kubernetes... ...Experience with service mesh technologies and...
Remote work
Home office
Flexible hours
DataHub Inc
Palo Alto, CA
2 days ago
SAP CS Consultant - S/4HANA Service, Onsite w/ Travel
Danta Technologies is seeking a candidate with strong expertise in SAP CS (Customer Service module) for a position based in Santa Clara, California. The role requires at least 3 full lifecycle implementations of SAP CS and knowledge in S/4HANA Service / Service Management...
Casual work
Remote work
Danta Technologies
Santa Clara, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer, Enterprise Technology Services. Be the first to apply!