Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer, Enterprise Technology Services

Apple

Site Reliability Engineer, Enterprise Technology Services

At Apple, groundbreaking ideas quickly transform into extraordinary products and services that delight millions worldwide. If you're passionate about engineering and operating robust, large-scale systems, imagine the impact you could make. The Identity Management Services (IdMS) SRE team is seeking a Service Reliability Engineer (SRE) to design, build tools for, and support our critical platform services. We're looking for someone with strong software development skills, deep systems expertise, and a solid understanding of SRE principles, ready to ensure operational precision at Apple's immense scale. Your work will be pivotal in powering services across Apple, partnering with engineering teams to deliver seamless experiences.

This role involves managing one of the largest Identity Management Platform services for a vast customer base across various devices and services. Key responsibilities include overseeing critical services such as device provisioning, authentication, token management, and security. A primary objective is ensuring the high availability and reliability of the system to facilitate critical authentication and authorization transactions, user provisioning, purchases, subscriptions, and account lifecycle management (creation, management, and recovery). This also entails maintaining platform security by blocking and rate-limiting fraud traffic at the perimeter, and ensuring high data consistency and replication across multiple data centers through custom mechanisms. The role covers managing infrastructure, capacity planning, disaster recovery, and auto-failover mechanisms. It also involves monitoring infrastructure and application services, driving incident management for internal and external stakeholders, and defining system and functional observability. Furthermore, this position helps teams overcome system bottlenecks and architectural challenges for efficiency improvements, ensures systems are compliant with industry standards and pass critical audits, and drives automation solutions for large-scale platform service needs. Advanced responsibilities include alert engineering, anomaly detection with Machine Learning tools, and adapting to Generative AI enhancements. Investigating device-related issues by debugging relevant logs is also part of the role, alongside managing the full system lifecycle, including configuration and code deployment in user acceptance test and production environments.

The responsibilities include:

  • Drive Platform Reliability & SRE Standards: Lead the optimization of a large-scale Identity Management Platform, ensuring ultra-high availability, reliability, and performance for critical authentication, authorization, and provisioning services. Define and implement robust Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) to guide engineering teams toward ambitious reliability and observability goals.
  • Architect & Engineer Resilient Systems: Design, build, and manage robust, distributed systems across cloud and on-premise infrastructure. Develop advanced capacity planning, disaster recovery, auto-failover, and data consistency mechanisms. Innovate by creating reusable tooling, automation frameworks, and advanced reliability platforms covering observability, alerting, chaos testing, auto-scaling, and failover strategies.
  • Lead Operational Excellence & Incident Management: Drive comprehensive operational excellence through advanced observability (tracing, logging, metrics, alerting) and next-generation telemetry, leveraging Machine Learning for anomaly detection and exploring GenAI for alert engineering. Lead technical response during major incidents, conducting deep post-mortems, driving systemic improvements, and embedding preventive architectural controls.
  • Champion Automation & Resilience Engineering: Develop and implement large-scale automation solutions, internal tooling, and frameworks to enhance reliability, cost-efficiency, and operational visibility. Advance resilience engineering by integrating automation pipelines, CI/CD, canary releases, and chaos engineering principles into core development and deployment workflows. Drive initiatives to eliminate toil and contribute to multi-cloud strategy.
  • Ensure Security & Compliance: Maintain the highest security posture, implementing fraud prevention at the perimeter, and ensuring strict adherence to industry compliance standards (e.g., ISO-27001, PCI). Uphold all architectural and operational practices to rigorously meet security standards, compliance requirements, and audit readiness protocols.
  • Foster Cross-Functional Collaboration: Partner extensively with engineering, production support, and QA teams to ensure seamless service delivery. Promote a strong DevOps culture and provide technical insights through log analysis and system debugging.

Minimum qualifications include:

  • 5+ years of experience in Site Reliability Engineering with a strong focus on building, scaling, and operating large-scale distributed platform services, and Java.
  • BS degree in computer science or equivalent field with 7+ years of experience or MS degree in computer science or equivalent field with 5+ years of experience.
  • Strong technical grasp and experience working on Open Source technologies designed for large-scale data processing.
  • Experience designing, analyzing, and troubleshooting distributed systems.
  • Proficiency in at least one programming or scripting language (Python, Java, Go, Bash, Ansible, or similar).
  • Experience designing observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, ELK, etc.).
  • Excellent troubleshooting and problem-solving skills.

Preferred qualifications include:

  • Observability & SRE Principles: Experience with monitoring and logging tools (e.g., Prometheus, Splunk, Grafana, OpenTelemetry) and a strong understanding of SRE principles, including observability, error budgeting, and service reliability metrics (SLA, SLO, SLI).
  • CI/CD & Automation: Proficiency with CI/CD, Release Engineering, DevOps practices, and source control (Git). Experience designing and implementing CI/CD pipelines and Infrastructure as Code (Helm, CRD).
  • Programming & Data Systems: Strong programming skills in languages like Java, Python, Go, etc. Experience with various databases (Relational, NoSQL, OLAP) and event-driven architectures (Kafka, RabbitMQ).
  • Reliability & Operations: Experience with on-call, including incident/problem management (PIR, RCA) and a strong sense of ownership for system reliability.
  • Security & Compliance: Understanding of security standards, policies, cryptography, and authentication (OAuth, SAML, SSO). Knowledge of Governance and Compliance.
  • Innovation & Collaboration: Passion for designing reliable systems, advocating for automation, and a desire to collaborate effectively. Experience leveraging ML/GenAI for operational efficiency is a plus.
  • Certification: Cybersecurity certification will be an added advantage.
  • Education: Bachelor's or Master's degree in Computer Science or equivalent practical experience.

Pay and benefits at Apple include comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.

Vacancy posted 5 hours ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer, Enterprise Technology Services in Sunnyvale, CA vacancy
  • $147.4k - $272.1k

    Site Reliability Engineer, Enterprise Technology Services Sunnyvale, California, United States Software and Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring... 
    Suggested
    Relocation

    Apple Inc.

    Sunnyvale, CA
    4 days ago
  • Apple Inc. is seeking a Site Reliability Engineer for its Enterprise Technology Services in Sunnyvale, California. In this role, you will collaborate with application teams to automate operations, optimize infrastructure, and ensure systems are reliable and high-performing... 
    Suggested

    Apple Inc.

    Sunnyvale, CA
    3 days ago
  • $250k

     ...management solutions for enterprises. As organizations...  ...truth—explainable, reliable, and maintainable—...  ..., with customer service as the primary point...  ...As Director of Site Reliability Engineering, you will ensure that...  ...with modern cloud technologies and solve complex reliability... 
    Suggested
    Work at office

    eGain Corporation

    Sunnyvale, CA
    4 days ago
  • $228.1k - $393.8k

    Site Reliability Engineering Manager, Storage - Apple Services Engineering Cupertino, California, United States Software and Services Are you a talented Engineering...  ...and lively team bringing distributed storage technologies to Apple's infrastructure? At Apple, scale is... 
    Suggested
    Relocation

    Apple Inc.

    Cupertino, CA
    3 days ago
  • $147k - $237.5k

     ...solving real‑world problems with cutting‑edge technology and bold thinking. Here, everyone has a...  ...Summary Your Career: Strata Logging Service (SLS) powers advanced cybersecurity...  ...normalize and correlate data across the entire enterprise. SLS can: Radically simplify customer... 
    Suggested
    Full time
    Work at office
    Local area
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    5 days ago
  • $198.3k - $342.8k

    Site Reliability Engineering Manager, eBusiness Services Sunnyvale, California, United States Software and Services Imagine...  ...everything we do, from amazing technology to industry‑leading...  ...managing complex SRE projects, enterprise services at a large scale. Bachelor... 
    Relocation

    Apple Inc.

    Sunnyvale, CA
    3 days ago
  •  ...running. Location: 5 on-site days a week in...  ...Team's Vision: Our Engineering team is driven by a culture...  ...with a cutting-edge technology stack that spans...  ...responsible for designing new services and applications in...  ...on enhancing system reliability and scalability of Illumio... 
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    5 days ago
  •  ...running. Location: 5 on-site days a week in Sunnyvale...  ...Team's Vision: Our Engineering team is shaping the...  ...a highly scalable SaaS service built using cloud-native technologies while simultaneously shipping...  ...experienced Senior Site Reliability Engineer (SRE) with a... 
    Work experience placement
    Immediate start

    Illumio

    Sunnyvale, CA
    5 days ago
  • $148k - $235.75k

     ...innovation that’s fueled by great technology—and amazing people. Today,...  ...be working as a Senior SRE Engineer. The position will be part...  .... Maintain uptime, reliability and readiness of on-prem engineering...  ...multiple data centers. Guard service level agreements (SLAs) for... 
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $172.1k - $305.6k

     ...United States Software and Services The Apple Services Engineering team is one of the most...  ...passion for combining art and technology. These are the people who...  ...solutions. The Service Reliability Engineering (SRE) team is...  ...project management for Site Reliability Engineering /... 
    Relocation

    Apple Inc.

    Cupertino, CA
    4 days ago
  •  ...Greetings from Rootshell Rootshell Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking SRE Devops Engineer Fulltime Role for one of our direct client. Role: SRE Devops Engineer... 
    Full time
    Local area
    Remote work

    Rootshell Enterprise Technologies

    Santa Clara, CA
    3 days ago
  •  ...: 200663929-3956 Summary We are seeking a proactive Site Reliability Engineer to champion the evolution of our production ecosystems. In...  ...reliability framework. You will play a pivotal role in ensuring our services are resilient, scalable, and observable, bridging the gap... 
    Shift work

    Apple

    Sunnyvale, CA
    3 days ago
  • $126k - $204.5k

     ...problems with cutting-edge technology and bold thinking. Here, everyone...  ...closely with our engineering teams to develop innovative...  ...teams to ensure that all of our services have the right monitoring and...  ...the product and ensure the reliability and availability of our services... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  •  ...real-world problems with cutting-edge technology and bold thinking. Here, everyone has...  ...XSOAR, and XPANSE. As a Principal Site Reliability Engineer within the Cortex DevOps team, you...  ...build innovative solutions that improve service availability, performance, and... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  • $200k - $260k

     ...industry's most advanced enterprise search has evolved into a...  ...: Glean is seeking a Site Reliability Engineering Lead to foster a culture of...  ...is pivotal in ensuring our services meet stringent Service Level...  ...with containerization technologies, including Docker and Kubernetes... 
    Work at office
    Home office
    Flexible hours

    Glean.info

    Mountain View, CA
    5 days ago
  •  ...running. Location: 5 on-site days a week in Sunnyvale...  ...Our Team's Vision: Our Engineering team is shaping the future...  ...a highly scalable SaaS service built using cloud‑native technologies while simultaneously...  ...experienced Senior Site Reliability Engineer (SRE) with a strong... 
    Work experience placement

    Illumio

    Sunnyvale, CA
    5 days ago
  • $152k - $241.5k

     ...innovation that’s fueled by great technology—and amazing people. NVIDIA...  ...heart of our products and services. Our work opens up new...  ...lifecycle management, fleet reliability/auto‑healing, E2E observability...  ...Perl, or Ruby. Mentored other engineers and influenced technical... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...innovation that’s fueled by great technology—and amazing people. Today,...  ...high‑volume telemetry into reliable, job‑centric insights and...  ...Join our team of innovative engineers who are building this platform...  ...hands‑on ops for Kubernetes/services/streaming stacks are ideal;... 

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $151.6k - $245.3k

     ...real-world problems with cutting-edge technology and bold thinking. Here, everyone has...  ...GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience...  ...be part of a team supporting the services that provide end-to-end visibility and... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  •  ...world problems with cutting‑edge technology and bold thinking. Here,...  ...delivering secure infrastructure for our Enterprise, SaaS, and Public Cloud security services. With your mad SRE/DevSecOps...  ...Citizen. BS/MS in Computer Science/Engineering or equivalent training,... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    1 day ago
  • A leading AI technology firm in California seeks a Principal or Distinguished Engineer to develop enterprise-grade agentic AI systems. The role requires extensive experience with large-scale distributed systems, proficiency in Python/Go, and expertise in Kubernetes. The... 

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $150k - $180k

     ...scale BESS) through our proprietary technology. This allows Verrus to be a flexible...  ...resource to the grid, and provide grid services that facilitate interconnection and the...  ...to serve as software-focused Senior Site Reliability Engineer at Verrus. This is a full‑time... 
    Full time
    Work at office
    Local area
    Flexible hours

    Verrus, LLC

    Mountain View, CA
    5 days ago
  • A leading technology company is seeking a Site Reliability Engineer in Cupertino, California. The role involves owning the reliability of AWS and Kubernetes services, designing systems, and collaborating with engineering teams for observability and automation. Candidates... 

    Apple Inc.

    Cupertino, CA
    1 day ago
  •  ...Tech Solutions LLC is seeking an experienced Technical Services / Cloud Solutions Engineer to provide customer-facing technical support and cloud...  ...architecture guidance. This role involves working closely with enterprise customers to ensure successful cloud adoption and... 

    Glint Tech Solutions LLC

    Sunnyvale, CA
    19 hours ago
  •  ...minimally invasive care, our technologies—like the da Vinci surgical...  ...worldwide. We’re a team of engineers, clinicians, and innovators...  ...highly experienced Salesforce Service Cloud Architect to lead the...  ...Service Cloud architecture, enterprise CRM transformation, and customer... 
    Full time
    Local area
    Worldwide
    Flexible hours
    Shift work

    Intuitive

    Sunnyvale, CA
    4 days ago
  • $175.8k - $264.2k

    Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud Cupertino, CA People at Apple don't just build products - they craft experiences...  ...and process enhancements. Evaluate and integrate new technologies to improve system reliability, security, and... 

    Hong Kong Study Skills Research Institute

    Cupertino, CA
    4 days ago
  • $120.3k - $194.53k

     ...real-world problems with cutting‑edge technology and bold thinking. Here, everyone has...  ...across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team...  ...supporting Advanced DNS Security services. This includes automation, architecture... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    2 days ago
  • $174k - $252k

    Google Inc. is seeking a Senior Software ML Engineer in Sunnyvale, CA to lead the development of agentic features that enhance enterprise productivity using Generative AI technologies. The candidate must possess a Bachelor's degree in Computer Science or a related field... 
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  •  ...adopted by over 3,000 enterprises, including Apple,...  ..., ensure AI system reliability, and implement...  ...seeking an experienced Site Reliability Engineering (SRE) Tech Lead to...  ...in containerization technologies (Docker, Kubernetes...  ...Experience with service mesh technologies and... 
    Remote work
    Home office
    Flexible hours

    DataHub Inc

    Palo Alto, CA
    2 days ago
  • Danta Technologies is seeking a candidate with strong expertise in SAP CS (Customer Service module) for a position based in Santa Clara, California. The role requires at least 3 full lifecycle implementations of SAP CS and knowledge in S/4HANA Service / Service Management... 
    Casual work
    Remote work

    Danta Technologies

    Santa Clara, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer, Enterprise Technology Services. Be the first to apply!