Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer II

$102.1k - $202.2k

Microsoft Corporation

Overview

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.

Microsoft's Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture.

Within Azure Data, the databases team builds and maintains Microsoft's operational Database systems. We store and manage data in a structured way to enable multitude of applications across various industries. We are on a journey to enable developer friendly, mission-critical, AI enabled operational Databases across relational, non-relational and OSS offerings.

We believe in making the day in the life of the On-Call Engineer boring while living up to the expectations of a massive cloud service with stringent Service Level Objectives (SLO's). We do this by thinking differently, stretching ourselves to go all the way to the root of the problem, keeping data in front and center for all our decisions and taking a systems approach for generating outcomes that far exceeds the expectations. Helping attain the aspirational Service Level Objectives (SLO's) through pragmatic innovation is what sets the SRE's in Cosmos DB apart. If you share the same purpose, cause and belief and have passion to follow this pursuit, please read the rest of the Job description on what we do, and we would love to have you join us!

Azure Cosmos DB is Microsoft's next generation of globally distributed, massively scalable, multi-model cloud database service. It is designed to enable developers to build planet-scale applications. Azure Cosmos DB is one of the fastest growing Azure services. Joining the Azure Cosmos DB team is a fantastic opportunity to work with incredibly talented engineers operating like a startup and be at the forefront of building and shaping the Livesite Automation and AI Ops stack in Cosmos DB and lead the path for broader adoption across Microsoft Azure.
Cosmos DB is a database of choice for the spectrum spanning from the hobbyist developer to the largest of Fortune 500 companies. The database provides the data backbone of many critical systems in Health Care, Retail, Telecommunications, IoT and many more where the Service Availability and Latency is paramount. Cosmos DB provides financially backed SLA (service level agreements) around 99.99 Availability and


We are looking for a self-driven Site Reliability Engineer (SRE) who likes taking a data driven and systems-based approach to solve Service Reliability problems. You will be responsible for building and optimizing solutions that can analyze massive amounts of telemetry and other Service Health indicators in near real time and perform automated root cause analysis and necessary mitigations to restore SLO's.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities
  • Collaborating closely with engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO's and averting incidents altogether when possible.
  • Collaborating with the customers to understand their pain points around Supportability and SLO attainment and formulate strategies for addressing recurring issues in a sustainable way.
  • Communicate on a technical level and be the single point of contact for interfacing with enterprise customers for handling service escalations and driving the issues to resolution.
  • Ability to design and implement any changes to service telemetry for the automation to consume if it is not already available.
  • Enhancing customer facing experience by proactive alerting based on utilization, trends, resource health, etc.
  • Analyze data and provide operational insights into customer experience to design and product teams, so that we can design features with Supportability in mind.
  • Embody our culture and values.
Qualifications

Required/Minimum Qualifications:
  • 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field.
Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
  • 2+ years of operational experience in improving Service Reliability, Availability and Performance.
  • 3+ years of experience running large scale cloud services.
  • Experience in Logic Apps and authoring Jupyter Notebooks.
  • Understanding of Observability and MELT implementation patterns for large-scale services.
  • Experience in analyzing, troubleshooting, and automating root cause analysis and mitigation of incidents impacting large-scale distributed systems.
  • Systematic problem-solving approach, coupled with communication skills and a sense of curiosity.
  • Ability to deal with the ambiguity associated with working in a fast-paced environment.
  • Influencing the product architecture and roadmap to make sure the customer-experienced supportability is always a key consideration when evolving the product.

#azdat #azuredata #SRE

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $102,100 - $202,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $133,800 - $219,200 per year.


Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer II in Redmond, WA vacancy
  • $165k - $230k

    Sr. Site Reliability Engineer (Starshield) Redmond, WA SpaceX was founded under the belief that a future where humanity is out exploring the stars...  ...regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (... 
    Suggested
    Permanent employment
    Temporary work
    Work at office
    Immediate start
    Monday to Friday
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Redmond, WA
    4 days ago
  • $165k - $230k

     ...with the ultimate goal of enabling human life on Mars. SR. SITE RELIABILITY ENGINEER (STARSHIELD) Starshield leverages the company’s Starlink technology...  ..., applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii... 
    Suggested
    Permanent employment
    Temporary work
    Work at office
    Immediate start
    Monday to Friday
    Weekend work

    United States Digital Space LLC

    Redmond, WA
    1 day ago
  • $104.03k - $155.84k

     ...Mobility (UTM). Applications Developer II - ManufacturingSystems Position Summary The...  ...production, quality, supply chain, and engineering functions. This role focuses on the development...  ...& Process Management Maintain accurate, reliable, and secure production data systems that... 
    Suggested
    Full time
    Temporary work
    Work at office
    Flexible hours

    Dormont Manufacturing Co

    Kirkland, WA
    1 day ago
  • Software Engineer II - Test Engineering Kymeta revolutionizes satellite communications through Intelligent Communications Platforms (ICPs). Our electronically steered flat panel antennas enable seamless communications on-the-move. Kymeta solutions serve government, military... 
    Suggested
    Worldwide
    Flexible hours

    Kymeta

    Redmond, WA
    4 days ago
  • $125k - $150k

    Site Reliability Engineer, Kubernetes Platform (Starshield) Redmond, WA SpaceX was founded under the belief that a future where humanity is out exploring...  ..., applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (green card holder), (iii)... 
    Suggested
    Permanent employment
    Temporary work
    Work at office
    Immediate start
    Monday to Friday
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Redmond, WA
    4 days ago
  • Madrona Venture Labs is looking for an Applications Developer II to enhance manufacturing systems and streamline operations through software development. This role requires strong technical skills in application development and experience in enhancing factory operations... 

    Madrona Venture Labs

    Kirkland, WA
    2 days ago
  • $125k - $140k

     ...in Redmond, Washington, is seeking an Automation Application Engineer II to support the design and implementation of industrial automation...  ..., collaborating with engineers to ensure systems operate reliably. Ideal candidates should possess a Bachelor's degree in engineering... 
    Work at office

    3MD Inc.

    Redmond, WA
    4 days ago
  •  ...manage AI resources on Microsoft Azure, including AI Foundry and RAG solutions Monitor and ensure service uptime, availability, reliability, and latency Track and integrate SRE metrics with enterprise monitoring systems Support CI/CD and DevOps workflows using... 

    Tech M USAAvance Consulting

    Redmond, WA
    2 days ago
  • $114.4k - $130k

     ...due to a disability, contact this employer to ask for an accommodation or an alternative application process. Java-AWS Software Engineer II 01 - Salary w/Benefits Professional Redmond, WA, US 6 days ago Requisition ID: 3638 Salary Range: $114,400.00 To $130,000.00 Annually... 
    Hourly pay
    Temporary work
    Work experience placement
    Work at office
    Local area

    3md

    Redmond, WA
    2 days ago
  • Dormont Manufacturing Co in Kirkland, WA is seeking an Applications Developer II to design and develop manufacturing systems applications. This role aims to enhance operational efficiencies across various manufacturing functions. The ideal candidate will have a background... 

    Dormont Manufacturing Company

    Kirkland, WA
    2 days ago
  • 3MD Inc. is seeking a Java-AWS Software Engineer II for their Redmond, WA location. The ideal candidate will design and maintain Java services in a fast-paced environment. This role demands strong knowledge of Java, cloud technologies, and software engineering principles... 

    3MD Inc.

    Redmond, WA
    2 days ago
  • Denali Advanced Integration is seeking a Java-AWS Software Engineer II based in Redmond, WA. The ideal candidate will have a strong background in Java service development and experience on large-scale software projects. You will design and maintain backend services while... 

    Denali Advanced Integration

    Redmond, WA
    4 days ago
  • $125k - $140k

     ...of Position: The Automation Application Engineer II (AAEII) is a mid-level engineering role...  ...develop solutions, and ensure systems operate reliably in customer environments. Essential...  ...to company facilities and customer sites for system deployment, occasionally on short... 
    Hourly pay
    Temporary work
    Work at office
    Local area

    3MD Inc.

    Redmond, WA
    4 days ago
  • $95.32k - $137.35k

     ...Job Description Job Description Mechanical Engineer II Sigma Design is a product development, engineering, and manufacturing firm. Based out of the Pacific Northwest, we offer innovative concept through production services to diverse clients around the globe. Sigma... 
    Work at office

    Sigma Design

    Kirkland, WA
    18 days ago
  • $125k - $150k

     ...possible, with the ultimate goal of enabling human life on Mars. SITE RELIABILITY ENGINEER — HPC & AUTOMATION (SILICON ENGINEERING) At SpaceX we’re...  ..., applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii... 
    Permanent employment
    Full time
    Temporary work
    Work at office
    Worldwide
    Monday to Friday
    Weekend work

    SpaceX

    Redmond, WA
    3 days ago
  • $165k - $242k

     ...Senior Site Reliability Engineer, Data Infrastructure CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers...  ...a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii)... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Bellevue, WA
    16 hours ago
  • $67.7k - $90.27k

     ...high complexity. The Software Developer II will play a critical role in maintaining...  ...customer domain functions, enabling data reliability, automation, and AI-ready data pipelines...  ...integration. ~ Proficiency in Data Engineering & Databases - PostgreSQL knowledge (queries... 
    Full time
    Temporary work
    Remote work
    Work from home

    Lumen

    Bellevue, WA
    1 day ago
  • $165k - $230k

    Sr. Hardware / Infrastructure Site Reliability Engineer (Starlink) - Redmond, WA SpaceX is developing the technologies to make space exploration possible, with an eye toward enabling human life on Mars. Our Starlink initiative powers the world’s largest satellite constellation... 
    Permanent employment
    Work at office
    Worldwide
    Monday to Friday
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Redmond, WA
    3 days ago
  •  ...Site Reliability Engineer Join the innovators connecting just about anything—from families to cars to now things—on T-Mobile's biggest and best network yet. The SyncUP Things platform team has an immediate need for a Site Reliability Engineer. Responsibilities:... 
    Contract work
    Immediate start
    Remote work

    Software Technology Inc

    Bellevue, WA
    2 days ago
  • $160k - $250k

     ...customers’ operations teams with a prioritized summary of the threats detected in their environments. This is a Sr. Software Development Engineer (SDE) role in the CTIO team that delivers the cutting edge innovation for our sensor (lightweight agent) for Linux OS platforms.... 
    Work experience placement
    Work at office
    Local area
    Remote work
    2 days per week

    Dormont Manufacturing Co

    Redmond, WA
    1 day ago
  • $160k - $250k

     ...customers’ operations teams with a prioritized summary of the threats detected in their environments.This is a Sr. Software Development Engineer (SDE) role in the CTIO team that delivers the cutting edge innovation for our sensor (lightweight agent) for Linux OS platforms.... 
    Work experience placement
    Work at office
    Local area
    Remote work
    2 days per week

    CrowdStrike Holdings, Inc.

    Redmond, WA
    1 day ago
  • $102.1k - $202.2k

     ...Microsoft and industry solutions. We are engineers, technology leaders and experts, digital transformation...  ...more! We are hiring a Service Engineer II to own administration and service...  ..., Kusto, or comparable telemetry and live site tooling. #MSD #MSDJOBS Service... 
    Ongoing contract
    Work at office
    Local area
    Relocation package
    2 days per week
    3 days per week

    Microsoft Corporation

    Redmond, WA
    2 days ago
  • $160k - $250k

    CrowdStrike, Inc. Full time R26580 As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed — we're here to stop breaches, and we've redefined modern security...
    Full time
    Work at office
    2 days per week

    Koitecc Solutions

    Redmond, WA
    16 hours ago
  • $79k - $113k

    Administrative Business Partner As an Administrative Business Partner, you're at the heart of your team's business operations and activities and the soul that keeps your team moving forward. You anticipate the needs of your managers and team members and help them stay...
    Work at office

    Google

    Kirkland, WA
    3 days ago
  • $163.62k - $212.71k

     ...maintaining the tools, platforms, and processes that improve our engineering teams' productivity and streamline the software development...  ...We are seeking a seasoned and strategic Lead/Principal Site Reliability Engineer to drive the reliability, scalability, and performance... 
    Full time
    Part time
    Work experience placement
    Work at office
    Local area
    Immediate start
    Remote work
    Work from home
    Flexible hours
    Shift work
    3 days per week
    1 day per week

    iSpot.tv

    Bellevue, WA
    1 day ago
  • Echodyne in Kirkland, Washington, is seeking an Electrical Engineer II specializing in Analog & Mixed-Signal design. This role involves contributing to advanced radar systems, leading technical discussions, and ensuring design quality throughout the product lifecycle. The... 
    Flexible hours

    Nari

    Kirkland, WA
    1 day ago
  • $194k - $267k

     ...something more than once, automate it" and who can rapidly self-educate on new concepts and tools. Position Overview: The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and... 
    Permanent employment
    Work at office
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    Bellevue, WA
    1 day ago
  • ManpowerGroup Global, Inc. is seeking a Service Engineer 2 in Redmond, WA, to join their Azure Storage support team. Candidates should have strong problem-solving skills and be well-versed in C# and related scripting languages like PowerShell and Python. This role requires... 
    Weekly pay
    Work at office

    ManpowerGroup Global, Inc.

    Redmond, WA
    2 days ago
  • $304k

     ...next level. We are hiring a Principal Engineer II to architect the core data processing engine...  ...Governance: Design and implement highly reliable, multi-tenant system internals that...  ...the job posting on the Snowflake Careers Site for salary and benefits information: careers... 

    Snowflake

    Bellevue, WA
    4 days ago
  • A leading product development firm in Kirkland, Washington is seeking a Mechanical Engineer II to design and analyze subsystems, contributing to cross-functional projects. The ideal candidate will have a BS in Mechanical Engineering and at least 2 years of relevant experience... 

    Sigma Design

    Kirkland, WA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer II. Be the first to apply!