Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior SRE / Senior Site Reliability Engineer (SRE)

$100k - $150k

skandasols

Orlando, FL

Hi Ninad,

Please upload and assign the below job to surya through ATS.

High-Priority!

Can submit candidate from any of these locations and have to work onsite. We have 1 position for this role at the moment.

243352

Site Reliability Engineer - Observability & Resilience
Local to HUBs specific locations (Glendale, Orlando, Seattle)

RECRUITER ADDITIONAL REQUIREMENT NOTES :

Orlando, FL - Recruiter Focus: Target senior SRE candidates with strong experience in reliability engineering, incident management, SLO/SLI implementation using Nobl9, Kubernetes, observability (OpenTelemetry, Grafana Cloud, AppDynamics), and AWS Well-Architected Framework reviews. Prioritize candidates who have led automation, chaos engineering, RCA-driven reliability improvements, and large-scale production resilience initiatives.

JOB TITLE : 

Senior SRE

SKILL CATEGORY :

Cloud: AWS

REQUIRED SKILLS :

Site Reliability Engineering (SRE) & Kubernetes Operations

WORK LOCATION :

Orlando, FL 

ONSITE / REMOTE :

Hybrid

SALARY :

$100000 - $150000 Yearly 
**It is expected that our partners will come in at market rate to ensure we can always be competitive.**

Contract / Direct Hire :

DURATION :

Full Time

MUST BE INCLUDED WITH SUBMITTAL :

  1. Full Legal Name
  2. Phone
  3. Email
  4. Current Location
  5. Rate
  6. Work Authorization
  7. Willing to relocate
  8. Confirm this candidate is on or will be on your W2

This opportunity is competitive and the required turnaround time for quality talent is rather slim. With that, please confirm whether or not you’ll have talent available for our review over the next 24-72 hours.

Please feel free to reach out if you need me to clarify the qualification criteria or the scope of responsibilities.

JOB DESCRIPTION : 

Job Title: Senior Site Reliability Engineer (SRE)

Overview / Summary

We are seeking a Site Reliability Engineer (SRE) with 8-10 years of experience to drive reliability, observability, and resilience improvements across critical systems. This is a high-impact, front-line operations role focused on real-time incident response, proactive prevention, continuous automation, and reliability engineering for Tier-1 business-critical applications.

Key Responsibilities

• Drive automation initiatives to improve system performance and operational efficiency.
• Improve application reliability and availability by proactively identifying and mitigating risks.
• Analyze production incidents and root cause analyses (RCAs) to eliminate recurring issues and reduce outages.
• Define and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets using Nobl9.
• Conduct reliability assessments across applications, infrastructure, Kubernetes, databases, networks, caching platforms, and cloud environments.
• Drive observability improvements using OpenTelemetry, Grafana Cloud, AppDynamics, Splunk, and monitoring best practices.
• Perform performance and scalability reviews to support current and future demand.
• Lead chaos engineering exercises using Gremlin or Harness Chaos Engineering.
• Review cloud architectures against AWS Well-Architected Framework standards and drive remediation of reliability gaps.
• Automate operational tasks and implement self-healing solutions.
• Identify and eliminate single points of failure (SPOFs) and strengthen disaster recovery and failover capabilities.
• Collaborate with Development, Infrastructure, Performance Engineering, and Operations teams to improve system resilience.
• Establish reliability governance, dashboards, runbooks, and continuous improvement processes.

Reliability Assessment & Engineering

• Conduct application reliability assessments using established reliability frameworks.
• Review historical incidents, Sev-1/Sev-2 RCAs, and recurring failure patterns.
• Identify reliability debt and drive remediation initiatives.
• Evaluate application readiness for SRE engagement.
• Perform end-to-end reliability reviews across application, infrastructure, network, and platform layers.
• Define reliability roadmaps and track improvement initiatives.

Incident Management & RCA

• Analyze incident trends using CSI or equivalent incident management platforms.
• Participate in Major Incident Management and Problem Management processes.
• Drive RCA reviews and corrective actions.
• Track reliability improvement initiatives resulting from postmortems.
• Reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR).

Service Level Management

• Define and implement SLIs.
• Establish SLOs and Error Budgets using Nobl9.
• Partner with Product and Engineering teams to define business-focused reliability targets.
• Build SLO dashboards and reliability scorecards.
• Monitor error budget consumption and enforce governance policies.
• Conduct reliability reviews based on SLO compliance.

Cloud & Platform Reliability

• Review cloud architectures against AWS Well-Architected Framework principles.
• Conduct reliability, performance, cost optimization, security, and operational excellence assessments.
• Identify High Risk Issues (HRIs) and drive remediation.
• Validate high availability, disaster recovery, backup, and failover capabilities.
• Ensure multi-AZ and multi-region deployment strategies are implemented where required.

Kubernetes & Infrastructure Reliability

• Review Kubernetes cluster health and workload configurations.
• Validate resource requests, limits, autoscaling, and resiliency patterns.
• Assess readiness, liveness, and startup probes.
• Review service mesh configurations, network policies, and traffic routing.
• Validate database high availability, caching strategies, and scaling configurations.
• Identify and eliminate single points of failure.

Observability & Monitoring

• Design and improve enterprise observability strategies.
• Implement OpenTelemetry-based telemetry collection.
• Manage metrics, events, logs, and traces (MELT).
• Integrate telemetry into Grafana Cloud, Splunk Observability, or equivalent platforms.
• Utilize AI-driven observability capabilities for anomaly detection and root cause analysis.
• Improve alert quality, reduce alert fatigue, and increase actionable monitoring coverage.
• Ensure every alert has an owner, runbook, and customer impact justification.

Application Performance Engineering

• Conduct dependency mapping and architecture reviews.
• Analyze latency, throughput, and scalability bottlenecks.
• Review timeout, retry, circuit breaker, and resilience patterns.
• Collaborate with Performance Engineering teams on load and stress testing.
• Validate system capacity against current and future traffic demands.
• Review Akamai CDN configurations, traffic routing, caching, and failover strategies.
• Ensure applications can sustain significant traffic spikes and peak loads.

Chaos Engineering & Resilience Testing

• Design and execute chaos engineering experiments using Gremlin or Harness Chaos Engineering.
• Simulate infrastructure, network, application, and dependency failures.
• Validate system behavior during failure scenarios.
• Establish reliability score baselines and improvement goals.
• Measure resilience against real-world production conditions.
• Document findings and implement corrective improvements.

Automation & Self-Healing

• Identify repetitive operational tasks suitable for automation.
• Develop self-healing workflows for common infrastructure and application failures.
• Automate alert remediation, scaling, recovery, and operational activities.
• Reduce manual intervention and operational toil.
• Improve platform efficiency through engineering-driven automation.

Required Qualifications

• 8-10 years of experience in Site Reliability Engineering.
• Experience with CSI for incident and RCA tracking.
• Experience with Nobl9 for SLO management.
• Experience with AppDynamics for application performance monitoring.
• Experience with OpenTelemetry and Grafana Cloud for telemetry and observability.
• Experience with Gremlin or Harness Chaos Engineering.
• Experience with Akamai CDN.
• Knowledge of AWS Well-Architected Framework.
• Experience with Kubernetes reliability, observability, incident management, automation, and resilience engineering.

#LI-ST1 #LI-Hybrid #Hiring 

Best Regards,

Swathi Goutham

Show phone number | ✉️ swathi @skandasols.com

Skanda Solutions LLC

105 Raider Boulevard, Suite 205, Hillsborough, NJ 08844

This email is not subject to a legally binding commitment. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and / or privileged material. Any review, retransmission, dissemination or other use of , or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior SRE / Senior Site Reliability Engineer (SRE) in Orlando, FL vacancy
  •  ..., contract job opportunity for a Senior Software Engineer with a major corporation in Lake...  ...engineers to design scalable and reliable systems. Systems Reliability...  ...in observability, monitoring, or site reliability engineering (SRE). Basic knowledge of AI/ML concepts... 
    Senior
    Contract work
    Temporary work
    Immediate start

    Software Resources

    Orlando, FL
    2 days ago
  • $130k - $175k

     ...A technology consultancy is seeking an experienced Release Train Engineer (RTE) in Orlando, FL. The role involves collaborating with various teams to ensure Agile transformation and solution delivery with a focus on continuous improvement. Candidates should have a Bachelor... 
    Senior
    Remote work

    PLANIT Group

    Orlando, FL
    2 days ago
  • $125k - $150k

     ...submission! Position: Cloud Platform Engineer Location: Orlando HQ - Remote,...  ...between cloud and on-site environments. Leverage AI tooling...  ...in Cloud Engineering, SRE, or DevOps role. Hands‑on experience...  ...replacing manual operations with reliable processes. Demonstrable... 
    Suggested
    Full time
    Temporary work
    Work experience placement
    Local area
    Remote work
    Flexible hours

    Fortressinfosec

    Orlando, FL
    4 days ago
  •  ...Site Reliability Engineer Job Locations US-FL-Orlando ID 2026-10931 # of Openings 1 Category Software Clearance Tier 3 - Secret/ADP II Company Overview By Light Professional IT Services LLC readies warfighters... 
    Suggested
    Contract work
    Temporary work
    Work experience placement
    Worldwide
    Flexible hours
    Shift work

    By Light Professional IT Services

    Orlando, FL
    1 day ago
  •  ...place to grow, make an impact, and work with people who care, we'd love to meet you! About the Role We are looking for a Site Reliability Engineer to ensure the reliability, security, and continuous operation of a multi‑cloud application security platform. This role combines... 
    Suggested
    Full time
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    AgileEngine

    Orlando, FL
    1 day ago
  •  ...ABOUT THE ROLE: FANDANGO is looking for a SR. SOFTWARE ENGINEER to build our next big thing in platform growth across Fandango...  ...services and tooling effectively. ~ Collaborate with SRE teams on reliability, observability, and incident response for systems you own.... 
    Senior
    Local area

    Versant Media

    Orlando, FL
    14 days ago
  •  ...you! ABOUT THE ROLE We are looking for a Senior Scrum Master to serve as the dedicated Agile facilitator for a global DevSecOps engineering program operating across LatAm, EU, and India...  ...Agile processes tailored for DevSecOps, SRE, and Data Science workflows — requiring... 
    Senior
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours
    Shift work

    AgileEngine

    Orlando, FL
    7 hours ago
  •  ...Fortress is seeking a Senior Software Engineer (JavaScript) in Orlando. The role involves designing and maintaining scalable software solutions to secure the software supply chain, collaborating closely with cross-functional teams to deliver high-quality solutions. Candidates... 
    Senior
    Remote work
    Flexible hours

    Fortress LLC

    Orlando, FL
    4 days ago
  •  ...Senior Reliability Engineer Location: Orlando, FL Job ID: #72473 Pay Range: $48-56 an hour 12 month contract DOD Secret Clearance REQUIRED To Start The Senior Reliability Engineer is responsible for conducting detailed Failure Modes, Effects, and Criticality Analyses (... 
    Senior
    Contract work
    For subcontractor

    Butler America

    Orlando, FL
    3 days ago
  •  ...Senior Software Engineer We are hiring a Senior Software Engineer to join the Workflow Engineering Team to develop applications and services that orchestrate and automate business process logic workflows for enterprise applications under the Cloud and Data Transformation... 
    Senior
    Local area
    Remote work

    RIT Solutions

    Orlando, FL
    1 day ago
  •  ...software solutions. You will work internally with other software engineers, program managers, and product owners to deliver quality...  ...: ~ Innovative and creative thinking Travel: Occasional customer site visits Optional attendance at conferences... 
    Senior

    OneArc

    Orlando, FL
    4 days ago
  •  ...seeking a highly skilled and motivated Senior Platform Engineer with a strong Linux and Windows...  ...operate our products efficiently and reliably. If you are passionate about creating...  ...Working Environment (3days per week on site) We don't offer just a job. We want... 
    Senior
    Full time
    Work at office

    OneArc

    Orlando, FL
    3 days ago
  •  ...to fill positions in the below categories. search results Senior Software Engineer- Front End Apply now Job no: 657635 Work type: Regular (Full...  ...responsible for defining, developing, and delivering secure, reliable, scalable, and optimized front end applications in... 
    Senior
    Full time
    Temporary work
    Work experience placement

    NBCUniversal

    Orlando, FL
    3 days ago
  •  ...Software Engineer Specializing In Real Time 3D Technologies Enludio is an advanced technology company driven by design-led engineering to deliver interactive experiences. We craft software solutions that integrate emerging technologies and human-centered design to... 
    Senior

    Enludio

    Orlando, FL
    1 day ago
  • $60 - $75 per hour

     ...Senior Software Security Engineer Southlake, TX (Hybrid) Alternate Locations: Orlando, FL or Omaha, NE Employment Type: Contract (12 months w/ potential to extend) The Application Security team, operating under the Chief Information Security Officer (CISO) organization... 
    Senior
    Contract work

    Apex Systems

    Orlando, FL
    4 days ago
  •  ...Now we're building the team that scales it. The Role Senior Engineer at Foundation Health means hands-on, high-ownership work....  ...ownership over the surface area you work on - correctness, reliability, the lot. What we're looking for Solid experience... 
    Senior
    Immediate start

    Foundation Health Global Inc

    Orlando, FL
    3 days ago
  • $163.4k - $219.1k

     ...Manager, Software Engineering Job ID: 10152534 | Location: Celebration, Florida & Orlando,...  ...It requires close partnership with SE, SRE, database, cybersecurity, Product, BSM and...  ...performance, and disaster recovery Drive service reliability and operational excellence, including... 
    Local area
    Worldwide

    The Walt Disney Company

    Orlando, FL
    7 hours ago
  •  ...A software development company is seeking an experienced iOS Software Engineer to join a dedicated team that enhances the mobile experience for millions. This role emphasizes technical leadership and collaboration, offering professional growth as you implement scalable... 
    Senior
    Remote work
    Flexible hours

    AgileEngine

    Orlando, FL
    4 days ago
  •  ...Software Engineer Join Lockheed Martin's dynamic team working on the cutting edge of aerospace technology. We provide training systems for the world's most advanced aircraft, including the F-35 Joint Strike Fighter. At Training, Logistics and Simulation (TLS) Integrated... 
    Senior
    Full time
    Interim role
    Work at office
    Remote work
    3 days per week

    Navstar

    Orlando, FL
    1 day ago
  •  ...Description Fandango is looking for Senior level Front End Software Engineer. As a Software Engineer working on our customer facing web sites, you will work with a team of other...  ...Software Engineer- Observability and Reliability Platform Engineering (REMOTE) We’re... 
    Senior
    Full time
    Local area
    Remote work

    Fandango

    Orlando, FL
    7 hours ago
  •  ...Senior Software Engineer India Bangalore - Orlando, FL 32809 Description The Senior Engineer is a full-stack developer proficient at working in all layers of complex software applications. This engineer works on an agile product development team as a hands-on... 
    Senior
    Work experience placement

    Outcomes Operating, Inc.

    Orlando, FL
    5 days ago
  •  ...Sr Software Engineer At Disney, we're storytellers. We make the impossible, possible. The Walt Disney Company (TWDC) is a world-class entertainment and technological leader. Walt's passion was to continuously envision new ways to move audiences around the world—a passion... 
    Senior
    Work experience placement

    The Walt Disney Studios

    Orlando, FL
    1 day ago
  •  ...Senior Software Engineer Bering-Alaka`ina Holdings (BAH) is looking for a motivated Senior Software Engineer to support an emerging Cyber Training program for the Department of Defense located in Orlando, Florida (not a remote position). Seeking a motivated software... 
    Senior
    Interim role

    Alakaina Family of Companies

    Orlando, FL
    1 day ago
  •  ...Senior Software Engineer At Disney, we're storytellers. We make the impossible, possible. The Walt Disney Company is a world-class entertainment and technological leader. Walt's passion was to continuously envision new ways to move audiences around the world—a passion... 
    Senior
    Work experience placement

    Disney

    Orlando, FL
    2 days ago
  •  ...in each story. Maintain a microservice architecture and platform-based development. Lead the technical operations team of Software Engineers in managing maintenance of AssistRx technology solution products and applications. Resolve customer solution software application... 
    Senior
    Temporary work
    Local area
    Immediate start
    Remote work

    AssistRx

    Orlando, FL
    1 day ago
  •  ...Job Description: The Lockheed Martin Artificial Intelligence Center (LAIC) is seeking a highly skilled engineer, a role that requires a unique blend of technical expertise in software engineering and AI/ML, as well as strong business acumen and communication skills... 
    Senior
    Remote work
    Flexible hours

    Lockheed Martin Corporation

    Orlando, FL
    2 days ago
  •  ...Senior Software Engineer - C++ Full-time Company Description ComTec is an expert IT solutions provider offering a wide range of services to enterprises and government agencies worldwide. We leverage our deep domain expertise, strong technological knowledge, proven... 
    Senior
    Full time
    Worldwide

    Comtec Information Systems

    Orlando, FL
    1 day ago
  •  ...C2C details • Position: Senior Embedded Software Engineer Must have Active SECRET security clearance Location: Orlando, FL - 100% onsite Duration: 12+ month contract Interview: 1-2 video conference interviews with coding US only... 
    Senior
    Contract work
    Monday to Thursday
    Day shift

    3B Staffing LLC

    Orlando, FL
    1 day ago
  • $80k - $130k

     ...Senior Software Developer We're hiring an experienced and motivated...  ..., ensuring high performance, reliability, and maintainability....  ...improve processes, and drive engineering excellence. About You...  ...Privacy Notice on our Careers site to know more about how we collect... 
    Senior
    Temporary work
    Flexible hours

    Atkins Realis

    Orlando, FL
    3 days ago
  •  ...Senior DevOps Engineer At VyStar, we offer competitive pay, an excellent benefit package that includes a 401(k) Plan, an extensive paid technical and on-the-job training program, and tuition reimbursement--available to all full and part time employees. Part time positions... 
    Senior
    Full time
    Contract work
    Part time

    VyStar Credit Union

    Orlando, FL
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior SRE / Senior Site Reliability Engineer (SRE). Be the first to apply!