Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

SRE Support Engineer - Observability

Gigster

Role Overview The Observability & Tools Support Engineer provides high-impact technical support for customers of a large technology company’s internal IaaS platform, with a focus on monitoring, alerting, telemetry, and operational tooling. This role spans a wide range of support—from white-glove onboarding and end-to-end customer enablement, to deep technical troubleshooting across Linux, networking, and observability systems (especially Prometheus and AlertManager). You will also contribute to improving the support function itself: strengthening tooling, documentation, workflows, and feedback loops so the service scales. Success depends on excellent troubleshooting, strong written communication, comfort working with highly technical customers, and the maturity to identify patterns and drive operational improvements beyond individual ticket resolution. Business Outcome Become a trusted frontline expert for the customer’s observability ecosystem and operational tooling - delivering fast, accurate support across Slack and tickets, improving monitoring reliability, and reducing incident impact through better triage, troubleshooting, onboarding, and knowledge capture. Success Measures Healthy volume of threads and tickets handled with high-quality outcomes Consistent achievement of time-based SLAs High customer satisfaction through surveys Accurate classification of issue type, severity, and recurring patterns Reduced repeat issues through better docs, tooling, and scalable onboarding What Will Be True When You Succeed Customers can onboard smoothly to monitoring/alerting with minimal friction Monitoring and alerting issues are resolved quickly, with fewer escalations Linux and networking-related incidents reach resolution faster due to strong troubleshooting and clean handoffs Engineering and SRE teams receive clear, actionable feedback based on real customer trends Knowledge base content prevents tickets and accelerates self-service Core Work Units Frontline Support for Observability & Tooling Manage Slack threads and tickets (roughly 50/50) Handle a broad range of customer support: simple issue resolution through end-to-end onboarding Provide clear, structured guidance to highly technical customers Maintain strong attention to detail while managing multiple interactions in parallel Deep-Dive Troubleshooting & Incident Support Troubleshoot, isolate, and resolve monitoring and alerting issues (especially Prometheus + AlertManager) Troubleshoot complex Linux and networking issues (TCP/IP fundamentals required) Support OpenTelemetry, tracing, and telemetry pipelines, including investigation of gaps in signals and instrumentation Drive incidents to resolution in partnership with Engineering/SRE teams Documentation & Knowledge Development Build and maintain customer-facing and internal knowledge base articles Create informational posts for the community support platform Turn repeated issues into reusable guides, checklists, and onboarding playbooks Trend Analysis & Feedback to Engineering Analyze and categorize customer interaction trends Provide accurate, meaningful feedback to Engineering and SRE orgs to improve product/tooling Identify “top offenders” and propose practical fixes (tooling, docs, process, product) Operational Excellence & Continuous Improvement Participate in post-mortem reviews and drive follow-through on improvements Contribute meaningfully to team objectives and goals (process, tooling, and service scaling) Bring creativity and discretion to resolve highly complex issues “outside the box” High-Quality Work - what top performance looks like Frontline Support Moves smoothly from triage to deeper analysis without losing the customer Communicates clearly and confidently with technical users Maintains clean follow-ups and thread hygiene even with high context switching Troubleshooting Rapidly isolates issues across monitoring/alerting configs, Linux runtime behavior, and network connectivity Uses structured approaches to incident handling: hypothesis → test → evidence → resolution Produces high-signal writeups that accelerate downstream resolution Documentation & Enablement Documentation is clear enough that customers avoid opening tickets Onboarding flows reduce time-to-value and prevent common misconfigurations Captures “tribal knowledge” quickly and makes it reusable Operational Excellence Obsessing over details: correct severity, accurate tagging, clean timelines, strong handoffs Spots patterns early and proactively proposes improvements that scale support Typical Day / Work Patterns ~50% Slack support, ~50% ticket handling Deep-dive investigations during lower ticket volume periods Documentation writing and lightweight tooling/process improvements when patterns emerge Weekly team review of escalations, themes, and operational improvements High rate of context switching and parallel issue management Required Skills & Experience (Non-Negotiable) Several years supporting highly scalable applications and web services Hands-on experience with open-source observability and cloud-native tooling, including: Kubernetes (and container fundamentals) Prometheus and AlertManager troubleshooting OpenTelemetry and distributed tracing concepts Strong understanding of the Linux operating system (command line, process/network debugging, logs) Good understanding of infrastructure observability principles (signals, alerting strategy, SLO thinking, noise reduction) Good understanding of the TCP/IP suite and practical networking troubleshooting Strong experience troubleshooting ambiguous, multi-layer issues Excellent analytical capability and strong attention to detail Strong written and verbal communication (clear, structured, customer-friendly) Comfortable working with a very technical customer base Passion for Technical Support and a service mindset Nice-to-Haves Experience improving or supporting internal support tooling or workflows (automation, templates, runbooks) Experience operating at scale in a services environment (pattern detection, KPI/SLA awareness, operational process maturity) Familiarity with Grafana, log aggregation, incident tooling, and production support practices Prior SRE or platform support experience Minimum Qualifications 3–7+ years in Technical Support Engineering, SRE support, DevOps, Platform Support, or similar Demonstrated experience supporting distributed systems, IaaS, or cloud platforms Strong Linux, troubleshooting, and customer-facing communication background Evidence of documentation, knowledge-base contributions, and process improvement mindset Disqualifiers: weak Linux fundamentals, inability to troubleshoot systematically, poor written communication, or discomfort supporting highly technical users. What You’ll Love Real technical problem solving with tangible customer impact A role that blends deep troubleshooting with scaling support via docs, tooling, and process High autonomy in a remote-first environment What May Be Challenging High context switching and managing multiple threads in parallel Repeated patterns that require discipline to convert pain into scalable improvements Supporting high-visibility systems where speed and accuracy matter Differentiation Industry: Remote-first, trust-based culture; global team; autonomy; modern systems; meaningful technical challenges Internal: High-impact, customer-facing observability support; direct influence on tooling and process maturity; opportunity to shape scalable support practices #J-18808-Ljbffr Gigster

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the SRE Support Engineer - Observability in Austin, TX vacancy
  • A leading technology company is looking for an Observability & Tools Support Engineer to provide high-impact technical support for its internal IaaS platform. This role involves monitoring, alerting, and troubleshooting, with a focus on helping customers seamlessly onboard... 
    Suggested
    Remote job

    Gigster

    Austin, TX
    23 hours ago
  • About the Role We are looking for a Senior SRE to join our Platform Engineering team as the operations owner of our observability platforms. You’ll be responsible for the reliability...  ...on steady‑state operations and platform support, and the other half on engineering projects... 
    Suggested

    Dimensional Fund Advisors

    Austin, TX
    1 day ago
  •  ...requires exceptional tooling and infrastructure. As a Software Engineer - Observability & Debugging , you will strengthen our team’s ability to...  ...Build and maintain observability tooling that supports debugging and root-cause analysis of robot performance in real... 
    Suggested

    Sunroom Rentals

    Austin, TX
    1 day ago
  • Teza Technologies is looking for a DevOps Engineer to own and evolve our infrastructure platform. The role demands strong proficiency...  ...orchestration, with responsibilities including defining observability approaches and managing compute orchestration. The successful... 
    Suggested
    Flexible hours

    Teza Technologies

    Austin, TX
    1 day ago
  • Upstart is seeking a Senior Software Engineer focused on Site Reliability Tooling. This role involves enhancing the reliability and observability of our production systems while working closely with other engineers at Upstart. Qualifications include a minimum of 6 years... 
    Suggested
    Remote job

    Upstart

    Austin, TX
    1 day ago
  • Mission Support Software Engineer Provide on-site and remote software support to keep deployed autonomous systems mission-ready. Location: Austin...  ...customer feedback, operational issues, and mission observations. Develop structured internal updates and reports based on... 
    Permanent employment
    Remote work

    jobs.frontdoordefense.com - Jobboard

    Austin, TX
    23 hours ago
  • $47.85 - $57.85 per hour

     ...to be considered for employment opportunities with Accenture and have accommodation needs such as for a disability or religious observance, please call us toll free at (***) ***-**** or send us an email or speak with your recruiter. Equal Employment Opportunity Statement... 
    Hourly pay
    Work experience placement
    Live in
    Work at office
    Local area
    Flexible hours

    Accenture

    Austin, TX
    3 days ago
  •  ...location(s). Workplace Services Engineering (WSE) is an organization...  ...on a major transformation. We support Workplace Services, and we’re...  ...Cloud Foundry environments. Observability: Use Splunk and Grafana for log...  ..., and recovery automation. SRE Practices & Observability:... 
    Work at office

    Charles Schwab

    Austin, TX
    3 days ago
  • A technology consulting firm is searching for an experienced Observability Engineer specializing in Splunk IT Service Intelligence (ITSI). The successful candidate will be responsible for designing, implementing, and optimizing enterprise observability solutions, with... 
    Remote job

    Conducive

    Austin, TX
    23 hours ago
  • $130k - $195k

    About the Role We’re hiring a Technical Support Engineer to lead our customer support experience...  ...applications and agents, improve AI observability and evaluations, and unblock critical issues...  ...support, solutions engineering, or SRE roles in deeply technical B2B environments... 
    Remote work

    LangChain

    Austin, TX
    4 days ago
  • $152k - $241.5k

    Senior Site Reliability Engineer - HPC page is loaded## Senior Site...  ...intelligence.We’re looking for a Senior SRE to join our Compute Farm team...  ...experience building and supporting critical services.*...  ...reliability/auto-healing, E2E observability or data-driven operations (AIOps... 

    NVIDIA Corporation

    Austin, TX
    3 days ago
  •  ...globe, and we’re honored to support first responders. And this is...  ...seeking a Senior Site Reliability Engineer who can own our data tier at...  ...to the broader platform, observability with Prometheus, Loki, and Tempo...  ...what that looks like for an SRE and excited to help shape it.... 
    Permanent employment
    Local area
    Flexible hours

    Zello

    Austin, TX
    4 days ago
  • Sr. Software Engineer - Site Reliability About ShipperHQ: ShipperHQ is a trusted leader...  ...AWS, DevOps practices, and automation to support and improve our complex cloud...  ...available systems in AWS Build and maintain observability, monitoring, and logging systems Support... 
    Full time
    Work at office

    Zowta, LLC

    Austin, TX
    23 hours ago
  • Site Reliability Engineer, Enterprise Technology Services Austin, Texas...  ...play a crucial role in supporting the Apple ecosystem by offering...  ...Reliability and Operations Engineer (SRE), you’ll be part of the...  ...of SRE principles, including observability, error budgeting, service... 

    Apple Inc.

    Austin, TX
    2 days ago
  • Site Reliability Engineer, Teamcenter, Enterprise Technology Services Austin, Texas, United...  ...and Services Description As an SRE, you will play a key role in ensuring the...  ...best practices. Responsibilities System Observability: Implement and maintain robust observability... 

    Apple Inc.

    Austin, TX
    23 hours ago
  •  ...looking for a Senior Site Reliability Engineer to join our SRE team in the Platform Engineering...  ...users. The role focuses on automation, observability, and ensuring the quality and...  ...as a scalable cloud service in AWS, supporting millions of user endpoints. Location... 
    Permanent employment
    Remote work
    Work from home
    Flexible hours

    NinjaOne

    Austin, TX
    4 days ago
  • $98.58k - $138.02k

     ...TX; Irvine, CA; or Akron, OH. Role Site Reliability Engineer II will be responsible for supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure...  ...evolve monitoring tools and platforms to improve observability. Promote and apply best practices for reliability,... 
    Work at office

    Restaurant365

    Austin, TX
    3 days ago
  •  ...a passionate Site Reliability Engineer to join our team in Dallas, TX or Austin, TX. As an SRE you will design and develop tooling...  ...will design infrastructure to support our massive growth and work...  ...toil, and improve system observability. Defining and driving the adoption... 
    Local area

    Traveltechessentialist

    Austin, TX
    4 days ago
  •  ...match. The role We’re looking for a Senior SRE to own the reliability, scalability, and...  ...Build and maintain CI/CD pipelines, observability stacks, and incident response workflows...  ...development workflows Partner closely with engineering on reliability reviews and architecture... 

    Satsuma

    Austin, TX
    23 hours ago
  •  ...world. We’re looking for a Senior Site Reliability Engineer to help build and scale a high-impact SRE function. You’ll be a technical leader on a team...  ...to guide engineering priorities Design and develop observability systems (metrics, logging, tracing, alerting) that... 

    Elea Ecuador

    Austin, TX
    2 days ago
  •  ...performance. Implement best practices for observability, automation, and incident response. Ensure...  ...k) Get notified about new Site Reliability Engineer jobs in Austin, Texas Metropolitan Area . Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote... 
    Full time
    Remote work

    Altimetrik

    Austin, TX
    1 day ago
  • $106.61k - $284.28k

     ...CVS Health as a Sr. Manager, Frontline Support Engineering to lead our organization's efforts to...  ...Salesforce Service Cloud). Experience with Observability & Monitoring Tools such as AppDynamics...  ...Qualifications Experience in IT, SRE, DevOps, or Software Engineering.... 
    Hourly pay
    Full time
    Temporary work
    Work experience placement
    Local area

    Hispanic Alliance for Career Enhancement

    Austin, TX
    4 days ago
  •  ...like you to help us! THE ROLE: At FloSports, SRE is the team that acts as a force multiplier for our engineering organization. Our mission is to be the "wind in...  ...secure IaC frameworks, not just consumed them. Observability Architect: You have designed and implemented observability... 
    Temporary work
    Immediate start
    Flexible hours

    FloSports, Inc.

    Austin, TX
    23 hours ago
  • $70k - $130k

     ...automation tooling. You will be part of a very creative team working directly with our organization’s software and quality assurance engineers to enable high quality software delivery and improve quality of service. We are seeking a versatile and adaptable professional who... 
    Shift work

    Tata Consultancy Services Limited

    Austin, TX
    1 day ago
  • Site Reliability Engineer (Edge Services), Infrastructure Services Austin, Texas, United States...  ...services are resilient, scalable, and observable, bridging the gap between complex...  ...experiences. Description As a key member of the SRE team, your mission is to treat operations... 
    Shift work

    Apple Inc.

    Austin, TX
    1 day ago
  • Teza Capital Management LLC is seeking a DevOps Engineer in Austin, Texas, to enhance our infrastructure platform for systematic trading. You will drive the evolution of tools and systems that our quant and trading teams rely on daily. The ideal candidate has a strong... 
    Flexible hours

    Teza Capital Management LLC

    Austin, TX
    1 day ago
  •  ...respond for operational issues. # You will help lead chaos engineering efforts in a production-alike environment, exposing...  ...and to measure and improve the reliability, scalability, observability, supportability, and performance of Teradata software. # You will become... 
    Permanent employment
    Flexible hours

    Teradata

    Austin, TX
    17 days ago
  •  ...-time position in Austin, Texas. The ideal candidate will have strong experience in Linux administration and Site Reliability Engineering (SRE), along with proficiency in GitHub, Kubernetes, CI/CD methodologies, and AWS. This role is vital to the Information Technology... 
    Full time

    Programmers.io

    Austin, TX
    1 day ago
  •  ...Technical Support Engineer Technical Support Engineers (TSE) partner with engineers and scientists to ensure their success through deep technical knowledge of Products, Platforms and Systems. TSEs respond to and anticipate technical needs to help maintain or accelerate... 
    Full time
    Temporary work
    Work experience placement
    Flexible hours

    Emerson Electric

    Austin, TX
    23 hours ago
  • Saronic Technologies is seeking a Mission Support Software Engineer to provide on-site and remote software support for autonomous systems. This role is crucial in ensuring operational success by assisting with system stability, troubleshooting, and customer operations.... 
    Remote job

    jobs.frontdoordefense.com - Jobboard

    Austin, TX
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to SRE Support Engineer - Observability. Be the first to apply!