Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Engineering - SRE Platforms - Site Reliability Engineer - Vice President - Dallas

Goldman Sachs

Job Description

Site Reliability Engineer - Vice President

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run scalable, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for improving the availability and reliability of the firm's most critical platform services and ensures they meet the requirements of our internal and external users. It is also responsible for firmwide policies and standards focused on firm's digital resilience. We are looking for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.

The SRE team develops and maintains platforms and tools which help other Engineering teams in Goldman Sachs to build and operate reliable and resilient systems. These systems span on-premises datacenters and multiple public cloud environments. The platforms we offer include central logging, monitoring, agents and alerting and we provide tools to drive adoption and improvements to capacity planning, operational readiness assessments, production incident postmortems, SLIs / SLOs, and deployment automation including canary releases.

The products and services we provide to our internal customers are used by thousands of engineers every day. We believe that reliability is the most important feature of any system, and we are devoted to giving our engineers the platforms and tools they need to build and operate reliable products.

Role Overview
As a Site Reliability Engineer (SRE) at Goldman Sachs, you will be a pivotal leader in ensuring the availability, reliability, and scalability of the firm's most critical platform applications and services. You will combine deep software and systems engineering expertise to architect, build, and run large-scale, massively distributed, fault-tolerant systems. This role involves providing technical leadership, mentoring senior engineers, and collaborating closely with internal teams and executive stakeholders to build and operate sustainable production systems that can adapt to our dynamic global business environment. You will drive a culture of continuous improvement, championing the adoption of advanced SRE principles and best practices across the organization.

Responsibilities
  • Strategic Reliability & Performance: Drive the strategic direction for availability, scalability, and performance of mission-critical applications and platform services, ensuring alignment with firm-wide objectives.
  • Architectural Leadership: Lead the design, build, and implementation of highly available, resilient, and scalable infrastructure and application architectures.
  • Advanced Automation & Tooling: Architect and develop sophisticated platforms, tools, and automation solutions to eliminate toil, optimize operational workflows, and enhance deployment processes across the enterprise.
  • Complex Incident Management & Post-Mortem Analysis: Lead critical incident response, conduct in-depth root cause analysis for systemic issues, and implement long-term preventative measures to significantly enhance system stability and resilience.
  • System Design & Capacity Planning: Partner with development teams to embed reliability into application design from inception, provide expert system design consulting, and lead comprehensive capacity planning initiatives for future growth.
  • Observability & Insights: Define and implement advanced monitoring, high volume logging with multi-user query capabilities, and tracing strategies to provide deep, actionable insights into application performance, infrastructure health, and user experience.
  • Technical Vision & Mentorship: Provide technical vision, lead complex technical projects, conduct rigorous code reviews, enforce SDLC best practices, and actively mentor and develop senior and staff-level engineers.
  • Technology Evaluation & Adoption: Stay at the forefront of industry trends and advancements, evaluating and integrating cutting-edge tools and frameworks to significantly improve operational efficiency and reliability.
  • On-Call Leadership: Participate in and lead on-call rotations, providing expert guidance and hands-on support for critical system incidents.
Qualifications
  • Experience: Minimum of 6+ years of hands-on experience in Site Reliability Engineering, with a proven track record in architecting, designing, building, and maintaining highly available, scalable, and fault-tolerant systems at an enterprise level.
  • Technical Proficiency:
    • Exceptional programming skills in one or more major languages such as Java, Python, Go with a focus on building robust, scalable software.
    • Extensive hands-on experience with cloud platforms (e.g., AWS, GCP) and deep expertise in containerization and orchestration technologies (e.g., Docker, Kubernetes).
    • Mastery of Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and configuration management tools (e.g., Puppet, Chef, Ansible).
    • Advanced proficiency in Prompt Engineering and Retrieval-Augmented Generation (RAG) architectures to automate complex SRE workflows, such as the generation of Infrastructure as Code (IaC), dynamic runbooks, and incident response summaries.
    • Profound understanding of Linux internals, networking, distributed systems, and advanced system performance tuning.
    • Expertise in designing and implementing comprehensive monitoring, alerting, logging and tracing solutions (e.g., Prometheus, Grafana, ELK stack, Datadog, PagerDuty).
    • Deep experience with CI/CD tools and practices (e.g., Jenkins, GitLab, Maven).
    • Strong foundation in databases and distributed systems.
    • Exceptional problem-solving abilities and analytical skills, with a track record of resolving complex technical challenges.
  • Preferred Experience:
    • Experience with Distributed Databases like Elastic Search
    • Experience with working on GCP Big Query
    • Experience with messaging Systems Like Kafka
  • Education: Advanced degree (Bachelor's or Mas ter's or PhD) in Computer Science or a related technical field involving coding and/or systems engineering, or equivalent practical experience.
  • Soft Skills: Superior communication, collaboration, and interpersonal skills, with the ability to influence technical direction, lead cross-functional initiatives, and effectively engage with global teams and executive leadership. Proven ability to work independently, manage multiple complex stakeholders, and drive significant organizational change.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Engineering - SRE Platforms - Site Reliability Engineer - Vice President - Dallas in Dallas, TX vacancy
  •  ...Site Reliability Engineer - Vice President Site Reliability Engineering (SRE) is an engineering discipline that combines software and...  ...of the firm's most critical platform services and ensures they meet...  ...Category Vice President Locations Dallas, TX, United States... 
    Suggested

    The Goldman Sachs Group, Inc.

    Dallas, TX
    2 days ago
  •  ...Endpoint Compute Sre Lead The Workplace Engineering organization is responsible...  ...for the reliability, resilience, and...  ...endpoint compute platforms and services, including...  ...function applies Site Reliability...  ...Job Category Vice President Locations Dallas, TX, United States... 
    Suggested
    Permanent employment

    The Goldman Sachs Group, Inc.

    Dallas, TX
    3 days ago
  • Compliance Engineering, Site Reliability Engineering, Vice President, Dallas Job Description We are Compliance Engineering, a global...  ...build and operate a suite of platforms and applications that prevent,...  ...application portfolio. SRE at Goldman Sachs combines software... 
    Suggested
    Full time
    Work at office

    Goldman Sachs Group, Inc.

    Dallas, TX
    19 hours ago
  •  ...Cloud SRE Engineer - Associate Goldman Sachs Engineers are innovators...  ...seeking a motivated Cloud Site Reliability Engineer (SRE) to support...  ...analysis. Cloud Platforms: Advanced knowledge of AWS...  ...21/2026, 10:07 PM Locations Dallas, TX, United States... 
    Suggested

    Goldman Sachs

    Dallas, TX
    4 days ago
  •  ...Site Reliability Engineer - SRE Now100 is committed to understanding our clients' needs and providing...  ...Engineer Location: Atlanta, GA OR Dallas OR Austin, TX Duration: Long Term...  ...engineer experience on Google platform. Strong experience in Google Cloud... 
    Suggested
    Long term contract
    Permanent employment
    Contract work
    Remote work

    Now100

    Dallas, TX
    3 days ago
  •  ...seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our SRE foundation from the ground up...  ...culture that will support our platform serving millions of users and billions...  ...with in-person meeting in Dallas, TX on need basis... 
    Remote work

    Infinite Choice

    Dallas, TX
    2 days ago
  •  ...Management- Lead Software Engineer-Vice President-Dallas Job Description Who We Look...  ...design. Wealth Core Custody Platform manages client asset position...  .... Partner closely with SRE and production support engineering...  ...offer state‑of‑the‑art on‑site health centers in certain... 
    Full time
    Work at office

    Goldman Sachs Group, Inc.

    Dallas, TX
    19 hours ago
  •  ...Management- Lead Software Engineer-Vice President-Dallas Job Description Who We Look...  ...design. Wealth Core Custody Platform manages client asset...  ...applications. Partner closely with SRE and production support...  ...offer state‑of‑the‑art on‑site health centers in certain offices... 
    Full time
    Work at office

    Goldman Sachs Group, Inc.

    Dallas, TX
    3 days ago
  •  ...Goldman Sachs, our Engineers don’t just...  ...system reliability, scalability,...  ...experience in DevOps, Site Reliability...  ...experience with cloud platforms, particularly...  ...in DevOps and SRE. ABOUT...  ...Category Senior Vice President Posting Date...  ...PM Locations Dallas, Texas, United... 
    Full time
    Work at office
    Immediate start

    Goldman Sachs Group, Inc.

    Dallas, TX
    2 days ago
  • Compunnel, Inc. is seeking a Senior Cloud Engineer to join the Cloud SRE team in Dallas, Texas. In this role, you will design and develop cloud solutions, ensuring platform reliability and engineering reliability tools. The ideal candidate will have over 7 years of software... 

    Compunnel, Inc.

    Dallas, TX
    1 day ago
  • Role: Senior SRE Engineer Location: Washington DC - Hybrid Job Description...  ...enterprise observability platform. This is a high-impact role...  ...Grail to drive proactive reliability, mentoring cross-functional...  ...Flexibility: Ability to work on-site in the Washington, DC area... 
    Work from home
    Flexible hours

    Vytwo

    Dallas, TX
    19 hours ago
  • Site Reliability Engineer (Chicago, IL; Dallas, TX; ...) Qualifications: 8+ years of Software Engineering experience, or equivalent demonstrated through...  ...scalable and reliable infrastructure on Google Cloud Platform (GCP) for Snowflake data warehousing. Monitor, troubleshoot... 
    Contract work
    For contractors
    Work experience placement

    Cedent

    Dallas, TX
    1 day ago
  •  ...Job Overview Risk -Dallas - Vice President, Software Engineering - 767676 location_on Dallas, TX, United States Responsibilities...  ...financial businesses and internal platforms in innovative and disruptive ways....  ...We also offer state-of-the‑art on‑site health centers in certain offices.... 
    Full time
    Temporary work
    Work at office

    Goldman Sachs Bank AG

    Dallas, TX
    1 day ago
  •  ...SRE Operations Engineer Dallas TX or Overland Park, KS (Hybrid) System & Infrastructure Monitoring Runbook Execution Incident Triage...  ...enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo,... 
    For contractors
    Night shift

    HCL Global Systems

    Dallas, TX
    2 days ago
  •  ...Cloud Platform Engineer At Goldman Sachs, our Engineers don't just make things – we make things possible...  ...CloudFormation. Job Info: Job Identification 161901 Job Category Vice President Posting Date 02/11/2026, 04:35 PM Locations Dallas, TX, United States... 
    Work experience placement

    Goldman Sachs

    Dallas, TX
    7 days ago
  •  ...Asset & Wealth Management-AI Solutions Engineer-Vice President-Dallas location_on Dallas, Texas, United States Job Title: AI Solutions Engineer...  ...— migrating legacy on‑premises workloads to cloud‑native platforms, delivering Lakehouse architecture on AWS, and embedding... 

    Goldman Sachs Bank AG

    Dallas, TX
    19 hours ago
  •  ...Job Duties: Vice President, Software Engineering with Goldman Sachs Bank USA in Dallas, Texas.Multiple positions available. Develop, enhance, support and maintain the...  ...transform our financial businesses and internal platforms in innovative and disruptive ways. Work in all... 

    The Goldman Sachs Group

    Dallas, TX
    4 days ago
  •  ...Job Duties: Vice President, Software Engineering with Goldman Sachs & Co. LLC in Dallas, Texas. Develop, enhance, support, and maintain the Firm's software solutions in...  ...transform our financial businesses and internal platforms in innovative and disruptive ways. Work in all... 
    Work experience placement

    The Goldman Sachs Group

    Dallas, TX
    4 days ago
  •  ...Compliance-Dallas-Vice President-Software Engineering Job Description Are you passionate about delivering mission...  .... We: build and operate a suite of platforms and applications that prevent, detect...  ...We also offer state‑of‑the‑art on‑site health centers in certain offices.... 
    Full time
    Work at office

    Goldman Sachs Group, Inc.

    Dallas, TX
    19 hours ago
  •  ...Job Duties: Vice President, Software Engineering with Goldman Sachs Services LLC in Dallas, TX. Design, develop and support data pipelines and highly scalable APIs (Application...  ..., Amazon Web Services (AWS), or Google Cloud Platform (GCP). Design, implement and enforce data... 

    The Goldman Sachs Group

    Dallas, TX
    1 day ago
  •  ...application inventory systems. Our platforms are used firm-wide by all...  ...of users across our engineering organization. The...  ...platforms Ensure the reliability, scalability and performance...  ...Identification 175187 Job Category Vice President Locations Dallas, TX, United States... 

    Goldman Sachs

    Dallas, TX
    4 days ago
  •  ...Job Duties: Vice President, Software Engineering with Goldman Sachs Services LLC in Dallas, TX. Design, develop and support data pipelines...  ...(AWS), or Google Cloud Platform (GCP). Design, implement and...  ...also offer state-of-the-art on-site health centers in certain offices... 
    Full time
    Temporary work
    Work at office

    Goldman Sachs Bank AG

    Dallas, TX
    1 day ago
  •  ...Corporate Treasury-Dallas-Vice President-Quantitative Engineering Job Description About Corporate Treasury Corporate Treasury manages the firm’s liquidity...  ...Ergonomics Program. We also offer state-of-the-art on-site health centers in certain offices. Fitness To encourage... 
    Full time
    Work at office

    Goldman Sachs Group, Inc.

    Dallas, TX
    13 hours ago
  •  ...AI Engineer We are seeking an AI Engineer with...  ...performance, scalability, and reliability in distributed and...  ...with AWS Bed Rock platform especially using AWS...  ...7266 Job Category Vice President Posting Date 06/02...  ...1:30 PM Locations Dallas, TX, United States... 

    The Goldman Sachs Group, Inc.

    Dallas, TX
    5 days ago
  •  ...VP Software Engineer - Cloud Connectivity - Dallas location_on Dallas, Texas, United States...  ..._outline CORPORATE TITLE Vice President language OFFICE LOCATION(...  ...Goldman Sachs, the Cloud Platform team is responsible for...  ...offer state-of-the-art on-site health centers in certain... 
    Full time
    Work at office

    Goldman Sachs Bank AG

    Dallas, TX
    19 hours ago
  •  ...Controllers-Dallas-Vice President-Software Engineering Job Description What We Do At Goldman Sachs, our Engineers...  ...EMEA and Asia) develop and manage the platforms, calculation engines, and...  ...We also offer state‑of‑the‑art on‑site health centers in certain offices.... 
    Full time
    Work at office

    Goldman Sachs Group, Inc.

    Dallas, TX
    4 days ago
  •  ...Compliance Engineering SRE Engineer We are Compliance Engineering...  ...and operate a suite of platforms and applications that...  ...that improve capacity and reliability. Practicing...  ...Identification 170771 Job Category Vice President Locations Dallas, TX, United States... 

    The Goldman Sachs Group, Inc.

    Dallas, TX
    19 days ago
  •  ...Vice President, Quantitative Engineering Job Duties: Vice President, Quantitative Engineering with Goldman Sachs & Co. LLC in Dallas, Texas. Multiple positions available. Lead the development, implementation, and documentation of scenarios comprised of a broad range... 

    Goldman Sachs

    Dallas, TX
    4 days ago
  • $136.88k - $200.75k

     ...Role Summary The Senior Cloud Platform & Site Reliability Engineering Lead partners with business and technical...  .... This role can be located in Dallas (Addison), TX; Montpelier, VT. This...  ...development and enhancement of CI/CD, SRE, and observability capabilities.... 
    Hourly pay
    Work experience placement
    Work at office
    Flexible hours

    National Life Group

    Addison, TX
    2 days ago
  •  ...Job Duties Vice President, Systems Engineering with Goldman Sachs Services LLC in Dallas, Texas (Multiple positions available). Apply expertise across all Engineering,...  ...to problems in the firm’s infrastructure and platforms. Oversee the identification, analysis, and resolution... 
    Work experience placement

    Goldman Sachs Group, Inc.

    Dallas, TX
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Engineering - SRE Platforms - Site Reliability Engineer - Vice President - Dallas. Be the first to apply!