Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Site Reliability Engineer (SRE)

INFINITE CHOICE LLC

Job Description

Job Description

About the Role

We're seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our SRE foundation from the ground up at InfiniteChoice. This is a rare greenfield opportunity to establish SRE practices, develop custom tooling, and create the reliability culture that will support our platform serving millions of users and billions in transaction volume.

As our Principal SRE, you'll combine deep technical expertise with strategic vision to build world-class monitoring, observability, and automation systems. You'll have the autonomy to define our SRE processes, select technologies, and create the framework that ensures our systems are reliable, scalable, and performant.

Location: Remote - US based

What You Will DoSRE Foundation & Process Development
  • Build SRE practices from scratch - define SLIs, SLOs, error budgets, and reliability metrics

  • Establish incident response procedures, on-call rotations, and post-mortem processes

  • Create reliability engineering standards and best practices across all engineering teams

  • Develop disaster recovery and business continuity strategies

  • Design and implement capacity planning and performance optimization frameworks

Architecture & Tool Development
  • Drive architecture decisions for comprehensive application and infrastructure monitoring solutions

  • Design and develop custom SRE tools for automated monitoring, alerting, and remediation

  • Build observability platforms that provide deep insights into system performance and user experience

  • Create automation frameworks for deployment, scaling, and incident response

  • Architect logging, metrics, and tracing systems for distributed microservices environments

Google Cloud Infrastructure Excellence
  • Leverage Google Cloud Platform services to build resilient, scalable infrastructure

  • Implement cloud-native monitoring using Stackdriver, Cloud Monitoring, and Cloud Logging

  • Design auto-scaling and self-healing systems using GKE, Cloud Functions, and managed services

  • Optimize cloud costs while maintaining high availability and performance standards

  • Establish security and compliance frameworks within GCP environments

Innovation & Continuous Improvement
  • Research and implement cutting-edge SRE tools and methodologies

  • Leverage AI and machine learning for predictive analytics, anomaly detection, and automated remediation

  • Create dashboards and reporting systems that provide actionable insights to engineering and business teams

  • Establish feedback loops for continuous improvement of reliability and performance

  • Stay current with industry best practices and emerging technologies in the SRE space

What You Must HaveSRE & Infrastructure Expertise
  • 12+ years of experience in Site Reliability Engineering or Infrastructure Engineering

  • 5+ years in lead SRE roles building and scaling SRE teams and processes

  • Proven track record designing and implementing monitoring and observability solutions at scale

  • Deep understanding of distributed systems, microservices architectures, and cloud-native patterns

  • Experience with infrastructure as code, configuration management, and deployment automation

Google Cloud Platform Proficiency
  • Hands-on experience with Google Cloud Platform is required

  • Expertise with GCP monitoring and observability stack (Cloud Monitoring, Cloud Logging, Cloud Trace)

  • Experience with GKE, Compute Engine, Cloud Functions, and other core GCP services

  • Knowledge of GCP networking, security, and compliance capabilities

  • Understanding of GCP cost optimization and resource management

Technical Skills
  • Strong programming skills in Python, Go, Java, or similar languages

  • Experience with monitoring tools (Prometheus, Grafana, Datadog, New Relic, or similar)

  • Proficiency with containerization (Docker, Kubernetes) and orchestration platforms

  • Knowledge of CI/CD pipelines, automated testing, and deployment strategies

  • Understanding of database performance tuning and optimization (SQL and NoSQL)

AI & Automation
  • Familiarity with AI-driven development tools and methodologies is a huge plus

  • Experience with machine learning for operations (AIOps), anomaly detection, or predictive analytics

  • Knowledge of automated incident response and self-healing systems

  • Understanding of AI/ML tools for log analysis, pattern recognition, and intelligent alerting

Problem-Solving & Mindset
  • Strong analytical and troubleshooting skills for complex distributed systems

  • Experience with high-pressure incident response and crisis management

  • Detail-oriented with commitment to operational excellence and continuous improvement

  • Comfortable with ambiguity and building processes in a fast-growing environment

  • Passion for reliability, automation, and engineering best practices

  • Demonstrated experience building SRE programs and processes from the ground up is a HUGE plus

Education
  • Bachelor's degree in Computer Science, Engineering, or equivalent professional experience

  • Industry certifications (Google Cloud Professional, SRE or related certifications preferred)

What We Offer
  • Ground-floor opportunity to build SRE practices and culture from scratch

  • Full autonomy to define processes, select technologies, and establish best practices

  • Direct impact on platform reliability serving millions of users

  • Opportunity to create lasting engineering culture and operational excellence

  • Remote-first culture with in-person meeting in Dallas, TX on need basis

  • Collaborative environment with smart, passionate engineers and cross-functional teams

  • Access to cutting-edge technologies and AI-driven development tools

  • Competitive compensation, equity participation, and comprehensive benefits

Ready to Build World-Class Reliability?

Join us in creating the SRE foundation that will power InfiniteChoice's next phase of growth. If you're passionate about reliability engineering, love building systems from scratch, and want to establish the operational excellence that scales with our business, we'd love to hear from you.

About InfiniteChoice

InfiniteChoice was founded to help people find the experiences they want simply and effortlessly. We leverage a new type of business model and platform that uniquely applies automation and technology to solve the challenges of scale and complexity in experience discovery.

Existing business and marketing technologies can no longer handle the demands of connecting millions of consumers with vast inventories of experiences across a fragmented, global marketplace of people, partners, and providers.

Our mission is to disrupt this status quo by creating seamless connections between consumers and experiences. We're just at the beginning of this journey, but our approach is working: we've helped over 275 million visitors connect to millions of experiences, generating over $2 billion in revenue for our brands and partners.

Vacancy posted a month ago
Similar jobs that could be interesting for youBased on the Principal Site Reliability Engineer (SRE) in Dallas, TX vacancy
  • $110k - $230k

     ...Great Company, Great Culture, Great Rewards and Great Careers. GEICO's Cyber Security Engineering & Analytics, Automation (SEA) team is seeking a Staff Cyber Site Reliability Engineer (SRE) — a hands-on, engineering-minded practitioner who is passionate about building... 
    Suggested
    Hourly pay
    Full time
    Work experience placement
    Local area
    Flexible hours

    GEICO

    Dallas, TX
    8 hours ago
  •  ...Job Summary:We are looking for an SRE L2 Engineer to support and maintain our Azure cloud-native infrastructure, ensuring high availability...  ...closely with L3 and engineering teams to improve system reliability.Key Responsibilities: Incident & Problem Management:o Monitor... 
    Suggested

    Omni Inclusive

    Dallas, TX
    3 days ago
  •  ...building the infrastructure, tooling, and engineering culture to scale both our platform and...  ...We are seeking a highly capable SRE / Support Developer to operate at the intersection...  ...support, software engineering, and site reliability. This is not a traditional support role... 
    Suggested
    Work at office
    Immediate start
    3 days per week

    Wellfit Technologies

    Irving, TX
    a month ago
  •  ...Senior Site Reliability Engineer (SRE) — Combination of deep operational expertise and hands-on engineering ability. The majority of your time (~70%) will be focused on owning the reliability, availability, scalability, and operational excellence of the cloud infrastructure... 
    Suggested

    Veloc Inc

    Irving, TX
    2 days ago
  •  ...interview process. Lantern is seeking an experienced Senior Site Reliability Engineer to champion the reliability, availability, and performance...  .... In this pivotal role, you will define and implement SRE practices, drive incident management processes, build observability... 
    Suggested
    Temporary work
    Flexible hours

    Lantern

    Dallas, TX
    3 days ago
  •  ...Description Forhyre is looking for engineers who can bring unique...  ...while building a culture of reliability and observability Engage in...  ...subject matter expert in an SRE mindset, best practices, and...  ...Skills We are looking for Principal SRE with proven experience in... 

    forhyre.com

    Dallas, TX
    3 days ago
  • $147.76k - $221.64k

     ...better world, so we can all enjoy living in it. Engineering Manager, IAM Platform (Ops, SRE & AI Enablement) We are seeking a strategic Engineering...  ...from traditional operations to a modern Site Reliability Engineering (SRE) model. You will lead the charge in... 
    Hourly pay
    Temporary work
    Part time
    Relocation
    Relocation package
    Flexible hours

    Caterpillar

    Irving, TX
    2 days ago
  •  ...Information Technology group delivers secure, reliable technology solutions that enable DTCC...  ...the technical leader responsible for Site Reliability Engineering across IAM platform, overseeing and...  ...: Lead and Implement SRE across all IAM platforms and ensure availability... 

    Dtcc

    Dallas, TX
    2 days ago
  •  ...Principal Engineer - Platform Engineering & Production Support Team Overview This...  ...key role post-deployment, ensuring reliability, performance, and operational...  ...candidate is a strong DevOps and Site Reliability Engineering (SRE) professional with hands-on expertise... 
    Principal
    For contractors
    Shift work

    Mindlance

    Irving, TX
    5 days ago
  • $86.09 - $94.09 per hour

     ...Genesis10 is currently seeking a Principal Engineer - Platform Engineering for a contract position with a Global Financial Institution...  ...release. The ideal candidate is a strong DevOps and Site Reliability Engineering (SRE) professional with hands-on expertise in observability... 
    Principal
    Hourly pay
    Permanent employment
    Contract work
    Shift work

    Genesis10

    Irving, TX
    6 days ago
  • $85 - $90 per hour

     ...Role:  Senior SRE Engineer  Location: Dallas / Fort Worth, Texas Rate: up to $85-$90 per hour INC Structure: 8 Month contract *** 4 days on-site *** -- We have a great new opportunity to support one of our Consulting Services clients in a contract capacity... 
    Hourly pay
    Contract work
    Work experience placement

    CorGTA

    Dallas, TX
    16 hours ago
  • $155k - $233k

     ...for a highly experienced and forward-thinking Senior Software Engineer to help design, build, and scale a mission-critical platform...  ...excellence—ensuring seamless coordination of complex workflows, high reliability, and real-time visibility into data center operations. As a... 
    Principal
    Full time
    Work at office

    Equinix

    Dallas, TX
    6 days ago
  •  ...Principal Software Engineer Location: Dallas, TX (Hybrid role. ON site Tuesday and Wednesday) Duration: 12 months+ Visa: US Citizen only Interview: Phone/Skype Basic Qualifications and Required Skills • Bachelor's Degree in Computer Engineering, Computer Science... 
    Principal
    Work at office

    ShiftCode Analytics

    Dallas, TX
    3 days ago
  •  ...Job Summary: We are seeking an experienced Principal Software Engineer to lead the design and development of advanced AI and machine learning...  ...with cross-functional teams to deliver scalable and reliable AI solutions Review code, mentor engineers, and establish... 
    Principal

    Compunnel

    Dallas, TX
    3 days ago
  • A leading tech company in Dallas is looking for a Principal Infrastructure Engineer to lead the design of scalable systems for enterprise-level deployments. You will reshape Voxel51's infrastructure and mentor peers while handling containerized systems, CI/CD pipelines,... 
    Principal
    Remote work

    Stryker

    Dallas, TX
    1 day ago
  •  ...Job title: SAP FICO Principal Consultant Work Location: Irving, TX - 75039 ( Hybrid, 2 days to office in Irving, TX office ) Minimum years of experience: 10 Years Job Description: At least 12 years of experience in Configuration/solutions evaluation... 
    Principal
    Work experience placement
    Work at office

    Diverse Lynx

    Irving, TX
    2 days ago
  •  ...Visa : USC, GC, GC EAD, H4 This is hybrid from day-1 ( Need local candidates ) Description : The Principal Cyber AI Engineer will play a key role in the development, implementation, and optimization of advanced AI and machine learning algorithms... 
    Principal
    Local area

    ShiftCode Analytics

    Dallas, TX
    4 days ago
  •  ...revolutionize services and drive leadership in advisory services. Operational Excellence: Spearhead operational excellence across all engineering functions, ensuring the continuous delivery of high-quality, impactful software and data applications. Identify and significantly... 
    Principal

    CBRE

    Dallas, TX
    1 day ago
  •  ...Principal Developer They’ll be working in Salesforce so I’m looking for full stack experience on Salesforce specifically (it sounds like Salesforce has their own customized coding language) – no need to have React/Node/cloud. If they did have broader experience I would... 
    Principal
    Immediate start
    Remote work

    Samprasoft

    Dallas, TX
    2 days ago
  •  ...Senior Director, Principal Gifts About the Company Philanthropic organization supporting Indigenous culture & individuals Industry Non-Profit Organization Management Type Non Profit Founded 2017 Employees 11-50 Categories ~ Non-Profit & Philanthropy... 
    Principal

    Confidential

    Dallas, TX
    18 hours ago
  •  ...A global leader in supply chain management is seeking a Senior Industry Principal to provide strategic guidance as a C-suite advisor within a remote work environment. The ideal candidate has over 10 years of experience in consulting or industry leadership focusing on supply... 
    Principal
    Remote work

    Kinaxis

    Dallas, TX
    11 days ago
  •  ...Principal Cloud Security Architect – Gen AI Client is seeking a highly skilled and experienced Application Security Architect with expertise in Generative AI (Gen AI) platforms and environments to join client's team. The Security Architect will be responsible for designing... 
    Principal

    InterSources

    Dallas, TX
    2 days ago
  •  ...Job Description Insight Global is seeking a Principal Azure GenAI Architect for their client. Responsibilities include: -Fine tuning LLMs -Building Transformer AI Models -Architecting LLM's -Conducting Automated evaluations -Agentic coding -Participating... 
    Principal

    Insight Global

    Dallas, TX
    4 days ago
  • $152.89k - $190k

     ...Role: Principle Software Engineer - ( Senior Enterprise Architect) Location: DALLAS TX ( hybrid role) NO 100% REMOTE, has to...  ...Product Development Team We are seeking an experienced Principal Software Engineer to lead a team of product engineers in designing... 
    Principal
    Local area
    Remote work
    Relocation
    Flexible hours

    Blue Yonder

    Dallas, TX
    3 days ago
  •  ...Bangalore, India. We are looking for Engineers to help us dramatically expand and improve...  ...environment. At Deem, our Senior and Principal Engineers are empowered to make an impact...  ...have designed and built high performance, reliable and scalable low-touch production... 
    Principal
    Work at office
    Remote work

    DEEM, LLC

    Dallas, TX
    1 day ago
  •  ...AI Security Principal Developer We are seeking an AI Security Principal Developer to serve as a strategic advisor to AI and engineering teams on building secure, trustworthy AI solutions. This role focuses on applying AI, analytics, and data-driven thinking to drive... 
    Principal

    TXU Energy

    Irving, TX
    3 days ago
  •  ...Position: Principal Software Engineer - Machine Learning Location: Dallas, TX/ Atlanta /GA/ Plano, TX onsite day1 (Any of these three locations...  ...cross-validation techniques to ensure the robustness and reliability of the model. 3. Model Deployment and Integration •... 
    Principal
    Full time

    Diverse Lynx

    Dallas, TX
    5 days ago
  •  ...GM Financial Technology Senior Principal Engineer Innovation isn't just a talking point at GM Financial, it's how we operate. From generative AI and cloud-native technologies to peer-led learning and hackathons, our tech teams are building real solutions that make... 
    Principal
    Full time
    H1b
    Work at office
    Visa sponsorship
    Flexible hours
    3 days per week

    GMAC Financial Services

    Irving, TX
    8 days ago
  • SAP Basis Administrator Interview: Virtual (Submit candidate with LinkedIn) Visa: USC and GC (Candidate should be working or worked with 500 Fortune companies) Hybrid: Addison, TX (preferred), or Miramar, FL, candidates local to one of the areas HIGHLY preferred ...
    Principal
    Local area

    ShiftCode Analytics

    Addison, TX
    3 days ago
  • $139.9k - $274.8k

     ...proprietary and open-source frameworks, all aimed at delivering reliable, enterprise-grade agentic workflows. We areseekinga...  ...Responsibilities include the following. Collaboration with engineers and researchers to build and optimize training infrastructure... 
    Principal
    Ongoing contract
    Local area

    Microsoft Corporation

    Irving, TX
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Site Reliability Engineer (SRE). Be the first to apply!