Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer - xAI Technical Operations

$180k - $400k

xAI

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

AI is building at a furious pace with the latest hardware to help people understand the universe and we are in need of Site Reliability Engineers (SREs) who have at least 8+ years of experience in distributed, internet-scale environments, including on-prem and cloud-based infrastructure.

You will own the availability and reliability of xAI's infrastructure and core services, including detecting issues, problem management, incident management, and root cause analyses (RCAs). Engineers will own the availability of xAI infrastructure and its operations processes applying concepts like failure domains, blast radii, and canary testing. You will be expected to participate in a team on-call rotation and to contribute to ushering xAI into the next generation of infrastructure management across multiple data centers and cloud environments.

Responsibilities Will Include

  • Setting technical strategy and roadmap for infrastructure availability.
  • Automating monitoring, alerting, and troubleshooting for high-availability services, while working with legacy systems to scale, improve, or deprecate.
  • Owning incident response, problem management, and conducting thorough RCAs to prevent recurrence and drive continuous improvement.
  • Analyzing performance metrics and service health to identify, resolve, and mitigate bottlenecks or failures in distributed environments.
  • Ensuring security, scalability, and resilience of production infrastructure supporting AI workloads.

Location

Work will be in-office based out of either Palo Alto, California or Dublin, Ireland. 

Required Qualifications

  • A minimum of 8 years of software, systems or reliability engineering experience.
  • Experience managing services in distributed, internet-scale *nix environments, including on-prem and cloud (e.g., AWS, GCP).
  • Development experience in Python, Scala, Java, C, or C++.
  • Demonstrable knowledge of TCP/IP, Networking and systems programming (e.g., bash and shell tools).
  • Familiarity with containerization and orchestration tools (e.g., Kubernetes, Docker, Mesos) and systems management (e.g., Puppet, Chef, Ansible).
  • Bachelor's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience).

Preferred Experiences

  • Experience in on-call rotations and incident response in high-stakes environments.
  • Experience with AI/ML infrastructure, large-scale GPU clusters
  • Strong problem-solving skills and ability to thrive in a fast-paced, ambiguous setting.
  • Comfortable with deployment, support, monitoring, administration, and troubleshooting across on-prem, cloud and hybrid infrastructures.
  • Proven understanding of systems and application design, including operational trade-offs.

Interview Process

After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to an initial interview (45 minutes - 1 hour) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four interviews:

  1. Coding assessment in a language of your choice.
  2. Site reliability and operations technologies.
  3. Manager Interview.
  4. Meet and greet with the team with a presentation of a large scale solution or problem you owned, start to finish.

Our goal is to finish the main process within one week. We don’t rely on recruiters for assessments. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.

Annual Salary Range

$180,000 - $400,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer.

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer - xAI Technical Operations in Ireland vacancy
  •  ...greatest potential. Title and Summary Site Reliability Engineer I-1 The Next Edge BizOps team is...  ...automate everything you can? Business Operations is leading the DevOps transformation...  ...in Computer Science or related technical field involving coding (e.g., physics... 
    Operations
    Full time
    Worldwide
    Shift work

    Mastercard

    Ireland
    14 days ago
  •  ...potential. Title and Summary Director, Site Reliability Engineering Who is Mastercard? Mastercard...  ...this role will focus on leading our operational presense in Europe as well as owning...  ...candidate will have strong hands on technical experience across our core... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    14 days ago
  •  ...their greatest potential. Title and Summary Lead, SRE Engineer Lead SRE Engineer, Site Reliability Engineering Our Purpose: Mastercard powers...  ...applications. Our mission is to ensure these components operate with excellence, enabling applications to deliver an... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    1 day ago
  •  ...greatest potential. Title and Summary Lead SRE Network Engineer Lead Network Engineer, Site Reliability Engineering Our Purpose: Mastercard powers...  .... Our mission is to ensure these components operate with excellence, enabling applications to deliver an... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    19 days ago
  •  ...Title and Summary Software Engineer II in Test (SDET) Who is Mastercard...  ...to join the Decision Operations team in Dublin. This role...  ...our decisioning systems are reliable, scalable, and secure. Role...  ...code, and contribute to technical documentation. Advocate for... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    23 days ago
  •  ...potential. Title and Summary AI engineer II Who is Mastercard?...  ...II to support the build and operation of applied AI solutions. This...  ..., and learning how to build reliable, scalable AI systems in a...  ...• Strong interest in growing technical depth in AI engineering and ML... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    10 days ago
  •  ...business including, but not limited to, on the floor sales, business operations, outside marketing, customer service and retention, employee...  ...service and handling escalations Demonstrate solid technical competence for all products and services sold Engage in community... 
    Operations

    Wireless Revolution LLC

    Ireland
    28 days ago
  •  ...Title and Summary Manager, Software Engineering Overview The Corporate Client Experience...  ...applications. • Work closely with technical leads, architects, and product owners...  ...etc. • Provide automation tests for operations teams to use in Ci/Cd automated quality... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    3 days ago
  •  ...and Summary Manager, Product Management-Technical Manager, Product Management-...  ...Product Strategy, Product Management, Engineering, Customer delivery, Support chain community...  ...dependent applications/services, runtime operations (i.e. trouble management/associated support... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    19 hours ago
  •  ...platform. We’re hiring a Senior Software Engineer II to join the Flywheel Context team - a...  ...What You’ll Do Design, build, and operate backend services that power context retrieval...  ...frameworks to ensure context accuracy, reliability, and performance. Collaborate closely... 
    Operations
    Remote job
    Work at office

    HubSpot

    Ireland
    more than 2 months ago
  •  ...potential. Title and Summary Senior Software Engineer in Test The Mastercard Consumer Data...  .... • Work closely with Product Owners, Technical leads and other developers to define...  ...business needs. • Automate build, operate, and run aspects of software Skills:... 
    Full time
    Work experience placement
    Worldwide

    Mastercard

    Ireland
    19 hours ago
  • $65 - $120 per hour

     ...written and verbal communication skills to clearly articulate technical concepts and feedback. Strong attention to detail and a passion...  ...Qualifications: Experience with AI/ML, LLMs, prompt engineering, or similar emerging technologies. Active GitHub or other public... 
    Remote job
    Hourly pay
    Part time

    SaidGig

    Ireland
    9 days ago
  •  ...Title and Summary Principal Software Engineer Who is Mastercard? Mastercard is a...  ...responsible for designing, building, and operating the technology that powers Mastercard’s...  ...teams. You will also lead by defining the technical strategy, architecture, design, and execution... 
    Full time
    Work experience placement
    Worldwide

    Mastercard

    Ireland
    19 days ago
  •  ...and Summary Principal Oracle Platform Engineer Principal Platform Engineer (Database...  ...routine maintenance to design complex, multi-site replication strategies and modern...  ...architecture-heavy role (70%), you will remain technically "sharp" by leading high-level... 
    Full time
    Worldwide

    Mastercard

    Ireland
    19 days ago
  •  ...and Summary Director of AI Engineering Overview Mastercard is...  ...solutions. The role requires strong technical judgement, delivery...  ...enterprise requirements for reliability, security, and governance....  ...delivery oversight Comfortable operating in a fast‑moving, evolving... 
    Full time
    Worldwide
    Shift work

    Mastercard

    Ireland
    19 hours ago
  •  ...Summary Software Development Engineer II - Data and Analytics...  ...clients. We are seeking a technically strong Software Development...  ...best practices, and support operational excellence. You will work...  ...while helping the team deliver reliable, high-quality software.... 
    Full time
    Immediate start
    Worldwide

    Mastercard

    Ireland
    22 days ago
  •  ...combat climate change, and reliably connect humans and the world...  .... Product Applications Engineer About the Role As a Product...  ...to market, applying your technical expertise to solve real-...  ...specifications, understand operational boundaries, and ensure performance... 
    Permanent employment
    Work at office
    Remote work

    Analog Devices

    Ireland
    more than 2 months ago
  •  ...Title and Summary Lead AI engineer Who is Mastercard? Mastercard...  ...individual contributor and technical leadership role. You will...  ...deployment • Build and operate ML/AI services, pipelines, and...  ...standards for performance, reliability, security, and governance... 
    Full time
    Worldwide

    Mastercard

    Ireland
    10 days ago
  •  ...Title and Summary Manager, Software Engineering Overview The Mastercard Fraud Platform...  ...we do. The ideal candidate will be technically proficient with strong experience...  .... · Be a champion of engineering and operational excellence: ensure organizational metrics... 
    Full time
    Local area
    Worldwide

    Mastercard

    Ireland
    1 day ago
  •  ...Summary Director of Software Engineering Mastercard is seeking a...  ...responsible for building and operating the software platforms and...  ...platform services, and system reliability, ensuring that AI...  ...the engineering delivery and technical direction of the software engineering... 
    Full time
    Worldwide
    Shift work

    Mastercard

    Ireland
    19 hours ago
  •  ...their greatest potential. Title and Summary Director, Platform Engineering (vmware) Who is Mastercard? Mastercard is a global...  ...scaling our service as we experience rapid growth, ensuring operational resliancy and continuing our automation journey. The ideal... 
    Full time
    Worldwide

    Mastercard

    Ireland
    16 days ago
  •  ...realize their greatest potential. Title and Summary Lead Platform Engineer – AWS Cloud DevOps Engineer Overview Mastercard’s...  ...BitBucket/GitHub, Artifactory, and Sonarqube. • Drive platform reliability and scalability through automation, observability (Splunk,... 
    Full time
    Worldwide

    Mastercard

    Ireland
    10 days ago
  •  ...Description Summary: Site lead for Contract...  ...Responsibilities: Manufacturing operations : Develop and lead...  ...processes, maintain reliable relationships with...  ...Administration, Engineering, or Science-related field...  ...requirements. Technical writing and multi-level... 
    Operations
    Contract work

    Emedgene, an Illumina Company

    Ireland
    more than 2 months ago
  •  ...Centers. At Tesla, our Mechanics are the backbone of the Service operation, supporting our mission to accelerate the world’s transition...  ...the repair of Electrical Vehicles. What You'll Bring Technically experienced: You have professional experience performing vehicle... 
    Operations
    Full time
    Local area
    Flexible hours
    Shift work
    Day shift
    Afternoon shift

    Tesla

    Ireland
    9 days ago
  •  ...geography or circumstance; Leadership – Advancing sustainable operations and innovative solutions to improve patient health; and...  ...investigations, and implementing corrective/preventive actions. Technical Proficiency: Proficient in the use of contamination control... 
    Operations
    Worldwide

    Viatris

    Ireland
    more than 2 months ago
  •  ...mechanical or electrical technical knowledge within...  ...customer satisfaction and reliability. - Ensure boats are...  ...ready to travel to off-site locations for...  ...commissioning, and gas dock operations as required. - Take...  ...pumps, batteries, diesel engines, electrical, propane,... 
    Operations
    Permanent employment
    Full time
    Summer work
    Rotating shift
    Ireland
    11 days ago
  •  ...point of contact between Business Units, Operations, Technology, and Global Product teams....  ...Technology, Computer Science, Engineering, Business, or a related field. • Recent...  ...Ability to adapt and learn in a fast-paced, technical environment. Languages: • Fluent English... 
    Operations
    Full time
    Internship
    Worldwide

    Mastercard

    Ireland
    10 days ago
  •  ...OneSource services. Job Title OneSource Customer Operations Location(s)Sanofi Waterford (Ireland) - Customer Site This role of OneSource Customer Operations has...  ...relating to the intervention ~ act as the technical interface with service providers by providing... 
    Operations
    Contract work
    Work at office
    Remote work

    PerkinElmer

    Ireland
    more than 2 months ago
  •  ...for designing, building, and operating cloud-based platforms that...  ...and deep experience in data engineering and platform delivery, with...  ...scalable, developer-friendly technical data products and data capabilities...  ...engineering standards for reliability, performance, cost... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    10 days ago
  •  ...customer contract lifecycle, partnering closely with sales, sales operations, legal, finance, and other cross-functional teams to ensure...  ...data integrity and process adherence within Salesforce. Technical Skills / Competencies   Solid understanding of contract lifecycle... 
    Operations
    Contract work
    Work at office
    Flexible hours

    SmartSpace Global

    Ireland
    more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer - xAI Technical Operations. Be the first to apply!