Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer - xAI Technical Operations

$180k - $400k

xAI

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

AI is building at a furious pace with the latest hardware to help people understand the universe and we are in need of Site Reliability Engineers (SREs) who have at least 8+ years of experience in distributed, internet-scale environments, including on-prem and cloud-based infrastructure.

You will own the availability and reliability of xAI's infrastructure and core services, including detecting issues, problem management, incident management, and root cause analyses (RCAs). Engineers will own the availability of xAI infrastructure and its operations processes applying concepts like failure domains, blast radii, and canary testing. You will be expected to participate in a team on-call rotation and to contribute to ushering xAI into the next generation of infrastructure management across multiple data centers and cloud environments.

Responsibilities Will Include

  • Setting technical strategy and roadmap for infrastructure availability.
  • Automating monitoring, alerting, and troubleshooting for high-availability services, while working with legacy systems to scale, improve, or deprecate.
  • Owning incident response, problem management, and conducting thorough RCAs to prevent recurrence and drive continuous improvement.
  • Analyzing performance metrics and service health to identify, resolve, and mitigate bottlenecks or failures in distributed environments.
  • Ensuring security, scalability, and resilience of production infrastructure supporting AI workloads.

Location

Work will be in-office based out of either Palo Alto, California or Dublin, Ireland. 

Required Qualifications

  • A minimum of 8 years of software, systems or reliability engineering experience.
  • Experience managing services in distributed, internet-scale *nix environments, including on-prem and cloud (e.g., AWS, GCP).
  • Development experience in Python, Scala, Java, C, or C++.
  • Demonstrable knowledge of TCP/IP, Networking and systems programming (e.g., bash and shell tools).
  • Familiarity with containerization and orchestration tools (e.g., Kubernetes, Docker, Mesos) and systems management (e.g., Puppet, Chef, Ansible).
  • Bachelor's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience).

Preferred Experiences

  • Experience in on-call rotations and incident response in high-stakes environments.
  • Experience with AI/ML infrastructure, large-scale GPU clusters
  • Strong problem-solving skills and ability to thrive in a fast-paced, ambiguous setting.
  • Comfortable with deployment, support, monitoring, administration, and troubleshooting across on-prem, cloud and hybrid infrastructures.
  • Proven understanding of systems and application design, including operational trade-offs.

Interview Process

After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to an initial interview (45 minutes - 1 hour) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four interviews:

  1. Coding assessment in a language of your choice.
  2. Site reliability and operations technologies.
  3. Manager Interview.
  4. Meet and greet with the team with a presentation of a large scale solution or problem you owned, start to finish.

Our goal is to finish the main process within one week. We don’t rely on recruiters for assessments. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.

Annual Salary Range

$180,000 - $400,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer.

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer - xAI Technical Operations in Ireland vacancy
  •  ...greatest potential. Title and Summary Site Reliability Engineer I-1 The Next Edge BizOps team is...  ...automate everything you can? Business Operations is leading the DevOps transformation...  ...in Computer Science or related technical field involving coding (e.g., physics... 
    Operations
    Full time
    Worldwide
    Shift work

    Mastercard

    Ireland
    a month ago
  •  ...potential. Title and Summary Senior Site Reliability Engineer Who is Mastercard? At...  ...next. About the Role The Business Operations team is seeking a highly motivated and...  ...leader in your field, you will bring technical expertise, a passion for automation,... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    8 days ago
  •  ...greatest potential. Title and Summary Site Reliability Lead Engineer Lead Site Reliability Engineer...  .... About the Role The Business Operations (Biz Ops) team is seeking a Business...  ...in your field, you will bring technical expertise, a passion for automation... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    17 days ago
  •  ...potential. Title and Summary Director, Site Reliability Engineering Who is Mastercard? Mastercard...  ...this role will focus on leading our operational presense in Europe as well as owning...  ...candidate will have strong hands on technical experience across our core... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    a month ago
  •  ...potential. Title and Summary Director, Site Reliability Engineering Director, Site Reliability...  ...mission is to ensure these components operate with excellence, enabling applications...  ...platform roadmaps. • Provide strong technical leadership by driving high level architectural... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    14 days ago
  •  ...potential. Title and Summary Manager, Site Reliability Engineer Who is Mastercard? At...  ...next. About the Role The Business Operations team is seeking a highly motivated and...  ...leader in your field, you will bring technical expertise, a passion for automation,... 
    Operations
    Full time
    Worldwide
    Shift work

    Mastercard

    Ireland
    8 days ago
  •  ...their greatest potential. Title and Summary Lead, SRE Engineer Lead SRE Engineer, Site Reliability Engineering Our Purpose: Mastercard powers...  ...applications. Our mission is to ensure these components operate with excellence, enabling applications to deliver an... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    29 days ago
  •  ...greatest potential. Title and Summary Lead SRE Network Engineer Lead Network Engineer, Site Reliability Engineering Our Purpose: Mastercard powers...  .... Our mission is to ensure these components operate with excellence, enabling applications to deliver an... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    a month ago
  •  ...greatest potential. Title and Summary Senior Software Engineer Overview Be part of the Operations & Technology Fraud Products team developing new...  ...direct development of software. • Work closely with technical leads for assigned projects to assist in design and... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    14 days ago
  •  ...potential. Title and Summary Software Engineer I​I Overview The Virtual Card Management...  ...functional requirements into technical solutions, ensuring alignment with project...  ...ability to understand internal business operations and how technical work connects to customer... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    18 days ago
  •  ...Title and Summary Software Engineer – DevOps / SRE Overview...  ...Engineer II with emphasis on site reliability to support and evolve our Authentication...  .... Our focus is on operating highly resilient systems in...  ...Business Analyst, Systems Analyst, Technical leads and other developers... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    10 days ago
  •  ...and Summary Senior Software Engineer Overview The Program...  ...technology, risk, and service operations. We establish the foundation...  ...security, performance, and reliability. Role & Responsibilities...  ...environments Contribute to technical design discussions, mentor... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    22 days ago
  •  ...and Summary Principal DevOps Engineer - Decision Management...  ...and driving engineering and operational excellence across a critical...  ...• Drive observability and reliability through monitoring, logging,...  ...including at least 2 years in a technical leadership capacity. • Strong... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    16 days ago
  •  ...Title and Summary Software Engineer II in Test (SDET) Who is Mastercard...  ...to join the Decision Operations team in Dublin. This role...  ...our decisioning systems are reliable, scalable, and secure. Role...  ...code, and contribute to technical documentation. Advocate for... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    a month ago
  •  ...and Summary Lead Software Engineer Lead Software Engineer...  ...C-Suite. This is a hands-on technical leadership role for an experienced...  ...of secure, scalable, and reliable agentic applications that can...  ...testing, release, and production operations • Use engineering tools to... 
    Operations
    Full time
    Temporary work
    Worldwide

    Mastercard

    Ireland
    16 days ago
  •  ...and Summary Lead Software Engineer Overview: Mastercard is...  ...and test engineers to align technical and business goals. • Perform...  ...with enterprise security, operations, and architecture standards....  ...application performance and reliability for large-scale, high-... 
    Operations
    Full time
    Worldwide
    3 days per week

    Mastercard

    Ireland
    15 days ago
  •  ...potential. Title and Summary AI engineer II Who is Mastercard?...  ...II to support the build and operation of applied AI solutions. This...  ..., and learning how to build reliable, scalable AI systems in a...  ...• Strong interest in growing technical depth in AI engineering and ML... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    a month ago
  •  ...Title and Summary Senior AI Engineer-1 Who is Mastercard?...  ...strong focus on building and operating production-grade AI systems....  ...as needed Participate in technical design reviews and support knowledge...  ...standards for performance, reliability, security and governance... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    3 days ago
  •  ...potential. Title and Summary Lead Network Engineer-2 Overview The Data Center and...  ...Platform Engineer to spearhead our Telcom Operations team forward by consistently innovating...  ...troubleshoot issues. Key Responsibilities Technical Leadership & Troubleshooting • Serve... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    24 days ago
  •  ...Summary Director, Software Engineering Director, Software Engineering...  ...on engineering, influence technical direction, and partner...  ...software platforms where agents operate as intelligent personas,...  ...agentic concepts into secure, reliable, observable, and production-... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    18 days ago
  •  ...Title and Summary Manager, Software Engineering Overview The Corporate Client Experience...  ...applications. • Work closely with technical leads, architects, and product owners...  ...etc. • Provide automation tests for operations teams to use in Ci/Cd automated quality... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    a month ago
  •  ...their greatest potential. Title and Summary Lead Technical Program Manager Overview Be part of the Operations & Technology Fraud Products team developing new...  ...current processing and work with analysts and engineers to ensure accuracy of enhancements. Document... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    25 days ago
  •  ...platform. We’re hiring a Senior Software Engineer II to join the Flywheel Context team - a...  ...What You’ll Do Design, build, and operate backend services that power context retrieval...  ...frameworks to ensure context accuracy, reliability, and performance. Collaborate closely... 
    Operations
    Remote job
    Work at office

    HubSpot

    Ireland
    more than 2 months ago
  •  ...and Summary Manager, Product Management-Technical Manager, Product Management-...  ...Product Strategy, Product Management, Engineering, Customer delivery, Support chain community...  ...dependent applications/services, runtime operations (i.e. trouble management/associated support... 
    Operations
    Full time
    Worldwide

    Mastercard

    Ireland
    28 days ago
  •  ...their greatest potential. Title and Summary Senior Software Engineer Overview The Mastercard Fraud Scoring and Analytics Platform...  ...with Product Owners, Business Analyst, Systems Analyst, Technical leads and other developers to define user stories. • Work Quality... 
    Full time
    Work experience placement
    Worldwide

    Mastercard

    Ireland
    23 days ago
  •  ...and Summary Senior Software Engineer Senior Software Engineer...  ...emerging technologies into secure, reliable, and reusable capabilities...  ..., release engineering, and operational support • Use modern tools...  ...business impact • Drive technical innovation by evaluating... 
    Full time
    Worldwide

    Mastercard

    Ireland
    7 days ago
  •  ...potential. Title and Summary Senior Software Engineer in Test The Mastercard Consumer Data...  .... • Work closely with Product Owners, Technical leads and other developers to define...  ...business needs. • Automate build, operate, and run aspects of software Skills:... 
    Full time
    Work experience placement
    Worldwide

    Mastercard

    Ireland
    28 days ago
  •  ...and Summary Lead Software Engineer - Distributed Microservices...  ...overview This role combines technical leadership, system design,...  ...capabilities at scale. The role operates across a mixed architecture...  ...on platform resilience, reliability, and safe evolution while... 
    Full time
    Worldwide

    Mastercard

    Ireland
    22 days ago
  •  ...potential. Title and Summary Senior Platform Engineer - Linux Overview: Linux Systems...  ...Administrator, Platform Support to provide support of technical hardware and software expertise in support of MasterCard Linux Operating systems and platforms. This is a senior-... 
    Full time
    Worldwide

    Mastercard

    Ireland
    25 days ago
  •  ...Title and Summary Principal Software Engineer Who is Mastercard? Mastercard is a...  ...responsible for designing, building, and operating the technology that powers Mastercard’s...  ...teams. You will also lead by defining the technical strategy, architecture, design, and execution... 
    Full time
    Work experience placement
    Worldwide

    Mastercard

    Ireland
    a month ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer - xAI Technical Operations. Be the first to apply!