Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer

Mistral AI

Site Reliability Engineer

At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.

We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our platform and customer facing applications. You will work closely with our software engineers and research teams to ensure our systems meet and exceed our internal and external customers' expectations.

Location: Remote - Europe

Reporting line: Team Lead, Site Reliability Engineer

As a Site Reliability Engineer, you balance the day-to-day operations on production systems with long-term software engineering improvements to reduce operational toil and foster the reliability, availability, and performance of these systems.

Operations

• Design, build, and maintain scalable, highly available and fault-tolerant infrastructures to support our web services and ML workloads

• Make sure our platform, inference and model training environments are always highly available and enable seamless replication of work environments across several HPC clusters

• Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)

• Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime

• Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for both our client-facing APIs and large training runs

• Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences

Development

• Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform

• Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments

• Build a cloud-agnostic platform offering an abstraction layer between science and infrastructure

• Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, refactoring, new API-based features, web apps, dashboards, etc.)

• Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements

• Document processes and procedures to ensure consistency and knowledge sharing across the team

• Contribute to open-source projects, research publications, blog articles and conferences

About you

• Master's degree in Computer Science, Engineering or a related field

• 7+ years of experience in a DevOps/SRE role

• Strong experience with cloud computing and highly available distributed systems

• Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...)

• Experience working against reliability KPIs (observability, alerting, SLAs)

• Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...)

• Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)

• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation

• Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices

• Strong understanding of networking, security, and system administration concepts

• Excellent problem-solving and communication skills

• Self-motivated and able to work well in a fast-paced startup environment

Your application will be all the more interesting if you also have:

• experience in an AI/ML environment

• experience of high-performance computing (HPC) systems and workload managers (Slurm)

• worked with modern AI-oriented solutions (Fluidstack, Coreweave, Vast...)

This role is primarily based at one of our European offices (Paris, France and London, UK). We will prioritize candidates who either reside there or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team.

Depending on their background, we may also consider remote candidates based in one of the countries listed in this job posting — currently France, UK, Germany, Belgium, Netherlands, Spain and Italy. In that case, we ask all new hires to visit our Paris HQ office:

• for the first 2 weeks of their onboarding

• then at least 2 days every month

What we offer

Competitive salary and equity

Health insurance

Sport allowance

Meal vouchers

Generous parental leave policy

Visa sponsorship

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer in United States vacancy
  • $76k - $127k

     ...products and services that help people, businesses and governments realize their greatest potential. Title and Summary Site Reliability Engineer II Site Reliability Engineer II Who is Mastercard? At Mastercard technology, we work to connect and power an inclusive... 
    Suggested
    Full time
    Part time
    Worldwide
    Flexible hours

    Mastercard

    O Fallon, MO
    4 days ago
  • $163k

     ...services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Overview-The ProCOM team is looking for a Site Reliability Engineering (SRE) who can help us solve problems, build our... 
    Suggested
    Full time
    Part time
    Immediate start
    Worldwide
    Flexible hours

    Mastercard

    O Fallon, MO
    4 days ago
  • $96k - $163k

     ...products and services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Who is Mastercard? At Mastercard technology, we work to connect and power an inclusive, digital economy that... 
    Suggested
    Full time
    Part time
    Worldwide
    Flexible hours

    Mastercard

    O Fallon, MO
    4 days ago
  • $96k - $163k

     ...services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Overview The BizOps team is looking for a Senior Site Reliability Engineer who can help us solve problems and... 
    Suggested
    Full time
    Part time
    Worldwide
    Flexible hours
    Shift work

    Mastercard

    O Fallon, MO
    4 days ago
  •  ...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow... 
    Suggested
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    3 days ago
  • $122k - $207k

     ...and services that help people, businesses and governments realize their greatest potential. Title and Summary Lead Site Reliability Engineer-2 Overview: Who is Mastercard?At Mastercard technology, we work to connect and power an inclusive, digital economy that... 
    Full time
    Part time
    Worldwide
    Flexible hours

    Mastercard

    O Fallon, MO
    4 days ago
  •  ...Site Reliability Engineers are responsible for ensuring the availability, reliability, scalability, and performance of the firm’s most critical customer-facing microservices that power all eCommerce channels. This role applies Google-inspired SRE principles to balance... 
    Local area
    Remote work
    Flexible hours
    Shift work

    O'Reilly Technology Services, Inc.

    Pierce, ID
    2 days ago
  • $122k - $207k

     ...services that help people, businesses and governments realize their greatest potential. Title and Summary Lead Site Reliability Engineer Overview The Mastercard Business Operations (BizOps) organization is seeking a Lead BizOps Engineer to serve as a technical... 
    Full time
    Part time
    Worldwide
    Flexible hours
    Shift work

    Mastercard

    O Fallon, MO
    4 days ago
  •  ...across global OTC markets.We are seeking a Market Data Support Engineer to join our Manila-based team within Parameta Solutions. This...  ...production issues under pressureStrong focus on data integrity, system reliability, and service uptimeAbility to work independently with minimal... 

    TP ICAP

    Manila, UT
    1 day ago
  •  ...role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry...  ...company goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing... 
    Flexible hours

    Megaport

    Cambridge, ID
    4 days ago
  •  ...availability. • Automation Experience with Build/deployment, Software Configuration/Continuous Integration/Continuous Delivery/Release Engineering related tasks in JavaEE/C++ Environments. • Experience in automating manual processes using Python, Ruby, Unix Shell (bash,... 
    Immediate start

    Navtech

    Atlanta, GA
    3 days ago
  • $121.4k - $218.6k

     ...will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. **Partner with...  ...and defend them when they are breached. As a Senior Site Reliability Engineer, you will be responsible for: + Developing and scaling robust... 
    Work experience placement
    Work at office

    Akamai

    Helena, MT
    3 days ago
  •  ...Position: Sr. Site Reliability Engineer Length: 12+ Month contract Location: 6380 S Fiddlers Green Cir, Greenwood Village, CO 80111 Onsite Requirements : while on contract - a few times a week until ramped up then can do once or twice a month... 
    Contract work

    Saxon Global

    Greenwood Village, CO
    9 hours ago
  • $109.5k - $150.55k

     ...strive for the best, own our actions, and grow and evolve. Job Description Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site Reliability Team with a focus on Application and Infrastructure... 
    For contractors
    Local area
    Remote work
    Worldwide
    Work visa
    Flexible hours
    Weekend work

    Renaissance Services

    Philadelphia, PA
    1 day ago
  • $95k - $171k

     .... Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: Building and maintaining dashboards, alerts... 
    Permanent employment
    Work experience placement
    Work at office
    Remote work
    Work from home
    Worldwide
    Flexible hours

    Akamai

    Hartford, CT
    4 days ago
  • $81.1k - $187k

     ...Job Description We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection... 
    Temporary work
    Immediate start
    Flexible hours
    Shift work

    Oracle

    Dover, DE
    4 days ago
  •  ...Title: Site Reliability Engineer (SRE) Location: Austin, TX Description: We're searching for a driven Site Reliability Engineer (SRE) to join our innovative team. As an SRE, you'll be a cornerstone of our production software, ensuring our systems... 
    Work experience placement

    United IT Solutions

    Austin, TX
    3 days ago
  •  ...Job Title: Site Reliability Engineer (SRE) Location: Columbus, OH | Iselin, NJ (Onsite) Job Type: Long Term Contract Key Responsibilities Enhance platform reliability, performance, and observability Build dashboards and alerts using APM tools... 
    Long term contract

    Diverse Lynx

    Iselin, NJ
    2 days ago
  • $94.85k - $135.5k

     ...powered business communications. This is where you and your skills come in. We're currently looking for: An experienced Site Reliability Engineer (SRE) to join the RingCentral Collaboration team. As a SRE, you will be responsible for maintaining and improving uptime... 
    Full time
    Local area
    Flexible hours

    RingCentral

    Denver, CO
    6 days ago
  •  ...Site Reliability Maintenance Engineer This is an exciting new position meant to be a key player in our newly created Reliability Program. The role responsible for identifying and managing reliability improvements to steel producing equipment and facilities, minimizing... 

    Universal Stainless

    Bridgeville, PA
    2 days ago
  • $70 - $80 per hour

     ...Bachelors responsibilities: Automate operational tasks and health checks to create sustainable systems and services. Monitor the production environment to ensure system health using observability tools like Dynatrace and Splunk. Identify reliability gaps through proc... 
    Hourly pay
    Permanent employment
    Contract work

    Randstad

    Berkeley Heights, NJ
    1 day ago
  • $30 - $40 per hour

     ...Job Description We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on availability, reliability, and performance to join our team. The ideal candidate will have extensive experience in production batch support, Unix shell scripting,... 

    Insight Global

    Woonsocket, RI
    1 day ago
  •  ...Site Reliability Engineer (SRE) We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and availability of mission-critical applications and infrastructure. The ideal candidate will combine software engineering... 
    Full time
    Remote work

    Ova Technologies

    New York, NY
    1 day ago
  • $75.7k - $136.3k

     ...solve complex challenges? Do you have a passion for automation and building systems that scale? Join our highly skilled Site Reliability Engineering team! Our team designs, develops, and manages applications and infrastructure that support Akamai Cloud's products and... 
    Work experience placement
    Work at office

    Akamai

    Little Rock, AR
    2 days ago
  • $128k - $216k

     ...consumers to one another millions of times a day - quickly, reliably, and securely. Any time you swipe your credit card, pay...  ...a global scale, come make a difference at Fiserv. Sr. Site Reliability Engineer About Clover Clover is a pioneer in the fintech space... 
    Worldwide

    BentoBox

    Berkeley Heights, NJ
    1 day ago
  • $179.2k - $268.8k

     ...sensors and compute systems, test operations, systems and safety engineering - all dedicated to making a real, positive impact on the...  ..., Mich., and Palo Alto, Calif. Meet the team: As a Site Reliability Engineer on the team, you will be responsible for helping to... 
    Permanent employment
    Full time
    Work at office
    Immediate start
    Visa sponsorship

    Latitude AI

    Pittsburgh, PA
    2 days ago
  • $104.9k - $174.7k

     ...scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives, whilst enabling the fast flow of feature and...  ...strong automation skills. About team; This diverse team of Engineers in assisting multiple product teams as we continue to innovate... 
    Local area
    Immediate start
    Worldwide

    RELX

    Trenton, NJ
    3 days ago
  •  ...of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s). As a Senior Site Reliability Engineer within the CET SAvE organization, you will play a critical leadership role advancing the reliability... 
    Full time
    Work at office

    Charles Schwab

    Austin, TX
    1 day ago
  • $109.5k

     ...and YouTube. ( Job Description AbbVie Information Security is looking for a highly motivated, diligent, and skillful Site Reliability Engineer to join the Cyber Security Engineering (CSE) Team. The CSE Team, working within the Cyber Security Operations (CSO) function... 
    Temporary work
    Local area
    Remote work

    AbbVie

    Atlanta, GA
    1 day ago
  • $176.75k - $209.1k

     ...development and learning. It allows us to scale easily, enabling our engineers to maximize attention on new features and capabilities. A...  ...all over the world. Peloton is looking for a Site Reliability Engineer with an operations focus to work with teams across... 
    Temporary work
    Local area

    Peloton

    New York, NY
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!