Site Reliability Engineer
Mistral AI
Site Reliability Engineer
At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.
We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our platform and customer facing applications. You will work closely with our software engineers and research teams to ensure our systems meet and exceed our internal and external customers' expectations.
Location: Remote - Europe
Reporting line: Team Lead, Site Reliability Engineer
As a Site Reliability Engineer, you balance the day-to-day operations on production systems with long-term software engineering improvements to reduce operational toil and foster the reliability, availability, and performance of these systems.
Operations
• Design, build, and maintain scalable, highly available and fault-tolerant infrastructures to support our web services and ML workloads
• Make sure our platform, inference and model training environments are always highly available and enable seamless replication of work environments across several HPC clusters
• Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)
• Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime
• Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for both our client-facing APIs and large training runs
• Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences
Development
• Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform
• Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments
• Build a cloud-agnostic platform offering an abstraction layer between science and infrastructure
• Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, refactoring, new API-based features, web apps, dashboards, etc.)
• Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements
• Document processes and procedures to ensure consistency and knowledge sharing across the team
• Contribute to open-source projects, research publications, blog articles and conferences
About you
• Master's degree in Computer Science, Engineering or a related field
• 7+ years of experience in a DevOps/SRE role
• Strong experience with cloud computing and highly available distributed systems
• Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...)
• Experience working against reliability KPIs (observability, alerting, SLAs)
• Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...)
• Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)
• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation
• Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices
• Strong understanding of networking, security, and system administration concepts
• Excellent problem-solving and communication skills
• Self-motivated and able to work well in a fast-paced startup environment
Your application will be all the more interesting if you also have:
• experience in an AI/ML environment
• experience of high-performance computing (HPC) systems and workload managers (Slurm)
• worked with modern AI-oriented solutions (Fluidstack, Coreweave, Vast...)
This role is primarily based at one of our European offices (Paris, France and London, UK). We will prioritize candidates who either reside there or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team.
Depending on their background, we may also consider remote candidates based in one of the countries listed in this job posting — currently France, UK, Germany, Belgium, Netherlands, Spain and Italy. In that case, we ask all new hires to visit our Paris HQ office:
• for the first 2 weeks of their onboarding
• then at least 2 days every month
What we offer
Competitive salary and equity
Health insurance
Sport allowance
Meal vouchers
Generous parental leave policy
Visa sponsorship
$76k - $127k
...products and services that help people, businesses and governments realize their greatest potential. Title and Summary Site Reliability Engineer II Site Reliability Engineer II Who is Mastercard? At Mastercard technology, we work to connect and power an inclusive...SuggestedFull timePart timeWorldwideFlexible hours$163k
...services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Overview-The ProCOM team is looking for a Site Reliability Engineering (SRE) who can help us solve problems, build our...SuggestedFull timePart timeImmediate startWorldwideFlexible hours$96k - $163k
...products and services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Who is Mastercard? At Mastercard technology, we work to connect and power an inclusive, digital economy that...SuggestedFull timePart timeWorldwideFlexible hours$96k - $163k
...services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Overview The BizOps team is looking for a Senior Site Reliability Engineer who can help us solve problems and...SuggestedFull timePart timeWorldwideFlexible hoursShift work- ...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow...SuggestedWork at officeRemote workFlexible hours2 days per week
$122k - $207k
...and services that help people, businesses and governments realize their greatest potential. Title and Summary Lead Site Reliability Engineer-2 Overview: Who is Mastercard?At Mastercard technology, we work to connect and power an inclusive, digital economy that...Full timePart timeWorldwideFlexible hours- ...Site Reliability Engineers are responsible for ensuring the availability, reliability, scalability, and performance of the firm’s most critical customer-facing microservices that power all eCommerce channels. This role applies Google-inspired SRE principles to balance...Local areaRemote workFlexible hoursShift work
$122k - $207k
...services that help people, businesses and governments realize their greatest potential. Title and Summary Lead Site Reliability Engineer Overview The Mastercard Business Operations (BizOps) organization is seeking a Lead BizOps Engineer to serve as a technical...Full timePart timeWorldwideFlexible hoursShift work- ...across global OTC markets.We are seeking a Market Data Support Engineer to join our Manila-based team within Parameta Solutions. This... ...production issues under pressureStrong focus on data integrity, system reliability, and service uptimeAbility to work independently with minimal...
- ...role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry... ...company goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team Championing...Flexible hours
- ...availability. • Automation Experience with Build/deployment, Software Configuration/Continuous Integration/Continuous Delivery/Release Engineering related tasks in JavaEE/C++ Environments. • Experience in automating manual processes using Python, Ruby, Unix Shell (bash,...Immediate start
$121.4k - $218.6k
...will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. **Partner with... ...and defend them when they are breached. As a Senior Site Reliability Engineer, you will be responsible for: + Developing and scaling robust...Work experience placementWork at office- ...Position: Sr. Site Reliability Engineer Length: 12+ Month contract Location: 6380 S Fiddlers Green Cir, Greenwood Village, CO 80111 Onsite Requirements : while on contract - a few times a week until ramped up then can do once or twice a month...Contract work
$109.5k - $150.55k
...strive for the best, own our actions, and grow and evolve. Job Description Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site Reliability Team with a focus on Application and Infrastructure...For contractorsLocal areaRemote workWorldwideWork visaFlexible hoursWeekend work$95k - $171k
.... Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: Building and maintaining dashboards, alerts...Permanent employmentWork experience placementWork at officeRemote workWork from homeWorldwideFlexible hours$81.1k - $187k
...Job Description We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection...Temporary workImmediate startFlexible hoursShift work- ...Title: Site Reliability Engineer (SRE) Location: Austin, TX Description: We're searching for a driven Site Reliability Engineer (SRE) to join our innovative team. As an SRE, you'll be a cornerstone of our production software, ensuring our systems...Work experience placement
- ...Job Title: Site Reliability Engineer (SRE) Location: Columbus, OH | Iselin, NJ (Onsite) Job Type: Long Term Contract Key Responsibilities Enhance platform reliability, performance, and observability Build dashboards and alerts using APM tools...Long term contract
$94.85k - $135.5k
...powered business communications. This is where you and your skills come in. We're currently looking for: An experienced Site Reliability Engineer (SRE) to join the RingCentral Collaboration team. As a SRE, you will be responsible for maintaining and improving uptime...Full timeLocal areaFlexible hours- ...Site Reliability Maintenance Engineer This is an exciting new position meant to be a key player in our newly created Reliability Program. The role responsible for identifying and managing reliability improvements to steel producing equipment and facilities, minimizing...
$70 - $80 per hour
...Bachelors responsibilities: Automate operational tasks and health checks to create sustainable systems and services. Monitor the production environment to ensure system health using observability tools like Dynatrace and Splunk. Identify reliability gaps through proc...Hourly payPermanent employmentContract work$30 - $40 per hour
...Job Description We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on availability, reliability, and performance to join our team. The ideal candidate will have extensive experience in production batch support, Unix shell scripting,...- ...Site Reliability Engineer (SRE) We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and availability of mission-critical applications and infrastructure. The ideal candidate will combine software engineering...Full timeRemote work
$75.7k - $136.3k
...solve complex challenges? Do you have a passion for automation and building systems that scale? Join our highly skilled Site Reliability Engineering team! Our team designs, develops, and manages applications and infrastructure that support Akamai Cloud's products and...Work experience placementWork at office$128k - $216k
...consumers to one another millions of times a day - quickly, reliably, and securely. Any time you swipe your credit card, pay... ...a global scale, come make a difference at Fiserv. Sr. Site Reliability Engineer About Clover Clover is a pioneer in the fintech space...Worldwide$179.2k - $268.8k
...sensors and compute systems, test operations, systems and safety engineering - all dedicated to making a real, positive impact on the... ..., Mich., and Palo Alto, Calif. Meet the team: As a Site Reliability Engineer on the team, you will be responsible for helping to...Permanent employmentFull timeWork at officeImmediate startVisa sponsorship$104.9k - $174.7k
...scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives, whilst enabling the fast flow of feature and... ...strong automation skills. About team; This diverse team of Engineers in assisting multiple product teams as we continue to innovate...Local areaImmediate startWorldwide- ...of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s). As a Senior Site Reliability Engineer within the CET SAvE organization, you will play a critical leadership role advancing the reliability...Full timeWork at office
$109.5k
...and YouTube. ( Job Description AbbVie Information Security is looking for a highly motivated, diligent, and skillful Site Reliability Engineer to join the Cyber Security Engineering (CSE) Team. The CSE Team, working within the Cyber Security Operations (CSO) function...Temporary workLocal areaRemote work$176.75k - $209.1k
...development and learning. It allows us to scale easily, enabling our engineers to maximize attention on new features and capabilities. A... ...all over the world. Peloton is looking for a Site Reliability Engineer with an operations focus to work with teams across...Temporary workLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!
- site reliability engineering manager United States
- site reliability engineer remote United States
- lead site reliability engineer United States
- site reliability engineer sre United States
- site reliability engineer United States
- on-site clinical research associate (traveling/remote) United States
- junior website developer United States
- site merchandiser United States
- IT site lead United States
- site acquisition specialist United States



