Site Reliability Engineer - Platforms
Toyota Deutschland GmbH
Overview Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world’s most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We’re looking for talented team members who want to Dream. Do. Grow. with us. An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world‑changing company—delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best‑in‑class customer experience in an innovative, collaborative environment. Toyota does not offer support or sponsorship of job applicants for employment‑based visas or any other work‑authorization for this role now or in the future. You must have the right to work in the United States and not require Toyota support or sponsorship for immigration‑related employment (e.g., H‑1B, O‑1, E‑3, TN, etc.) now or in the future. You should not apply for this role if you will require Toyota to assist with immigration support or sponsorship now or in the future. Who we’re looking for The Toyota Financial Services Technology Operations Center is looking for a passionate and highly motivated Site Reliability Engineer (SRE) – Platforms. The SRE – Platforms reports to the Manager of the SRE Department. In this role, you will apply software engineering principles to ensure the availability, performance and stability of TFS’s enterprise platforms and infrastructure services. You will play a key role in maintaining and modernizing our Infrastructure Platforms including AWS Cloud Platform, Core Operating Platforms like Linux, Windows. What you’ll be doing Manage and maintain operating systems across Red Hat Enterprise Linux (RHEL), Amazon Linux, and Windows Server environments. Perform OS‑level configuration, hardening, and lifecycle management following industry best practices and organizational security standards. Manage user access, permissions, file systems, storage, networking, and core OS services across platforms. Coordinate with relevant teams for maintenance and change management processes as needed. Build/update, own and maintain the end‑to‑end patch management lifecycle across all supported operating systems. Maintain tooling and workflows for automated patch scheduling, compliance reporting, and remediation tracking. Ensure patch compliance targets are consistently met and documented. Work with tools such as Red Hat Satellite, AWS Systems Manager (SSM), WSUS, Ansible, or similar patch management platforms. Design and maintain observability setups including metrics, logging, and alerting for all managed systems. Ensure all systems are instrumented with appropriate monitoring agents and are integrated into centralized observability platforms. Define and maintain meaningful alerting thresholds, dashboards, and runbooks to provide operational visibility. Proactively identify gaps in monitoring coverage and address them before they impact reliability. Participate in incident triage and use observability data to drive faster resolution. Manage and maintain backup and restore solutions such as Cohesity, AWS backups for operating systems and critical data. Regularly test and validate restore procedures to ensure reliability and meet defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). Document backup policies, schedules, and recovery procedures. Identify and remediate failures in backup jobs and ensure alerts are in place for backup health monitoring. Write and maintain scripts and automation workflows to reduce manual toil and streamline operational tasks (e.g., provisioning, configuration management, log rotation, disk cleanup, service restarts). Develop and implement self‑healing mechanisms for common, well‑understood system issues such as service crashes, disk space alerts, memory pressure, and connectivity failures. Use tools such as Bash, Python, PowerShell, Ansible, or Terraform to automate repeatable operational workflows. Contribute to internal automation libraries and maintain version‑controlled infrastructure code. Troubleshoot complex production issues and implement permanent fixes to improve reliability. Build and maintain components required to automate operational workflows and reduce toil using Python or equivalent scripting language. Participate in capacity planning, disaster recovery, and business continuity exercises. Define and manage SLIs/SLOs, health checks, and automated remediation processes. Collaborate across teams to ensure service reliability, deployment hygiene, and operational readiness. Work on Incident Postmortems and coordinate to implement required fixes to avoid repetitive incidents. Participate in on‑call rotations and major incident restoration. What you bring Bachelor’s degree in information technology or related field. Solid understanding of SRE concepts: SLIs, SLOs, error budgets, incident response. Hands‑on experience managing RHEL, Amazon Linux, and/or Windows Server in production environments. Solid understanding of Linux/Windows system administration fundamentals (file systems, networking, processes, services, permissions). Experience with patch management tools and processes (e.g., Red Hat Satellite, AWS SSM Patch Manager, WSUS, Ansible). Familiarity with monitoring and observability tools such as Dynatrace, CloudWatch, etc. Experience with backup solutions like Cohesity, AWS Backups and restore testing practices. Scripting proficiency in one or more of: Bash, Python, PowerShell. Understanding of automation frameworks such as Ansible or similar configuration management tools. Good troubleshooting and root cause analysis skills. Ability to write clear technical documentation and runbooks. Strong understanding of SRE principles (SLIs/SLOs, error budgets, observability, toil reduction). What we’ll bring A work environment built on teamwork, flexibility and respect. Professional growth and development programs to help advance your career, as well as tuition reimbursement. Team Member Vehicle Purchase Discount. Toyota Team Member Lease Vehicle Program (if applicable). Comprehensive health care and wellness plans for your entire family. Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute. Paid holidays and paid time off. Referral services related to prenatal services, adoption, childcare, schools and more. Tax‑advantaged accounts (Health Savings Account, Health Care FSA, Dependent Care FSA). Relocation assistance (if applicable). Belonging at Toyota Our success begins and ends with our people. We embrace all perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members’ efforts to dream, do and grow without questioning that they belong. Applicants for our positions are considered without regard to race, ethnicity, national origin, sex, sexual orientation, gender identity or expression, age, disability, religion, military or veteran status, or any other characteristics protected by law. #J-18808-Ljbffr Toyota Deutschland GmbH
- ...strong hands-on ownership of production reliability and troubleshooting, coupled with... ...agentic-driven automation and performance engineering. The Site Reliability Engineer will play a... ...performance, and operational excellence of our platforms. The ideal candidate will leverage...SuggestedFull timeWork at officeShift work3 days per week
$96.8k - $145.2k
We are currently seeking a Site Reliability Engineer (Onsite Hybrid) to join our team in Plano, Texas (US-TX), United States (US). Job Responsibilities Own and manage observability using New Relic (APM, infrastructure monitoring, dashboards, alerting) Define and implement...SuggestedTemporary workFlexible hours- Site Reliability Engineer Regular Full-Time Professional Plano, TX, US 6 days ago Requisition ID: 1249 At Armor, we are committed to making... ...The Site Reliability Engineer reports to the Manager, SRE & Platform Engineering, and contributes to the reliability,...SuggestedFull timeWork at officeLocal areaImmediate startRemote work3 days per week
- ManpowerGroup Global, Inc. is seeking a Site Reliability Engineer II to join their team in Town of Norway, Wisconsin. As a part of the Infrastructure Support Department, you will design and implement critical Service Level Indicators (SLIs) and Service Level Objectives...SuggestedWeekly pay
- NTT DATA is seeking a Site Reliability Engineer for a hybrid role in Plano, Texas. This engineer will own observability using New Relic, define SLIs/SLOs, and drive incident response and root-cause analysis. Qualified candidates will have over 5 years of experience with...SuggestedFlexible hours
- Our client, a leading organization in the financial services industry, is seeking a Site Reliability Engineer II to join their team. As a Site Reliability Engineer II, you will be part of the Infrastructure Support Department supporting the SRE team. The ideal candidate...Weekly pay
- ...well as acquired properties Duke Nukem and Homeworld. We are looking for a Senior Site Reliability Engineer to join our team. As a Senior SRE, you will help drive our cloud‑native platform toward the future, design and implement flexible cloud architectures with an...Temporary workLocal areaWorldwideFlexible hours
- Site Reliability Engineer - Vice President Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems... ...availability and reliability of the firm’s most critical platform services and ensures they meet the requirements of our internal...
$96.8k - $145.2k
NTT DATA, Inc. is looking for a Site Reliability Engineer to join their team in Plano, Texas. This hybrid role involves managing observability through New Relic, incident response, and defining alerting strategies. The ideal candidate will have over 5 years of experience...$92.7k - $203.94k
CVS Health in Richardson, TX, seeks a Site Reliability Engineer responsible for ensuring the reliability and performance of the myPBM platform. The role focuses on automation, incident management, and improving delivery of client services. This position requires 5+ years...- ...influential companies. As a Lead Infrastructure Engineer, you have the opportunity to drive... ...functional skills, collaborating with other platforms to architect and implement impactful... ...include comprehensive health care coverage, on‑site health and wellness centers, a retirement...
$96.8k - $145.2k
...want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Site Reliability Engineer (Onsite Hybrid) to join our team in Plano, Texas (US-TX), United States (US). Job Responsibilities Include: Own...Full timeTemporary workWork at officeRemote workFlexible hours- Armor Defense Inc. in Plano, Texas is seeking a Site Reliability Engineer to manage production infrastructure in a hybrid cloud environment. This... ...to optimize performance and reliability across cloud platforms, alongside supporting automation efforts. Candidates should...
$140k - $170k
...financial industry? Our client is seeking a highly motivated Site Reliability Engineer to join a dynamic team, Global Banking Technology is... ...groundbreaking technology projects, from international trading platforms to critical applications for leading airlines. We recruit...Full timeLocal areaWorldwideVisa sponsorshipWork visa- SRE DevOps Engineer Location: Overland Park, KS / Atlanta, GA / Frisco, TX (Onsite) Requirements... ..., API Proxy, WAF, DBs, and infra platforms. Design and improve runbooks,... ...improvements in monitoring, capacity, and reliability. Collaborate with engineering teams on...
- ...Pay Rate: $40/Hr. W2 Experience: 3-5 Years Overview We are seeking a remote Junior SRE/DevOps Engineer role. The ideal candidate has foundational knowledge of Site Reliability Engineering (SRE) and Kubernetes, and is enthusiastic about growing in a DevOps‑driven...Long term contractContract workInternshipRemote work
$40 per hour
A technology solutions provider is seeking a remote Junior SRE/DevOps Engineer. The ideal candidate should have foundational knowledge of Site Reliability Engineering (SRE) and Kubernetes. Responsibilities include gaining experience in a DevOps-driven environment. Applicants...Remote jobLong term contractInternship- ## Sr. Release Train Engineer (FURA)Applyremote type: Hybridlocations: USA - Plano HQtime... ...Jira, Azure DevOps, Confluence, or similar platforms.* Preferred experience in enterprise... ...in our core values - Humble, Empowered, Reliable, and Open. Together, these values guide...
- ...Health Corporation is seeking a Senior Software Development Engineer (Site Reliability) in Richardson, TX. This hybrid position focuses on... ...the reliability and operational scalability of the myPBM platform, applying engineering practices in operations. The ideal candidate...
- Job Title :- Security Specialist - DevOps/SRE Engineer Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 except H1B Location- Onsite atFrisco, TX Key Responsibilities: Conduct/Coordinate penetration testing to identify...H1b
- Responsibilities Spearheaded Agile transformation initiatives across the program, enhancing team productivity and accelerating delivery timelines through the adoption of SAFe and Lean-Agile practices. Joined a complex, cross-functional enterprise program and quickly established...
$157k - $197.7k
Siemens AG is seeking a Software Engineer Advanced in Plano, Texas. This full-time role involves defining and maintaining AWS architecture and automating deployment. The candidate will manage infrastructure configurations and develop automation scripts. The position offers...Full time$400 per month
...contributors for a unique opportunity involving frontier AI coding models. Candidates will focus on evaluating and completing complex engineering tasks using AI coding agents. Applicants should possess at least 2 years of experience in DevOps, SRE, or Cloud Engineering and...- ...applications. As we continue to grow, we’re looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team and contribute to our... ...and operations problems, and continually pushing the platform toward higher reliability with lower operational toil. The...Full timeH1bLocal areaImmediate startRemote workVisa sponsorshipWork visa
- ...Job Description Forhyre is looking for engineers who can bring unique perspectives and... ...interested in continuing to improve our platform through the ever-changing technology landscape... ...practices while building a culture of reliability and observability Engage in and...
- Site Reliability Engineer - SQL Server / Oracle (Hybrid) Our Client, a multi-national telecommunications technology is seeking an experienced... ...on-premise data centers with public or private cloud platforms is desirable. - 4+ years of experience with SQL Server Database...Contract work
- Mcafee is seeking a Sr. DevOps Engineer to build and operate scalable production platform capabilities. This hybrid position emphasizes developing cloud-native... ...collaboration with various teams to enhance reliability and developer productivity. #J-18808-Ljbffr Mcafee
- Highbrow LLC is looking for an experienced SRE DevOps Engineer based in Overland Park, KS. The ideal candidate should have 4-9 years... ...mentoring junior engineers. Candidates with additional skills in cloud platforms and security will be preferred. #J-18808-Ljbffr Highbrow LLC
- We are seeking a skilled DevOps Engineer specializing in Edge Deployments and Embedded Linux systems. This role involves designing, implementing... ...maintaining deployment infrastructure for distributed edge platforms, with a focus on Kubernetes orchestration and automated...
- ...have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms, Web Hosting team , you hold a leadership role in your team, demonstrate strong...Work experience placement
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer - Platforms. Be the first to apply!
- site reliability engineer Plano, TX
- site reliability engineer sre Plano, TX
- platform developer Plano, TX
- senior platform engineer Plano, TX
- platform engineering manager Plano, TX
- platform engineer Plano, TX
- client platform engineer Plano, TX
- data platform engineer Plano, TX
- site services specialist Plano, TX
- site leader Plano, TX
