Resiliency and Recovery Engineer
Trans.eu
Overview: The Resiliency & Recovery Engineer (Contractor) is a senior, hands-on engineering role focused on improving production resiliency and recovery outcomes across critical services and payment rails. This role is responsible for driving measurable improvements such as faster recovery (reduced time to restore service), stronger and actionable alert coverage, increased automation to reduce manual toil, and safer releases with repeatable rollback/cutback readiness. The engineer will partner closely with application teams, DevOps, Infrastructure, Database teams, and operational stakeholders to identify resiliency gaps, prioritize remediation, and implement durable solutions that improve stability and reduce customer impact. • Work across all MMC payment rails to develop faster, more repeatable resiliency and recovery processes that benefit every platform, ensuring these enhancements are adopted broadly across the organization rather than siloed on any single platform. • Identify resiliency gaps based on incident patterns and recurring failures; turn findings into prioritized remediation work. • Build/strengthen monitoring, alerting, and dashboards that are actually used by engineers and leadership. • Create runbooks and automate recovery actions to reduce manual toil and human error during incidents. • Improve release safety and rollback/fallback readiness (clear, repeatable cutback procedures). • Support SQL reliability efforts (SQL Server 2022 focus) in partnership with DB/infrastructure teams. • Owns backlog, prioritization, design reviews, and cross-team coordination (Ops/Product/Tech). • Runs weekly standup + prepares bi-weekly exec readout. • Integrate resilience testing into CI/CD pipelines and DevOps workflows to catch issues early and ensure robust, automated releases. • Conduct chaos engineering experiments (failure injections, game days) to proactively uncover system weaknesses and validate recovery processes under real-world failure scenarios. • Document and share resiliency best practices; mentor and train engineering teams to foster a culture of reliability and continuous improvement across the organization. • Improve release safety and rollback/fallback readiness (clear, repeatable cutback procedures). • Ensure a seamless handoff of all newly created resiliency and recovery practices (once mature and repeatable) to the MMC Engineering team by thoroughly documenting the improvements and conducting knowledge transfer, so that the permanent team can sustain and build upon these enhancements after the contract period. Must-Have Qualifications: • Proven experience in high-availability, high-transaction environments (preferably payments or financial services). • Strong background in production resiliency and recovery (recovery execution, runbooks/playbooks, RCA mindset). • Incident pattern analysis + MTTR baselines (P2 Major/Minor) and recurring failure taxonomy (by rail/service). • Senior-level observability expertise: dashboards, monitors, and alerts (Datadog preferred; similar tools considered). • Splunk, Datadog, SQLs, JQL Jira Query language, Gitlab, • Experience of CI / CD metrics and generating code quality, changes, testing automation executives reports from Gitlab • Understand quality of stories, metrics, monitoring experiences - help get data to showcase deficiencies • Senior CI/CD experience: pipeline design/operation, release safety patterns, and rollback readiness. • Experience using metrics and monitoring data to identify and communicate deficiencies. • Automation skills: Python and/or PowerShell (or equivalent) for building repeatable recovery workflows and operational tooling. • Kubernetes/container platform production troubleshooting (deployments, pods, config drift, safe restarts, and "why did this change break prod" investigations • Experience with identity/credentials/certificate & secret-rotation resilience (preventing outages during password rotations, certificate upgrades, and secret propagation; implementing guardrails and monitoring for these events). • Batch/scheduler/job-execution reliability (detecting/preventing silent job failures, validating multi-DC scenarios, and building controls to ensure scheduled processing does not impact customers). • Distributed integration failure-handling (timeouts, retries, backpressure, idempotency, duplicate prevention, and reconciliation-especially across vendor/downstream dependencies). Nice-to-have (differentiators) • Experience with SRE-style reliability practices (SLO/SLI thinking, error budgets, operational metrics). • Experience with failover / DC flip / active-active or active-passive recovery concepts and scenario-based runbooks. • Cloud Engineering (Azure, AWS) • DevOps tools expertise, (Jenkins, Terraform, Sonar Cube, Helm Charts) • Network & traffic-management incident triage (load balancers/firewalls/VLAN changes, DC traffic flips, and rapid isolation of "app vs infra vs network" to stabilize service) Skills: IT Disaster Recovery And Automation
Vacancy posted 20 hours ago
Similar jobs that could be interesting for youBased on the Resiliency and Recovery Engineer in Charlotte, NC vacancy
$90k - $110k
...The Resiliency & Recovery Engineer (Contractor) is a senior, hands-on engineering role focused on improving production resiliency and recovery outcomes across critical services and payment rails. This role is responsible for driving measurable improvements such as faster...SuggestedPermanent employmentContract workFor contractors- US641 Valmet, Inc. is seeking a Lead Engineer to join their Energy & Recovery Engineering team in Charlotte, North Carolina. The ideal candidate will have a Bachelor’s degree in mechanical engineering and at least 5 years of experience in design and engineering. Responsibilities...Suggested
$190k - $220k
...Chief Engineer, Mechanical Come Build Your Career at Aecon! Aecon delivers some of the... ...renewable and power delivery markets, we create resilient infrastructure to last generations. In... ...Ability to assist the project develop a recovery plan if performance falls below desired...SuggestedTemporary workWork at office- ...Job Title: Senior Test Automation Engineer - Payments Platform Location: Charlotte... ...automation for performance, soak, and resilience testing (throughput, latency, error budgets... ...experience with chaos, failover, and disaster recovery testing , including timeout handling...SuggestedContract workImmediate start
$77.86k
...Description & Requirements Our Organizational Resilience practice brings our global footprint, clients, industry expertise,... ...professional with experience in Business Continuity, Disaster Recovery, Crisis Management, and/or Incident Management. A promising candidate...SuggestedLocal areaFlexible hours$164.39k - $193.4k
...One. Job Description The Lead Capital Markets Systems Engineer is a senior front office leader accountable for the analytic... ...around continuous availability, full failover, rapid recovery, and operational resilience under market conditions Strong governance experience partnering...Temporary workLocal area- ...Senior Consultant - Technology Resilience Accelerate your career as a Senior Consultant, Technical Resilience, helping clients strengthen... ...-functional stakeholders to address risk, continuity, and recovery priorities. This is an opportunity to contribute to complex...Local areaVisa sponsorship
$120k - $190k
...P&C Engineer – Transmission At Jacobs, we're challenging today to reinvent tomorrow by solving the world's most critical problems for thriving cities, resilient environments, mission-critical outcomes, operational advancement, scientific discovery and cutting-edge manufacturing...Full timeRemote workShift work- ...Job Description We are looking for a skilled Area Building Engineer to perform technical and mechanical functions related to the property... ...of any construction projects. Assist in natural disaster recovery efforts Must be able to read and write English in order to understand...Work at officeLocal areaImmediate startShift workWeekend workAfternoon shift
$98.63k - $140k
...requirements About the role: As a Senior Test Automation Engineer, you'll own performance quality across critical TQL... ...performance strategy, and drive solutions that improve system resilience and customer experience. In this role, you'll influence how performance...H1bWork from homeFlexible hours$45 - $60 per hour
...Description Salary: Pay Rate - $45-$60/hour Job Title: Civil Engineer Position Summary Pay Rate - $45-$60/hour Adidev is... ..., and scheduling experience -Interest in sustainable, resilient, and community-focused infrastructure design -EIT or PE licensure...Full timeFor contractorsWork at officeLocal areaRelocationRelocation packageFlexible hours- ...Job Description Job Description Robotics Field Service Engineer Value Driven Solutions (VDS) is looking for a traveling Robotics... ...troubleshooting, and PM. Document configurations, parameters, backups, and recovery procedures. Requirements: ~5+ years of hands-on robotics...Permanent employmentContract workInterim roleWorldwide
- ...Job Description Description: Join our team as a Project Engineer within our Stormwater practice area. This is a hybrid role... ...solutions that enhance water quality, improve infrastructure resilience, and support sustainable development. Key responsibilities...Work at officeLocal areaFlexible hours
$100k - $150k
...Charlotte, NC, US Operating Sector: Nuclear Position Title: Senior Engineer, Mechanical (HVAC) Aecon delivers some of the most complex and... ...generation, renewable and power delivery markets, we create resilient infrastructure to last generations. In December 2024, United...$74.97k - $128.52k
...Field Service Engineer - Charlotte, NC area. Here at Siemens, we take pride in enabling sustainable progress through technology... ..., and healthcare. From more resource-efficient factories, resilient supply chains, and smarter buildings and grids, to sustainable...Permanent employmentLocal areaImmediate start$99k - $143k
SDN Automation Engineer American International Group, Inc. (AIG) is a leading global insurance organization. AIG members provide property... ...and observability best practices to support scalable and resilient network operations. Stay abreast of emerging SDN, NFV and automation...Flexible hours- ...for both rotating and stationary components. As a service design engineer on this team, you will have broad responsibilities to ensure... ...Support field service engineers and customers in required outage recovery programs and root cause investigations. Support marketing/...Work experience placementLocal areaVisa sponsorship
- We are seeking a senior-level Global Directory Services Engineer to serve as the subject matter expert (SME) for Global Directory Services... ...to change management, incident management, and disaster recovery protocols Required Qualifications Bachelor's degree in Computer...
- ...eliminating recurring issues through automation. Collaborate with engineering and business teams to align automation solutions with... ...moderate to high complexity, driving delivery of scalable and resilient solutions. Qualifications Required Qualifications:...Full timePart timeWork experience placementWork at office
- ...Description Bolton & Menk has an exciting opportunity for a Project Engineer, PE to join our Water Resources team . As a Top Workplace,... ...design , and plan development for a wide range of water-resilience and ecological improvement projects. You will play a key role...Contract workLive inFlexible hours
$97k - $135k
...Description: Cyber Security Systems Engineer ?? Location: Remote ?? Full-Time... ...incident containment, remediation, and recovery Perform forensic preservation and analysis... ...that promote operational efficiency and resilience Interface with vendors and, as needed...Full timeWork experience placementRemote work$100.2k - $164.1k
...Senior Risk Engineering Consultant- Construction Casualty 132882 Zurich is currently looking for a Senior Construction Risk Engineering... ...losses, and overall improvement of Sustainability & Business Resiliency at the Corporate Level. Leveraging Zurich's risk...Full timeTemporary workWork at officeLocal areaRemote workWork from homeVisa sponsorship$127.6k - $191.4k
...Senior Staff Software Engineer - IE07HE We’re determined to make a difference and are proud to be an insurance company that goes... ...testing (e.g., REST Assured, Postman/Newman) and performance, resilience, and reliability testing practices. Test Automation & Quality...Contract workTemporary workWork at office3 days per week- ...The Lead Senior Network Operations Center (NOC) Engineer (Tier 3) serves as the highest technical escalation point within the global NOC... ...-term corrective and preventive actions. Directs network recovery efforts during major outages involving routing instability, security...Night shift
$100k - $130k
...Technology Solutions, a growing MSP, is looking for a Senior IT Project Engineer. The Senior IT Project Engineer will serve as the technical... ..., and VPNs ~ Experience implementing backup and disaster recovery solutions ~ Strong communication skills and comfort working...$78.04 - $86.04 per hour
...Senior Ci/Cd Automation Engineer - Hybrid Genesis10 is currently seeking a Senior CI/CD Automation Engineer - Hybrid position with... ...designing deployment patterns that support high availability and resiliency Experience in regulated enterprise environments...Hourly payContract work3 days per week- ...Contract Engineer Position Hybrid 3 days onsite/2 days remote in either Irving, TX or Charlotte, NC or Minneapolis, MN or Chandler,... ...orchestration using Temporal or ORCA with retry, validation, and recovery logic. Establish reusable workflow and orchestration...Hourly payContract workRemote work
- ...Job Description: Job Description: The Senior IT Project Engineer is a senior-level technical resource responsible for designing... ...Firewall and VPN installations Backup, disaster recovery, and security solution rollouts Ensure smooth handoff to...
$90k - $110k
...IT Project Engineer Location: Charlotte, NC Schedule: Onsite, M-F, 8am-5pm with some night and weekend availability Compensation... ...Firewall and VPN installations Backup, disaster recovery, and security solution rollouts Client Communication Act...Local areaNight shiftWeekend work$80k - $110k
Job Description Job Description We need an experienced BAS Controls Technician in Charlotte, NC. As an experienced and confident Controls Technician, you will be responsible for installing, programming, testing, calibrating, operating, servicing, and repairing control...Work at officeLocal areaRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Resiliency and Recovery Engineer. Be the first to apply!


