Average salary: $75,000 /yearly
More statsGet new jobs by email
- ...infrastructure by monitoring system performance, availability, and operational health. Analyze system performance issues, identify root causes, and recommend corrective actions to improve reliability and efficiency. Support virtualization platforms, including...SuggestedFor contractors
$27 - $35 per hour
...That Helps Kids Thrive at Home This season is all about growth, renewal, and new beginnings — and there’s no better time to plant roots in a career that truly makes a difference. At Pediatric Home Service , our RNs and LPNs provide 1:1 pediatric home health care...SuggestedFull timePart timeLocal areaRelocation packageFlexible hoursNight shiftWeekend work- ...and service restoration activities Ensure compliance with operational procedures and policies Facilitate incident reviews, root cause analysis, and continuous improvement Collaborate with infrastructure, network, security, and application teams...SuggestedContract workWork at officeNight shift
- ...authoritative source of truth Establish and enforce runbooks, SOPs, and escalation procedures tailored to CDP's criticality Drive root cause analysis (RCA) and implement preventive measures to reduce recurring issues and protect data trust Reliability...SuggestedContract workTemporary workFixed term contractWork experience placement
- ...Version-control network definitions. Incident Response & Troubleshooting Lead network-related incident response. Perform deep root-cause analysis for: Packet drops Routing issues DNS failures Load balancer degradation Participate in on-call rotation and post-incident...SuggestedPermanent employmentShift work
- ...Virtualization clusters in production Lead incident response across Kubernetes, KubeVirt, storage backend, and network layers; perform root cause analysis Build and tune observability - metrics (Prometheus), alerting (Alertmanager), dashboards (Grafana), logs (Loki/...SuggestedRemote work
- ...persistent failure modes toappropriate engineeringteams Collaborates withappropriate engineeringteams on the failure investigation and root cause analysis. Clearly documents test results and observations in test report. Test Fixture Development Designs test fixtures and...Suggested
- ...Incident Management: Serve as a primary responder for production incidents, ensuring rapid triage, mitigation, and resolution. Lead Root Cause Analysis (RCA) efforts and drive long-term corrective actions. Maintain and improve incident response processes, runbooks...Suggested
- ...frameworks Define and implement SRE best practices and reliability metrics Support production environments, incident response, and root cause analysis Collaborate with cross-functional teams to improve platform reliability and scalability Nice to Have:...SuggestedContract workRemote work
- ...Administration Experience implementing monitoring, observability, and alerting solutions. Knowledge of incident management, root cause analysis, and operational excellence practices. Experience with Linux/Unix environments. Scripting and automation...Suggested
- ...for critical incidents. Serve as the primary liaison to Engineering for confirmed product defects. Own and deliver complex Root Cause Analysis (RCA) and post-incident review documents. Drive post-incident improvement actions, including permanent code or...SuggestedPermanent employmentNight shift
$20 - $22 per hour
...every product meets our "zero-defect" standard. Troubleshooting: Use your mechanical aptitude to detect malfunctions and perform root-cause adjustments rather than "band-aid" fixes. Collaboration: Work with your team to meet production targets while maintaining...SuggestedLocal areaWorldwideShift workAfternoon shift- ...infrastructure groups Perform capacity planning and ensure database availability and scalability Support incident management, root cause analysis, and production stability initiatives Required Skills: Strong experience as a DB2 System Programmer in...SuggestedContract work
- ...requests, and provide support to L1/L2 teams. Troubleshoot provisioning failures, connector issues, job failures, and perform root cause analysis (RCA). Manage user lifecycle processes (Joiner, Mover, Leaver) and access governance activities. Configure,...SuggestedRemote work
- ...challenges into structured, actionable technical designs Lead discovery and requirements definition to ensure solutions address root problems Execution & Delivery Lead design and development of scalable cloud-based data platforms (AWS, Azure, Google Cloud...Suggested
- ...real-time and batch processing pipelines. AIOps & Advanced Analytics: Lead implementation of: AIOps, Predictive analytics, Root cause analysis, Anomaly detection, Event correlation. Integrate observability datasets with AI/ML platforms. Develop...
- ...kernels, drivers, and low-level system components ~ Advanced Python and Shell scripting skills ~ Strong system debugging and root-cause analysis experience ~ Experience supporting build, test, release, and automation pipelines ~ Ability to troubleshoot...
- ...to impacted services and customers Enable AIOps-driven alert correlation and anomaly detection to reduce noise and accelerate root-cause analysis Drive automation and orchestration for incident creation, enrichment, and remediation workflows Collaborate...Contract workRemote workNight shift
$74 - $79 per hour
...architectures. Support the selection and implementation of data management tools, standards, and best practices. Conduct root cause analysis of data quality issues and recommend remediation strategies. Serve as a subject matter expert on Medicaid data assets...Hourly payContract workTemporary workWork experience placement- ...during incident response by acquiring, preserving, and analyzing endpoint artifacts (e.g., memory, disk, registry, logs); assist with root cause analysis and ensure forensic evidence in accordance with legal and procedural requirements. Provide engineering-focused...Remote workShift workNight shiftAfternoon shift
- ...ServiceNow. Validate service impact and dependency information during major incidents and outages to support effective triage and root cause analysis. Maintain and validate service criticality, tiering, and impact attributes to support SLAs, SLOs, and...Local area3 days per week
- ...Management & Production Support Hands-on experience managing production releases, supporting deployments, handling incidents, performing root cause analysis (RCA), and coordinating remediation while minimizing downtime and risk. Requirements Leadership, Estimation &...Contract work3 days per week
- ...Availability while expanding into multi-cloud. You'll build, harden, and operate real production infrastructure. You'll own incidents, run root cause analysis, and improve the platform so the same failure doesn't happen twice. What you'll do Own Terraform-based infrastructure...Remote work
- ...optimization Troubleshoot manufacturing issues, welding defects, and tolerance stack-ups Participate in design reviews, DFMEA, and root cause analysis Ensure compliance with industry standards, safety regulations, and durability requirements. Position...Full time
- ...protocol behavior and QoS policies in an IPv6 context. Troubleshoot complex network issues across dual-stack environments and provide root-cause analysis. Develop production migration procedures, including change management documentation and rollback procedures....
- ...network solutions Complex Problem Resolution - Provide architectural-level support for escalated operational issues, driving structured root cause analysis and durable remediation strategies Innovation & Automation - Champion the adoption of automation, AI-enabled tooling,...
- ...Troubleshoot and optimize call center routing and IVR systems Leadership & Troubleshooting Lead complex troubleshooting and root cause analysis (RCA) Mentor junior engineers and provide technical guidance Collaborate with cross-functional teams on...
- ..., and in identifying and nominating sources as systems of record for data usage in various applications. Assists in performing root cause analyses related to information issues and non-conformance to published data standards based on review of technical metadata of...Work at officeLocal area
- ...infrastructure production engineering leadership, this includes delivering reliable and responsive systems, and discipline to continually root out issues at the core ~ Experience with system and platform integrations, including service to service communication and...Work experience placement
- ...Partnering with development and security teams to enhance CI/CD, platform reliability, and compliance. Supporting incident response, root cause analysis, and operational readiness activities. Driving automation for platform operations, environment creation,...
