Average salary: $90,000 /yearly

More stats
Get new jobs by email
  •  ...persistent failure modes toappropriate engineeringteams Collaborates withappropriate engineeringteams on the failure investigation and root cause analysis. Clearly documents test results and observations in test report. Test Fixture Development Designs test fixtures and... 
    Suggested

    Openkyber

    Alaska
    1 day ago
  •  ...Management Serve as a primary responder for production incidents, ensuring rapid triage, mitigation, and resolution. Lead Root Cause Analysis (RCA) efforts and drive long-term corrective actions. Maintain and improve incident response processes, runbooks... 
    Suggested

    Openkyber

    Alaska
    3 days ago
  •  ...Administration. Experience implementing monitoring, observability, and alerting solutions. Knowledge of incident management, root cause analysis, and operational excellence practices. Experience with Linux/Unix environments. Scripting and automation... 
    Suggested

    Openkyber

    Alaska
    3 days ago
  •  ...and service restoration activities Ensure compliance with operational procedures and policies Facilitate incident reviews, root cause analysis, and continuous improvement Collaborate with infrastructure, network, security, and application teams... 
    Suggested
    Contract work
    Work at office
    Night shift

    Openkyber

    Alaska
    1 day ago
  •  ...for critical incidents. Serve as the primary liaison to Engineering for confirmed product defects. Own and deliver complex Root Cause Analysis (RCA) and post-incident review documents. Drive post-incident improvement actions, including permanent code or... 
    Suggested
    Permanent employment
    Night shift

    Openkyber

    Alaska
    1 day ago
  •  ...authoritative source of truth Establish and enforce runbooks, SOPs, and escalation procedures tailored to CDP's criticality Drive root cause analysis (RCA) and implement preventive measures to reduce recurring issues and protect data trust Reliability... 
    Suggested
    Contract work
    Temporary work
    Fixed term contract
    Work experience placement

    Openkyber

    Alaska
    1 day ago
  •  ...frameworks Define and implement SRE best practices and reliability metrics Support production environments, incident response, and root cause analysis Collaborate with cross-functional teams to improve platform reliability and scalability Nice to Have:... 
    Suggested
    Contract work
    Remote work

    Openkyber

    Alaska
    1 day ago
  •  ...Version-control network definitions. Incident Response & Troubleshooting Lead network-related incident response. Perform deep root-cause analysis for: Packet drops Routing issues DNS failures Load balancer degradation Participate in on-call rotation and post-incident... 
    Suggested
    Permanent employment
    Shift work

    Openkyber

    Alaska
    1 day ago
  •  ...Virtualization clusters in production Lead incident response across Kubernetes, KubeVirt, storage backend, and network layers; perform root cause analysis Build and tune observability - metrics (Prometheus), alerting (Alertmanager), dashboards (Grafana), logs (Loki/... 
    Suggested
    Remote work

    Openkyber

    Alaska
    1 day ago
  •  ...Partnering with development and security teams to enhance CI/CD, platform reliability, and compliance. Supporting incident response, root cause analysis, and operational readiness activities. Driving automation for platform operations, environment creation,... 
    Suggested

    Openkyber

    Alaska
    1 day ago
  •  ...real-time and batch processing pipelines. AIOps & Advanced Analytics: Lead implementation of: AIOps, Predictive analytics, Root cause analysis, Anomaly detection, Event correlation. Integrate observability datasets with AI/ML platforms. Develop... 
    Suggested

    Openkyber

    Alaska
    1 day ago
  •  ...requests, and provide support to L1/L2 teams. Troubleshoot provisioning failures, connector issues, job failures, and perform root cause analysis (RCA). Manage user lifecycle processes (Joiner, Mover, Leaver) and access governance activities. Configure,... 
    Suggested
    Remote work

    Openkyber

    Alaska
    17 hours agonew
  •  ...infrastructure groups Perform capacity planning and ensure database availability and scalability Support incident management, root cause analysis, and production stability initiatives Required Skills: Strong experience as a DB2 System Programmer in... 
    Suggested
    Contract work

    Openkyber

    Alaska
    17 hours agonew
  •  ...challenges into structured, actionable technical designs Lead discovery and requirements definition to ensure solutions address root problems Execution & Delivery Lead design and development of scalable cloud-based data platforms (AWS, Azure, Google Cloud... 
    Suggested

    Openkyber

    Alaska
    17 hours agonew
  •  ...Hands on experience with AI/ML production. Handle networking, ingress controllers, storage, and scaling strategies Perform root cause analysis and resolve production issues Collaborate with engineering and cloud teams for infrastructure improvements... 
    Suggested
    Contract work

    Openkyber

    Alaska
    1 day ago
  •  ...Monitoring & Optimization Develop monitoring and alerting frameworks. Troubleshoot production platform issues and drive root cause analysis. Optimize Databricks clusters and Azure resources for performance and cost efficiency. Create platform health... 
    Contract work
    Local area

    Openkyber

    Alaska
    1 day ago
  •  ...Engineering teams to support Fabric Lakehouse and Warehouse solutions and Azure-based data pipelines. Provide production support and root cause analysis across legacy and modern BI platforms. Convert end-user reporting solutions (Excel, Access, SAS) into governed,... 
    Long term contract

    Openkyber

    Alaska
    4 days ago
  •  ...management tools and developing standards and procedures. Assists in identifying data sources and systems of record. Assists in root cause analysis related to information issues and non-conformance to data standards. Builds business cases supporting HHS Master... 
    Long term contract
    Work at office
    Remote work

    Openkyber

    Alaska
    4 days ago
  •  ...during incident response by acquiring, preserving, and analyzing endpoint artifacts (e.g., memory, disk, registry, logs); assist with root cause analysis and ensure forensic evidence in accordance with legal and procedural requirements. Provide engineering-focused... 
    Remote work
    Shift work
    Night shift
    Afternoon shift

    Openkyber

    Alaska
    4 days ago
  •  ..., and in identifying and nominating sources as systems of record for data usage in various applications. Assists in performing root cause analyses related to information issues and non-conformance to published data standards based on review of technical metadata of... 
    Work at office

    Openkyber

    Alaska
    4 days ago
  •  ...configuration approaches. Support end-to-end project delivery including sprint planning, testing, deployment, troubleshooting, root cause analysis, and system optimization. Develop and support integrations with enterprise systems (e.g., Azure AD, HR, ERP, monitoring... 
    Contract work
    For contractors
    Remote work

    Openkyber

    Alaska
    4 days ago
  •  ...to impacted services and customers Enable AIOps-driven alert correlation and anomaly detection to reduce noise and accelerate root-cause analysis Drive automation and orchestration for incident creation, enrichment, and remediation workflows Collaborate... 
    Contract work
    Remote work
    Night shift

    Openkyber

    Alaska
    3 days ago
  •  ...technical documentation, design reviews, and version control using GitHub and modern DevOps practices Lead incident response and root cause analysis for critical production applications and integrations. You Have ~5+ years of hands-on experience in one... 
    For contractors
    Remote work

    Openkyber

    Alaska
    5 days ago
  •  ...Management & Production Support Hands-on experience managing production releases, supporting deployments, handling incidents, performing root cause analysis (RCA), and coordinating remediation while minimizing downtime and risk. Requirements Leadership, Estimation &... 
    Contract work
    3 days per week

    Openkyber

    Alaska
    3 days ago
  •  ...technologies and IP protocols. ~ Excellent communication skills ~ Moderate level knowledge of scripting ~ Troubleshooting and root-cause-analysis skills. ~5+ years of experience in networking and wireless technologies. ~ Regular, consistent and punctual... 
    Contract work

    Openkyber

    Alaska
    5 days ago
  •  ...protocol behavior and QoS policies in an IPv6 context. Troubleshoot complex network issues across dual-stack environments and provide root-cause analysis. Develop production migration procedures, including change management documentation and rollback procedures.... 

    Openkyber

    Alaska
    5 days ago
  • $74 - $79 per hour

     ...architectures. Support the selection and implementation of data management tools, standards, and best practices. Conduct root cause analysis of data quality issues and recommend remediation strategies. Serve as a subject matter expert on Medicaid data assets... 
    Hourly pay
    Contract work
    Temporary work
    Work experience placement

    Openkyber

    Alaska
    4 days ago
  •  ...infrastructure production engineering leadership, this includes delivering reliable and responsive systems, and discipline to continually root out issues at the core ~ Experience with system and platform integrations, including service to service communication and... 
    Work experience placement

    Openkyber

    Alaska
    4 days ago
  •  ...maintain scheduling and workflow orchestration tools such as Airflow and ESP. Troubleshoot production incidents and participate in root cause analysis and issue resolution. Collaborate with business and technical teams to deliver scalable and reliable data... 
    Contract work
    H1b
    Local area
    Remote work
    Visa sponsorship

    Openkyber

    Alaska
    4 days ago
  •  ...Troubleshoot production issues and support incident resolution. Monitor APIs using platform monitoring and logging tools. Perform root-cause analysis and support continuous improvement. Collaborate with platform admins on upgrades, patches, and runtime changes.... 
    Contract work
    Remote work
    2 days per week
    3 days per week

    Openkyber

    Alaska
    6 days ago