Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Observability & Operations Engineer

Fullbay

Observability & Operations Engineer

At Fullbay, our mission is simple — to create safer roads for our families and yours. As leaders in the heavy-duty repair industry, we power shops with technology that helps them run smarter and more efficiently. As an AI-First company, we invite artificial intelligence to eliminate friction, spark innovation, and drive efficiencies in every conversation— for our teams and our customers.

Position Overview:

The Observability & Operations Engineer is a key technical contributor who brings an AI-first mindset to maintaining, monitoring, and operating our AWS cloud environment and internal Developer Platform. In this role, you won't just react to incidents — you'll leverage AI-powered tooling, intelligent alerting, and automation to get ahead of problems before they impact users. You'll work deeply across AWS and its PaaS ecosystem, building repeatable, code-first pipelines that treat infrastructure and observability configuration as first-class software. From using AI coding assistants to accelerate runbook development, to applying ML-based anomaly detection across logs and metrics, you'll be expected to ask "how can AI help here?" as a first instinct. Working within a dedicated platform team, you'll build the observability foundations that keep our systems fast, reliable, and self-healing.

Primary Duties & Responsibilities:
  • Design and implement a comprehensive observability strategy (logging, metrics, tracing, alerting) across all AWS environments, leveraging AI-powered tools to detect anomalies and surface insights automatically
  • Build and manage monitoring platforms such as Datadog, Grafana, Prometheus, and AWS CloudWatch — actively exploring AI-native features within these tools to reduce alert fatigue and improve signal quality
  • Use AI coding assistants (e.g. GitHub Copilot, Claude) to accelerate development of dashboards, runbooks, and automation scripts
  • Own the incident management lifecycle — on-call rotations, post-mortems, root cause analysis — and apply AI-assisted log analysis to speed up diagnosis and resolution
  • Instrument Java, Kotlin, and Node.js-based cloud-native applications to emit structured logs, distributed traces, and metrics; identify opportunities to use ML-based anomaly detection in place of static thresholds
  • Build repeatable, code-first observability pipelines that treat dashboards, alerts, and runbooks as first-class software — versioned, tested, and deployed through Harness
  • Leverage AWS PaaS services (Lambda, API Gateway, ECS, RDS, SQS, SNS, and others) to build scalable, automated operational tooling
  • Collaborate with development teams to embed observability and AI-assisted quality checks into CI/CD pipelines via Harness
  • Own the FinOps function for our AWS environment — tracking cloud spend, building cost dashboards, identifying waste, and using AI-powered cost analysis tools to surface optimization opportunities and drive accountability across engineering teams
  • Monitor AWS infrastructure for performance, availability, and cost — partnering with finance and engineering to enforce spend governance
  • Develop and maintain Infrastructure as Code using Terraform, using AI pair programming to improve quality and consistency
  • Contribute to architectural decisions with a focus on resilience, automation, and reducing toil through intelligent systems
  • Adheres to all confidentiality and compliance regulations
  • Performs other duties as assigned
Minimum Education & Work Experience:
  • 7 –10 years of experience in Software Engineering, Cloud Operations, or Site Reliability Engineering
  • 5+ years of hands-on experience with AWS infrastructure and AWS PaaS services; certifications are a plus
  • Demonstrated experience building repeatable, code-first pipelines and treating operational configuration as first-class software
  • Experience working with polyglot environments including Java, Kotlin, and Node.js
  • Demonstrated experience using AI tools (coding assistants, AI-powered observability platforms, or similar) in a professional setting — we're an AI-first company and expect this to be part of how you work, not something you're just exploring
Key Skills and Qualifications:
  • Deep experience with enterprise observability platforms — including AWS-native tooling such as CloudWatch, X-Ray, and OpenTelemetry, or comparable platforms such as Datadog, Grafana, or Prometheus
  • Proficiency with distributed tracing frameworks and log management platforms (e.g. ELK Stack, Splunk, Fluent Bit); experience mapping these patterns to AWS-native tooling is a strong plus
  • Strong understanding of SRE principles including SLOs, SLAs, error budgets, and chaos engineering
  • Hands-on FinOps experience — cloud cost allocation, chargeback modeling, rightsizing, and savings plans optimization across AWS
  • Strong working knowledge of AWS PaaS services including Lambda, API Gateway, ECS, RDS, SQS, SNS, and IAM — and how to leverage them to build scalable operational tooling
  • Experience instrumenting polyglot applications (Java, Kotlin, Node.js) and cloud-native microservices for observability
  • Proven ability to build repeatable, code-first pipelines — treating dashboards, alerts, runbooks, and infrastructure configuration as versioned, testable software
  • Experience with CI/CD tooling, specifically Harness
  • Solid understanding of Infrastructure as Code using Terraform
  • Fluency with AI tools in day-to-day work — whether that's AI coding assistants, AI-powered monitoring features, or using LLMs to accelerate problem solving; you default to asking "can AI help here?" before doing things the hard way
  • Ability to lead incident response, facilitate blameless post-mortems, and drive long-term reliability improvements
  • Strong collaboration skills for working across platform and product engineering teams
  • Knowledge of containerization technologies and microservices architecture
Physical Demands and Work Environment:

The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

  • Regularly required to sit at a desk in front of a computer and use hands to finger, handle, or feel objects, tools, or controls (including a computer keyboard and operating a telephone), lift and/or move up to 10 pounds.
  • Frequently requires the use of hands and arms for reaching, as well as the ability to walk and communicate effectively through speaking and listening.
  • Specific vision abilities required by this position include close vision, color vision, and the ability to adjust focus.
  • Noise level in the work environment is usually moderate.
  • Type on a computer keyboard and look at a computer monitor, and operate a cell phone or a computer-based phone
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Observability & Operations Engineer in United States vacancy
  • $144.2k - $288.4k

    CVS Health is seeking a Principal AIOps Engineer in Connecticut to lead AIOps strategy and improve operational efficiency through intelligent operations. You will modernize IT operations using observability and machine learning. The role requires 10+ years of experience... 
    Suggested

    Hispanic Alliance for Career Enhancement

    Hartford, CT
    3 days ago
  • $144.2k - $288.4k

    Hispanic Alliance for Career Enhancement is looking for a Principal AIOps Engineer based in Arizona. This role requires extensive experience in IT operations, focusing on modernizing through observability and AI. You will lead the AIOps strategy, collaborate across teams,... 
    Suggested

    Hispanic Alliance for Career Enhancement

    Phoenix, AZ
    3 days ago
  • $144.2k - $288.4k

    CVS Health is seeking a Principal AIOps Engineer in New York, NY to modernize IT operations with a focus on building an intelligent operations ecosystem....  ...experience, scripting skills in Python, and experience with observability platforms. This full-time position offers a... 
    Suggested
    Full time

    Hispanic Alliance for Career Enhancement

    New York, NY
    3 days ago
  • $144.2k - $288.4k

    CVS Health® is seeking a Principal AIOps Engineer in Georgia, USA. This full-time role involves leading the AIOps strategy and operational efficiency improvements. Candidates...  ...operations and experience with ServiceNow and observability platforms. The salary range for this... 
    Suggested
    Full time

    Hispanic Alliance for Career Enhancement

    New York, NY
    2 days ago
  •  ...experience in customer-facing support / engineering role , Engineering degree or...  ...to have , Familiarity with monitoring, observability, telemetry, and SRE principles is advantageous...  ...our clients, addressing trading and operational queries , This role provides an exciting... 
    Suggested

    Talos

    New York, NY
    4 days ago
  • Broughton Group is seeking an experienced Observability and Monitoring Engineer for a 9-10 month contract located in West Des Moines, Iowa. The role involves building and maturing enterprise-wide monitoring and observability capabilities across an AWS-based technology stack... 
    Contract work

    Broughton Group

    West Des Moines, IA
    5 days ago
  •  ...GE Aerospace is seeking a Sr. Data Engineer in Overland Park, Kansas, to architect and build automation and observability tools for the EDAS platform. This role emphasizes enhancing platform reliability by designing observability frameworks and optimizing data pipelines... 
    Remote work

    GE Aerospace

    Overland Park, KS
    2 days ago
  •  ...Y99000 General Electric Company is seeking a Sr. Data Engineer to enhance automation and observability for the EDAS platform. You will be automating workflows and ensuring platform reliability through various observability tools. The ideal candidate should possess a Bachelor... 
    Remote work

    Y99000 General Electric Company

    New York, NY
    5 days ago
  •  ...founded by two former Navy electrical engineers with a proven track record in robotics...  ...Position Overview We are seeking a Test Operations Engineer to execute live field test events...  ..., organize, and report test data, observations, anomalies, and recommendations; assist... 
    Local area
    Shift work

    Allen Control Systems

    Austin, TX
    2 days ago
  • $117.5k - $170k

     ...of a partner company. We are currently looking for a Senior Operations Engineer in the United States. This role is a high-impact...  ...modernization efforts, with a strong focus on automation and observability. You will also play a central role in documenting systems,... 
    Remote job
    Full time

    jobgether

    United States
    3 days ago
  • $93k - $124k

    GE Aerospace is looking for a Senior Data Engineer to build automation and observability for the EDAS platform in San Francisco. Responsibilities include developing observability frameworks, analyzing metrics, and collaborating with teams to enhance efficiency. Minimum... 
    Remote job

    GE Aerospace

    San Francisco, CA
    3 days ago
  • $40 per hour

     ...Job Description Job Description Test Operations Engineer – Powertrain The Test Operations Engineer will serve as the essential operational...  ...test-related issues. Ensure every bug, failure, or observation is tracked, followed up on, and closed out without items... 
    Immediate start

    Yoh, A Day & Zimmermann Company

    Livonia, MI
    3 days ago
  • GE Aerospace in Providence, Rhode Island is seeking a Sr. Data Engineer to enhance automation and observability for the EDAS platform. You will design and implement frameworks for real-time data monitoring and streamline workflows to ensure platform reliability and efficiency... 
    Remote job

    GE Aerospace

    Providence, RI
    2 days ago
  • $93k - $124k

    GE Aerospace is seeking a Sr. Data Engineer based in Bellevue, Washington. You will play a critical role in building and architecting automation and observability for the EDAS platform, emphasizing workflow automation and platform reliability. The position offers a competitive... 
    Remote job

    GE Aerospace

    Bellevue, WA
    3 days ago
  • $93k - $124k

    GE Aerospace in Omaha, Nebraska is seeking a Sr. Data Engineer to enhance the EDAS platform through automation and observability. This involves building frameworks for monitoring and logging, automating workflows, and ensuring data governance. Candidates should have a... 
    Remote job

    GE Aerospace

    Omaha, NE
    2 days ago
  • $93k - $124k

    GE Aerospace is seeking a Sr. Data Engineer in Honolulu, Hawaii. This role is critical in architecting automation and observability for the EDAS platform. Responsibilities include building observability frameworks, developing automation solutions, and collaborating with... 
    Remote job

    GE Aerospace

    Honolulu, HI
    3 days ago
  •  ...missions globally, supporting scientific exploration, Earth observation and missions to combat climate change, national security,...  ...proven execution history with the Electron program. TEST OPERATIONS ENGINEER Based out of Rocket Lab's Test Facility at Stennis Space... 
    Permanent employment
    Local area
    Weekend work

    Rocket Lab Corporation

    Picayune, MS
    17 days ago
  • Netsmart is looking for a Performance Test Automation Engineer in Overland Park, Kansas. The role involves designing and maintaining automated test frameworks, executing performance tests, and collaborating with teams to enhance application performance. Candidates should... 

    Netsmart

    Overland Park, KS
    3 days ago
  •  ...Senior Operations Engineer - Crypto Career Renew is recruiting for one of its clients a Senior Operations Engineer - Crypto - this is...  ...Operations Engineer to own the reliability, security, and observability of the systems that keep us running. The Role This isn... 
    Remote work

    Career Renew

    United States
    2 days ago
  •  ...Jersey is seeking a driven Business Systems Engineer whose role focuses on scaling and supporting systems for Data Center Operations. Responsibilities include building...  ...integrations for operational tooling, implementing observability dashboards, and collaborating with... 

    CoreWeave

    Livingston, NJ
    5 days ago
  •  ...Operations Engineer, responsible for maintaining fleet reliability in a remote, full-time role, focusing on provisioning, troubleshooting...  ...issues, including NVLink and driver bugs Familiarity with observability systems like Grafana and Prometheus Ability to script for... 
    Full time
    Remote work

    Virtual Vocations Inc

    United States
    2 days ago
  • $152k - $241.5k

     ...efficiency. Success in this role requires both operational precision along with developing and...  ..., the role drives improvements in observability, service reliability, and automation,...  ..., and aligned with long-term engineering demands. What you'll be doing: Manage... 

    NVIDIA

    Santa Clara, CA
    2 days ago
  •  ...Grafana Labs is seeking a Staff AI Engineer to help build observability tools that make complex data accessible through AI-driven features. This remote position empowers engineers with autonomy and ownership, promoting collaboration with cross-functional teams to ship... 
    Remote work

    Grafana

    New York, NY
    5 days ago
  • $86.8k - $165.2k

     ...the strength of more than 100 years of experience and renowned engineering expertise to meet the needs of today's mission and stay ahead...  ...phases of development, production, and maintenance of Low Observable (LO) weapon systems. You will be working on cutting-edge projects... 
    Temporary work
    Work experience placement
    Work at office
    Remote work
    Relocation
    Flexible hours

    Raytheon

    Tucson, AZ
    5 days ago
  •  ...analytics firm is seeking a Senior Individual Contributor to design and develop AI-native workflows aimed at enhancing engineering productivity and observability intelligence. The candidate should have over 8 years of software engineering experience, strong programming... 
    Flexible hours

    Teradata Corporation (SE)

    Nashville, TN
    3 days ago
  •  ...leading data analytics firm in Atlanta seeks a Senior Software Engineer to drive the development of AI-native workflows and...  ...deep expertise in Python and experience with automation and observability frameworks. The position offers competitive compensation and... 

    Teradata Corporation (SE)

    Atlanta, GA
    2 days ago
  •  ...missions globally, supporting scientific exploration, Earth observation and missions to combat climate change, national security,...  ...launch vehicle, this is your opportunity! SENIOR TEST SITE OPERATIONS ENGINEER The Senior Site Operations Engineer is a critical... 
    Permanent employment
    Local area
    Shift work
    Weekend work
    2 days per week

    Rocket Lab Corporation

    Picayune, MS
    17 days ago
  • $59.53 - $67.53 per hour

     ...Genesis10 is currently seeking a Senior Systems Operations Engineer for a hybrid position with a Global Financial Institution located in...  ...operations to proactive reliability engineering through strong observability, automation, and continuous improvement. The position... 
    Hourly pay
    Permanent employment
    Contract work
    Work experience placement
    Shift work

    Genesis10

    Charlotte, NC
    4 days ago
  • $94k - $118k

     ...missions globally, supporting scientific exploration, Earth observation and missions to combat climate change, national security,...  ...the boldest and most ambitious space missions SPACECRAFT OPERATIONS ENGINEER II Based on site at our Littleton, CO site the Spacecraft... 
    Permanent employment
    Local area
    Shift work

    Rocket Lab

    Littleton, CO
    3 days ago
  • $73.5 per hour

     ...Senior Systems Operations Engineer Location: Charlotte, NC, Irving, TX, Chandler, AZ Duration: 18 months Pay Rate: $73.50 Job...  ...and toil reduction. Implements and continuously improves observability across applications and middleware, including logs, metrics... 
    Work experience placement
    Shift work

    Leading Utilities Organization

    Charlotte, NC
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Observability & Operations Engineer. Be the first to apply!