HPC Observability Engineer
e-IT Professionals Corp.
22 hours ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Direct message the job poster from EIT Professionals Corp Role: HPC Observability Engineer (Python, HPC) Location: Remote Contract Description: The client has Grafana and InfluxDB services running on K8S in-house on-premises. Telegraf is used to ingest data from a GPU HPC cluster into InfluxDB. This engineer will help collect and visualize data for the “Terra” platform. The HPC Observability Engineer should have experience in: Setting up and maintaining Grafana dashboards for HPC environments Creating drill-down dashboards for servers, including metrics like memory, network, and CPU utilization Exploring and utilizing out-of-the-box metrics from InfluxDB Writing Python scripts for data ingestion into InfluxDB with examples Developing a proof of concept with a simple Python script to monitor load Ingesting Infiniband packet data Monitoring LSF jobs in various states Visualizing server-specific and cluster-wide metrics in Grafana Optional: Integrating third-party plugins like DDN’s Lustre, Mellanox fabric, etc. Qualifications and Skills: B.Tech, MS, or PhD in Computer Science or related field 5-8 years of experience with Grafana, InfluxDB, and Telegraf Experience in Python and Bash scripting is a plus Knowledge of Docker and Google Cloud Platform is advantageous HPC operations experience is beneficial Strong communication skills and ability to work independently Proficiency in requirements analysis and automated testing Ability to write efficient, secure, and well-documented Python code Experience with Git and pipeline development Awareness of modern security and development practices Responsibilities: Develop and leverage Grafana dashboards and Telegraf configurations Create dashboards for server and cluster metrics Develop Python scripts for data ingestion and documentation Visualize non-native resources in Grafana Optional: Integrate third-party plugins Maintain high-quality code and documentation Collaborate with teams to troubleshoot and optimize pipelines Desired Skills: Python (good to have) Bash scripting (good to have) Docker (must) HPC operations and LSF (good to have) Experience with DDN Lustre, Mellanox fabric (good to have) Google Cloud Platform (good to have) Knowledge of Git (must) Seniority level: Mid-Senior level Employment type: Contract Job function: Engineering and Information Technology Industries: IT Services and IT Consulting This job is active and accepting applications. #J-18808-Ljbffr
- ...Space Executive is seeking a Fullstack Engineer to develop core product experiences for their AI observability platform. This role encompasses frontend engineering, distributed systems, and applied AI. You will work on building fullstack features across TypeScript, React...SuggestedRemote work
- ...description Our core mission at Railway is to make software engineers higher leverage. We believe that people should be given... ...users, in real-time, of threshold breaches Craft rich backend observability APIs, working with product to build amazing experiences for instantly...SuggestedMonday to Friday
- ...About The Opportunity Are you a passionate cloud engineer that's delivered systems in a fast-paced agile environment? We are looking for... ...applications while ensuring cloud-native best practices Design/deploy HPC clusters with Slurm scheduling and parallel filesystems (Ceph,...SuggestedWork at office
$150k - $190k
...currently working on a proposal with HHS and we are looking for a High Performance Computer Engineer. This is for a proposal and will be remote. The High Performance Computing (HPC) Engineer supports and optimizes HPC environments that enable advanced scientific research...SuggestedFull timeRemote workFlexible hours$150k - $190k
...A transformative IT services firm is seeking a High Performance Computer Engineer to support and optimize HPC environments for advanced scientific research remotely. The role involves collaboration with researchers to improve performance and efficiency of computational...SuggestedRemote work- A leading technology solutions firm is seeking an Engineer in Richmond, Virginia, to provide system programming and management support for large-scale networking infrastructures. This role involves maintaining system documentation, installing networks, and offering high...
- ...What You Will Do As a Controls Engineer II you will be responsible for the design of control system details, instrumentation configuration... ...per documented processes. Tests the installed system to observe system functionality and ensure that the programs function as...Full timeFlexible hours
- ...Contributor to lead AI-native workflows and develop intelligent automation tools. The role requires over 8 years of experience in software engineering, strong expertise in Python, and a deep understanding of AI/ML systems. Successful candidates will architect AI-driven solutions...
$83k - $110k
...individual will join a team of committed engineers working to deploy nodes as fast as they... ...Prometheus, promsql queries or similar observability platforms ~ Data center environments including... ...trays ~ Kubernetes administration ~ HPC - administering GPU-related workloads...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work- ...Building Engineer Job Category: Contract Services Requisition Number: BUILD007347 Posted: May 7, 2026 Full-Time Virginia,... ...and maintenance procedures or assist in their development. Observe and interpret readings on gauges, meters and charts which register...Full timeContract work
$80.9k - $109.3k
...is currently seeking an experienced Electrical Commissioning Engineer I and II to join our Energy Storage Group to provide electrical... ...plans, checklists, and tests • Performing field visits and observing commissioning inspections and tests where prescribed means and...For contractorsH1bWork at officeFlexible hours$107.1k - $160.7k
...Your Opportunity We are looking for a talented Project Engineer who wants to be part of a purpose-driven organization that’s focused... ..., responds to request for information, prepares site observation reports, and performs other contract administration tasks....Full timeContract workTemporary workPart timeWork experience placementCasual workLive inWork at officeLocal areaFlexible hours- ...Contractor Job Summary Were hiring a VP of Automation Engineering for a senior, hands‑on leadership role at the intersection... ...with a track record of designing for scale, maintainability, observability, and resilience. Strategic technical leadership experience...Full timeFor contractorsWork experience placementRemote work
- ...Project Engineer The Project Engineer assists the Project Manager and Superintendent with respect to safety, quality, planning and... ...Make inspections and record erosion and sedimentation control observations in SWPPP. Record Keeping: Assist superintendent with daily...Daily paidContract workTemporary workFor subcontractorWork at officeNight shift
- ...production and internal application issues through structured engineering solutions. Strengthen engineering standards across IT automation through version control, testing, documentation, and observability. Influence teams to adopt software engineering approaches...
- ...East Africa, that provides program management and facilities engineering services worldwide. Planate is a small business provider of planning... ...codes. Prepare detailed inspection reports documenting observations and findings. Collaborate with contractors, project teams,...Full timeTemporary workFor contractorsWorldwide
- ...Project Engineer Opportunity Founded in 1919, KJ has always looked to the future. With a talented team of professionals and a culture... ..., respond to requests for information, and prepare site observation reports. Communication and Collaboration: Assist in preparing...Contract workWork at officeWork from home
- ...Railway is seeking a Software Engineer to build scalable services and manage complex distributed systems. This high-impact role offers an environment rich in autonomy and ownership, where engineers can thrive while addressing novel problems. The position emphasizes collaboration...
- ...security, databases, and DevSecOps Hands-on experience with AWS services, Terraform, Kubernetes (EKS), CI/CD pipelines, and observability tools Strong experience with incident management, monitoring, alerting, and root cause analysis Experience with GitLab CI/...
$106.61k - $284.28k
...Summary Join Fortune 7 CVS Health as a Senior Manager-Quality Engineering & Automation to lead our organization’s efforts to develop... ...architecture teams. Influence design decisions to improve testability, observability, and reliability. Communicate testing strategy, progress,...Full timeContract workWork experience placementLocal areaShift work$87.88k - $120k
A leading health solutions provider is seeking a Senior Talent Acquisition Partner for a full-time remote position. This role involves leading AWS environment setups, building Infrastructure as Code using Terraform, and troubleshooting infrastructure issues. The successful...Full timeRemote work- ...equivalent field experience. Ability to perform triennial maintenance, engine diagnostics, fuel system priming and replacing failed engine... ...data, establish facts, and draw valid conclusions. Must observe all safety standards. Ability to deal with customers where tact...Contract workFor subcontractorWork at officeFlexible hours
- ...We are seeking a QA Engineer with a strong background in API testing and LLM fine-tuning/evaluation . You will be responsible for... ...through agent conversations. Performance Testing: Use Gravitee Observability tools to measure latency in the agent-to-API loop and identify...
$120k - $140k
...Electrical Engineer Electrical Engineer $ 120,000- $ 140,000.00 + 12% 401K and Quarterly Bonuses[ 10% -15 % ] This regional role... ...a high level of safety support, specifically through safety observations, evaluations, and audits. Oversee all corporate Ignition installations...Temporary workRemote work$105.23k
...impact our world? CDM Smith offers employees opportunities to delve into many aspects of electrical engineering, including the design of complex power systems, observation and construction services, and power system analyses, etc. We want to match you up with the...Full timeH1bLocal areaRelocation packageFlexible hours- Registration: Professional Engineer (PE) Required Skills: Bluebeam; Microsoft Office Sponsorship: Immigration related employment benefits... ...services, including RFIs, submittal/shop drawing review, site observations, and coordination with contractors. Ability to build and...For contractorsWork at office
$65.6k - $95.1k
...transportation. Your Opportunity As a recognized leader in pavement engineering and infrastructure/asset management consulting, our... ...pavement condition assessments using a tablet to document the observed conditions of the pavement Collecting samples of surface...Full timeTemporary workPart timeCasual workWork at officeLocal areaRemote workFlexible hours- ...Job Description Timmons Group is seeking a Civil Project Engineer II/III - Traffic Analysis and Planning candidate to join our... ...engineering planning schedules and cost estimates Complete field observation, inspection, and data collection duties Communicate with...Work at officeFlexible hours
$117.4k - $146.7k
...Platform Automation Engineering Our team is building a modern hybrid cloud platform from the ground up to support the next generation... ...environments, developer tooling, messaging systems, and observability capabilities required to run reliable and scalable applications...Full timeWork experience placementLocal areaRemote workShift work- ...Description Timmons Group is currently seeking a Civil Project Engineer II/III to join our Roadway Design Transportation Group in... ...open-source GIS data for project planning Conduct field observations and data collection to support design discussions Stay...For contractorsFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to HPC Observability Engineer. Be the first to apply!


