Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

HPC Data Center Operational Lead

Jump Trading

Hpc Data Center Operational Lead

Location: Chicago or New York (On-site 5 days/week; regular travel to HPC data center sites required)

Jump Trading Group is committed to world class research. We empower exceptional talents in Mathematics, Physics, and Computer Science to seek scientific boundaries, push through them, and apply cutting edge research to global financial markets. Our culture is unique. Constant innovation requires fearlessness, creativity, intellectual honesty, and a relentless competitive streak. We believe in winning together and unlocking unique individual talent by incenting collaboration and mutual respect. At Jump, research outcomes drive more than superior risk adjusted returns. We design, develop, and deploy technologies that change our world, fund start-ups across industries, and partner with leading global research organizations and universities to solve problems.

Trading Infrastructure is a global organization of Engineers who architect, build and maintain our world-class infrastructure. From colo design/implementation, to optimizing our exchange connectivity, to building world class low latent Wide Area Networks, we leverage research and automation to consistently adapt and innovate our infrastructure to scale and drive our trading and evolving business.

Jump's HPC infrastructure powers some of the most demanding computational workloads in the industry. As our HPC footprint grows, we need a seasoned operations leader to own the reliability, standards, and day-to-day excellence of these environments. This role leads the teams that keep the lights on across Jump's HPC data centers, ensuring maximum uptime through disciplined operations, proactive maintenance, and deep technical expertise in critical facility systems. Heavy, daily use of AI tools is expected in this role—to accelerate decision-making, automate operational workflows, analyze data center telemetry, and continuously raise the bar on how the team operates.

Team Leadership & Organizational Ownership - Lead and manage data center site leads and their teams across multiple HPC facilities; site leads report directly to this role. - Recruit, mentor, and develop team members while conducting performance reviews and building a culture of operational rigor. - Direct onsite contractors by providing clear scope and validating completed work.

Hpc Data Center Standards, Processes & Preventative Maintenance - Develop, document, and enforce operational standards and procedures for Jump's HPC data centers covering power, cooling, cabling, and hardware lifecycle. - Design and own the preventative maintenance program, including scheduled inspections, component replacements, and firmware/capacity reviews to minimize unplanned downtime. - Drive continuous improvement of operational processes and pursue automation—including AI-driven approaches—to reduce manual effort and human error.

Critical Facility Systems Expertise - Serve as the subject matter authority on HPC data center power distribution, power striping strategies, and failover/redundancy configurations. - Own expertise across air cooling, liquid cooling (direct-to-chip, rear-door, CDU-based), and hybrid cooling architectures. - Maintain deep knowledge of environmental monitoring and controls (temperature, humidity, airflow, leak detection) and ensure systems remain within design parameters.

Monitoring & Incident Response - Own the HPC data center monitoring strategy end-to-end: define what is monitored, set alerting thresholds, and ensure comprehensive visibility into facility and hardware health. - Leverage AI tools to analyze telemetry data, identify failure patterns, predict potential issues, and accelerate root cause analysis during incidents. - Lead critical incident response and drive root cause analysis and corrective actions to prevent recurrence. - Establish and track operational KPIs including availability, mean time to repair, and efficiency metrics.

Server & Switch Hardware Expertise - Maintain deep, hands-on knowledge of server hardware architectures including multi-socket platforms, GPU/accelerator configurations, memory subsystems, NVMe/storage controllers, BMC/IPMI management, and firmware lifecycle. - Maintain deep, hands-on knowledge of network switch hardware including line cards, optics/transceivers, switch fabrics, and platform-specific diagnostics for Arista and Cisco platforms. - Evaluate new hardware platforms, drive hardware qualification and acceptance testing, and provide informed recommendations on hardware selection.

Hardware Break-Fix - Own the overall hardware break-fix function across all HPC sites, ensuring rapid diagnosis and resolution for servers, GPUs, network equipment, storage, and facility infrastructure. - Diagnose complex hardware failures at the component level—CPUs, DIMMs, GPUs, NICs, PSUs, fans, drives, switch line cards, and optics—and direct the team to resolve efficiently. - Establish escalation paths, SLA targets, and reporting for hardware failures.

Inventory & Spares Management - Own inventory processes and spares tracking across all HPC facilities, ensuring critical spares are stocked, tracked, and replenished to meet availability targets. - Maintain accurate asset records for all serialized and consumable inventory.

Planning, Vendor & Budget Management - Conduct capacity planning for space, power, cooling, and cabling to stay ahead of growth. - Gather requirements and plan new hardware installations including physical placement, power/cooling needs, and cabling. - Manage relationships with colocation providers and hardware vendors; negotiate contracts and SLAs. - Develop and manage operational budgets for equipment, staffing, and facilities.

Networking & Linux - Possess strong working knowledge of networking concepts including L2/L3 protocols, VLANs, BGP, OSPF, LACP, ECMP, and high-performance fabrics relevant to HPC environments. - Understand network architectures such as spine-leaf, fat-tree, and high-radix topologies used in HPC clusters. - Maintain strong Linux systems knowledge—comfortable navigating and troubleshooting at the OS level, including storage, networking, process management, log analysis, and system diagnostics.

AI-Driven Operations - Use AI tools daily across all aspects of the role: writing and reviewing documentation, analyzing operational data, drafting procedures, managing communications, and problem-solving. - Champion AI adoption within the team—set the expectation that every team member integrates AI into their daily workflows. - Identify and implement opportunities where AI can replace or augment manual operational processes.

Cross-Team Partnership - Partner with HPC Engineering, Network Engineering, and other teams to align operations with research and business needs. - Ensure compliance with all safety, security, and regulatory requirements.

Travel - Travel regularly to Jump's HPC data center sites for operational oversight, project execution, and team engagement. This is a core requirement of the role.

Additional duties as assigned or needed.

Skills You'll Need:

- Minimum 7+ years of data center operations experience with at least 3 years leading teams in 24/7 critical infrastructure environments. HPC environment experience strongly preferred. - In-depth knowledge of data center power systems, power distribution/striping, and failover/redundancy architectures. - In-depth knowledge of cooling technologies including air cooling, liquid cooling (direct-to-chip, rear-door heat exchangers, CDUs), and environmental control systems. - Proven experience building and maintaining preventative maintenance programs and operational standards/procedures. - Strong experience with data center monitoring platforms (DCIM, BMS, environmental sensors) and defining monitoring/alerting strategies. - Demonstrates a high level of energy, results driven, and able to work under pressure with tight deadlines.

Technical Skills:

- Deep knowledge of server hardware architectures: multi-socket platforms, GPU/accelerator systems, memory subsystems, NVMe storage, BMC/IPMI, and firmware management. - Deep knowledge of network switch hardware: line cards, optics/transceivers, switch fabrics, and platform diagnostics across Arista and Cisco platforms. - Proven hardware break-fix experience with the ability to diagnose failures at the component level (CPUs, DIMMs, GPUs, NICs, PSUs, drives, line cards, optics). - Strong understanding of networking concepts: L2/L3 protocols, VLANs, BGP, OSPF, LACP, ECMP, and HPC network topologies (spine-leaf, fat-tree). - Strong Linux systems proficiency—well beyond basic CLI usage. Comfortable with OS-level troubleshooting, storage and network configuration, process management, log analysis, and system diagnostics. - Experience managing inventory and spares programs for critical infrastructure. - Structured cabling standards expertise. - Programming/scripting experience (Python preferred) is a plus. - Demonstrated heavy use of AI tools (e.g., LLM-based assistants, AI coding tools, AI-driven analytics) in a professional setting. You should already be using AI daily and be eager to push its application further across operations. - Strong project management skills with multi-site infrastructure deployment experience. - Knowledge of industry standards including ASHRAE and TIA-942. - Excellent written and verbal communication skills with the ability to communicate effectively across technical and non-technical audiences. - Meet physical requirements including working on ladders/elevated platforms and lifting up to 50 lbs. - Extremely high personal standards for work quality and operational discipline. - Reliable and predictable availability, including ability to work evenings and weekends as required. - Willingness and ability to travel regularly to data center sites. - Bachelor's degree preferred.

Benefits - Discretionary bonus eligibility - Medical, dental, and vision insurance - HSA, FSA, and Dependent Care options - Employer Paid Group Term Life and AD&D Insurance - Voluntary Life & AD&D insurance - Paid vacation plus paid holidays - Retirement plan with employer match - Paid parental leave - Wellness Programs

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the HPC Data Center Operational Lead in New York, NY vacancy
  • P2P is seeking an HPC Infrastructure Operations Lead to manage critical infrastructure and ensure the reliability of our data centers. This on-site role in Chicago or New York requires leadership and technical expertise in data center operations, including a strong focus... 
    Suggested

    P2P

    New York, NY
    3 days ago
  •  ...Senior Technical Program Manager (HPC, Linux) New York, NY (...  ...Technical Program Manager to lead the charge in establishing,...  ...of passionate problem-solvers operating in a dynamic, high-performance...  ...(HPC) environments, data centers, and multi-node architectures... 
    Suggested
    Work at office

    Elliot Partnership

    New York, NY
    2 days ago
  • $100k - $150k

     ...next. About The Role We are seeking a Data Center Operations Technician to serve as the on‑site specialist...  ...problems, and cable plant defects. Lead your local team beyond troubleshooting...  ...compute environments (GPU clusters, HPC, AI/ML infrastructure) Familiarity with... 
    Suggested
    Local area
    Remote work
    Relocation

    Fluidstack

    New York, NY
    1 day ago
  • $125k - $150k

     ...electric vehicles, renewable energy, and data centers. KoBold builds AI models for mineral...  ...—to guide decisions on KoBold-owned-and-operated exploration programs. In the six years since...  ...and software engineers, who come from leading technology companies, jointly lead exploration... 
    Suggested
    Full time
    Contract work
    For contractors
    Seasonal work
    Local area
    Remote work

    KoBold Metals

    New York, NY
    10 hours ago
  • $163k - $237k

    Google is seeking a Technical Program Manager to lead multifaceted projects in data center planning and execution. Located in New York, NY, you will utilize...  ..., ensure stakeholder communication, and improve operational procedures. The role requires a Bachelor's degree in a... 
    Suggested

    Google

    New York, NY
    1 day ago
  • $180k - $277k

     ...Nscale is seeking a Head of Infrastructure Operations based in the United States to lead data center operations. This role will ensure operational excellence while driving continuous improvement, scaling operations for business growth. Responsibilities include managing... 

    Nscale

    New York, NY
    10 hours ago
  •  ...guidelines and maintain overall customer satisfaction • Travel to data centers or client sites for on-site technical work • Support...  ...Authentication Infrastructure (SSO, Entra ID, Okta) • Strong operation and troubleshooting skills in: Monitoring tools (AWS... 
    Full time
    Work at office
    Visa sponsorship

    Cinter LLC

    New York, NY
    3 days ago
  • $140k - $150k

     ...Vantage Data Centers is looking for an Energy Project Development Manager to oversee the execution of energy projects. The role requires 7-10 years of experience in energy development or project management and involves managing complex cross-functional teams. Strong communication... 
    Remote work

    Vantage Data Centers

    New York, NY
    10 hours ago
  • $98.3k - $175.23k

     ...is currently initiating a search for a Substation Project Lead for our Data Center Initiation Team . This role can report to any WSP US...  ...federal law. #LI-SC1 About WSP WSP USA is the U.S. operating company of WSP, one of the world's leading engineering and... 
    For contractors
    For subcontractor
    Work at office
    Local area
    Flexible hours

    WSP

    New York, NY
    10 hours ago
  •  ...Who We Are Core Scientific is a leading provider of infrastructure for high...  ...company (NASDAQ: CORZ). We power AI, HPC, and other next‑generation data center workloads demanding exceptional...  ...addition to our digital asset mining operations. We own and operate nine data... 
    Permanent employment
    Full time
    Temporary work
    For contractors
    Local area
    Immediate start
    Monday to Friday
    Night shift
    Weekend work

    Core Scientific

    New York, NY
    10 hours ago
  •  ...An IT solution provider is seeking a Project Manager to support their Data Center Practice. The role involves managing projects to meet client objectives, collaborating with Architects and Engineers, and ensuring project delivery quality. Successful candidates should possess... 

    CompuNet, Inc

    New York, NY
    10 hours ago
  • $185k - $250k

     ...A leading electrical contractor in the United States seeks a Project Executive to oversee a portfolio of electrical construction projects. You will lead project managers, build client relationships, and ensure project execution and financial performance. Ideal candidates... 
    For contractors

    Verrus

    New York, NY
    10 hours ago
  • $5,000 - $7,000 per month

     ...1 (Advanced – written and spoken) Data Center Infrastructure Engineer (Project Lead) Location: Remote Compensation: $5...  ...Increase speed to market Improve operational efficiency Cross-Functional Execution...  ...with: High-density or AI/HPC environments Liquid cooling systems... 
    Remote work

    Entrepreneur Cooperative

    New York, NY
    10 hours ago
  • $122k - $179k

     ...Infrastructure Operations Program Manager Livingston, NJ / New York, NY / Sunnyvale,...  ...and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises...  ...the AI revolution—working across data centers, hardware systems, and customer workloads... 
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    New York, NY
    3 days ago
  •  ...quality on site. The ideal candidate will have a Bachelor’s or Master’s degree in Engineering or Construction Management and experience in data center construction. This full-time position offers a chance to work for a highly rated employer. #J-18808-Ljbffr McClure Company
    Full time

    McClure Company

    New York, NY
    10 hours ago
  • $188.5k - $205.4k

     ...agreement to acquire Intersect, to enable more data center and generation capacity to come online...  ...built to do—develop, construct, and operate the most ambitious power and data...  ...This Role The Project Controls Manager leads the development and execution of the project... 
    Contract work
    Work at office
    Local area
    Home office
    Flexible hours

    Intersect Power, LLC

    New York, NY
    10 hours ago
  • $132k - $189k

    Lead Program Manager, Business Operations By applying to this position you will have an opportunity to share your preferred working location from the...  ...them up to date on progress and deadlines. Google's Data Center, Compliance, Strategy and Risk Management team (CSRM)... 
    Full time
    Worldwide

    Google Inc.

    New York, NY
    10 hours ago
  •  ...and hyperscale customers with customized data center solutions. Today, we remain dedicated to...  ..., facilities management and data center operations to customized construction needs...  ...resolutions Ability to assemble, develop, and lead highly technical and diverse teams Project... 
    Daily paid
    Contract work
    Local area
    Remote work
    Worldwide
    Shift work

    T5 Data Centers

    New York, NY
    10 hours ago
  •  ...confidence. As the U.S. enters a new capex supercycle across data centers, factories, housing, and renewables, joining PermitFlow...  ...Uber. Role Overview We're looking for a Head of Operations to lead and scale the operational engine behind PermitFlow. This... 
    For contractors

    PermitFlow

    New York, NY
    3 days ago
  • $65 - $85 per hour

     ...enterprise is looking for a Avaya Operations Manager that can work...  ...supporting systems at the medical center. This person will be an...  ...well as the internal PMO and data center teams. They will interface...  ...Global Services We're a leading provider of business and... 
    Contract work
    Temporary work

    TEKsystems

    Brooklyn, NY
    10 hours ago
  • $65 - $85 per hour

     ...We are seeking an experienced Avaya Operations Manager with deep expertise in Avaya call center and contact center environments. This role will lead efforts to assess, understand, and manage...  ..., Server, Switch, Cabling, PMO, and Data Center teams, as well as external vendors... 
    Contract work
    Temporary work

    TEKsystems

    New York, NY
    10 hours ago
  • $120k - $150k

     ...Contract Lifecycle Management (CLM) Data Management & Governance...  ...support informed decision-making, operational efficiency, and strategic...  ...Contract Data Readiness Lead and support contract data initiatives...  ...the full spectrum of data center, colocation, and... 
    Contract work
    Work at office

    Digital Realty

    New York, NY
    4 days ago
  • $123.5k - $150.7k

     ...Make an impact at NTT Global Data CentersJoin NTT Global Data Centers and be part of a team that drives innovation and sustainability in the digital world...  ...we areAs the third largest data center provider, we operate over 150 data centers in more than 20 countries and regions... 
    Full time
    Temporary work
    Work at office
    Local area
    Remote work
    Flexible hours

    NTT DATA

    New York, NY
    10 hours ago
  •  ...Vistra Nuclear Operations Company is seeking an OPI Lead to drive continuous improvement in their operational performance. This role focuses on enhancing Lean Thinking strategies and fostering a culture of OPI across the site. The ideal candidate will collaborate with... 

    Vistra Nuclear Operations Company

    Brooklyn, NY
    22 hours ago
  •  ...Operational Leads Responsible day-to-day management of complex direct service programs, for example, Humanitarian Emergency Response and Relief Center (HERRC) facilities, which offer direct service provision, resource navigation, and temporary shelter to single adults... 
    Temporary work

    Phaxis

    New York, NY
    3 days ago
  •  ...to travel occasionally - mid-west/east coast Client is a billion dollar Managed Serviced Provider / colo. Most projects involve Data center infrastructure ( open systems / distributed and Mainframe systems migrations to on-prem or cloud ) PLS SPECIFY IF YOU HAVE EXPERIENCE... 
    Remote work

    IT Associates

    New York, NY
    10 hours ago
  •  ...CyrusOne is a leading global data center developer and operator, delivering secure, scalable, and reliable infrastructure solutions for the world’s largest enterprises and hyperscale customers. We are committed to operational excellence, innovation, and delivering high... 

    CyrusOne

    New York, NY
    10 hours ago
  • $120k - $140k

     ...ASM Global LLC. is seeking an Assistant General Manager responsible for the day-to-day operations of assigned departments. This role involves overseeing personnel, managing budgets, and ensuring high standards of service and efficiency. The ideal candidate will have a... 

    ASM Global LLC.

    New York, NY
    2 days ago
  •  ...deliver flexible, efficient, and resilient data center solutions to businesses worldwide....  ...suppression systems, and fuel delivery systems. Operating as part of CloudHQ’s Construction...  ...site commissioning of critical systems. Lead value engineering efforts to optimize mechanical... 
    Full time
    Contract work
    For contractors
    For subcontractor
    Local area
    Worldwide
    Flexible hours

    CloudHQ LLC

    New York, NY
    10 hours ago
  •  ...A leading engineering firm in the United States is seeking a skilled Project Manager to lead complex data center projects. The role involves assembling and guiding teams, managing budgets, and ensuring timely completion of all project stages. The ideal candidate will... 

    Olsson

    New York, NY
    10 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to HPC Data Center Operational Lead. Be the first to apply!