Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer - GPU reliability

$200k - $300k
Full-time

Hudson River Trading

Hudson River Trading (HRT) is seeking a Software Engineer focused on GPU reliability to join our Systems Development team. The Systems Development team builds and maintains the platform that is shared by all Systems teams to provision, monitor, and manage HRT’s server and network infrastructure. In this role, your main focus will be to develop tools in Python to analyze the performance of GPU hardware and build creative solutions to improve observability, reliability, and efficiency of the fleet. You’ll work closely with other engineering teams to deeply understand research and trading workflows and ensure that GPU infrastructure is utilized optimally. Strong Python skills and development experience are required, along with Unix experience and a background of managing GPU hardware at scale. Responsibilities This role offers a unique opportunity to make a significant impact on a critical part of our existing and growing infrastructure. Your responsibilities may vary day to day, but will include: Building and maintaining tools and software features to automate systems engineering workflows related to GPU management, monitoring, metrics collection, maintenance, and network configuration Troubleshooting software and hardware bugs on a fleet of GPU devices, including application, network, operating system, and/or kernel issues Working across HRT’s engineering teams to tune workloads and processes to use GPUs more efficiently Analyzing GPU job statistics to identify trends and areas for improvement Qualifications Required: BS and/or MS in computer science or a related field 2+ years of relevant experience, including programming in Python and managing GPUs Experience using automation to solve problems and improve process efficiency Experience working with, troubleshooting, tuning, and deploying various types of GPU hardware Strong grasp of computer science fundamentals and software design patterns Solid understanding of Linux/UNIX operating systems Familiarity with open-source software Ability to debug and analyze problems quickly Skilled at balancing multiple tasks while maintaining meticulous attention to detail Ability to operate effectively as a team player and also work independently Ability to learn at a fast pace and apply new skills effectively Preferred: Understanding of Debian operating system Familiarity with systems configuration management and monitoring technologies Familiarity with continuous integration and continuous deployment tools and processes Understanding of networking protocols The estimated base salary range for this position is 200,000 to 300,000 USD per year (or local equivalent). The base pay offered may vary depending on multiple individualized factors, including location, job-related knowledge, skills, and experience. This role will also be eligible for discretionary performance-based bonuses and a competitive benefits package which includes medical, dental, vision, basic life insurance, and enrollment in our company’s retirement savings plans. Employees will receive sick and parental leave, as well as other paid time off (including 20 vacation days and 10 paid holidays in the US). Please note that benefits and time off policies will vary across non-US locations. Culture Hudson River Trading (HRT) brings a scientific approach to trading financial products. We have built one of the world's most sophisticated computing environments for research and development. Our researchers are at the forefront of innovation in the world of algorithmic trading. At HRT we welcome a variety of expertise: mathematics and computer science, physics and engineering, media and tech. We’re a community of self-starters who are motivated by the excitement of being at the cutting edge of automation in every part of our organization—from trading, to business operations, to recruiting and beyond. We value openness and transparency, and celebrate great ideas from HRT veterans and new hires alike. At HRT we’re friends and colleagues – whether we are sharing a meal, playing the latest board game, or writing elegant code. We embrace a culture of togetherness that extends far beyond the walls of our office. Feel like you belong at HRT? Our goal is to find the best people and bring them together to do great work in a place where everyone is valued. HRT is proud of our diverse staff; we have offices all over the globe and benefit from our varied and unique perspectives. HRT is an equal opportunity employer; so whoever you are we’d love to get to know you. Please be advised: Use of AI tools during interviews or assessments is strictly prohibited, unless otherwise instructed or agreed upon. We employ various methods to evaluate the authenticity of candidate responses. If we determine that AI assistance was used during any stage of the hiring process, we reserve the right to immediately disqualify your candidacy or rescind any job offers extended.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Software Engineer - GPU reliability in New York, NY vacancy
  • About the Team The Reliability Platform role is a key pillar of DoorDash...  ...and repetitive tasks. We use software and agents to “keep the...  ...About the Role As a Software Engineer on the Reliability Platform team...  ...Kafka topics, Databases, CPU/GPU Pools, Service Scaffolding,... 
    Suggested
    Hourly pay
    Work at office
    Local area
    Remote work
    Flexible hours

    DoorDash USA

    New York, NY
    1 day ago
  • $114.75k - $183.6k

    Job Title Software Engineer - Image Processing (C++ / GPU) Job Description The Software Development Engineer collaborates with the team to define software...  ...to monitor performance, usage, and errors, ensuring reliability, interoperability, and optimal system performance.... 
    Suggested
    Work at office
    Work visa
    Relocation package

    Philips Iberica SAU

    Brooklyn, NY
    4 days ago
  •  ...including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We're seeking a GPU Kernel Engineer to join our team at the cutting edge of AI acceleration, where your code... 
    Suggested
    Flexible hours

    Baseten

    New York, NY
    4 days ago
  • A leading company in security solutions is seeking a Senior Software Engineer, Enterprise Platform, to enhance reliability and compliance within their systems. This role focuses on building and operating services that meet strict compliance standards, especially in FedRAMP... 
    Suggested

    Vanta

    New York, NY
    4 days ago
  • $170k - $240k

     ...Senior Software Engineer - Observability and Reliability New York City, NY Senior Software Engineer - Observability and Reliability About the Role We are growing the engineering team and looking for engineers who have the chops to build and deliver world-class... 
    Suggested
    Full time
    Work at office
    Flexible hours

    Sigma Computing

    New York, NY
    4 days ago
  •  ...in the United States is seeking an experienced Infrastructure GPU Engineer to build and support high-performance cloud infrastructure....  ...optimizing resource allocation for GPU workloads, ensuring system reliability, and collaborating with cross-functional teams. The position... 
    Remote job

    DevOpsChat

    New York, NY
    4 days ago
  • About the Team The Reliability Platform role is a key pillar of DoorDash...  ...and repetitive tasks. We use software and agents to “keep the...  ...About the Role As a Software Engineer on the Reliability Platform team...  ...Kafka topics, Databases, CPU/GPU Pools, Service Scaffolding,... 
    Hourly pay
    Work at office
    Local area
    Remote work
    Flexible hours

    DoorDash USA

    New York, NY
    20 hours ago
  • $160k - $200k

    THE WORK As a Senior Site Reliability Engineer you will be a force multiplier at the intersection of platform reliability and engineering excellence. You will be responsible for the observability, releasability, and security foundations that keep Ripple's products highly... 

    jobr.pro

    New York, NY
    4 days ago
  • $166.9k - $230.9k

     ...matters, we’d love to hear from you. The Team Upstart’s Site Reliability Engineering (SRE) team owns the reliability, resiliency, and...  ...visibility into the system and customer experience. As a Senior Software Engineer focused on Site Reliability Tooling, your work will... 
    Summer work
    Currently hiring
    Work at office
    Local area
    Remote work
    Work from home

    Upstart

    New York, NY
    4 days ago
  • $160k - $240k

    Bloomberg L.P. is seeking a Senior Software Engineer in New York to enhance the reliability of Core Communications platforms critical to the financial industry. You'll work on large-scale distributed systems, improving automation, and ensuring predictable behavior under... 

    Bloomberg

    New York, NY
    3 days ago
  • $130k - $165k

    Job Title: Senior Software Engineer Company: Snapsheet Job Location: USA, Remote Job Type: Full-time, direct hire Job Department: Technology Team: Site Reliability Engineering About Snapsheet Snapsheet exists to simplify claims. We leverage our expertise in virtual... 
    Full time
    Temporary work
    Local area
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    Snapsheet

    New York, NY
    5 days ago
  • A technology consulting firm is looking for an Imaging Software Engineer to design and develop high-performance imaging software solutions. The...  ...a strong software engineering background and experience in GPU programming. This is a contract position with remote flexibility... 
    Remote job
    Contract work

    Intelliswift - An LTTS Company

    New York, NY
    4 days ago
  • A cloud computing firm is seeking a Senior Engineer to ensure the efficiency and reliability of their data center infrastructure. The role demands strong analytical abilities, problem-solving skills, and the capacity to influence stakeholders. Responsibilities include managing... 
    Remote work

    Nscale Ltd.

    New York, NY
    4 days ago
  • $160k - $240k

    Senior Software Engineer - Core Communications Reliability Location: New York Business Area: Engineering and CTO Ref #: 10050729 Description & Requirements Bloomberg’s Core Communications platforms power real‑time messaging across the global financial industry. Systems... 
    Temporary work
    For contractors
    Work experience placement

    Bloomberg L.P.

    New York, NY
    4 days ago
  • $114.75k - $183.6k

    R&D Software Development Engineer- Medical page is loaded## R&D Software Development Engineer- Medicallocations: Orange (OH), Ohio, United Statestime...  ...to monitor performance, usage, and errors, ensuring reliability, interoperability, and optimal system performance.**Your... 
    Full time
    Work at office
    Immediate start
    Work visa
    Relocation package
    3 days per week

    Philips International

    Brooklyn, NY
    5 days ago
  • $108.8k - $136k

     ...need for the care we deserve. To learn more, visit About the team Join our team as a Platform Engineer focused on network architecture and site reliability. In this role, you will own the design and implementation of our cloud network architecture, ensuring our... 
    Local area
    Flexible hours

    Judi Health

    New York, NY
    3 days ago
  • $200k - $250k

     ...Software Engineer, Infrastructure Platform Fluidstack, a leading cloud provider, is looking...  ...and product, you'll deliver scalable, reliable, user-friendly solutions that directly...  ...platforms for rack operations, server/GPU deployment, OS installation, quality assurance... 
    Local area

    Fluidstack

    New York, NY
    3 days ago
  •  ...appreciates complexity, knows or wants to learn when to step back and when to dive deep. We call this role a Cloud Service Reliability Engineer. The Cloud Service Reliability Engineer will be responsible for effective design, execution, and maintenance of systems... 

    Forhyre

    New York, NY
    22 days ago
  •  ...define and own the architecture for a new AI-native platform. This role involves designing distributed data systems, ensuring system reliability, and directly impacting product quality and customer trust. The ideal candidate has a deep understanding of production data... 

    Greylock Partners

    New York, NY
    5 days ago
  • $198k - $250k

    Capitolis is seeking a Senior Platform Engineer in New York City, NY. In this role, you will...  ...with Infra and SRE to ensure system reliability. We value ownership, collaboration, and...  ...should have over 8 years of experience in software engineering, particularly with Node.js and... 

    Capitolis

    New York, NY
    4 days ago
  • Saragossa is seeking a senior systems engineer to join their founder-led AI firm in New York. You will own the dev platform, lead deployment...  ...calls and shaping operations within a fast-paced environment, ensuring customers have reliable support. #J-18808-Ljbffr Saragossa

    Saragossa

    New York, NY
    4 days ago
  • GovWell Technologies Inc. is seeking a Founding Software Engineer, Platform to build and operate core backend systems for rapid and safe deployments. This role ensures system reliability and security while directly impacting developer velocity and government workflows.... 
    Flexible hours

    GovWell Technologies Inc.

    New York, NY
    4 days ago
  •  ...efficient AI development. ️ Role Overview We are seeking a GPU Cloud Platform Engineer to join our core infrastructure team and help build the...  ...Qualifications Bachelor's degree or higher in Computer Science, Software Engineering, Electronic Engineering, or related fields; 3+... 
    Full time
    Remote work
    Flexible hours

    Yotta Labs

    New York, NY
    4 days ago
  •  ...dynamic technology firm in the United States is seeking a Platform Engineer to optimize and innovate their infrastructure. You'll...  ...with engineering teams, manage AWS resources, and ensure system reliability. Candidates should have experience in SQL, infrastructure management... 

    Ashby

    New York, NY
    4 days ago
  • $148.5k - $223.9k

     ...ensure you are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #...  ...Job Title: Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation) Location: New York, NY... 
    Work experience placement
    Shift work

    Salesforce

    New York, NY
    5 days ago
  • A leading software engineering firm in the United States is seeking a pragmatic engineer to enhance reliability in client-critical software delivery. The role involves reworking release flows, improving incident readiness, and establishing measurable reliability KPIs using... 

    KeY2Moon Solutions

    New York, NY
    4 days ago
  • WP Engine is searching for a Production Engineer to join our engineering team in the United States. The ideal candidate will have over...  ...of cloud technologies. The role involves building reliability into our platform, debugging issues, and maintaining automation... 

    WP Engine

    New York, NY
    1 day ago
  • A pioneering AI infrastructure company is seeking a GPU Cloud Platform Engineer to design and operate large-scale GPU clusters. This remote position aims to ensure high availability and performance of containerized AI workloads across cloud environments. The ideal candidate... 
    Remote job

    Yotta Labs

    New York, NY
    4 days ago
  • A leading remote company is seeking a Senior Staff Platform Engineer to enhance the development workflow by building and scaling platforms...  ...involves mentoring junior engineers, guiding architecture for reliability, and advocating DevOps best practices in a fully remote... 
    Remote job

    DevOpsChat

    New York, NY
    4 days ago
  • $142.8k - $204k

    Senior Software Engineer - AI Platform Lead We are looking for a Senior Software Engineer who will act as the primary engineering lead for resolving complex, multi‑layered issues within RingCentral's AI platform. This role owns the full lifecycle of the solution: from deep... 
    Full time
    Work at office
    Flexible hours

    RingCentral

    Brooklyn, NY
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer - GPU reliability. Be the first to apply!