Software Engineer - GPU reliability
$200k - $300kHudson River Trading
Hudson River Trading (HRT) is seeking a Software Engineer focused on GPU reliability to join our Systems Development team. The Systems Development team builds and maintains the platform that is shared by all Systems teams to provision, monitor, and manage HRT’s server and network infrastructure. In this role, your main focus will be to develop tools in Python to analyze the performance of GPU hardware and build creative solutions to improve observability, reliability, and efficiency of the fleet. You’ll work closely with other engineering teams to deeply understand research and trading workflows and ensure that GPU infrastructure is utilized optimally. Strong Python skills and development experience are required, along with Unix experience and a background of managing GPU hardware at scale. Responsibilities This role offers a unique opportunity to make a significant impact on a critical part of our existing and growing infrastructure. Your responsibilities may vary day to day, but will include: Building and maintaining tools and software features to automate systems engineering workflows related to GPU management, monitoring, metrics collection, maintenance, and network configuration Troubleshooting software and hardware bugs on a fleet of GPU devices, including application, network, operating system, and/or kernel issues Working across HRT’s engineering teams to tune workloads and processes to use GPUs more efficiently Analyzing GPU job statistics to identify trends and areas for improvement Qualifications Required: BS and/or MS in computer science or a related field 2+ years of relevant experience, including programming in Python and managing GPUs Experience using automation to solve problems and improve process efficiency Experience working with, troubleshooting, tuning, and deploying various types of GPU hardware Strong grasp of computer science fundamentals and software design patterns Solid understanding of Linux/UNIX operating systems Familiarity with open-source software Ability to debug and analyze problems quickly Skilled at balancing multiple tasks while maintaining meticulous attention to detail Ability to operate effectively as a team player and also work independently Ability to learn at a fast pace and apply new skills effectively Preferred: Understanding of Debian operating system Familiarity with systems configuration management and monitoring technologies Familiarity with continuous integration and continuous deployment tools and processes Understanding of networking protocols The estimated base salary range for this position is 200,000 to 300,000 USD per year (or local equivalent). The base pay offered may vary depending on multiple individualized factors, including location, job-related knowledge, skills, and experience. This role will also be eligible for discretionary performance-based bonuses and a competitive benefits package which includes medical, dental, vision, basic life insurance, and enrollment in our company’s retirement savings plans. Employees will receive sick and parental leave, as well as other paid time off (including 20 vacation days and 10 paid holidays in the US). Please note that benefits and time off policies will vary across non-US locations. Culture Hudson River Trading (HRT) brings a scientific approach to trading financial products. We have built one of the world's most sophisticated computing environments for research and development. Our researchers are at the forefront of innovation in the world of algorithmic trading. At HRT we welcome a variety of expertise: mathematics and computer science, physics and engineering, media and tech. We’re a community of self-starters who are motivated by the excitement of being at the cutting edge of automation in every part of our organization—from trading, to business operations, to recruiting and beyond. We value openness and transparency, and celebrate great ideas from HRT veterans and new hires alike. At HRT we’re friends and colleagues – whether we are sharing a meal, playing the latest board game, or writing elegant code. We embrace a culture of togetherness that extends far beyond the walls of our office. Feel like you belong at HRT? Our goal is to find the best people and bring them together to do great work in a place where everyone is valued. HRT is proud of our diverse staff; we have offices all over the globe and benefit from our varied and unique perspectives. HRT is an equal opportunity employer; so whoever you are we’d love to get to know you. Please be advised: Use of AI tools during interviews or assessments is strictly prohibited, unless otherwise instructed or agreed upon. We employ various methods to evaluate the authenticity of candidate responses. If we determine that AI assistance was used during any stage of the hiring process, we reserve the right to immediately disqualify your candidacy or rescind any job offers extended.
- About the Team The Reliability Platform role is a key pillar of DoorDash... ...and repetitive tasks. We use software and agents to “keep the... ...About the Role As a Software Engineer on the Reliability Platform team... ...Kafka topics, Databases, CPU/GPU Pools, Service Scaffolding,...SuggestedHourly payWork at officeLocal areaRemote workFlexible hours
$114.75k - $183.6k
Job Title Software Engineer - Image Processing (C++ / GPU) Job Description The Software Development Engineer collaborates with the team to define software... ...to monitor performance, usage, and errors, ensuring reliability, interoperability, and optimal system performance....SuggestedWork at officeWork visaRelocation package- ...including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We're seeking a GPU Kernel Engineer to join our team at the cutting edge of AI acceleration, where your code...SuggestedFlexible hours
- A leading company in security solutions is seeking a Senior Software Engineer, Enterprise Platform, to enhance reliability and compliance within their systems. This role focuses on building and operating services that meet strict compliance standards, especially in FedRAMP...Suggested
$170k - $240k
...Senior Software Engineer - Observability and Reliability New York City, NY Senior Software Engineer - Observability and Reliability About the Role We are growing the engineering team and looking for engineers who have the chops to build and deliver world-class...SuggestedFull timeWork at officeFlexible hours- ...in the United States is seeking an experienced Infrastructure GPU Engineer to build and support high-performance cloud infrastructure.... ...optimizing resource allocation for GPU workloads, ensuring system reliability, and collaborating with cross-functional teams. The position...Remote job
- About the Team The Reliability Platform role is a key pillar of DoorDash... ...and repetitive tasks. We use software and agents to “keep the... ...About the Role As a Software Engineer on the Reliability Platform team... ...Kafka topics, Databases, CPU/GPU Pools, Service Scaffolding,...Hourly payWork at officeLocal areaRemote workFlexible hours
$160k - $200k
THE WORK As a Senior Site Reliability Engineer you will be a force multiplier at the intersection of platform reliability and engineering excellence. You will be responsible for the observability, releasability, and security foundations that keep Ripple's products highly...$166.9k - $230.9k
...matters, we’d love to hear from you. The Team Upstart’s Site Reliability Engineering (SRE) team owns the reliability, resiliency, and... ...visibility into the system and customer experience. As a Senior Software Engineer focused on Site Reliability Tooling, your work will...Summer workCurrently hiringWork at officeLocal areaRemote workWork from home$160k - $240k
Bloomberg L.P. is seeking a Senior Software Engineer in New York to enhance the reliability of Core Communications platforms critical to the financial industry. You'll work on large-scale distributed systems, improving automation, and ensuring predictable behavior under...$130k - $165k
Job Title: Senior Software Engineer Company: Snapsheet Job Location: USA, Remote Job Type: Full-time, direct hire Job Department: Technology Team: Site Reliability Engineering About Snapsheet Snapsheet exists to simplify claims. We leverage our expertise in virtual...Full timeTemporary workLocal areaRemote workVisa sponsorshipWork visaFlexible hours- A technology consulting firm is looking for an Imaging Software Engineer to design and develop high-performance imaging software solutions. The... ...a strong software engineering background and experience in GPU programming. This is a contract position with remote flexibility...Remote jobContract work
- A cloud computing firm is seeking a Senior Engineer to ensure the efficiency and reliability of their data center infrastructure. The role demands strong analytical abilities, problem-solving skills, and the capacity to influence stakeholders. Responsibilities include managing...Remote work
$160k - $240k
Senior Software Engineer - Core Communications Reliability Location: New York Business Area: Engineering and CTO Ref #: 10050729 Description & Requirements Bloomberg’s Core Communications platforms power real‑time messaging across the global financial industry. Systems...Temporary workFor contractorsWork experience placement$114.75k - $183.6k
R&D Software Development Engineer- Medical page is loaded## R&D Software Development Engineer- Medicallocations: Orange (OH), Ohio, United Statestime... ...to monitor performance, usage, and errors, ensuring reliability, interoperability, and optimal system performance.**Your...Full timeWork at officeImmediate startWork visaRelocation package3 days per week$108.8k - $136k
...need for the care we deserve. To learn more, visit About the team Join our team as a Platform Engineer focused on network architecture and site reliability. In this role, you will own the design and implementation of our cloud network architecture, ensuring our...Local areaFlexible hours$200k - $250k
...Software Engineer, Infrastructure Platform Fluidstack, a leading cloud provider, is looking... ...and product, you'll deliver scalable, reliable, user-friendly solutions that directly... ...platforms for rack operations, server/GPU deployment, OS installation, quality assurance...Local area- ...appreciates complexity, knows or wants to learn when to step back and when to dive deep. We call this role a Cloud Service Reliability Engineer. The Cloud Service Reliability Engineer will be responsible for effective design, execution, and maintenance of systems...
- ...define and own the architecture for a new AI-native platform. This role involves designing distributed data systems, ensuring system reliability, and directly impacting product quality and customer trust. The ideal candidate has a deep understanding of production data...
$198k - $250k
Capitolis is seeking a Senior Platform Engineer in New York City, NY. In this role, you will... ...with Infra and SRE to ensure system reliability. We value ownership, collaboration, and... ...should have over 8 years of experience in software engineering, particularly with Node.js and...- Saragossa is seeking a senior systems engineer to join their founder-led AI firm in New York. You will own the dev platform, lead deployment... ...calls and shaping operations within a fast-paced environment, ensuring customers have reliable support. #J-18808-Ljbffr Saragossa
- GovWell Technologies Inc. is seeking a Founding Software Engineer, Platform to build and operate core backend systems for rapid and safe deployments. This role ensures system reliability and security while directly impacting developer velocity and government workflows....Flexible hours
- ...efficient AI development. ️ Role Overview We are seeking a GPU Cloud Platform Engineer to join our core infrastructure team and help build the... ...Qualifications Bachelor's degree or higher in Computer Science, Software Engineering, Electronic Engineering, or related fields; 3+...Full timeRemote workFlexible hours
- ...dynamic technology firm in the United States is seeking a Platform Engineer to optimize and innovate their infrastructure. You'll... ...with engineering teams, manage AWS resources, and ensure system reliability. Candidates should have experience in SQL, infrastructure management...
$148.5k - $223.9k
...ensure you are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #... ...Job Title: Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation) Location: New York, NY...Work experience placementShift work- A leading software engineering firm in the United States is seeking a pragmatic engineer to enhance reliability in client-critical software delivery. The role involves reworking release flows, improving incident readiness, and establishing measurable reliability KPIs using...
- WP Engine is searching for a Production Engineer to join our engineering team in the United States. The ideal candidate will have over... ...of cloud technologies. The role involves building reliability into our platform, debugging issues, and maintaining automation...
- A pioneering AI infrastructure company is seeking a GPU Cloud Platform Engineer to design and operate large-scale GPU clusters. This remote position aims to ensure high availability and performance of containerized AI workloads across cloud environments. The ideal candidate...Remote job
- A leading remote company is seeking a Senior Staff Platform Engineer to enhance the development workflow by building and scaling platforms... ...involves mentoring junior engineers, guiding architecture for reliability, and advocating DevOps best practices in a fully remote...Remote job
$142.8k - $204k
Senior Software Engineer - AI Platform Lead We are looking for a Senior Software Engineer who will act as the primary engineering lead for resolving complex, multi‑layered issues within RingCentral's AI platform. This role owns the full lifecycle of the solution: from deep...Full timeWork at officeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer - GPU reliability. Be the first to apply!
- software sales engineer New York, NY
- software engineer internship remote New York, NY
- IT software developer New York, NY
- new grad software engineer New York, NY
- software engineer staff New York, NY
- integration software engineer New York, NY
- machine learning software engineer New York, NY
- software engineer part time New York, NY
- facebook software engineer New York, NY
- senior robotics software engineer New York, NY


