GPU/CPU Systems Engineer
$135.2k - $306.4kOracle
Job Overview Oracle hardware platform development engineering is seeking a highly driven GPU/CPU Platform System Engineer at the Principal Engineer level. The GPU System Engineer will work within development engineering with a small team of talented engineers who lead the development and day-to-day engineering efforts for Oracle’s rapidly growing and successful Cloud AI platforms. You will participate in platform definition, platform development oversight, in‑house development, design reviews, system integration, performance testing and characterization. You will interact closely with third‑party GPU IC suppliers and partners as well as internal hardware and software development teams to help drive Oracle’s AI Cloud platform solution space. You will be a critical part of the team developing Oracle’s growing Cloud AI solutions. The team has delivered the first and second generation of Oracle Cloud dedicated compute, AI platforms and is building out the next generation of Cloud and Enterprise systems with record‑breaking performance, security, and world‑class quality, using the latest merchant silicon and technologies. Oracle Cloud Infrastructure (OCI) is looking for a visionary Systems Engineer to lead innovation in AI hardware and datacenter infrastructure . In this high‑impact role, you’ll guide the development of emerging technologies in compute accelerators, virtualization, networking, energy systems , and AI infrastructure . Your work will directly influence OCI’s long‑term architectural direction and help shape the future of cloud infrastructure. Responsibilities Our Design Engineering organization is looking for a highly driven, capable, and dedicated Principal Engineer to join the team developing the next generation AI platform for Cloud. Your work will include: Review and assessment of third‑party merchant silicon used for AI Accelerator Modules. Evaluation of system architecture and proposed implementation path analysis. Participate in platform definition and analysis. Provide platform development oversight for partners. Work with in‑house engineering functional experts on design and reviews. Support and guide system integration, performance testing and characterization. Support development program managers on technical assessments & planning. Interact closely with third‑party GPU IC suppliers & partners as well as internal hardware, software development, quality assurance, cloud orchestration, security experts, and Oracle manufacturing teams. Document and specify design intent and design details where appropriate in collaboration with the appropriate engineering teams. Participate in hardware platform security evaluations. Guide partner internal Oracle teams on support needed to scale, monitor, and successfully deploy our products to the Cloud. Assist Oracle Cloud and Support teams with root‑cause analysis of potential hardware or software bugs through firsthand lab replication, remote debug, and calls with the appropriate teams supporting our deployed products. Work with Oracle manufacturing teams to ensure that Oracle hardware is secure, robustly evaluated, performing at peak capabilities and well qualified for deployment to our Cloud customers. What This Role Looks Like: Work directly with hardware design and development teams on architecture, implementation, deployment, and troubleshooting of AI hardware platforms. Develop, implement, own, and run the day‑to‑day execution of AI platform development, both internally and in partnership with third‑party design teams. This includes reviews of design plans, schematics, board layout, test feature definition, and guidance for subsystem test, as well as system validation plans. Oversee system integration, system testing and qualification, define software diagnostics features and utilize third‑party and approved open‑source AI platform qualification and test tools. Expand system characterization and performance testing capabilities and support definition of in‑service system monitoring and error reporting needs. Collaborate with hardware developers, system architects, platform firmware developers, partners, AI chip/GPU suppliers, storage, networking and compute experts, and manufacturing to support the new product introduction process out to production. Serve as the last level of engineering technical support when trained cloud and support teams require guidance and help in resolving complex deployed product issues. Required Qualifications Technical hands‑on experience with market‑leading GPU (or alternate AI platforms) from the hardware and platform development, test, and characterization perspectives. Balance hardware performance priorities against power, cost, and cross‑functional considerations. Be responsible for meeting hardware product performance and regulatory specifications, if applicable. Solid knowledge of AI/GPU and/or AI/CPU platform architecture and their capabilities. Strong understanding and experience running firmware and system diagnostics tools using BMC firmware, UEFI/BIOS and Linux tools. Skilled in scripting to customize tests. Solid working experience with GPU supplier test code as well as open‑source AI test/characterization tools. Experience with the architecture, design, and implementation of modern server platforms consisting of multiple architectures and vendors, including x86 and ARM server architectures. Experience with hardware development at the system, board, and FPGA level. Required experience with board‑level tools and ability to review hierarchical schematics, multilayer advance board layout, cross‑board interconnect and end‑to‑end connectivity analysis. Strong communication skills and ability to clearly communicate complex technical issues across engineering disciplines as well as clearly and succinctly articulate issues for executives. Demonstrated experience debugging and root‑causing complex issues that may have a mix of hardware and software causes. Experience with early‑stage bring‑up and power‑on, platform firmware debugging, prototype GPU & CPU complex and memory complex debugging. Ability to isolate a problem to the source and devise timely and robust solutions. Experience and understanding of the latest high‑speed busses and interconnects used in modern compute and AI platforms. Familiarity with their startup connectivity and operational robustness. Preferred Qualifications 10 years of experience with hardware design and bring‑up. Comfortable with the use of hardware debuggers. Experience in PCIe, DDR, Ethernet, USB, SPI, etc. Experience with platform‑level security technologies is an advantage in the role. Experience with power circuit design and signal integrity. Salary and Benefits US: Hiring Range in USD from: $135,200 - $306,400 per year. May be eligible for bonus, equity, and compensation deferral. Benefits include: Medical, dental, and vision insurance, including expert medical opinion Short‑term disability and long‑term disability Life insurance and AD&DD Supplemental life insurance (Employee/Spouse/Child) Health care and dependent care Flexible Spending Accounts Pre‑tax commuter and parking benefits 401(k) Savings and Investment Plan with company match Paid Time Off: Flexible Vacation provided to all eligible employees assigned to a salaried (non‑overtime eligible) position. Accrued Vacation provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years. Vacation accrual prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 11 paid holidays Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. Paid parental leave Adoption assistance Employee Stock Purchase Plan Financial planning and group legal Voluntary benefits including auto, homeowner and pet insurance Additional Information Career Level - IC5 Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. Certain U.S. based or U.S. customer or client‑facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements. #J-18808-Ljbffr Oracle
- MissionHires in San Francisco is looking for a distributed systems software engineer to develop our in-house resource orchestration system for GPU compute nodes. This role entails designing resilient architectures for high availability and performance optimization. The...Suggested
- ...company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and... ...of foundation models. You will design distributed training systems and optimize GPU utilization while collaborating with cross-functional teams...Suggested
$160k - $320k
...initiative and deliver excellence. We seek engineers/researchers with strong intrinsic drive,... ...About the Role We’re looking for a systems engineer with HPC or parallel programming... ...of high-performance systems to optimize GPU performance at the bleeding edge of AI....SuggestedFull timeWork at office$500 per month
Aurelius Systems is a VC backed defense tech startup building autonomous, edge deployed... ...shoot down drones. We\'re a small team of engineers, former US military operators, and... ...behavior; how latency accumulates across CPU, GPU, memory, and I/O; how bandwidth limits affect...SuggestedPermanent employmentWork at officeMonday to FridayNight shiftWeekend work- Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale... ...will possess deep expertise in Linux systems, automation tools, and orchestration technologies... ...for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr...Suggested
- Thunder Compute is looking for a core C++ Systems Developer in San Francisco. Your role involves performance optimization and debugging in critical systems. The ideal candidate has top-tier C++ skills and a strong responsibility ethic from day one. This is a full-time in...Full time
- Requirements Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus You understand... ...you 3+ years of professional software engineering experience with meaningful work on ML inference...
- ...Conviction. Join us and help build the platform engineers turn to to ship AI products. At Baseten, we are building the global operating system for distributed, heterogeneous AI hardware... ...for foundational engineers to lead our GPU Networking efforts, making RDMA a first‑...Flexible hours
$238k - $288k
...Role As Crusoe Cloud expands across new GPU and CPU server platforms, we're investing deeply... ...operability — and we're hiring a founding engineer to lead our BMC firmware work. You'll... ...production teams to investigate and resolve systemic BMC issues surfaced from the fleet....Temporary work$100k - $120k
...significantly cheaper and faster. Responsibilities Lead a team of kernel and system engineers focused on performance-critical code Design, implement, and optimize custom compute kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators Find bottlenecks in memory...$137k - $161k
...Software Development team is seeking a passionate and experienced Systems Engineer II, Compute specializing in Systems Applications. This pivotal... ...virtualization specifically for AI/ML workloads, including GPU virtualization. Previous work debugging or contributing to...Full timeTemporary work- ...emphasizes collaboration with researchers and the implementation of reliable, efficient systems capable of long-duration training. Candidates should possess programming experience with GPU systems and a strong background in reinforcement learning. Our company combines deep...
- ...high-performance, open, and efficient AI systems designed to power the next generation of... ...About the Role We are looking for Systems Engineers / System Administrators to help design, operate... ...as Code / automation tooling GPU or AI/ML experience Profile We Value Pragmatic...
$170k - $220k
...significant ownership in designing and developing high-performance systems for LLMs, focusing on distributed systems and multi-GPU workloads. The ideal candidate should have 2+ years of backend engineering experience and strong Python skills. This full-time position...Full time- ...motivated by the challenge of solving tough engineering problems, committed to creating... ...including space-based IR sensing and interceptor systems. If you join the team, you will have the... ...track‑before‑detect—on embedded FPGA and GPU platforms. Performance Modeling & Test:...
$190k - $270k
AI Chopping Block, Inc. is seeking an experienced AI Infrastructure Engineer to manage user-facing services and production systems. The role encompasses participating in on-call rotations, building infrastructure with tools like Ansible, Terraform, and Kubernetes, and...Full timeInternship$142.7k - $270.95k
...Senior researcher - Machine Learning Systems & Efficiency Engineer to join our R&D team focused on delivering... ...model design decisions, improve GPU utilization, and build scalable, cost-... ...environments (e.g., A100, H100, B200, CPU). Contribute to resource scheduling, GPU...Full timeTemporary workLocal areaWorldwide- Summary As a systems engineer, you’ll work on pioneering machine learning infrastructure that enables running large numbers of experiments in... ...is required. Example projects Abstracting cloud and physical GPU resources. Implementing a caching system for models and...Remote jobLocal areaFlexible hours
$172.5k - $210k
...believes in each other, come build with us at Crusoe. Senior Systems Performance Engineer San Francisco, Sunnyvale (Onsite) Role Mission At Crusoe,... ...the team to scale evaluation processes for large‑scale GPU/AI data centers. Industry Leadership: Actively engage in industry...- ...build-out in history. When people finance GPU clusters, the datacenters housing them, and... ...operates at almost every layer of our system (from the web server to coordinating with... ...Experience with basic assembly Understanding of CPU interrupts Networking knowledge and the...Long term contractContract workFixed term contractWork at officeLocal areaVisa sponsorshipShift work
$218.4k - $273k
...data and model evaluation for Physical AI. The Role As an ML Systems Engineer on the Physical AI team, you will design and build platforms for... ...machine learning algorithms for cloud environments, including GPU-level algorithm optimizations (e.g., CUDA, kernel tuning)....Full time- ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE We’re seeking a GPU Kernel Engineer to join our team at the cutting... ...your contributions directly influence production systems serving millions of users across numerous products...Flexible hours
$210k - $255k
...the Role We are seeking a Staff Software Engineer to design, build, and scale Crusoe Cloud'... ...'ll ship customer-facing features, build systems from 0 to 1, and scale existing services... ...agents deployed on customer VMs. Minimize CPU/memory footprint without sacrificing observability...Full timeTemporary work- Develop and optimize GPU-accelerated kernels and algorithms for ML/AI applications. Work closely with the modeling and algorithm team... ...to enhance the performance and efficiency of our AI systems. Collaborate with the hardware and software teams to contribute...
$335k
...infrastructure that powers large-scale AI systems. We design and deliver next-generation... ...About the Role We are seeking a System Engineer (Network / Storage / Systems) to help architect... ...system faults across firmware, NIC, GPU, server, and platform layers; drive root...Work at officeRelocation package$342k
...infrastructure that powers large-scale AI systems. We design and deliver next-... ...the intersection of hardware engineering, systems architecture, and... ...the Role We are seeking a CPU & Storage Technical Lead to... ...decisions involving CPU, memory, NIC, GPU, and storage subsystem...Local area- ...We are seeking a Systems Engineering Manager to join our Rail and Transit team in Seattle, WA, Austin, TX, New York, NY, Washington, DC, and Sacramento, DC. Under general direction, provides technical expertise in systems integration for diverse engineering activities...
- AI Systems Engineer - Codex Core Agents The Codex Core Agents team builds the agent harness that turns model capability into real‑world action... ...across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface. You’ll work with research, infrastructure...
- AI Systems Engineer - Codex Core Agents Location San Francisco Employment Type Full time Department Applied AI Compensation 230K-385K Offers... ...in compilers, kernels, runtimes, inference optimization, GPU systems, benchmarking, profiling, or performance engineering....Full timeWork at officeLocal areaRelocation packageFlexible hours
$285k - $315k
...AMD. We are partnering with researchers, engineers, and organizations who share our belief that... .... About the Role We're hiring a Founding GPU Compiler Engineer to build the core... ...benchmarking, and performance regression systems Work closely with ML researchers to understand...Full timeWork at officeRelocation package
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to GPU/CPU Systems Engineer. Be the first to apply!
- visual systems engineer San Francisco, CA
- system engineer contract San Francisco, CA
- application system engineer San Francisco, CA
- system test engineer San Francisco, CA
- senior windows systems engineer San Francisco, CA
- lead system engineer San Francisco, CA
- system performance engineer San Francisco, CA
- senior staff systems engineer San Francisco, CA
- director systems engineering San Francisco, CA
- systems engineer San Francisco, CA

