GPU/CPU Systems Engineer

$135.2k - $306.4k

Oracle

Job Overview Oracle hardware platform development engineering is seeking a highly driven GPU/CPU Platform System Engineer at the Principal Engineer level. The GPU System Engineer will work within development engineering with a small team of talented engineers who lead the development and day-to-day engineering efforts for Oracle’s rapidly growing and successful Cloud AI platforms. You will participate in platform definition, platform development oversight, in‑house development, design reviews, system integration, performance testing and characterization. You will interact closely with third‑party GPU IC suppliers and partners as well as internal hardware and software development teams to help drive Oracle’s AI Cloud platform solution space. You will be a critical part of the team developing Oracle’s growing Cloud AI solutions. The team has delivered the first and second generation of Oracle Cloud dedicated compute, AI platforms and is building out the next generation of Cloud and Enterprise systems with record‑breaking performance, security, and world‑class quality, using the latest merchant silicon and technologies. Oracle Cloud Infrastructure (OCI) is looking for a visionary Systems Engineer to lead innovation in AI hardware and datacenter infrastructure . In this high‑impact role, you’ll guide the development of emerging technologies in compute accelerators, virtualization, networking, energy systems , and AI infrastructure . Your work will directly influence OCI’s long‑term architectural direction and help shape the future of cloud infrastructure. Responsibilities Our Design Engineering organization is looking for a highly driven, capable, and dedicated Principal Engineer to join the team developing the next generation AI platform for Cloud. Your work will include: Review and assessment of third‑party merchant silicon used for AI Accelerator Modules. Evaluation of system architecture and proposed implementation path analysis. Participate in platform definition and analysis. Provide platform development oversight for partners. Work with in‑house engineering functional experts on design and reviews. Support and guide system integration, performance testing and characterization. Support development program managers on technical assessments & planning. Interact closely with third‑party GPU IC suppliers & partners as well as internal hardware, software development, quality assurance, cloud orchestration, security experts, and Oracle manufacturing teams. Document and specify design intent and design details where appropriate in collaboration with the appropriate engineering teams. Participate in hardware platform security evaluations. Guide partner internal Oracle teams on support needed to scale, monitor, and successfully deploy our products to the Cloud. Assist Oracle Cloud and Support teams with root‑cause analysis of potential hardware or software bugs through firsthand lab replication, remote debug, and calls with the appropriate teams supporting our deployed products. Work with Oracle manufacturing teams to ensure that Oracle hardware is secure, robustly evaluated, performing at peak capabilities and well qualified for deployment to our Cloud customers. What This Role Looks Like: Work directly with hardware design and development teams on architecture, implementation, deployment, and troubleshooting of AI hardware platforms. Develop, implement, own, and run the day‑to‑day execution of AI platform development, both internally and in partnership with third‑party design teams. This includes reviews of design plans, schematics, board layout, test feature definition, and guidance for subsystem test, as well as system validation plans. Oversee system integration, system testing and qualification, define software diagnostics features and utilize third‑party and approved open‑source AI platform qualification and test tools. Expand system characterization and performance testing capabilities and support definition of in‑service system monitoring and error reporting needs. Collaborate with hardware developers, system architects, platform firmware developers, partners, AI chip/GPU suppliers, storage, networking and compute experts, and manufacturing to support the new product introduction process out to production. Serve as the last level of engineering technical support when trained cloud and support teams require guidance and help in resolving complex deployed product issues. Required Qualifications Technical hands‑on experience with market‑leading GPU (or alternate AI platforms) from the hardware and platform development, test, and characterization perspectives. Balance hardware performance priorities against power, cost, and cross‑functional considerations. Be responsible for meeting hardware product performance and regulatory specifications, if applicable. Solid knowledge of AI/GPU and/or AI/CPU platform architecture and their capabilities. Strong understanding and experience running firmware and system diagnostics tools using BMC firmware, UEFI/BIOS and Linux tools. Skilled in scripting to customize tests. Solid working experience with GPU supplier test code as well as open‑source AI test/characterization tools. Experience with the architecture, design, and implementation of modern server platforms consisting of multiple architectures and vendors, including x86 and ARM server architectures. Experience with hardware development at the system, board, and FPGA level. Required experience with board‑level tools and ability to review hierarchical schematics, multilayer advance board layout, cross‑board interconnect and end‑to‑end connectivity analysis. Strong communication skills and ability to clearly communicate complex technical issues across engineering disciplines as well as clearly and succinctly articulate issues for executives. Demonstrated experience debugging and root‑causing complex issues that may have a mix of hardware and software causes. Experience with early‑stage bring‑up and power‑on, platform firmware debugging, prototype GPU & CPU complex and memory complex debugging. Ability to isolate a problem to the source and devise timely and robust solutions. Experience and understanding of the latest high‑speed busses and interconnects used in modern compute and AI platforms. Familiarity with their startup connectivity and operational robustness. Preferred Qualifications 10 years of experience with hardware design and bring‑up. Comfortable with the use of hardware debuggers. Experience in PCIe, DDR, Ethernet, USB, SPI, etc. Experience with platform‑level security technologies is an advantage in the role. Experience with power circuit design and signal integrity. Salary and Benefits US: Hiring Range in USD from: $135,200 - $306,400 per year. May be eligible for bonus, equity, and compensation deferral. Benefits include: Medical, dental, and vision insurance, including expert medical opinion Short‑term disability and long‑term disability Life insurance and AD&DD Supplemental life insurance (Employee/Spouse/Child) Health care and dependent care Flexible Spending Accounts Pre‑tax commuter and parking benefits 401(k) Savings and Investment Plan with company match Paid Time Off: Flexible Vacation provided to all eligible employees assigned to a salaried (non‑overtime eligible) position. Accrued Vacation provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years. Vacation accrual prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 11 paid holidays Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. Paid parental leave Adoption assistance Employee Stock Purchase Plan Financial planning and group legal Voluntary benefits including auto, homeowner and pet insurance Additional Information Career Level - IC5 Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. Certain U.S. based or U.S. customer or client‑facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements. #J-18808-Ljbffr Oracle

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the GPU/CPU Systems Engineer in San Francisco, CA vacancy

GPU Systems Engineer - HPC / Parallel Computing
...is earned by shipping excellence. We seek engineers with strong intrinsic drive, a true... ...Angeles. About the Role We’re looking for a systems engineer with HPC or parallel programming... ...of high-performance systems to optimize GPU performance at the bleeding edge of AI. Full...
Suggested
Full time
Work at office
Vast
San Francisco, CA
5 hours ago
Distributed Systems Engineer - AI Infra & GPU Clusters
A leading AI technology firm in San Francisco is seeking systems-oriented candidates to enhance their infrastructure used for advanced AI... ...experience managing distributed systems, particularly within GPU environments. The position offers the chance to work on cutting-...
Suggested
krea.ai
San Francisco, CA
5 days ago
GPU Systems Engineer - Scale AI Inference (On-site SF/LA)
Vast.ai Inc. is seeking a systems engineer with HPC or parallel programming experience to help scale AI inference. You will design and optimize GPU kernels and tensor libraries, leveraging CUDA/C++ and related frameworks to push the bleeding edge of AI performance. This...
Suggested
Vast.ai Inc.
San Francisco, CA
5 days ago
GPU Systems Engineer — HPC & AI Inference (On-site)
Vast.ai is seeking a systems engineer to scale AI inference and optimize GPU performance at our San Francisco or Los Angeles offices. You will leverage your HPC background to push the bleeding edge of AI, working with CUDA/C++ and a modern tech stack. Ideal candidates have...
Suggested
Full time
Vast.ai
San Francisco, CA
5 days ago
C++ Systems Engineer: GPU Virtualization & Ultra-Low Latency
A leading consulting firm is seeking a Software Engineer (C++ Systems) in San Francisco to optimize microsecond-level performance in GPU virtualization software. Ideal candidates will have elite C++ expertise, with at least 2 years of experience in low-level systems engineering...
Suggested
SK HR Consultants.com
San Francisco, CA
5 days ago
C++ Systems Engineer: Low-Latency GPU Virtualization (SF)
USA Tech Recruit is looking for a Software Engineer (C++ Systems) to work in San Francisco. This full-time onsite role focuses on optimizing high-performance GPU virtualization technology, ideal for those passionate about low-level GPU infrastructure. The position requires...
Full time
Relocation
USA Tech Recruit
San Francisco, CA
5 days ago
C++ Systems Engineer: GPU Virtualization & Performance
10X Business Consulting is seeking a highly skilled Software Engineer (C++ Systems) to join our client’s team in San Francisco. You will own production systems from day one, optimize a GPU virtualization platform, and tackle challenging performance issues in a fast-growing...
10X Recruiting Partners
San Francisco, CA
5 days ago
Senior ML Training Systems Engineer - Distributed GPU Infra
...company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and... ...of foundation models. You will design distributed training systems and optimize GPU utilization while collaborating with cross-functional teams...
Baseten
San Francisco, CA
2 days ago
Systems Research Engineer Intern - GPU Programming (Fall 2026)
$58 - $63 per hour
Systems Research Engineer Intern - GPU Programming (Fall 2026) About The Role As a Systems Research Engineer Intern specialized in GPU Programming, you will play a crucial role in developing and optimizing GPU-accelerated kernels and algorithms for ML/AI applications....
Hourly pay
Internship
Together AI
San Francisco, CA
5 days ago
Remote GPU Systems Research Engineer (CUDA/Triton)
Together AI in San Francisco offers a role as Systems Research Engineer focused on GPU programming to develop and optimize GPU-accelerated kernels for ML/AI workloads. You will co-design kernels with modeling teams and collaborate across hardware and software groups to...
Remote job
Together AI
San Francisco, CA
4 days ago
Founding Systems + ML Engineer: GPU & Chip Design
A technology startup is seeking a Founding Engineer (Systems + ML) to develop GPU-accelerated engines and build end-to-end pipelines. The ideal candidate has proven experience in C++ and GPU code, alongside a deep understanding of systems performance. You'll join as the...
Full time
Partcl
San Francisco, CA
5 days ago
Founding Systems Engineer
Job Title: Founding Systems Engineer (Performance) Location | On-Site — San Francisco, CA Time Zone | PST Role Overview We are seeking a... ...fundamentals — you think naturally about latency, throughput, memory, CPU, GPU, I/O, and resource efficiency Experience improving...
Hire Hangar
San Francisco, CA
2 days ago
Senior Systems Engineer: ML Pipelines & GPU-Scale Infra
$250k
A Series A Funded start-up in California is seeking a Systems Engineer to design and optimize systems handling complex ML pipelines. The role involves building scalable infrastructure, developing CI/CD pipelines, and ensuring system performance. Key qualifications include...
Acceler8 Talent
San Francisco, CA
1 day ago
RL Systems Engineer: Large-Scale GPU Training
...infrastructure powering our reinforcement learning stack in a San Francisco office. You’ll work with researchers to ensure the RL system is fast, reliable, and capable of days-long runs with minimal intervention. You will design and optimize training pipelines across...
Work at office
Applied Compute
San Francisco, CA
3 days ago
Sr. Systems Performance Engineer
$500 per month
Who We Are: Aurelius Systems is a VC backed defense tech startup building autonomous, edge... ...down drones. We're a small team of engineers, former US military operators, and subject... ...; how latency accumulates across CPU, GPU, memory, and I/O; how bandwidth limits affect...
Permanent employment
Work at office
Monday to Friday
Flexible hours
Night shift
Weekend work
Aurelius Systems
San Francisco, CA
2 days ago
GPU Cluster Engineer - AI Training & Research Systems
$350k
Thinking Machines Lab is looking for a skilled engineer to operate and automate large GPU clusters in San Francisco, California. Candidates should have... ...with large-scale clusters and container orchestration systems. The role offers an expected salary range of $350,000...
Visa sponsorship
Thinking Machines Lab
San Francisco, CA
5 days ago
Senior Hardware Systems Engineer
$172k - $209k
...Role We are seeking a Hardware Production / Sustaining Engineer to strengthen Crusoe’s Hardware Systems Engineering team and close critical skill gaps in... ...issue resolution, and reliability across Crusoe Cloud’s GPU‑ and CPU‑based infrastructure. You will work closely with...
Temporary work
Crusoe Energy Systems
San Francisco, CA
3 days ago
Senior GPU Fleet Reliability Engineer
$180k - $250k
A tech company located in San Francisco seeks a hands-on engineer to manage a fleet of GPU servers. The role requires building and maintaining a Python fleet tracking system, implementing OS-level security, and developing automated error detection processes. The ideal...
Fal
San Francisco, CA
5 days ago
GPU Reliability Engineer — Systems & Firmware
Thinking Machines Lab is hiring an engineer to ensure the reliability of our GPU supercomputing fleet. You will own the seam between hardware, firmware, and OS, diagnosing issues to root cause and coordinating fixes with vendors so researchers can scale confidently. This...
Thinking Machines
San Francisco, CA
4 days ago
Systems Engineer
$180k - $250k
...About Unto Labs Unto Labs is a team of low-level engineers pushing distributed systems to the physical limits of modern hardware. We’re reimagining blockchains... ...optimization Experience and understanding of modern CPU architectures, memory hierarchies, and low-level...
Flexible hours
Unto Labs
San Francisco, CA
1 day ago
Software Engineer — GPU Networking & Distributed Systems
...Conviction. Join us and help build the platform engineers turn to to ship AI products. At Baseten, we are building the global operating system for distributed, heterogeneous AI hardware... ...for foundational engineers to lead our GPU Networking efforts, making RDMA a first-...
Flexible hours
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
5 days ago
GPU / CUDA Engineers (Multiple Openings)
...Francisco, CA are looking for experts in GPU Optimization / Inference... ...incredibly complex (new inference engines) Qualifications Proven background in CPU acceleration and/or GPU optimization... ...products targeting high-performance ML systems Strong coding skills in high-...
Full time
Immediate start
Greylock Partners
San Francisco, CA
5 days ago
TaaS Systems Engineer for Token Throughput
$293k
OpenAI is looking for a Tokens-as-a-Service (TaaS) Engineer to enhance token throughput for our workloads. You will develop systems to measure and improve GPU capacity, conduct performance benchmarking, and integrate external compute environments. Qualifications include...
OpenAI
San Francisco, CA
5 days ago
Systems Engineer II, Compute
$137k - $161k
...Software Development team is seeking a passionate and experienced Systems Engineer II, Compute specializing in Systems Applications. This pivotal... ...virtualization specifically for AI/ML workloads, including GPU virtualization. Previous work debugging or contributing to...
Full time
Temporary work
Crusoe Energy Systems
San Francisco, CA
1 day ago
Windows Systems Engineer
...Windows Systems Engineer About Trellis Trellis builds and deploys computer-use AI agents that get patients access to life-saving medicine. Our... ...Diagnose and remediate runaway processes, memory leaks, high CPU, and zombie sessions on live hosts Manage scheduled tasks for...
Contract work
Local area
Remote work
Usetrellis
San Francisco, CA
4 days ago
GPU Platform Reliability Engineer
Beam is seeking a role-focused engineer to own the health and reliability of our GPU compute fleet in a fast-growing AI inference platform. You will build and own metrics pipelines, alerts, and a unified health view across thousands of GPUs in production. You will automate...
Beam
San Francisco, CA
5 days ago
Remote Systems Engineer - Imbue
Summary As a systems engineer, you’ll work on pioneering machine learning infrastructure that enables running large numbers of experiments in... ...is required. Example projects Abstracting cloud and physical GPU resources. Implementing a caching system for models and...
Remote job
Local area
Flexible hours
Stars Arena
San Francisco, CA
1 day ago
Principal Systems Engineer
$175k - $225k
...Principal Systems Engineer – GPU Supercluster Bringup We are building AI infrastructure for frontier-scale workloads. Our platform is designed for high-density, high-performance GPU clusters that push the limits of power, networking, and distributed compute. As...
Flexible hours
Nscale
San Francisco, CA
3 days ago
Software Engineer - Systems
...build-out in history. When people finance GPU clusters, the datacenters housing them, and... ...operates at almost every layer of our system (from the web server to coordinating with... ...Experience with basic assembly Understanding of CPU interrupts Networking knowledge and the...
Long term contract
Contract work
Fixed term contract
Work at office
Local area
Visa sponsorship
Shift work
San Francisco Compute Company
San Francisco, CA
5 days ago
Senior Systems Engineer - Rust (Robotics Systems)
$250k
...Up to $250k Salary We’re looking for a highly technical engineer to help build the core systems that power real-time robotic intelligence. This role... ...deliver production-ready systems Profile and optimize for CPU-level performance, memory layout, and cache efficiency...
UMATR
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to GPU/CPU Systems Engineer. Be the first to apply!