Inference Systems Engineer for Transformers & Low-Latency HPC

Etched

An innovative AI hardware company in San Jose is looking for talented engineers to support the porting of state-of-the-art AI models to their architecture. Candidates should be proficient in C++ or Rust and have a strong understanding of performance-sensitive distributed software systems. This full-time position offers competitive benefits including a housing subsidy and wellness programs designed to support team members both professionally and personally. Join the team committed to redefining AI infrastructure. #J-18808-Ljbffr Etched

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Inference Systems Engineer for Transformers & Low-Latency HPC in San Jose, CA vacancy

Principal AI Inference Systems Engineer
...to PCs, gaming and embedded systems. Grounded in a culture of innovation... ...for a Senior Staff AI Infra Engineer who is passionate about... ...accelerate LLM training and inference on AMD GPUs, improving... ...• Solid understanding of transformer-based architectures and distributed...
Transformer
Advanced Micro Devices , Inc.
Santa Clara, CA
5 days ago
Solutions Architect, Inference Deployments
$152k - $241.5k
...roll out and enhance AI inference solutions at scale,... ...closely with our engineering, DevOps, and customers... ...disaggregated inference systems and resolving complex... ...HBM, DRAM, SSD), and low-latency networking (RDMA, UCX... ...understanding of transformer neural network, and inference...
Transformer
NVIDIA
Santa Clara, CA
4 days ago
Inference Software Engineer
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...qualifications) Developed low-latency, high-performance...
Transformer
Work at office
Relocation package
ETCHED LLC
San Jose, CA
5 days ago
Low-Level Systems Engineer - AI Inference (San Jose)
Acceler8 Talent is seeking a Software Engineer (Low-Level Systems) in San Jose to join their Supercomputing team. This role involves building control-plane software and working on hardware-software integration in AI infrastructure. Successful candidates will have strong...
Suggested
Acceler8 Talent
San Jose, CA
19 hours ago
AI Inference Engineer - Speech
$151.8k
...are looking for an AI Inference Engineer with a solid background... ...speech recognition system and ship it to various... ..., including inference latency, throughput, memory footprint... ...ASR systems with low latency and high accuracy... ...deep understanding of transformer encoder-decoder...
Transformer
Work at office
Remote work
Zoom Video Communications
San Jose, CA
4 days ago
Inference Intern
...building the world’s first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...With Proficiency in Rust. Low‑latency, high‑performance...
Transformer
Internship
Work at office
Relocation
Etched
San Jose, CA
2 days ago
Systems Validation Engineer, L10
$2,000 per month
...is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance... ...and dramatically lower cost and latency than a B200. With Etched ASICs,... ...investors and staffed by leading engineers, Etched is redefining the infrastructure...
Transformer
Contract work
Work at office
Overseas
Relocation package
Etched
San Jose, CA
8 days ago
Supercomputing Engineer
$2,000 per month
...the world's first AI inference system purpose-built for transformers - delivering over 1... ...lower cost and latency than a B200. With Etched... ...staffed by leading engineers, Etched is... ...Architect and implement low-level control-plane... ...operation Background in HPC, AI infrastructure,...
Transformer
Work at office
Relocation package
ETCHED LLC
San Jose, CA
3 days ago
AI Analytics Engineer
...Computer Vision Analytics Engineer - Medical Video/Image... ...solutions, ensuring low-latency inference for various medical... ...and edge computing systems. • Work closely with... ..., CNNs, Vision Transformers (ViTs), GANs, attention... ...performance computing (HPC) techniques for...
Transformer
Remote work
YD Talent Solutions
Santa Clara, CA
1 day ago
Head of Supercomputing
$2,000 per month
...the world’s first AI inference system purpose-built for transformers — delivering over 1... ...lower cost and latency than a B200. With Etched... ...staffed by leading engineers, Etched is... ...Architect and own low‑level control‑plane... ...cluster-scale systems (HPC, AI infrastructure,...
Transformer
Work at office
Relocation package
Etched
San Jose, CA
2 days ago
Infrastructure Intern
...the world's first AI inference system purpose-built for transformers - delivering over 1... ...lower cost and latency than a B200. With Etched... ...staffed by leading engineers, Etched is... ...performance compute (HPC) clusters, massively... ...Strong understanding of low-level software...
Transformer
Summer work
Internship
Summer internship
Work at office
Relocation
ETCHED LLC
San Jose, CA
1 day ago
Staff Machine Learning Performance Engineer, Siri Runtime Systems and Interaction
$212k - $318.4k
...Machine Learning Performance Engineer, Siri Runtime Systems And Interaction Apple... ...and optimizing our model inference stack. In this highly collaborative... ...of compute, memory and latency. - Collaborate with... ...Qualifications Understanding of Transformer and LLM architectures....
Transformer
Relocation
Apple
Cupertino, CA
2 days ago
Power Systems Design Engineer
$2,000 per month
...product (Sohu) only supports transformers, but has an order of... ...more throughput and lower latency than a B200. With Etched... ...Power Supply Integration Engineer We are seeking a Power Systems Design Engineer to join our... ...power solutions for low-voltage processors, ensuring...
Transformer
Work at office
Relocation package
Etched
San Jose, CA
5 days ago
HPC Systems Engineer
$154.9k - $263.3k
...into your hands without us. KLA invents systems and solutions for the manufacturing of wafers... ...R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work... ...Description/Preferred Qualifications HPC server systems are increasingly an essential...
Minimum wage
Work experience placement
Flexible hours
KLA
Milpitas, CA
2 days ago
Systems Engineer, Kernel
$165k - $242k
...Systems Engineer, Kernel Livingston, NJ / New York, NY / Sunnyvale, CA... ...ideal for someone who thrives in low-level systems engineering,... ...containerd, nydus, kubelet) HPC/AI workloads (CUDA, GPUDirect... ...– Tune kernel subsystems for latency, throughput, and scalability...
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
CoreWeave
Sunnyvale, CA
3 days ago
HPC Systems Engineer (E)
$114.8k - $195.2k
...into your hands without us. KLA invents systems and solutions for the manufacturing of wafers... ...R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work... ...Description/Preferred Qualifications HPC server systems are increasingly an essential...
Minimum wage
Work experience placement
Flexible hours
KLA
Milpitas, CA
2 days ago
Staff Machine Learning Engineer - AI Foundation
$215.28k - $364.32k
...Staff Machine Learning Engineer - Ai Foundation Santa Clara, CA Xpeng is... ...model and accelerating model training/inference. Our mission is to solve the... ...Job Responsibilities: Optimize transformer-based LLMs for low-latency and high-throughput inference. Optimize...
Transformer
Full time
XPENG
Santa Clara, CA
1 day ago
Senior Engineer, AI Systems
$138k - $206k
...solving the complex system-level challenges... ...hardware and software engineers to identify and... ...large scale LLM inference and training pipelines... ...metrics such as latency, throughput,... ...attention mechanisms, transformer architectures, and... ...skills in Python and low-level performance-...
Transformer
Work at office
Immediate start
Flexible hours
Samsung Semiconductor
San Jose, CA
12 days ago
Member of Technical Staff
Software Engineer - Low-Level Systems / Supercomputing AI Hardware Startup | Transformer Inference at Scale | On-site (San Jose) We’re hiring a Software Engineer (Low-Level Systems... ...like eBPF, perf, ftrace Background in HPC, AI infrastructure, or large-scale compute...
Transformer
Acceler8 Talent
San Jose, CA
3 days ago
FP&A Analyst
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...fast-paced environment ~ Low ego, high ownership-you're...
Transformer
Work at office
Relocation package
ETCHED LLC
San Jose, CA
1 day ago
Head of Performance Visibility
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...performance models bridging low-level hardware signals...
Transformer
Work at office
Relocation package
Etched AI
San Jose, CA
4 days ago
Chip Simulation Software Engineer
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...Strong understanding of low-level software engineering...
Transformer
Work at office
Relocation package
ETCHED LLC
San Jose, CA
4 days ago
Principal ASIC Architect
...builds high-performance, low-power generative AI inference systems. We're leveraging novel techniques... ...closely with external engineering partners, ASIC design... ...performance compute (HPC) processors like GPUs, CPUs... ...attention mechanisms, foundation transformer models, and mapping these...
Transformer
Remote work
Tensordyne
Sunnyvale, CA
1 day ago
Technical Recruiter (Entry Level)
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...when you're stuck Have low ego and high drive - you...
Transformer
Work at office
Relocation package
ETCHED LLC
San Jose, CA
1 day ago
Supercomputing Engineer (Network)
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...supporting high bandwidth, low latency communication across...
Transformer
Work at office
Relocation package
ETCHED LLC
San Jose, CA
3 days ago
Senior Software Engineer - TensorRT Edge-LLM
$152k - $241.5k
...large language model inference? Join NVIDIA's TensorRT... ...tailored for transformer-based models running... ...Electrical/Computer Engineering, or a closely related... ...autoregressive LLM serving systems, including speculative... ...including optimizing for low-latency, resource-constrained...
Transformer
Remote work
NVIDIA
Santa Clara, CA
1 day ago
Chip Simulation Software Intern
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...Strong understanding of low-level software engineering...
Transformer
Internship
Summer internship
Work at office
Relocation
ETCHED LLC
San Jose, CA
3 days ago
Director, System Software Engineering - Metropolis Accelerated and Inferencing Software
$320k
...team is the execution engine behind NVIDIA’s... ...production deployment. We transform foundation models into... ...video intelligence systems using DeepStream and... ...kernels, memory, and latency/efficiency trade-offs... ...of delivering robust, low-latency inference at scale. You have led...
Transformer
NVIDIA
Santa Clara, CA
4 days ago
Design Verification Engineer - Interface IP
$2,000 per month
...building the world’s first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...Ethernet, CPU (arc/arm), low power peripherals, sensors...
Transformer
Work at office
Relocation package
Etched.ai, Inc.
San Jose, CA
19 hours ago
Lead Android/Linux Systems Engineer | Secure, Low-Latency
A technology company focused on industrial solutions is seeking a Lead Systems Software Engineer in San Jose or Washington D.C. This role involves designing and maintaining system-level platform code across Android and Linux, integrating custom hardware, and leading a team...
Rivet Industries, Inc.
San Jose, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Systems Engineer for Transformers & Low-Latency HPC. Be the first to apply!