Inference Systems Engineer for Transformers & Low-Latency HPC
Etched
An innovative AI hardware company in San Jose is looking for talented engineers to support the porting of state-of-the-art AI models to their architecture. Candidates should be proficient in C++ or Rust and have a strong understanding of performance-sensitive distributed software systems. This full-time position offers competitive benefits including a housing subsidy and wellness programs designed to support team members both professionally and personally. Join the team committed to redefining AI infrastructure. #J-18808-Ljbffr Etched
- ...to PCs, gaming and embedded systems. Grounded in a culture of innovation... ...for a Senior Staff AI Infra Engineer who is passionate about... ...accelerate LLM training and inference on AMD GPUs, improving... ...• Solid understanding of transformer-based architectures and distributed...Transformer
$152k - $241.5k
...roll out and enhance AI inference solutions at scale,... ...closely with our engineering, DevOps, and customers... ...disaggregated inference systems and resolving complex... ...HBM, DRAM, SSD), and low-latency networking (RDMA, UCX... ...understanding of transformer neural network, and inference...Transformer$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...qualifications) Developed low-latency, high-performance...TransformerWork at officeRelocation package- Acceler8 Talent is seeking a Software Engineer (Low-Level Systems) in San Jose to join their Supercomputing team. This role involves building control-plane software and working on hardware-software integration in AI infrastructure. Successful candidates will have strong...Suggested
$151.8k
...are looking for an AI Inference Engineer with a solid background... ...speech recognition system and ship it to various... ..., including inference latency, throughput, memory footprint... ...ASR systems with low latency and high accuracy... ...deep understanding of transformer encoder-decoder...TransformerWork at officeRemote work- ...building the world’s first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...With Proficiency in Rust. Low‑latency, high‑performance...TransformerInternshipWork at officeRelocation
$2,000 per month
...is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance... ...and dramatically lower cost and latency than a B200. With Etched ASICs,... ...investors and staffed by leading engineers, Etched is redefining the infrastructure...TransformerContract workWork at officeOverseasRelocation package$2,000 per month
...the world's first AI inference system purpose-built for transformers - delivering over 1... ...lower cost and latency than a B200. With Etched... ...staffed by leading engineers, Etched is... ...Architect and implement low-level control-plane... ...operation Background in HPC, AI infrastructure,...TransformerWork at officeRelocation package- ...Computer Vision Analytics Engineer - Medical Video/Image... ...solutions, ensuring low-latency inference for various medical... ...and edge computing systems. • Work closely with... ..., CNNs, Vision Transformers (ViTs), GANs, attention... ...performance computing (HPC) techniques for...TransformerRemote work
$2,000 per month
...the world’s first AI inference system purpose-built for transformers — delivering over 1... ...lower cost and latency than a B200. With Etched... ...staffed by leading engineers, Etched is... ...Architect and own low‑level control‑plane... ...cluster-scale systems (HPC, AI infrastructure,...TransformerWork at officeRelocation package- ...the world's first AI inference system purpose-built for transformers - delivering over 1... ...lower cost and latency than a B200. With Etched... ...staffed by leading engineers, Etched is... ...performance compute (HPC) clusters, massively... ...Strong understanding of low-level software...TransformerSummer workInternshipSummer internshipWork at officeRelocation
$212k - $318.4k
...Machine Learning Performance Engineer, Siri Runtime Systems And Interaction Apple... ...and optimizing our model inference stack. In this highly collaborative... ...of compute, memory and latency. - Collaborate with... ...Qualifications Understanding of Transformer and LLM architectures....TransformerRelocation$2,000 per month
...product (Sohu) only supports transformers, but has an order of... ...more throughput and lower latency than a B200. With Etched... ...Power Supply Integration Engineer We are seeking a Power Systems Design Engineer to join our... ...power solutions for low-voltage processors, ensuring...TransformerWork at officeRelocation package$154.9k - $263.3k
...into your hands without us. KLA invents systems and solutions for the manufacturing of wafers... ...R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work... ...Description/Preferred Qualifications HPC server systems are increasingly an essential...Minimum wageWork experience placementFlexible hours$165k - $242k
...Systems Engineer, Kernel Livingston, NJ / New York, NY / Sunnyvale, CA... ...ideal for someone who thrives in low-level systems engineering,... ...containerd, nydus, kubelet) HPC/AI workloads (CUDA, GPUDirect... ...– Tune kernel subsystems for latency, throughput, and scalability...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$114.8k - $195.2k
...into your hands without us. KLA invents systems and solutions for the manufacturing of wafers... ...R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work... ...Description/Preferred Qualifications HPC server systems are increasingly an essential...Minimum wageWork experience placementFlexible hours$215.28k - $364.32k
...Staff Machine Learning Engineer - Ai Foundation Santa Clara, CA Xpeng is... ...model and accelerating model training/inference. Our mission is to solve the... ...Job Responsibilities: Optimize transformer-based LLMs for low-latency and high-throughput inference. Optimize...TransformerFull time$138k - $206k
...solving the complex system-level challenges... ...hardware and software engineers to identify and... ...large scale LLM inference and training pipelines... ...metrics such as latency, throughput,... ...attention mechanisms, transformer architectures, and... ...skills in Python and low-level performance-...TransformerWork at officeImmediate startFlexible hours- Software Engineer - Low-Level Systems / Supercomputing AI Hardware Startup | Transformer Inference at Scale | On-site (San Jose) We’re hiring a Software Engineer (Low-Level Systems... ...like eBPF, perf, ftrace Background in HPC, AI infrastructure, or large-scale compute...Transformer
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...fast-paced environment ~ Low ego, high ownership-you're...TransformerWork at officeRelocation package$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...performance models bridging low-level hardware signals...TransformerWork at officeRelocation package$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...Strong understanding of low-level software engineering...TransformerWork at officeRelocation package- ...builds high-performance, low-power generative AI inference systems. We're leveraging novel techniques... ...closely with external engineering partners, ASIC design... ...performance compute (HPC) processors like GPUs, CPUs... ...attention mechanisms, foundation transformer models, and mapping these...TransformerRemote work
$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...when you're stuck Have low ego and high drive - you...TransformerWork at officeRelocation package$2,000 per month
...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...supporting high bandwidth, low latency communication across...TransformerWork at officeRelocation package$152k - $241.5k
...large language model inference? Join NVIDIA's TensorRT... ...tailored for transformer-based models running... ...Electrical/Computer Engineering, or a closely related... ...autoregressive LLM serving systems, including speculative... ...including optimizing for low-latency, resource-constrained...TransformerRemote work- ...building the world's first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...Strong understanding of low-level software engineering...TransformerInternshipSummer internshipWork at officeRelocation
$320k
...team is the execution engine behind NVIDIA’s... ...production deployment. We transform foundation models into... ...video intelligence systems using DeepStream and... ...kernels, memory, and latency/efficiency trade-offs... ...of delivering robust, low-latency inference at scale. You have led...Transformer$2,000 per month
...building the world’s first AI inference system purpose-built for transformers - delivering over 10x... ...lower cost and latency than a B200. With Etched... ...investors and staffed by leading engineers, Etched is redefining the... ...Ethernet, CPU (arc/arm), low power peripherals, sensors...TransformerWork at officeRelocation package- A technology company focused on industrial solutions is seeking a Lead Systems Software Engineer in San Jose or Washington D.C. This role involves designing and maintaining system-level platform code across Android and Linux, integrating custom hardware, and leading a team...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Inference Systems Engineer for Transformers & Low-Latency HPC. Be the first to apply!
- healthcare systems engineer San Jose, CA
- wireless systems engineer San Jose, CA
- system test engineer San Jose, CA
- unix linux systems engineer San Jose, CA
- electronic systems engineer San Jose, CA
- systems engineer San Jose, CA
- ground systems engineer San Jose, CA
- operations support system engineer San Jose, CA
- digital communications systems engineer San Jose, CA
- data systems engineer San Jose, CA


