Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Inference Systems Engineer for Transformers & Low-Latency HPC

Etched

An innovative AI hardware company in San Jose is looking for talented engineers to support the porting of state-of-the-art AI models to their architecture. Candidates should be proficient in C++ or Rust and have a strong understanding of performance-sensitive distributed software systems. This full-time position offers competitive benefits including a housing subsidy and wellness programs designed to support team members both professionally and personally. Join the team committed to redefining AI infrastructure. #J-18808-Ljbffr Etched

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Inference Systems Engineer for Transformers & Low-Latency HPC in San Jose, CA vacancy
  •  ...to PCs, gaming and embedded systems. Grounded in a culture of innovation...  ...for a Senior Staff AI Infra Engineer who is passionate about...  ...accelerate LLM training and inference on AMD GPUs, improving...  ...• Solid understanding of transformer-based architectures and distributed... 
    Transformer

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...roll out and enhance AI inference solutions at scale,...  ...closely with our engineering, DevOps, and customers...  ...disaggregated inference systems and resolving complex...  ...HBM, DRAM, SSD), and low-latency networking (RDMA, UCX...  ...understanding of transformer neural network, and inference... 
    Transformer

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...qualifications) Developed low-latency, high-performance... 
    Transformer
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    5 days ago
  • Acceler8 Talent is seeking a Software Engineer (Low-Level Systems) in San Jose to join their Supercomputing team. This role involves building control-plane software and working on hardware-software integration in AI infrastructure. Successful candidates will have strong... 
    Suggested

    Acceler8 Talent

    San Jose, CA
    19 hours ago
  • $151.8k

     ...are looking for an AI Inference Engineer with a solid background...  ...speech recognition system and ship it to various...  ..., including inference latency, throughput, memory footprint...  ...ASR systems with low latency and high accuracy...  ...deep understanding of transformer encoder-decoder... 
    Transformer
    Work at office
    Remote work

    Zoom Video Communications

    San Jose, CA
    4 days ago
  •  ...building the world’s first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...With Proficiency in Rust. Low‑latency, high‑performance... 
    Transformer
    Internship
    Work at office
    Relocation

    Etched

    San Jose, CA
    2 days ago
  • $2,000 per month

     ...is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance...  ...and dramatically lower cost and latency than a B200. With Etched ASICs,...  ...investors and staffed by leading engineers, Etched is redefining the infrastructure... 
    Transformer
    Contract work
    Work at office
    Overseas
    Relocation package

    Etched

    San Jose, CA
    8 days ago
  • $2,000 per month

     ...the world's first AI inference system purpose-built for transformers - delivering over 1...  ...lower cost and latency than a B200. With Etched...  ...staffed by leading engineers, Etched is...  ...Architect and implement low-level control-plane...  ...operation Background in HPC, AI infrastructure,... 
    Transformer
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    3 days ago
  •  ...Computer Vision Analytics Engineer - Medical Video/Image...  ...solutions, ensuring low-latency inference for various medical...  ...and edge computing systems. • Work closely with...  ..., CNNs, Vision Transformers (ViTs), GANs, attention...  ...performance computing (HPC) techniques for... 
    Transformer
    Remote work

    YD Talent Solutions

    Santa Clara, CA
    1 day ago
  • $2,000 per month

     ...the world’s first AI inference system purpose-built for transformers — delivering over 1...  ...lower cost and latency than a B200. With Etched...  ...staffed by leading engineers, Etched is...  ...Architect and own low‑level control‑plane...  ...cluster-scale systems (HPC, AI infrastructure,... 
    Transformer
    Work at office
    Relocation package

    Etched

    San Jose, CA
    2 days ago
  •  ...the world's first AI inference system purpose-built for transformers - delivering over 1...  ...lower cost and latency than a B200. With Etched...  ...staffed by leading engineers, Etched is...  ...performance compute (HPC) clusters, massively...  ...Strong understanding of low-level software... 
    Transformer
    Summer work
    Internship
    Summer internship
    Work at office
    Relocation

    ETCHED LLC

    San Jose, CA
    1 day ago
  • $212k - $318.4k

     ...Machine Learning Performance Engineer, Siri Runtime Systems And Interaction Apple...  ...and optimizing our model inference stack. In this highly collaborative...  ...of compute, memory and latency. - Collaborate with...  ...Qualifications Understanding of Transformer and LLM architectures.... 
    Transformer
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $2,000 per month

     ...product (Sohu) only supports transformers, but has an order of...  ...more throughput and lower latency than a B200. With Etched...  ...Power Supply Integration Engineer We are seeking a Power Systems Design Engineer to join our...  ...power solutions for low-voltage processors, ensuring... 
    Transformer
    Work at office
    Relocation package

    Etched

    San Jose, CA
    5 days ago
  • $154.9k - $263.3k

     ...into your hands without us. KLA invents systems and solutions for the manufacturing of wafers...  ...R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work...  ...Description/Preferred Qualifications HPC server systems are increasingly an essential... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    2 days ago
  • $165k - $242k

     ...Systems Engineer, Kernel Livingston, NJ / New York, NY / Sunnyvale, CA...  ...ideal for someone who thrives in low-level systems engineering,...  ...containerd, nydus, kubelet) HPC/AI workloads (CUDA, GPUDirect...  ...– Tune kernel subsystems for latency, throughput, and scalability... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    3 days ago
  • $114.8k - $195.2k

     ...into your hands without us. KLA invents systems and solutions for the manufacturing of wafers...  ...R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work...  ...Description/Preferred Qualifications HPC server systems are increasingly an essential... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    2 days ago
  • $215.28k - $364.32k

     ...Staff Machine Learning Engineer - Ai Foundation Santa Clara, CA Xpeng is...  ...model and accelerating model training/inference. Our mission is to solve the...  ...Job Responsibilities: Optimize transformer-based LLMs for low-latency and high-throughput inference. Optimize... 
    Transformer
    Full time

    XPENG

    Santa Clara, CA
    1 day ago
  • $138k - $206k

     ...solving the complex system-level challenges...  ...hardware and software engineers to identify and...  ...large scale LLM inference and training pipelines...  ...metrics such as latency, throughput,...  ...attention mechanisms, transformer architectures, and...  ...skills in Python and low-level performance-... 
    Transformer
    Work at office
    Immediate start
    Flexible hours

    Samsung Semiconductor

    San Jose, CA
    12 days ago
  • Software Engineer - Low-Level Systems / Supercomputing AI Hardware Startup | Transformer Inference at Scale | On-site (San Jose) We’re hiring a Software Engineer (Low-Level Systems...  ...like eBPF, perf, ftrace Background in HPC, AI infrastructure, or large-scale compute... 
    Transformer

    Acceler8 Talent

    San Jose, CA
    3 days ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...fast-paced environment ~ Low ego, high ownership-you're... 
    Transformer
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    1 day ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...performance models bridging low-level hardware signals... 
    Transformer
    Work at office
    Relocation package

    Etched AI

    San Jose, CA
    4 days ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...Strong understanding of low-level software engineering... 
    Transformer
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    4 days ago
  •  ...builds high-performance, low-power generative AI inference systems. We're leveraging novel techniques...  ...closely with external engineering partners, ASIC design...  ...performance compute (HPC) processors like GPUs, CPUs...  ...attention mechanisms, foundation transformer models, and mapping these... 
    Transformer
    Remote work

    Tensordyne

    Sunnyvale, CA
    1 day ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...when you're stuck Have low ego and high drive - you... 
    Transformer
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    1 day ago
  • $2,000 per month

     ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...supporting high bandwidth, low latency communication across... 
    Transformer
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    3 days ago
  • $152k - $241.5k

     ...large language model inference? Join NVIDIA's TensorRT...  ...tailored for transformer-based models running...  ...Electrical/Computer Engineering, or a closely related...  ...autoregressive LLM serving systems, including speculative...  ...including optimizing for low-latency, resource-constrained... 
    Transformer
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...building the world's first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...Strong understanding of low-level software engineering... 
    Transformer
    Internship
    Summer internship
    Work at office
    Relocation

    ETCHED LLC

    San Jose, CA
    3 days ago
  • $320k

     ...team is the execution engine behind NVIDIA’s...  ...production deployment. We transform foundation models into...  ...video intelligence systems using DeepStream and...  ...kernels, memory, and latency/efficiency trade-offs...  ...of delivering robust, low-latency inference at scale. You have led... 
    Transformer

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $2,000 per month

     ...building the world’s first AI inference system purpose-built for transformers - delivering over 10x...  ...lower cost and latency than a B200. With Etched...  ...investors and staffed by leading engineers, Etched is redefining the...  ...Ethernet, CPU (arc/arm), low power peripherals, sensors... 
    Transformer
    Work at office
    Relocation package

    Etched.ai, Inc.

    San Jose, CA
    19 hours ago
  • A technology company focused on industrial solutions is seeking a Lead Systems Software Engineer in San Jose or Washington D.C. This role involves designing and maintaining system-level platform code across Android and Linux, integrating custom hardware, and leading a team... 

    Rivet Industries, Inc.

    San Jose, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Systems Engineer for Transformers & Low-Latency HPC. Be the first to apply!