Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Inference Software Engineer

$2,000 per month

ETCHED LLC

About Etched

Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Key responsibilities
  • Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
  • Build, enhance, and scale Sohu's runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
  • Optimize routing and communication layers using Sohu's collectives.
  • Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
You may be a good fit if you have
  • Proficiency in C++ or Rust.
  • Understanding of performance-sensitive or complex distributed software systems like Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
  • Familiarity with PyTorch or JAX.
  • Ported applications to non-standard accelerator hardware or hardware platforms.
Strong candidates may also have experience with (Nice-to-have qualifications)
  • Developed low-latency, high-performance applications using both kernel-level and user-space networking stacks.
  • Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
  • Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
Benefits
  • Medical, dental, and vision packages with generous premium coverage
    • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office

How we're different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in San Jose (Santana Row), and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Inference Software Engineer in San Jose, CA vacancy
  • $184k - $287.5k

     ...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive... 
    Suggested

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $139k - $204k

     ...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $165k - $242k

     ...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $184k - $287.5k

     ...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today's most sophisticated AI applications. Our team is responsible... 
    Suggested
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $152k - $241.5k

     ...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑... 
    Suggested

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node...  ...You will collaborate across internal GPU software teams and engage with open-source...  ...THE PERSON: Skilled engineer with strong technical and analytical expertise... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $193.3k - $261.5k

     ...AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators...  ...to deliver high-performance, low-cost inference at scale. The Neuron Serving team...  .... We are seeking a Software Development Engineer to lead and architect our next-... 
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    4 days ago
  • $152k - $241.5k

     ...some of the world’s most challenging problems. We're seeking talented and motivated engineers to join our TensorRT team in developing the industry-leading deep learning inference software for NVIDIA AI accelerators. As a Senior Software Engineer in the TensorRT team,... 

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in Deep Learning by helping build a state-of-the-art inference framework for accelerating Deep Learning models, especially Large Language Models, on NVIDIA... 

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...technology for safety-critical applications? Join NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other specialized platforms. Your expertise will help... 

    NVIDIA

    Santa Clara, CA
    21 hours ago
  • $193.3k - $261.5k

     ...Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep...  ...and JAX enabling unparalleled ML inference and training performance. The Inference...  ...till the hardware-software boundary, our engineers build systematic infrastructure, innovate... 
    Work experience placement
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    4 days ago
  • $156k - $316.8k

     ...Responsibilitie About the Team The Inference Infrastructure team is the creator and open...  ...new AI workloads, and are looking for engineers passionate about cloud-native systems, scheduling...  ...have recently completed a PhD degree in Software Development, Computer Science, Computer... 
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    1 day ago
  • $92k - $135k

     ...CRWV) in March 2025. Learn more at What You'll Do: Join the Inference team to ship production features that improve latency,...  ...practices, and grow quickly with mentorship from experienced engineers. About the role: Implement well-scoped features and fixes... 
    Permanent employment
    Temporary work
    Casual work
    Internship
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    7 days ago
  •  ...your career. THE ROLE: AMD is looking for a strategic software engineering lead who is passionate about improving the performance of...  ...Develop techniques for optimizing scale-up and scale-out inference. Develop methods and tooling to utilize dynamic resources... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    21 hours ago
  • $272k - $431.25k

     ...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly... 
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $155.42k - $205.9k

     ...Description About the Team: The ML Inference Platform is part of the AV ML...  ...are seeking a Senior ML Infrastructure engineer to help build and scale robust platforms...  ...Design and implement core platform backend software components. Collaborate with ML engineers... 
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    21 hours ago
  • $128.7k - $261.3k

     ...About the Team The Model Deployment & Inference Solutions team in GM AV deploys machine...  ...workflows currently performed manually by engineers. Build the developer experience that...  ...Experience designing clean, well-tested software with clear interfaces and good... 
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours
    Shift work

    General Motors

    Sunnyvale, CA
    2 days ago
  • $185.5k - $270k

     ...assistance. About the Team: The ML Inference Platform is part of the AI Compute...  ...We are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute platforms...  ...and implement core platform backend software components. Collaborate with ML engineers... 
    Local area
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago
  •  ...to deliver industry-leading training and inference speeds and empowers machine learning...  ...Sunnyvale We're hiring a Staff Engineer to own major areas of the architecture of...  ...Qualifications ~8+ years of experience in software engineering, with substantial individual... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    2 days ago
  •  ...Tech Lead, Data & Inference Engineer Cupertino, California, United States About the Job A fast moving and venture backed advertising technology startup based in San Francisco. They have raised twelve million dollars in funding and are transforming how business... 
    Full time

    Catalyst Labs, LLC

    Cupertino, CA
    1 day ago
  • $199.7k - $254.6k

     ...Incubation Team as a Senior AI/ML DevOps Engineer and help productionize LLM/SLM...  ...and observable AI services, optimizing inference performance from CPU and small GPUs to large...  ...observability. This role requires strong software engineering, hands-on GPU inference experience... 
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    2 days ago
  • $246.5k

     ...core of this is our Machine Learning and Inference Platform that powers the entire...  ...optimizations that span across hardware, software, and models. We're looking for a strong...  ...frameworks - someone excited to mentor engineers, innovate at scale, and shape the future... 
    Work at office
    Local area
    Remote work
    Monday to Thursday
    Flexible hours

    Roku

    San Jose, CA
    3 days ago
  • $197.3k - $225.1k

     ...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking...  ...Capital One. Design, develop, test, deploy, and support AI software components including foundation model training, large... 
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Jose, CA
    6 days ago
  •  ...Platform Software Engineer Platform Software Engineer About Tensordyne AI is transforming our world. It can perform cognitive functions...  ...that builds very high-performance, low-power generative AI inference systems. Our mission, through the creation of custom silicon... 
    Contract work
    Remote work
    Flexible hours

    Tensordyne

    Sunnyvale, CA
    2 days ago
  •  ...Full-Stack Software Engineer We are seeking a motivated, hardworking Full-Stack Software Engineer to join our team. The ideal candidate...  ...Support integrating AI/ML into internal tools (data pipelines, inference endpoints, and dashboard integration). System... 
    Internship

    Sakuu

    San Jose, CA
    2 days ago
  • $151.8k

     ...What you can expect We are looking for an AI Inference Engineer with a solid background in speech recognition and model inference. In this role, you will develop state-of-the-art automatic speech recognition system and ship it to various Zoom products. You will work... 
    Work at office
    Remote work

    Zoom

    San Jose, CA
    4 days ago
  • $181.1k - $318.4k

     ...Software Engineer - AML, AI & Data Platforms (AiDP) AI & Data Platforms (AiDP) is IS&T's engine for AI-powered innovation. The team brings...  ...Learning and Data Science teams to train, build, deploy and inference models at scale to prevent Fraud on multiple Apple Platforms... 
    Relocation

    Apple

    Sunnyvale, CA
    2 days ago
  • $181.1k - $318.4k

     ...Full Stack Software Engineer - ML Compute Capacity Scaling machine learning workloads across thousands of accelerators creates challenges...  ...the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise in distributed... 
    Relocation

    Apple

    Santa Clara, CA
    2 days ago
  • $172.5k - $306.63k

     ...these models and the associated prompt engine. This is an opportunity to reach millions...  ...come up with solutions to simplify the software stack ~ Develop efficient, reliable...  ...~ Experience with GPU-based ML inference services #FireflyGenAI About... 
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    3 days ago
  • $181.1k - $318.4k

     ...Sr Software Engineer - AI, Search & Knowledge Platform – Cloud Infrastructure Are you an open-source contributor passionate about building...  ...intelligent, automated infrastructure for ML training and inference at massive scale—this role is for you. You'll architect... 
    Relocation

    Apple

    Cupertino, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Software Engineer. Be the first to apply!