Inference Software Engineer

$2,000 per month

ETCHED LLC

About Etched

Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Key responsibilities

Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
Build, enhance, and scale Sohu's runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
Optimize routing and communication layers using Sohu's collectives.
Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.

You may be a good fit if you have

Proficiency in C++ or Rust.
Understanding of performance-sensitive or complex distributed software systems like Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
Familiarity with PyTorch or JAX.
Ported applications to non-standard accelerator hardware or hardware platforms.

Strong candidates may also have experience with (Nice-to-have qualifications)

Developed low-latency, high-performance applications using both kernel-level and user-space networking stacks.
Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.

Benefits

Medical, dental, and vision packages with generous premium coverage
- $500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch + dinner in our office

How we're different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in San Jose (Santana Row), and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Inference Software Engineer in San Jose, CA vacancy

Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive...
Suggested
NVIDIA
Santa Clara, CA
3 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
2 days ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today's most sophisticated AI applications. Our team is responsible...
Suggested
Remote work
NVIDIA
Santa Clara, CA
5 days ago
Senior Software Engineer - AI Inference
$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑...
Suggested
NVIDIA
Santa Clara, CA
3 days ago
Senior Software Development Engineer - SGLang and Inference Stack
...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node... ...You will collaborate across internal GPU software teams and engage with open-source... ...THE PERSON: Skilled engineer with strong technical and analytical expertise...
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Software Development Engineer AI/ML, Inference Serving, AWS Neuron
$193.3k - $261.5k
...AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators... ...to deliver high-performance, low-cost inference at scale. The Neuron Serving team... .... We are seeking a Software Development Engineer to lead and architect our next-...
Internship
Local area
Flexible hours
Amazon
Cupertino, CA
4 days ago
Senior Software Engineer, Machine Learning Inference
$152k - $241.5k
...some of the world’s most challenging problems. We're seeking talented and motivated engineers to join our TensorRT team in developing the industry-leading deep learning inference software for NVIDIA AI accelerators. As a Senior Software Engineer in the TensorRT team,...
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer, Deep Learning Inference - TensorRT
$152k - $241.5k
...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in Deep Learning by helping build a state-of-the-art inference framework for accelerating Deep Learning models, especially Large Language Models, on NVIDIA...
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer, Deep Learning Inference - Automotive Safety
$152k - $241.5k
...technology for safety-critical applications? Join NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other specialized platforms. Your expertise will help...
NVIDIA
Santa Clara, CA
21 hours ago
Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference
$193.3k - $261.5k
...Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep... ...and JAX enabling unparalleled ML inference and training performance. The Inference... ...till the hardware-software boundary, our engineers build systematic infrastructure, innovate...
Work experience placement
Internship
Local area
Flexible hours
Amazon
Cupertino, CA
4 days ago
Software Engineer Graduate (Inference Infrastructure) - 2026 Start (PhD)
$156k - $316.8k
...Responsibilitie About the Team The Inference Infrastructure team is the creator and open... ...new AI workloads, and are looking for engineers passionate about cloud-native systems, scheduling... ...have recently completed a PhD degree in Software Development, Computer Science, Computer...
Temporary work
Local area
ByteDance
San Jose, CA
1 day ago
Software Engineer, Inference AI/ML
$92k - $135k
...CRWV) in March 2025. Learn more at What You'll Do: Join the Inference team to ship production features that improve latency,... ...practices, and grow quickly with mentorship from experienced engineers. About the role: Implement well-scoped features and fixes...
Permanent employment
Temporary work
Casual work
Internship
Work at office
Remote work
Flexible hours
CoreWeave
Sunnyvale, CA
7 days ago
Principal Software Engineer (AI Inference / Distributed Systems)
...your career. THE ROLE: AMD is looking for a strategic software engineering lead who is passionate about improving the performance of... ...Develop techniques for optimizing scale-up and scale-out inference. Develop methods and tooling to utilize dynamic resources...
Advanced Micro Devices , Inc.
Santa Clara, CA
21 hours ago
Principal Software Engineer - AI Inference
$272k - $431.25k
...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly...
Remote work
NVIDIA
Santa Clara, CA
1 day ago
Senior ML Infrastructure Engineer, Inference Platform
$155.42k - $205.9k
...Description About the Team: The ML Inference Platform is part of the AV ML... ...are seeking a Senior ML Infrastructure engineer to help build and scale robust platforms... ...Design and implement core platform backend software components. Collaborate with ML engineers...
Local area
Remote work
Work from home
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
21 hours ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...About the Team The Model Deployment & Inference Solutions team in GM AV deploys machine... ...workflows currently performed manually by engineers. Build the developer experience that... ...Experience designing clean, well-tested software with clear interfaces and good...
Local area
Remote work
Work from home
Relocation package
Flexible hours
Shift work
General Motors
Sunnyvale, CA
2 days ago
Staff ML Engineer, Inference Platform
$185.5k - $270k
...assistance. About the Team: The ML Inference Platform is part of the AI Compute... ...We are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute platforms... ...and implement core platform backend software components. Collaborate with ML engineers...
Local area
Work from home
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
4 days ago
Staff Software Engineer, Inference Cloud
...to deliver industry-leading training and inference speeds and empowers machine learning... ...Sunnyvale We're hiring a Staff Engineer to own major areas of the architecture of... ...Qualifications ~8+ years of experience in software engineering, with substantial individual...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
2 days ago
Tech Lead, Data & Inference Engineer
...Tech Lead, Data & Inference Engineer Cupertino, California, United States About the Job A fast moving and venture backed advertising technology startup based in San Francisco. They have raised twelve million dollars in funding and are transforming how business...
Full time
Catalyst Labs, LLC
Cupertino, CA
1 day ago
Senior AI/ML Platform Engineer (LLM/SLM Inference)
$199.7k - $254.6k
...Incubation Team as a Senior AI/ML DevOps Engineer and help productionize LLM/SLM... ...and observable AI services, optimizing inference performance from CPU and small GPUs to large... ...observability. This role requires strong software engineering, hands-on GPU inference experience...
Full time
Temporary work
Local area
Flexible hours
Cisco
San Jose, CA
2 days ago
Lead ML Inference Engineer, Advertising
$246.5k
...core of this is our Machine Learning and Inference Platform that powers the entire... ...optimizations that span across hardware, software, and models. We're looking for a strong... ...frameworks - someone excited to mentor engineers, innovate at scale, and shape the future...
Work at office
Local area
Remote work
Monday to Thursday
Flexible hours
Roku
San Jose, CA
3 days ago
Lead AI Engineer (FM Hosting, LLM Inference)
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking... ...Capital One. Design, develop, test, deploy, and support AI software components including foundation model training, large...
Full time
Part time
Local area
Capital One Financial Corp
San Jose, CA
6 days ago
Platform Software Engineer
...Platform Software Engineer Platform Software Engineer About Tensordyne AI is transforming our world. It can perform cognitive functions... ...that builds very high-performance, low-power generative AI inference systems. Our mission, through the creation of custom silicon...
Contract work
Remote work
Flexible hours
Tensordyne
Sunnyvale, CA
2 days ago
Full-Stack Software Engineer, Manufacturing/R&D Data Platform (NestJS, Next.js, Kafka)
...Full-Stack Software Engineer We are seeking a motivated, hardworking Full-Stack Software Engineer to join our team. The ideal candidate... ...Support integrating AI/ML into internal tools (data pipelines, inference endpoints, and dashboard integration). System...
Internship
Sakuu
San Jose, CA
2 days ago
AI Inference Engineer - Speech
$151.8k
...What you can expect We are looking for an AI Inference Engineer with a solid background in speech recognition and model inference. In this role, you will develop state-of-the-art automatic speech recognition system and ship it to various Zoom products. You will work...
Work at office
Remote work
Zoom
San Jose, CA
4 days ago
Software Engineer - AML, AI & Data Platforms (AiDP)
$181.1k - $318.4k
...Software Engineer - AML, AI & Data Platforms (AiDP) AI & Data Platforms (AiDP) is IS&T's engine for AI-powered innovation. The team brings... ...Learning and Data Science teams to train, build, deploy and inference models at scale to prevent Fraud on multiple Apple Platforms...
Relocation
Apple
Sunnyvale, CA
2 days ago
Full Stack Software Engineer - ML Compute Capacity
$181.1k - $318.4k
...Full Stack Software Engineer - ML Compute Capacity Scaling machine learning workloads across thousands of accelerators creates challenges... ...the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise in distributed...
Relocation
Apple
Santa Clara, CA
2 days ago
Senior Backend Engineer, Inference
$172.5k - $306.63k
...these models and the associated prompt engine. This is an opportunity to reach millions... ...come up with solutions to simplify the software stack ~ Develop efficient, reliable... ...~ Experience with GPU-based ML inference services #FireflyGenAI About...
Temporary work
Local area
Worldwide
Adobe
San Jose, CA
3 days ago
Sr Software Engineer - AI, Search & Knowledge Platform - Cloud Infrastructure
$181.1k - $318.4k
...Sr Software Engineer - AI, Search & Knowledge Platform – Cloud Infrastructure Are you an open-source contributor passionate about building... ...intelligent, automated infrastructure for ML training and inference at massive scale—this role is for you. You'll architect...
Relocation
Apple
Cupertino, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Software Engineer. Be the first to apply!