Inference Software Engineer

$2,000 per month

ETCHED LLC

About Etched

Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Key responsibilities

Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
Build, enhance, and scale Sohu's runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
Optimize routing and communication layers using Sohu's collectives.
Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.

You may be a good fit if you have

Proficiency in C++ or Rust.
Understanding of performance-sensitive or complex distributed software systems like Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
Familiarity with PyTorch or JAX.
Ported applications to non-standard accelerator hardware or hardware platforms.

Strong candidates may also have experience with (Nice-to-have qualifications)

Developed low-latency, high-performance applications using both kernel-level and user-space networking stacks.
Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.

Benefits

Medical, dental, and vision packages with generous premium coverage
- $500 per month credit for waiving medical benefits
Housing subsidy of $2k per month for those living within walking distance of the office
Relocation support for those moving to San Jose (Santana Row)
Various wellness benefits covering fitness, mental health, and more
Daily lunch + dinner in our office
Unlimited compute budget subject to ROI justification

How we're different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in San Jose (Santana Row), and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Inference Software Engineer in San Jose, CA vacancy

Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive...
Suggested
NVIDIA
Santa Clara, CA
4 days ago
Senior Software Engineer II, Inference
$165k - $242k
...Senior Software Engineer II, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
3 days ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence....
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
3 days ago
Senior Software Engineer, Deep Learning Inference - TensorRT
$152k - $241.5k
...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact in Deep Learning by helping build a state-of-the-art inference framework for accelerating Deep Learning models, especially Large Language Models, on NVIDIA...
Suggested
NVIDIA
Santa Clara, CA
3 days ago
Senior Software Engineer, Machine Learning Inference
$152k - $241.5k
...some of the world’s most challenging problems. We're seeking talented and motivated engineers to join our TensorRT team in developing the industry-leading deep learning inference software for NVIDIA AI accelerators. As a Senior Software Engineer in the TensorRT team,...
Suggested
NVIDIA
Santa Clara, CA
3 days ago
Software Development Engineer AI/ML, Inference Serving, AWS Neuron
$193.3k - $261.5k
...AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators... ...to deliver high-performance, low-cost inference at scale. The Neuron Serving team... .... We are seeking a Software Development Engineer to lead and architect our next-...
Internship
Local area
Flexible hours
Amazon
Cupertino, CA
5 days ago
Senior Software Development Engineer - LLM Inference Framework
...The Role As a senior member of the LLM inference framework team, you will be responsible... ...sits at the intersection of inference engines, distributed systems, and GPU runtime and... ...architectures and kernel development Software Engineering ~ Expertise in Python...
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Senior Software Development Engineer - SGLang and Inference Stack
...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node... ...You will collaborate across internal GPU software teams and engage with open-source... ...THE PERSON: Skilled engineer with strong technical and analytical expertise...
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago
Senior Software Engineer - AI Inference
$152k - $241.5k
...NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑...
Remote work
NVIDIA
Santa Clara, CA
4 days ago
Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference
$193.3k - $261.5k
...Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep... ...and JAX enabling unparalleled ML inference and training performance. The Inference... ...till the hardware-software boundary, our engineers build systematic infrastructure, innovate...
Work experience placement
Internship
Local area
Flexible hours
Amazon
Cupertino, CA
5 days ago
Software Engineer Graduate (Inference Infrastructure) - 2026 Start (PhD)
$156k - $316.8k
...Responsibilitie About the Team The Inference Infrastructure team is the creator and open... ...new AI workloads, and are looking for engineers passionate about cloud-native systems, scheduling... ...have recently completed a PhD degree in Software Development, Computer Science, Computer...
Temporary work
Local area
ByteDance
San Jose, CA
2 days ago
Senior Software Engineer, Deep Learning Inference - Automotive Safety
$152k - $241.5k
...technology for safety-critical applications? Join NVIDIA's TensorRT team as a Senior Software Engineer, and be at the forefront of technology, enabling high-performance AI inference solutions for automotive safety and other specialized platforms. Your expertise will help...
NVIDIA
Santa Clara, CA
1 day ago
Software Engineer, Inference AI/ML
$92k - $135k
...CRWV) in March 2025. Learn more at What You'll Do: Join the Inference team to ship production features that improve latency,... ...practices, and grow quickly with mentorship from experienced engineers. About the role: Implement well-scoped features and fixes...
Permanent employment
Temporary work
Casual work
Internship
Work at office
Remote work
Flexible hours
CoreWeave
Sunnyvale, CA
13 days ago
Principal Software Engineer (AI Inference / Distributed Systems)
...your career. THE ROLE: AMD is looking for a strategic software engineering lead who is passionate about improving the performance of... ...Develop techniques for optimizing scale-up and scale-out inference. Develop methods and tooling to utilize dynamic resources...
Advanced Micro Devices , Inc.
Santa Clara, CA
1 day ago
Principal Software Engineer - AI Inference
$272k - $431.25k
...NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly...
Remote work
NVIDIA
Santa Clara, CA
2 days ago
Post-Training LLM Inference Platform Engineer
Advanced Micro Devices is looking for a systems-minded engineer in San Jose, CA, focusing on ML infrastructure and performance optimization for large-scale model inference. Ideal candidates should have a strong background in systems engineering and experience with GPU...
Advanced Micro Devices
San Jose, CA
4 days ago
Principal AI Inference Systems Engineer
...is looking for a Senior Staff AI Infra Engineer who is passionate about improving the performance... ...at the intersection of hardware and software to optimize performance for next-... ...Optimize and accelerate LLM training and inference on AMD GPUs, improving kernel, communication...
Advanced Micro Devices , Inc.
Santa Clara, CA
5 days ago
Staff Software Engineer, Inference Cloud
...to deliver industry-leading training and inference speeds and empowers machine learning... ...Sunnyvale We're hiring a Staff Engineer to own major areas of the architecture of... ...Qualifications ~8+ years of experience in software engineering, with substantial individual...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
3 days ago
Tech Lead, Data & Inference Engineer
...Tech Lead, Data & Inference Engineer Sunnyvale, California, United States About the Job Tech Lead, Data & Inference Engineer Our client is a fast moving and venture backed advertising technology startup based in San Francisco. They have raised twelve million...
Full time
Catalyst Labs, LLC
Sunnyvale, CA
2 days ago
Senior AI/ML Platform Engineer (LLM/SLM Inference)
$199.7k - $254.6k
...You will collaborate with product and engineering teams to deploy reliable, secure, and observable... ...for LLMs/SLMs, including on-prem inference packaging, runtime optimization,... ...observability. This role requires strong software engineering, hands-on GPU inference experience...
Full time
Temporary work
Local area
Flexible hours
Cisco
San Jose, CA
2 days ago
Lead ML Inference Engineer, Advertising
$246.5k
...core of this is our Machine Learning and Inference Platform that powers the entire... ...optimizations that span across hardware, software, and models. We're looking for a strong... ...frameworks - someone excited to mentor engineers, innovate at scale, and shape the future...
Work at office
Local area
Remote work
Monday to Thursday
Flexible hours
Roku
San Jose, CA
4 days ago
Lead AI Engineer (FM Hosting, LLM Inference)
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking... ...Capital One. Design, develop, test, deploy, and support AI software components including foundation model training, large...
Full time
Part time
Local area
Capital One Financial Corp
San Jose, CA
2 days ago
Application Engineer - Low Power Edge Inference (DIB Focus)
$120k - $180k
Application Engineer - Low Power Edge Inference (DIB Focus) About this Role We are seeking an Application Engineer to support deployment and integration... ...integrate models, and debug issues across hardware and software boundaries. This is an entry-level role with significant...
For contractors
Internship
TetraMem Inc
San Jose, CA
3 days ago
Lead AI Inference Performance Engineer (GPU)
A leading technology company is looking for a Principal AI Performance Engineer to optimize AI inference performance on GPUs. In this role, you will lead a team driving performance optimization across various configurations, diagnose complex performance issues, and interact...
Advanced Micro Devices
San Jose, CA
21 hours ago
AI/ML Technical Leader - Language Model Inference & AI Ops
$212.3k - $275.8k
...You will collaborate with product and engineering teams to deploy reliable, secure, and observable AI services, optimizing inference performance from CPU and small GPUs to large... ...observability. This role requires strong software engineering, hands-on GPU inference experience...
Full time
Temporary work
Local area
Flexible hours
3 days per week
Cisco
San Jose, CA
2 days ago
Inference Software Engineer: High-Performance Transformers
...proficient in C++ or Rust and have experience with distributed software systems. Benefits include a housing subsidy, relocation... ...comprehensive health packages. Join a fully in-person team that values engineering and research collaboration. #J-18808-Ljbffr Etched.ai, Inc.
Relocation package
Etched.ai, Inc.
San Jose, CA
1 day ago
Platform Software Engineer
...Platform Software Engineer Platform Software Engineer About Tensordyne AI is transforming our world. It can perform cognitive functions... ...that builds very high-performance, low-power generative AI inference systems. Our mission, through the creation of custom silicon...
Contract work
Remote work
Flexible hours
Tensordyne
Sunnyvale, CA
3 days ago
Full-Stack Software Engineer, Manufacturing/R&D Data Platform (NestJS, Next.js, Kafka)
...Full-Stack Software Engineer We are seeking a motivated, hardworking Full-Stack Software Engineer to join our team. The ideal candidate... ...Support integrating AI/ML into internal tools (data pipelines, inference endpoints, and dashboard integration). System...
Internship
Sakuu
San Jose, CA
3 days ago
Senior Software Engineer I, Inference
$139k - $204k
...CRWV) in March 2025. Learn more at What You'll Do: Senior engineers are area owners who lead designs, raise engineering standards,... ...orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale. About the role:...
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
13 days ago
AI Inference Engineer - Speech
$151.8k
...What you can expect We are looking for an AI Inference Engineer with a solid background in speech recognition and model inference. In this role, you will develop state-of-the-art automatic speech recognition system and ship it to various Zoom products. You will work...
Work at office
Remote work
Zoom Video Communications
San Jose, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Software Engineer. Be the first to apply!