Inference Runtime Engineer for LLMs & Diffusion

Inferact

Inferact is seeking an inference runtime engineer to enhance the performance and capabilities of LLM and diffusion model serving. This role requires expertise in optimizing model execution on various hardware architectures and has significant implications for AI inference. The ideal candidate must possess a bachelor's degree in computer science or related fields, strong programming skills in Python, and experience with LLM inference systems. Remote work options are available for exceptional candidates. #J-18808-Ljbffr Inferact

Apply

Vacancy posted 6 days ago

Similar jobs that could be interesting for youBased on the Inference Runtime Engineer for LLMs & Diffusion in San Francisco, CA vacancy

Senior Staff Engineer Inference Runtime — Flexible Hours
jobr.pro is seeking a Staff Engineer to lead technical direction for Inference Runtime. This senior IC role encompasses broad ownership of the runtime’s architecture and validation systems while collaborating across teams to drive performance and scalability. The ideal...
Suggested
Flexible hours
jobr.pro
San Francisco, CA
6 days ago
Founding Engineer, ML Inference
...unicorn founders and senior engineers with deep expertise in 3D, generative... ...for a Founding Engineer, ML Inference with deep expertise in high-... ...-time model performance for diffusion models Design and implement... ...in-house inference runtime Implement optimizations using...
Suggested
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
3 days ago
Software Engineer, Productivity - Inference Runtime
About the Team We’re hiring a Developer Productivity engineer to support the company’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’...
Suggested
United States Digital Space LLC
San Francisco, CA
2 days ago
Distributed Training and Inference Engineer
...with hands-on support from AMD engineers the team is scaling rapidly... ...skilled Distributed Training and Inference Engineer to build, optimize,... ...from low-level CUDA/ROCm runtimes to high-level frameworks like... ...deployment of next-generation LLMs and generative AI models. What...
Suggested
Flexible hours
Sciforium
San Francisco, CA
6 days ago
LLM Inference Frameworks and Optimization Engineer
...to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference... ...Frameworks and Optimization Engineer to design, develop, and optimize distributed... ...architectures and LLM/VLM/Diffusion model optimization . Knowledge of...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
4 days ago
Engineering Manager, AI Inference Runtime Fabric
The Consensus in San Francisco is looking for an Engineering Manager for the Runtime Fabrics team to lead efforts in refining container runtimes specifically for AI inference workloads. Your role will encompass team leadership, technical direction, and collaboration across...
The Consensus
San Francisco, CA
3 days ago
Staff+ Software Engineer, Inference Runtime
$405k
About the role Anthropic's Inference organization serves Claude to millions of users and enterprise customers with... ...every platform we add. We're looking for a Staff Engineer to be a technical lead for Inference Runtime: the team that owns the shared, accelerator‑...
Work at office
Visa sponsorship
Flexible hours
jobr.pro
San Francisco, CA
6 days ago
Unlock Faster Inference: Developer Productivity Engineer
United States Digital Space LLC is seeking a Developer Productivity Engineer to enhance the systems for serving models in their Inference Runtime teams. This role is crucial in ensuring reliability and efficiency across various applications. Your responsibilities will include...
United States Digital Space LLC
San Francisco, CA
6 days ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast... ...APIs or building dashboards. The work here sits deep in the runtime stack, optimising realtime speech systems under production latency...
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
5 days ago
Staff Engineer, AI Inference & Distributed Systems
Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while focusing...
Sail Research
San Francisco, CA
2 days ago
Staff Engineer, ML Inference Systems
...is seeking a Member of Technical Staff focused on ML Systems & Inference in San Francisco, California. This role includes building and... ...production AI workloads. The ideal candidate has strong software engineering roots and experience in inference systems. You will influence...
Acceler8 Talent
San Francisco, CA
4 days ago
Staff Engineer, Inference & RL Systems — Scale Production ML
$225k
Dormont Manufacturing Co is looking for a Software Engineer on the Inference & RL Systems team in San Francisco. The role involves designing distributed systems, optimizing performance, and ensuring high reliability for RL and post-training workflows. The ideal candidate...
Dormont Manufacturing Co
San Francisco, CA
6 days ago
INFERENCE ENGINEER
...ABOUT THE ROLE You build and operate the inference systems that serve our models in... ...The work spans serving infrastructure, runtime optimization, and the long tail of production... ...with running real workloads. This is an engineering role, not a research role. You'll measure...
MakerMaker
San Francisco, CA
4 days ago
System Engineering In
Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Senior Inference & RL Systems Engineer
$225k
...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that...
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Senior or Staff ML Systems Engineer, LLMs - San Francisco Only
$200k - $240k
..., and disrupt crypto‑related fraud and financial crime. Our AI engineering team focuses on next‑generation AI applications, especially large... .... Deploy infrastructure for offline and online evaluation of LLMs and agents, including regression testing, cost monitoring, and...
Dormont Manufacturing Co
San Francisco, CA
6 days ago
Engineer, Inference & Model serving
$220k - $320k
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering... ..., high-throughput systems across LLMs, speech, and vision models running in...
3 days per week
Trades Workforce Solutions
San Francisco, CA
4 days ago
Production AI Inference Engineer — Scale & Impact
A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...
Flexible hours
Baseten
San Francisco, CA
12 days ago
Open Source AI Tools Engineer: Build & Ship LLMs
...California, who is eager to contribute to developer ecosystems and work with cutting-edge technologies like LLMs and AI tooling. This role is suited for early‑career engineers with a builder mindset. You will collaborate with the engineering team, maintain open-source projects...
Stealth Startup
San Francisco, CA
6 days ago
AI Agents Runtime Engineer Low-Latency Distributed Systems
$140k - $265k
An innovative tech company in San Francisco is seeking a Software Engineer to enhance their AI agents and runtime services. You will work on distributed systems and collaborate with cross-functional teams. Candidates should have over 3 years of engineering experience and...
Glean.info
San Francisco, CA
3 days ago
Remote Low-Level Engineer: Kernel & Inference Optimization
$90 - $125 per hour
A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming... ...should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly rate ranging from...
Remote job
Hourly pay
For contractors
Open Data Science
San Francisco, CA
4 days ago
ML Inference Systems Engineer
...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
Gimlet Labs, Inc.
San Francisco, CA
2 days ago
Senior ML Inference Systems Engineer
...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
Gimlet Labs
San Francisco, CA
5 days ago
Senior ML Inference Engineer Production Systems
MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
MakerMaker.AI
San Francisco, CA
4 days ago
Management system engineering
...The Challenge We are building the runtime and registry for AI/LLM tools. You will... ...Create integrations with major platforms and LLMs (Google Workspace, Microsoft 365, OpenAi... ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years...
Work at office
Shift work
Arcade AI, Inc
San Francisco, CA
3 days ago
Engineering Manager, Runtime Fabric
ABOUT BASETEN Baseten powers mission-critical inference for the world’s most dynamic AI companies, like Cursor... ...Capital. Join us and help build the platform engineers turn to to ship AI products. THE ROLE Container runtimes were designed for general-purpose software...
Flexible hours
The Consensus
San Francisco, CA
3 days ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT...
Baseten
San Francisco, CA
2 days ago
Multimodal Inference Engineer — Scale GPU AI Models
An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
Jobleads-US
San Francisco, CA
2 days ago
Rust Systems Engineer: Build AI Data Runtime & Scheduler
...startup in California is looking for an experienced Distributed Systems Engineer to join their dynamic team. In this role, you'll build and evolve the core infrastructure for a durable application runtime aimed at advanced data processing and machine learning workflows....
Tensorlake Inc.
San Francisco, CA
5 days ago
Runtime Engineer
Role: Runtime Engineer Location: SF Bay Area / Toronto | Full-time | Hybrid Compensation: Competitive salary (based on experience & location) + Equity + Bonus About the Role This is your opportunity to join a mission-driven startup building the foundation of sustainable...
Full time
Amadeus Search
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Runtime Engineer for LLMs & Diffusion. Be the first to apply!