Inference Runtime Engineer for LLMs & Diffusion
Inferact
Inferact is seeking an inference runtime engineer to enhance the performance and capabilities of LLM and diffusion model serving. This role requires expertise in optimizing model execution on various hardware architectures and has significant implications for AI inference. The ideal candidate must possess a bachelor's degree in computer science or related fields, strong programming skills in Python, and experience with LLM inference systems. Remote work options are available for exceptional candidates. #J-18808-Ljbffr Inferact
- jobr.pro is seeking a Staff Engineer to lead technical direction for Inference Runtime. This senior IC role encompasses broad ownership of the runtime’s architecture and validation systems while collaborating across teams to drive performance and scalability. The ideal...SuggestedFlexible hours
- ...unicorn founders and senior engineers with deep expertise in 3D, generative... ...for a Founding Engineer, ML Inference with deep expertise in high-... ...-time model performance for diffusion models Design and implement... ...in-house inference runtime Implement optimizations using...SuggestedRelocationVisa sponsorshipRelocation package
- About the Team We’re hiring a Developer Productivity engineer to support the company’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’...Suggested
- ...with hands-on support from AMD engineers the team is scaling rapidly... ...skilled Distributed Training and Inference Engineer to build, optimize,... ...from low-level CUDA/ROCm runtimes to high-level frameworks like... ...deployment of next-generation LLMs and generative AI models. What...SuggestedFlexible hours
- ...to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference... ...Frameworks and Optimization Engineer to design, develop, and optimize distributed... ...architectures and LLM/VLM/Diffusion model optimization . Knowledge of...Suggested
- The Consensus in San Francisco is looking for an Engineering Manager for the Runtime Fabrics team to lead efforts in refining container runtimes specifically for AI inference workloads. Your role will encompass team leadership, technical direction, and collaboration across...
$405k
About the role Anthropic's Inference organization serves Claude to millions of users and enterprise customers with... ...every platform we add. We're looking for a Staff Engineer to be a technical lead for Inference Runtime: the team that owns the shared, accelerator‑...Work at officeVisa sponsorshipFlexible hours- United States Digital Space LLC is seeking a Developer Productivity Engineer to enhance the systems for serving models in their Inference Runtime teams. This role is crucial in ensuring reliability and efficiency across various applications. Your responsibilities will include...
- Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast... ...APIs or building dashboards. The work here sits deep in the runtime stack, optimising realtime speech systems under production latency...Remote jobFlexible hours
- Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while focusing...
- ...is seeking a Member of Technical Staff focused on ML Systems & Inference in San Francisco, California. This role includes building and... ...production AI workloads. The ideal candidate has strong software engineering roots and experience in inference systems. You will influence...
$225k
Dormont Manufacturing Co is looking for a Software Engineer on the Inference & RL Systems team in San Francisco. The role involves designing distributed systems, optimizing performance, and ensuring high reliability for RL and post-training workflows. The ideal candidate...- ...ABOUT THE ROLE You build and operate the inference systems that serve our models in... ...The work spans serving infrastructure, runtime optimization, and the long tail of production... ...with running real workloads. This is an engineering role, not a research role. You'll measure...
- Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
$225k
...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that...RelocationVisa sponsorship$200k - $240k
..., and disrupt crypto‑related fraud and financial crime. Our AI engineering team focuses on next‑generation AI applications, especially large... .... Deploy infrastructure for offline and online evaluation of LLMs and agents, including regression testing, cost monitoring, and...$220k - $320k
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering... ..., high-throughput systems across LLMs, speech, and vision models running in...3 days per week- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...Flexible hours
- ...California, who is eager to contribute to developer ecosystems and work with cutting-edge technologies like LLMs and AI tooling. This role is suited for early‑career engineers with a builder mindset. You will collaborate with the engineering team, maintain open-source projects...
$140k - $265k
An innovative tech company in San Francisco is seeking a Software Engineer to enhance their AI agents and runtime services. You will work on distributed systems and collaborate with cross-functional teams. Candidates should have over 3 years of engineering experience and...$90 - $125 per hour
A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming... ...should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly rate ranging from...Remote jobHourly payFor contractors- ...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems... .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning...
- ...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving... ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly...
- MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production...
- ...The Challenge We are building the runtime and registry for AI/LLM tools. You will... ...Create integrations with major platforms and LLMs (Google Workspace, Microsoft 365, OpenAi... ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years...Work at officeShift work
- ABOUT BASETEN Baseten powers mission-critical inference for the world’s most dynamic AI companies, like Cursor... ...Capital. Join us and help build the platform engineers turn to to ship AI products. THE ROLE Container runtimes were designed for general-purpose software...Flexible hours
- ...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT...
- An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
- ...startup in California is looking for an experienced Distributed Systems Engineer to join their dynamic team. In this role, you'll build and evolve the core infrastructure for a durable application runtime aimed at advanced data processing and machine learning workflows....
- Role: Runtime Engineer Location: SF Bay Area / Toronto | Full-time | Hybrid Compensation: Competitive salary (based on experience & location) + Equity + Bonus About the Role This is your opportunity to join a mission-driven startup building the foundation of sustainable...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Inference Runtime Engineer for LLMs & Diffusion. Be the first to apply!

