Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Inference Runtime Engineer for LLMs & Diffusion

Inferact

Inferact is seeking an inference runtime engineer to enhance the performance and capabilities of LLM and diffusion model serving. This role requires expertise in optimizing model execution on various hardware architectures and has significant implications for AI inference. The ideal candidate must possess a bachelor's degree in computer science or related fields, strong programming skills in Python, and experience with LLM inference systems. Remote work options are available for exceptional candidates. #J-18808-Ljbffr Inferact

Vacancy posted 6 days ago
Similar jobs that could be interesting for youBased on the Inference Runtime Engineer for LLMs & Diffusion in San Francisco, CA vacancy
  • jobr.pro is seeking a Staff Engineer to lead technical direction for Inference Runtime. This senior IC role encompasses broad ownership of the runtime’s architecture and validation systems while collaborating across teams to drive performance and scalability. The ideal... 
    Suggested
    Flexible hours

    jobr.pro

    San Francisco, CA
    6 days ago
  •  ...unicorn founders and senior engineers with deep expertise in 3D, generative...  ...for a Founding Engineer, ML Inference with deep expertise in high-...  ...-time model performance for diffusion models Design and implement...  ...in-house inference runtime Implement optimizations using... 
    Suggested
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    3 days ago
  • About the Team We’re hiring a Developer Productivity engineer to support the company’s Inference Runtime teams. These teams own the systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. We’... 
    Suggested

    United States Digital Space LLC

    San Francisco, CA
    2 days ago
  •  ...with hands-on support from AMD engineers the team is scaling rapidly...  ...skilled Distributed Training and Inference Engineer to build, optimize,...  ...from low-level CUDA/ROCm runtimes to high-level frameworks like...  ...deployment of next-generation LLMs and generative AI models. What... 
    Suggested
    Flexible hours

    Sciforium

    San Francisco, CA
    6 days ago
  •  ...to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference...  ...Frameworks and Optimization Engineer to design, develop, and optimize distributed...  ...architectures and LLM/VLM/Diffusion model optimization . Knowledge of... 
    Suggested

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    4 days ago
  • The Consensus in San Francisco is looking for an Engineering Manager for the Runtime Fabrics team to lead efforts in refining container runtimes specifically for AI inference workloads. Your role will encompass team leadership, technical direction, and collaboration across... 

    The Consensus

    San Francisco, CA
    3 days ago
  • $405k

    About the role Anthropic's Inference organization serves Claude to millions of users and enterprise customers with...  ...every platform we add. We're looking for a Staff Engineer to be a technical lead for Inference Runtime: the team that owns the shared, accelerator‑... 
    Work at office
    Visa sponsorship
    Flexible hours

    jobr.pro

    San Francisco, CA
    6 days ago
  • United States Digital Space LLC is seeking a Developer Productivity Engineer to enhance the systems for serving models in their Inference Runtime teams. This role is crucial in ensuring reliability and efficiency across various applications. Your responsibilities will include... 

    United States Digital Space LLC

    San Francisco, CA
    6 days ago
  • Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast...  ...APIs or building dashboards. The work here sits deep in the runtime stack, optimising realtime speech systems under production latency... 
    Remote job
    Flexible hours

    Trades Workforce Solutions

    San Francisco, CA
    5 days ago
  • Sail Research in San Francisco is seeking a talented engineer to design and implement robust systems that ensure fast and cost-efficient AI inference at global scale. You will be responsible for building high-performance schedulers and optimizing global routing while focusing... 

    Sail Research

    San Francisco, CA
    2 days ago
  •  ...is seeking a Member of Technical Staff focused on ML Systems & Inference in San Francisco, California. This role includes building and...  ...production AI workloads. The ideal candidate has strong software engineering roots and experience in inference systems. You will influence... 

    Acceler8 Talent

    San Francisco, CA
    4 days ago
  • $225k

    Dormont Manufacturing Co is looking for a Software Engineer on the Inference & RL Systems team in San Francisco. The role involves designing distributed systems, optimizing performance, and ensuring high reliability for RL and post-training workflows. The ideal candidate... 

    Dormont Manufacturing Co

    San Francisco, CA
    6 days ago
  •  ...ABOUT THE ROLE You build and operate the inference systems that serve our models in...  ...The work spans serving infrastructure, runtime optimization, and the long tail of production...  ...with running real workloads. This is an engineering role, not a research role. You'll measure... 

    MakerMaker

    San Francisco, CA
    4 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients... 

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    3 days ago
  • $225k

     ...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that... 
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    3 days ago
  • $200k - $240k

     ..., and disrupt crypto‑related fraud and financial crime. Our AI engineering team focuses on next‑generation AI applications, especially large...  .... Deploy infrastructure for offline and online evaluation of LLMs and agents, including regression testing, cost monitoring, and... 

    Dormont Manufacturing Co

    San Francisco, CA
    6 days ago
  • $220k - $320k

    ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering...  ..., high-throughput systems across LLMs, speech, and vision models running in... 
    3 days per week

    Trades Workforce Solutions

    San Francisco, CA
    4 days ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development... 
    Flexible hours

    Baseten

    San Francisco, CA
    12 days ago
  •  ...California, who is eager to contribute to developer ecosystems and work with cutting-edge technologies like LLMs and AI tooling. This role is suited for early‑career engineers with a builder mindset. You will collaborate with the engineering team, maintain open-source projects... 

    Stealth Startup

    San Francisco, CA
    6 days ago
  • $140k - $265k

    An innovative tech company in San Francisco is seeking a Software Engineer to enhance their AI agents and runtime services. You will work on distributed systems and collaborate with cross-functional teams. Candidates should have over 3 years of engineering experience and... 

    Glean.info

    San Francisco, CA
    3 days ago
  • $90 - $125 per hour

    A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming...  ...should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly rate ranging from... 
    Remote job
    Hourly pay
    For contractors

    Open Data Science

    San Francisco, CA
    4 days ago
  •  ...looking for a Member of Technical Staff focused on ML systems and inference in San Francisco. You will design and build inference systems...  .... Candidates should have strong foundations in software engineering, experience with ML inference systems, and performance tuning... 

    Gimlet Labs, Inc.

    San Francisco, CA
    2 days ago
  •  ...is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving...  ...components. Ideal candidates should have strong software engineering skills and experience with ML inference systems, particularly... 

    Gimlet Labs

    San Francisco, CA
    5 days ago
  • MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production... 

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  •  ...The Challenge We are building the runtime and registry for AI/LLM tools. You will...  ...Create integrations with major platforms and LLMs (Google Workspace, Microsoft 365, OpenAi...  ...desire to ship. ~7+ years of software engineering experience comprising of: ~5+ years... 
    Work at office
    Shift work

    Arcade AI, Inc

    San Francisco, CA
    3 days ago
  • ABOUT BASETEN Baseten powers mission-critical inference for the world’s most dynamic AI companies, like Cursor...  ...Capital. Join us and help build the platform engineers turn to to ship AI products. THE ROLE Container runtimes were designed for general-purpose software... 
    Flexible hours

    The Consensus

    San Francisco, CA
    3 days ago
  •  ...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT... 

    Baseten

    San Francisco, CA
    2 days ago
  • An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate... 

    Jobleads-US

    San Francisco, CA
    2 days ago
  •  ...startup in California is looking for an experienced Distributed Systems Engineer to join their dynamic team. In this role, you'll build and evolve the core infrastructure for a durable application runtime aimed at advanced data processing and machine learning workflows.... 

    Tensorlake Inc.

    San Francisco, CA
    5 days ago
  • Role: Runtime Engineer Location: SF Bay Area / Toronto | Full-time | Hybrid Compensation: Competitive salary (based on experience & location) + Equity + Bonus About the Role This is your opportunity to join a mission-driven startup building the foundation of sustainable... 
    Full time

    Amadeus Search

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference Runtime Engineer for LLMs & Diffusion. Be the first to apply!