Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference Deployment Engineer

$180k - $240k

EnCharge AI

LLM Inference Deployment Engineer

U.S.-Remote, Canada

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software integration and will enable the immense potential of AI to be accessible in power, energy, and space constrained applications. EnCharge AI launched in 2022 and is led by veteran technologists with backgrounds in semiconductor design and AI systems.

About the Role

EnCharge AI is seeking an LLM Inference Deployment Engineer to optimize, deploy, and scale large language models (LLMs) for high-performance inference on its energy efficient AI accelerators. You will work at the intersection of AI frameworks, model optimization, and runtime execution to ensure efficient model execution and low-latency AI inference.

Responsibilities
  • Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace
  • Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution.
  • Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications.
  • Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.
Qualifications
  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
  • Experience in LLM inference deployment, model optimization, and runtime engineering.
  • Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed).
  • In-depth knowledge of the Python programming language for model integration and performance tuning.
  • Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases
  • Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe).
  • Strong knowledge of LLM memory optimization strategies for long-context applications.
  • Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).

EnchargeAI is an equal employment opportunity employer in the United States.

The salary range for this position is $180,000 to $240,000 USD ($175,000 to $245,000 CAD) per year. Actual compensation offered will be determined based on job-related knowledge, skills, and experience.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the LLM Inference Deployment Engineer in United States vacancy
  • $150k - $270k

    A leading developer platform company in San Francisco seeks a talented engineer to collaborate with clients on LLM applications. The role requires 3+ years in a technical role, with a strong emphasis on customer engagement and solution architecture. You will provide training... 
    Suggested
    Flexible hours

    Langchain

    San Francisco, CA
    2 days ago
  •  ...the Role We are looking for a Forward Deployment Engineer (FDE) to work directly with customers to design, deploy, and validate inference & reinforcement learning POCs on GMI's...  ...customer POCs end-to-end: deploy and optimize LLM inference, RL training, and post-training... 
    Suggested
    H1b
    Visa sponsorship

    Appletree Global Consulting

    Mountain View, CA
    11 days ago
  •  ...A leading technology firm is seeking a Forward Deployed Engineer to work closely with enterprise clients, deploying and optimizing their AI platform. The role requires extensive experience in AI/LLM pipelines and proficiency in Python, Node.js, or TypeScript. Applicants... 
    Suggested
    Remote work

    4MindsAI Inc.

    Dallas, TX
    3 days ago
  • A technology company based in Los Angeles seeks a Forward Deployed Engineer to lead deployments of AI models in production. You will collaborate with customers to design and deliver complex systems while ensuring impactful adoption. The ideal candidate will have over five... 
    Suggested
    Relocation package

    OpenAI

    Los Angeles, CA
    1 day ago
  • $320k

    Menlo Ventures is seeking a Software Engineer to design and build deployment infrastructure to maximize deployment throughput across resource-constrained GPUs, TPUs, and Trainiums. This role requires substantial experience in deployment systems and strong Kubernetes proficiency... 
    Suggested

    Menlo Ventures

    Seattle, WA
    2 days ago
  •  ...(frequent customer interaction) Team: Inference & Reinforcement Learning Platform About...  ...the Role We're looking for a Forward Deployment Engineer (FDE) to work directly with customers and...  ...POCs end-to-end Deploy and optimize LLM inference , RL training , and post-training... 

    Glint Tech Solutions LLC

    Mountain View, CA
    1 day ago
  •  ...Analytics is looking for experienced Forward Deployment Engineer (Generative AI) with Gen AI experience...  ...optimize large-scale Gen AI models and LLM orchestration frameworks within...  ...safety, prompt engineering patterns, and inference cost optimization. Product Collaboration... 
    Local area

    Tiger Analytics

    New York, NY
    1 day ago
  • A forward-thinking AI startup in New York is seeking a Senior Forward Deployed Engineer to lead deployment projects in enterprise environments. This role involves engineering solutions for complex technical challenges and collaborating with customer engineering teams. Candidates... 

    Parloa GmbH

    New York, NY
    4 days ago
  •  ...looking to hire a Principal AI Forward Deployed Engineer. The Principal AI Forward Deployed Engineer...  ...for real‑time data processing and AI inference. Take full ownership of end‑to‑end...  ...Experience building generative AI applications, LLM integrations, or agentic AI solutions.... 
    Work at office
    Work from home
    Relocation
    Monday to Thursday

    Carnival Corporation & plc

    Miami, FL
    4 days ago
  • $114.1k - $214.95k

     ...Forward Deployment Engineer You will be joining the newly formed Forward Deployment Engineering (FDE) team within Adobe's Digital Experience...  ...using frameworks like Spring Boot. ~ Familiarity with LLM Development: You have used APIs (OpenAI, Anthropic, Gemini) to... 
    Temporary work
    Local area
    Immediate start
    Worldwide

    Adobe

    San Jose, CA
    15 hours ago
  • Job Summary As a Senior Forward Deployed Engineer, you will be at the forefront of enterprise AI adoption, delivering advanced technology in complex...  ...Qualifications Experience building or integrating AI/LLM‑powered systems (e.g., embedding agent workflows, vector databases... 

    Compunnel, Inc.

    Dallas, TX
    15 hours ago
  •  ...ubiquitous. We build the foundation for agent engineering in the real world, helping developers...  ...a platform for building, evaluating, deploying, and operating agents at scale. Today,...  ...closely with companies on the frontier of LLM-applications, bringing ideas to... 
    Flexible hours

    LangChain

    New Bremen, OH
    2 days ago
  •  ...’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform...  ...Role We're looking for an Integration Engineer to help bridge healthcare data systems...  ...containerization (Docker), or cloud‑based deployment pipelines. Background in AI/ML data... 

    Hippocratic AI

    Chicago, IL
    2 days ago
  • $304k - $338k

    Head of Forward Deployment Engineer (FDE) New York Office YOUR MISSION: As Head of Forward Deployed Engineering, AMER (FDE) at Parloa, you own...  ...architecture (PSTN, SIP), building or integrating AI/LLM-powered systems (agent workflows, orchestration, evaluation,... 
    Work at office
    Flexible hours
    Night shift

    Parloa

    New York, NY
    2 days ago
  •  ...Forward Deployed Engineer Required Travel: Minimal Location: Hybrid Jersey City, NJ; Alpharetta, GA; Plano, TX Amdocs helps the world's...  ...software engineering experience ~ Hands-on experience with GenAI / LLM-based systems and complex enterprise-scale system integration... 
    Worldwide

    Amdocs

    Jersey City, NJ
    4 days ago
  •  ...evaluations and feedback loops. The Role We\'re looking for a Deployment Architect / Forward Deployed Engineer to own the path from "first customer call" to "signed...  ...can read a codebase, write a Python script, debug an LLM prompt, sketch an integration, and hold your own with... 
    Contract work
    Live in
    Day shift

    deCircle

    San Francisco, CA
    4 days ago
  •  ...re building the world’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform patient...  ...unless otherwise specified. About the Role We're seeking a Agent Deployment Engineer to join our collaborative team of engineers, scientists, and... 
    Work at office

    AI Chopping Block, Inc.

    Palo Alto, CA
    1 day ago
  • ## Staff Forward Deployment EngineerApplyremote type: Hybridlocations: United States - Boston...  ...the product doesn't do yet. With Evo's engineering team, you bring back what you've learned...  ....* You've built something with LLM SDKs, MCP, or agent frameworks.## It'd be... 
    Work at office
    Work from home
    Flexible hours

    Snyk Ltd.

    Boston, MA
    1 day ago
  •  ...that enables enterprises to specialize and deploy LLMs into production with measurable...  ...models at scale - pioneering task-specific LLM development and running production-ready...  ...customers. About the role As a DevOps Engineer in our Product Staff, you will help... 
    Live in
    Work at office
    Remote work
    Relocation
    Visa sponsorship

    Adaptive ML

    New York, NY
    15 hours ago
  •  ...AI Deployment Engineer The AI Deployment Engineering team is responsible for ensuring the safe and effective deployment of Generative AI applications...  ...proficient in Python, JavaScript, and a strong grasp of AI/LLM best practices. Built and/or delivered prototypes on top... 
    Work at office
    Remote work
    Relocation package

    OpenAI

    United States
    10 hours ago
  • $120k - $150k

     ...Lyra Tech Group portfolio company, is looking for a Forward Deployed AI Engineer to work directly inside their clients' businesses - rolling up...  ...APIs and cloud services • Hands-on experience building with LLM APIs (OpenAI, Anthropic, Gemini, or similar) •... 
    Work at office

    Lyra Technology Group

    Chicago, IL
    2 days ago
  • $70 - $80 per hour

     ...Great Place to Work® certification year after year. Forward Deployment Engineer \n Primary Skills AWS, Python, and Claude (and related AI...  ...and deploying solutions on AWS. Experience building LLM-powered applications and workflows, including practical experience... 
    Contract work
    Remote work

    Brillio

    Jersey City, NJ
    4 days ago
  •  ...Senior Desktop Deployment Engineer Location: Washington, DC Rate: DOE $/hr. on W2 Position Type: Contract Interview Process: Phone Followed by F2F US Citizen, Green Card and GC EAD Job Description: We have a position at the IMF we are trying to fill, a Senior... 
    Contract work
    Work at office

    Staffing the Universe

    Washington DC
    15 hours ago
  • $220k - $280k

    AI Deployment Engineer- Startups Technical Success - San Francisco We are seeking a technically proficient, business-minded AI Deployment Engineer...  ...proficient in Python, JavaScript, and a strong grasp of AI/LLM best practices. Built and/or delivered prototypes on top of... 
    Work at office
    Relocation package

    OpenAI

    Los Angeles, CA
    15 hours ago
  • $150k - $200k

    EmergencyMD is seeking an AI Deployment Engineer based in New York. This role focuses on designing and deploying AI-powered applications, integrating...  ..., with proficiency in Python and SQL, and experience with LLM systems. The position offers a competitive salary of $150k-$2... 

    EmergencyMD

    New York, NY
    3 days ago
  •  ...thinking technology firm in New York is looking for a Forward Deployed Engineer to integrate cutting-edge AI systems into client workflows. This...  ...in production software development, particularly with LLM-powered applications. Competitive compensation is offered, reflecting... 

    Harvey

    New York, NY
    3 days ago
  • $150k - $200k

    AI Deployment Engineer AI & Automation Practice Stable Rock is hiring an AI Deployment Engineer to help build and scale a new AI & Automation...  ...production systems using Python, APIs, databases, and modern LLM tooling. Integrate with CRMs, ERPs, accounting systems, internal... 
    Immediate start
    Flexible hours

    EmergencyMD

    New York, NY
    2 days ago
  • A leading AI technology firm is looking for a passionate individual to collaborate with companies on production-ready LLM applications. The ideal candidate has over 3 years of experience in a technical role and enjoys working closely with customers. Responsibilities include... 

    LangChain

    Brockport, NY
    2 days ago
  • HRB is seeking a Senior Forward Deployed Engineer in Redwood City to deliver voice AI solutions to enterprise clients. This role involves leading...  ...communication, writes production-grade code, and understands LLM-powered systems. This is an exciting opportunity to shape the... 

    HRB

    Redwood City, CA
    4 days ago
  • A leading telecommunications company is seeking a Cellular Deployment Engineer to manage the hands-on deployment and troubleshooting of licensed spectrum products. This role requires strong experience with Ericsson systems and offers competitive compensation, including... 
    Remote work

    Hamilton Barnes ?

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference Deployment Engineer. Be the first to apply!