LLM Inference Deployment Engineer

$180k - $240k

EnCharge AI

LLM Inference Deployment Engineer

U.S.-Remote, Canada

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software integration and will enable the immense potential of AI to be accessible in power, energy, and space constrained applications. EnCharge AI launched in 2022 and is led by veteran technologists with backgrounds in semiconductor design and AI systems.

About the Role

EnCharge AI is seeking an LLM Inference Deployment Engineer to optimize, deploy, and scale large language models (LLMs) for high-performance inference on its energy efficient AI accelerators. You will work at the intersection of AI frameworks, model optimization, and runtime execution to ensure efficient model execution and low-latency AI inference.

Responsibilities

Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace
Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution.
Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications.
Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.

Qualifications

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
Experience in LLM inference deployment, model optimization, and runtime engineering.
Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed).
In-depth knowledge of the Python programming language for model integration and performance tuning.
Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases
Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe).
Strong knowledge of LLM memory optimization strategies for long-context applications.
Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).

EnchargeAI is an equal employment opportunity employer in the United States.

The salary range for this position is $180,000 to $240,000 USD ($175,000 to $245,000 CAD) per year. Actual compensation offered will be determined based on job-related knowledge, skills, and experience.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the LLM Inference Deployment Engineer in United States vacancy

LLM Deployment Engineer — Customer-Facing
$150k - $270k
A leading developer platform company in San Francisco seeks a talented engineer to collaborate with clients on LLM applications. The role requires 3+ years in a technical role, with a strong emphasis on customer engagement and solution architecture. You will provide training...
Suggested
Flexible hours
Langchain
San Francisco, CA
2 days ago
Forward Deployment Engineer (Inference & RL POC)-Chinese speaking
...the Role We are looking for a Forward Deployment Engineer (FDE) to work directly with customers to design, deploy, and validate inference & reinforcement learning POCs on GMI's... ...customer POCs end-to-end: deploy and optimize LLM inference, RL training, and post-training...
Suggested
H1b
Visa sponsorship
Appletree Global Consulting
Mountain View, CA
11 days ago
Field AI Deployment Engineer - Real-Time LLM & Integrations
...A leading technology firm is seeking a Forward Deployed Engineer to work closely with enterprise clients, deploying and optimizing their AI platform. The role requires extensive experience in AI/LLM pipelines and proficiency in Python, Node.js, or TypeScript. Applicants...
Suggested
Remote work
4MindsAI Inc.
Dallas, TX
3 days ago
LLM Deployment Engineer - Hybrid (Seattle)
A technology company based in Los Angeles seeks a Forward Deployed Engineer to lead deployments of AI models in production. You will collaborate with customers to design and deliver complex systems while ensuring impactful adoption. The ideal candidate will have over five...
Suggested
Relocation package
OpenAI
Los Angeles, CA
1 day ago
Senior Inference Deployment Engineer
$320k
Menlo Ventures is seeking a Software Engineer to design and build deployment infrastructure to maximize deployment throughput across resource-constrained GPUs, TPUs, and Trainiums. This role requires substantial experience in deployment systems and strong Kubernetes proficiency...
Suggested
Menlo Ventures
Seattle, WA
2 days ago
Forward Deployment Engineer
...(frequent customer interaction) Team: Inference & Reinforcement Learning Platform About... ...the Role We're looking for a Forward Deployment Engineer (FDE) to work directly with customers and... ...POCs end-to-end Deploy and optimize LLM inference , RL training , and post-training...
Glint Tech Solutions LLC
Mountain View, CA
1 day ago
Forward Deployment Engineer - Gen AI
...Analytics is looking for experienced Forward Deployment Engineer (Generative AI) with Gen AI experience... ...optimize large-scale Gen AI models and LLM orchestration frameworks within... ...safety, prompt engineering patterns, and inference cost optimization. Product Collaboration...
Local area
Tiger Analytics
New York, NY
1 day ago
Senior Enterprise Deployment Engineer (AI/LLM)
A forward-thinking AI startup in New York is seeking a Senior Forward Deployed Engineer to lead deployment projects in enterprise environments. This role involves engineering solutions for complex technical challenges and collaborating with customer engineering teams. Candidates...
Parloa GmbH
New York, NY
4 days ago
Principal, AI Forward Deployment Engineer
...looking to hire a Principal AI Forward Deployed Engineer. The Principal AI Forward Deployed Engineer... ...for real‑time data processing and AI inference. Take full ownership of end‑to‑end... ...Experience building generative AI applications, LLM integrations, or agentic AI solutions....
Work at office
Work from home
Relocation
Monday to Thursday
Carnival Corporation & plc
Miami, FL
4 days ago
Forward Deployment Engineer
$114.1k - $214.95k
...Forward Deployment Engineer You will be joining the newly formed Forward Deployment Engineering (FDE) team within Adobe's Digital Experience... ...using frameworks like Spring Boot. ~ Familiarity with LLM Development: You have used APIs (OpenAI, Anthropic, Gemini) to...
Temporary work
Local area
Immediate start
Worldwide
Adobe
San Jose, CA
15 hours ago
Senior Deployment Engineer
Job Summary As a Senior Forward Deployed Engineer, you will be at the forefront of enterprise AI adoption, delivering advanced technology in complex... ...Qualifications Experience building or integrating AI/LLM‑powered systems (e.g., embedding agent workflows, vector databases...
Compunnel, Inc.
Dallas, TX
15 hours ago
Deployment Engineer for AI Apps (Germany)
...ubiquitous. We build the foundation for agent engineering in the real world, helping developers... ...a platform for building, evaluating, deploying, and operating agents at scale. Today,... ...closely with companies on the frontier of LLM-applications, bringing ideas to...
Flexible hours
LangChain
New Bremen, OH
2 days ago
Forward Deployment Engineer (Sioux Falls, PA)
...’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform... ...Role We're looking for an Integration Engineer to help bridge healthcare data systems... ...containerization (Docker), or cloud‑based deployment pipelines. Background in AI/ML data...
Hippocratic AI
Chicago, IL
2 days ago
Head of Forward Deployment Engineering (FDE)
$304k - $338k
Head of Forward Deployment Engineer (FDE) New York Office YOUR MISSION: As Head of Forward Deployed Engineering, AMER (FDE) at Parloa, you own... ...architecture (PSTN, SIP), building or integrating AI/LLM-powered systems (agent workflows, orchestration, evaluation,...
Work at office
Flexible hours
Night shift
Parloa
New York, NY
2 days ago
Solution Architect - Forward Deployment Engineer
...Forward Deployed Engineer Required Travel: Minimal Location: Hybrid Jersey City, NJ; Alpharetta, GA; Plano, TX Amdocs helps the world's... ...software engineering experience ~ Hands-on experience with GenAI / LLM-based systems and complex enterprise-scale system integration...
Worldwide
Amdocs
Jersey City, NJ
4 days ago
Champ AI - Deployment Architect / Forward Deployed Engineer
...evaluations and feedback loops. The Role We\'re looking for a Deployment Architect / Forward Deployed Engineer to own the path from "first customer call" to "signed... ...can read a codebase, write a Python script, debug an LLM prompt, sketch an integration, and hold your own with...
Contract work
Live in
Day shift
deCircle
San Francisco, CA
4 days ago
Agent Deployment Engineer (Residency Program)
...re building the world’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform patient... ...unless otherwise specified. About the Role We're seeking a Agent Deployment Engineer to join our collaborative team of engineers, scientists, and...
Work at office
AI Chopping Block, Inc.
Palo Alto, CA
1 day ago
Staff Forward Deployment Engineer
## Staff Forward Deployment EngineerApplyremote type: Hybridlocations: United States - Boston... ...the product doesn't do yet. With Evo's engineering team, you bring back what you've learned... ....* You've built something with LLM SDKs, MCP, or agent frameworks.## It'd be...
Work at office
Work from home
Flexible hours
Snyk Ltd.
Boston, MA
1 day ago
Deployment DevOps Engineer
...that enables enterprises to specialize and deploy LLMs into production with measurable... ...models at scale - pioneering task-specific LLM development and running production-ready... ...customers. About the role As a DevOps Engineer in our Product Staff, you will help...
Live in
Work at office
Remote work
Relocation
Visa sponsorship
Adaptive ML
New York, NY
15 hours ago
AI Deployment Engineer | Startups
...AI Deployment Engineer The AI Deployment Engineering team is responsible for ensuring the safe and effective deployment of Generative AI applications... ...proficient in Python, JavaScript, and a strong grasp of AI/LLM best practices. Built and/or delivered prototypes on top...
Work at office
Remote work
Relocation package
OpenAI
United States
10 hours ago
Forward Deployment AI Engineer
$120k - $150k
...Lyra Tech Group portfolio company, is looking for a Forward Deployed AI Engineer to work directly inside their clients' businesses - rolling up... ...APIs and cloud services • Hands-on experience building with LLM APIs (OpenAI, Anthropic, Gemini, or similar) •...
Work at office
Lyra Technology Group
Chicago, IL
2 days ago
Forward Deployment Engineer - R01565619
$70 - $80 per hour
...Great Place to Work® certification year after year. Forward Deployment Engineer \n Primary Skills AWS, Python, and Claude (and related AI... ...and deploying solutions on AWS. Experience building LLM-powered applications and workflows, including practical experience...
Contract work
Remote work
Brillio
Jersey City, NJ
4 days ago
Senior Desktop Deployment Engineer-Washington, DC
...Senior Desktop Deployment Engineer Location: Washington, DC Rate: DOE $/hr. on W2 Position Type: Contract Interview Process: Phone Followed by F2F US Citizen, Green Card and GC EAD Job Description: We have a position at the IMF we are trying to fill, a Senior...
Contract work
Work at office
Staffing the Universe
Washington DC
15 hours ago
AI Deployment Engineer- Startups
$220k - $280k
AI Deployment Engineer- Startups Technical Success - San Francisco We are seeking a technically proficient, business-minded AI Deployment Engineer... ...proficient in Python, JavaScript, and a strong grasp of AI/LLM best practices. Built and/or delivered prototypes on top of...
Work at office
Relocation package
OpenAI
Los Angeles, CA
15 hours ago
AI Deployment Engineer: Hybrid, Production-Grade AI
$150k - $200k
EmergencyMD is seeking an AI Deployment Engineer based in New York. This role focuses on designing and deploying AI-powered applications, integrating... ..., with proficiency in Python and SQL, and experience with LLM systems. The position offers a competitive salary of $150k-$2...
EmergencyMD
New York, NY
3 days ago
Senior AI Deployment Engineer - Embedded with Enterprise Clients
...thinking technology firm in New York is looking for a Forward Deployed Engineer to integrate cutting-edge AI systems into client workflows. This... ...in production software development, particularly with LLM-powered applications. Competitive compensation is offered, reflecting...
Harvey
New York, NY
3 days ago
AI Deployment Engineer
$150k - $200k
AI Deployment Engineer AI & Automation Practice Stable Rock is hiring an AI Deployment Engineer to help build and scale a new AI & Automation... ...production systems using Python, APIs, databases, and modern LLM tooling. Integrate with CRMs, ERPs, accounting systems, internal...
Immediate start
Flexible hours
EmergencyMD
New York, NY
2 days ago
Client-Facing AI Deployment Engineer
A leading AI technology firm is looking for a passionate individual to collaborate with companies on production-ready LLM applications. The ideal candidate has over 3 years of experience in a technical role and enjoys working closely with customers. Responsibilities include...
LangChain
Brockport, NY
2 days ago
Senior Voice AI Deployment Engineer
HRB is seeking a Senior Forward Deployed Engineer in Redwood City to deliver voice AI solutions to enterprise clients. This role involves leading... ...communication, writes production-grade code, and understands LLM-powered systems. This is an exciting opportunity to shape the...
HRB
Redwood City, CA
4 days ago
Remote Ericsson Deployment Engineer- Equity & Training
A leading telecommunications company is seeking a Cellular Deployment Engineer to manage the hands-on deployment and troubleshooting of licensed spectrum products. This role requires strong experience with Ericsson systems and offers competitive compensation, including...
Remote work
Hamilton Barnes ?
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference Deployment Engineer. Be the first to apply!