LLM Inference & GPU Systems Consultant
United Software Group Inc
Job Title: LLM Inference & GPU Systems Consultant
Location: Charlotte, NC
Interview: Video Interview
Description:
We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.
Key Responsibilities
NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.
Required Qualifications
8+ years’ experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.
Must be onsite at client in Charlotte, NC at least 3 days/week
- United Software Group Inc is searching for an LLM Inference & GPU Systems Consultant based in Charlotte, NC. The role involves building and maintaining large-scale on-prem LLM infrastructure, specifically focusing on NVIDIA H200 GPU clusters and OpenShift AI deployment...Suggested3 days per week
- ...review of third-party AI integrations, and designing sophisticated systems like retrieval frameworks. Candidates should have extensive experience in programming, LLM optimization, Kubernetes, and GPU orchestration. This position offers an exciting opportunity to shape...Suggested
- Burns & McDonnell is seeking a Consultant - Grid Systems in Charlotte, NC. This role requires hands-on technical consulting within electric utility environments, focusing on grid management systems and operational technology. You will be involved in the configuration, analysis...Suggested
- A technology solutions company is seeking a Gen AI Architect to lead the design and implementation of AI/ML architecture. The successful candidate will have over 10 years of experience in enterprise environments, specializing in large-scale AI/ML deployments, including ...Suggested
- A leading consulting firm is seeking a Senior Associate for Digital Business Systems Consulting in Charlotte, NC. This role involves assessing client systems, supporting ERP implementations, and consulting on best practices. Ideal candidates have a Bachelor's degree and...SuggestedFlexible hours
- ...capabilities — such as new inference providers or evolving... ...demonstrate how GSIs, consulting firms, and technology... ...Translate multi-layer AI system architectures spanning... ...security enforcement for LLM traffic, prompt... ...NVIDIA AI Enterprise, and GPU-accelerated inference infrastructure...Immediate startRemote workVisa sponsorshipWork visa
- ...IT Coordinator (Full Time) Summary Job Description The Covenant Day School Systems & Network Support Specialist is responsible for providing frontline technical support, managing core IT systems, maintaining end-user devices, and ensuring reliable connectivity across...Hourly payFull timeWork at officeRemote work
- ...Generative AI, distributed systems, and agentic... ...reliability. Infrastructure, Inference & Edge Computing: Design... ...platforms. Optimize LLM inference, implementing... ...strategy, including rigorous GPU management, utilization... ...and technology consulting, data and artificial intelligence...Local area
- ...logic. Architect sophisticated retrieval systems and agent data stacks, utilizing vector... ...systems and AI/ML platforms. Optimize LLM inference, implementing advanced batching, caching... ...premise hardware strategy, including rigorous GPU management, utilization, and thermal/...Local area
- ...Conversational assistants and copilots integrated into enterprise systems Retrieval-Augmented Generation (RAG) pipelines for knowledge-... ...for workflow execution and decision support ~ Integrate LLM services (Azure OpenAI, OpenAI, Anthropic, etc.) into Java-based...Contract work
- ...Senior Consultant In Artificial Intelligence We are seeking a highly skilled Senior Consultant in Artificial... ...have hands-on experience designing and implementing LLM-powered applications, building RAG-based systems, and developing AI/ML solutions using Python. The candidate...
$124k - $280k
...work will involve designing and optimising algorithms, models, and systems to enable intelligent decision-making and automation. Growing... ...enhancing LLMs Experience in prompt engineering for LLM outputs Developing scalable data storage solutions using cloud...Full timeH1b- ...Opportunity In this contingent resource assignment, you may: Consult as an expert to develop or influence initiatives and resources... ...business and technical needs across Technology Business Systems Consulting. Consult on the strategy and resolution of highly...
- ...AI/ML Inference Engineer Major Financial Services Organization - Charlotte, NC 3 Open... ...intersection of large language model serving, GPU infrastructure, and enterprise MLOps -... ...on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang...Immediate start
- ...Charlotte, NC Introduction In this contingent resource assignment, you may consult on or participate in moderately complex initiatives and deliverables within Technology Business Systems Consulting. This role involves contributing to large-scale planning related to...Work experience placementImmediate start
- Une agence d'expérience digitale recherche un expert en IA pour rejoindre sa communauté Innovation & IA. Le candidat idéal aura plus de 5 ans d'expérience en IA/ML et sera en charge de concevoir des solutions allant du POC à la mise en production. Ce poste offre une flexibilit...
$77k - $202k
...work will involve designing and optimising algorithms, models, and systems to enable intelligent decision-making and automation.... ...techniques enhancing LLMs - Experience in prompt engineering for LLM outputs - Designing thorough data architecture strategies -...Full timeH1b$70 per hour
...financial services industry, is seeking a Middleware Integration Consultant (IBM ACE) to join their team. As a Middleware Integration... ...be part of the IT Department supporting the Data Migration and System Transition team. The ideal candidate will demonstrate strong problem...Weekly payContract workTemporary workFlexible hours$70 - $80 per hour
...Third Parties / No 1099 Project Overview: Profile Deposits System Transition Our client is undertaking a critical system... ...into the Profile system. We are seeking 3-4 Senior Middleware Consultants to lead this initiative. The engagement will focus on complex...Weekly payContract workTemporary workFlexible hours- ...ll post it later today - should get approval and pop out in beeline next week. She has not opened up a Tech Business Systems Consultant role before. She's in uptown - so ideally that location OR Minneapolis would be another option. She'll have...Work at office
- ...Job Title: Technology Business Systems Consultant Duration: 12+ Months Location: 300 S Brevard St., Charlotte, NC 28202 (Hybrid - Onsite & Remote) Job Description: We are seeking a Technology Business Systems Consultant for a long-term contract role with...Long term contractRemote work
- ...Technology Business Systems Consultant Location: Charlotte, NC Schedule: Hybrid Duration: 9 Months Position Overview We are seeking a Technology Business Systems Consultant to support Financial Crimes Risk Management applications through reporting analysis, data...Work at office
- ...Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Technology Business Systems Consulting. Review and analyze complex multi-faceted, larger scale or longer-term Technology Business...Work experience placementWork at office
- ...developing scalable AI solutions, leading integration efforts, and ensuring compliance. The ideal candidate has expertise in large-scale AI systems and a strong understanding of AI ethics. This position offers a dynamic work environment and opportunities for professional growth....
$248k - $396.75k
..., operator development, node health monitoring and working with GPU resource scheduling. We welcome out-of-the-box thinkers who can... ...You will be part of a DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety...- ...Agentic Software Engineer to design and operate production-grade AI systems. You will be responsible for end-to-end development, including... ...and a strong backend focus, ideally with proven capabilities in LLM-powered applications. This full-time role offers opportunities...Full time
$129.6k - $233.3k
Senior Agentic AI Systems Engineer (Compliance & Licences) - US Hybrid Charlotte, NC About the Role We are seeking a Senior Agentic AI Systems... ...backend or systems background. Proven experience building LLM‑powered applications beyond prototypes. Hands‑on experience...Local area$120k - $125k
...Generative AI (GenAI), and Large Language Model (LLM) frameworks. This role involves architecting end-to-end AI systems, guiding development teams, and ensuring robust... ..., feature engineering, model training, inference pipelines, and monitoring frameworks. Lead the...Local area- ...Description 1898 & Co., a division of Burns & McDonnell, is seeking an experienced Grid Systems Solution Architect to provide utility grid operations modernization consulting services for our electric utility clients. The selected candidate will join the Enterprise...Full timeWork experience placement
$122.29k
...that you submit will be collected and reviewed by associates, consultants, and vendors of Regions in order to evaluate your qualifications... ...by visiting and logging into the careers section of the system. Job Description: At Regions, the Solutions Architect oversees...Full timeFor contractorsWork at officeRelocationVisa sponsorshipWork visaRelocation packageFlexible hours3 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM Inference & GPU Systems Consultant. Be the first to apply!
- care consultant Charlotte, NC
- work from home nurse consultant Charlotte, NC
- aws consultant Charlotte, NC
- human performance consultant Charlotte, NC
- loss control consultant Charlotte, NC
- network relations consultant Charlotte, NC
- public sector consultant Charlotte, NC
- workflow consultant Charlotte, NC
- pega consultant Charlotte, NC
- patient consultant Charlotte, NC


