LLM Inference & GPU Systems Consultant
United Software Group Inc
Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte, NC Interview: Video Interview Description: We are seeking an AI Infrastructure Runtime Engineer to build and maintain large‑scale on‑prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self‑hosting open‑source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine‑tuning pipelines. Key Responsibilities NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management. Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM. Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration. Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement. Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads. Required Qualifications 8+ years’ experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer. 8+ years hands‑on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode). Proficiency in OpenShift AI and GPU orchestration tools like RunAI. Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM. Proven track record managing the Hugging Face deployment lifecycle. Must be onsite at client in Charlotte, NC at least 3 days/week. #J-18808-Ljbffr United Software Group Inc
- ...Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte, NC Interview: Video Interview Description: We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private...Suggested3 days per week
- ...organization, apply now. We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (... ...companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as...SuggestedRemote work
- United Software Group Inc is searching for an LLM Inference & GPU Systems Consultant based in Charlotte, NC. The role involves building and maintaining large-scale on-prem LLM infrastructure, specifically focusing on NVIDIA H200 GPU clusters and OpenShift AI deployment...Suggested3 days per week
- ...review of third-party AI integrations, and designing sophisticated systems like retrieval frameworks. Candidates should have extensive experience in programming, LLM optimization, Kubernetes, and GPU orchestration. This position offers an exciting opportunity to shape...Suggested
- Description At 1898 & Co., a part of Burns & McDonnell, the Enterprise System Integration (ESI) team delivers technically rigorous solutions... ..., secure, and real‑time grid operations. We are seeking a Consultant - Grid Systems to perform hands‑on technical consulting and...SuggestedFull timeWork at office
- Burns & McDonnell is seeking a Consultant - Grid Systems in Charlotte, NC. This role requires hands-on technical consulting within electric utility environments, focusing on grid management systems and operational technology. You will be involved in the configuration, analysis...
- ...Principal Consultant, Integrated System Planning Location: Overland Park, KS, US Atlanta, GA, US Tampa, FL, US Dallas, TX, US Orlando, FL, US Ann Arbor, MI, US Jacksonville, FL, US Tualatin, OR, US Houston, TX, US Cary, NC, US Phoenix, AZ, US Charlotte, NC, US Austin...Full timePart timeWork experience placementWork at officeRelocationVisa sponsorshipFlexible hours
- A technology solutions company is seeking a Gen AI Architect to lead the design and implementation of AI/ML architecture. The successful candidate will have over 10 years of experience in enterprise environments, specializing in large-scale AI/ML deployments, including ...
- A leading consulting firm is seeking a Senior Associate for Digital Business Systems Consulting in Charlotte, NC. This role involves assessing client systems, supporting ERP implementations, and consulting on best practices. Ideal candidates have a Bachelor's degree and...Flexible hours
$119.77k - $140.9k
...technical guidance for planning, directing and monitoring operating system software and hardware. Analyzes project requirements and... ...system software and hardware solutions Ability to provide technical consulting on complex projects Ability to formulate and define...Full timeTemporary workLocal area3 days per week- ...capabilities — such as new inference providers or evolving... ...demonstrate how GSIs, consulting firms, and technology... ...Translate multi-layer AI system architectures spanning... ...security enforcement for LLM traffic, prompt... ...NVIDIA AI Enterprise, and GPU-accelerated inference infrastructure...Immediate startRemote workVisa sponsorshipWork visa
- ...logic. Architect sophisticated retrieval systems and agent data stacks, utilizing vector... ...systems and AI/ML platforms. Optimize LLM inference, implementing advanced batching, caching... ...premise hardware strategy, including rigorous GPU management, utilization, and thermal/...Local area
- ...Conversational assistants and copilots integrated into enterprise systems Retrieval-Augmented Generation (RAG) pipelines for knowledge-... ...for workflow execution and decision support ~ Integrate LLM services (Azure OpenAI, OpenAI, Anthropic, etc.) into Java-based...Contract work
- ...Opportunity In this contingent resource assignment, you may: Consult as an expert to develop or influence initiatives and resources... ...business and technical needs across Technology Business Systems Consulting. Consult on the strategy and resolution of highly...
- ...AI Engineer – GenAI / Agentic Systems Location: Charlotte, NC or Dallas, TX Schedule: On-site 5 days/week What You'll Do Build... ...(AWS, Azure, or GCP). Build scalable REST APIs that power LLM-driven applications integrated with enterprise data sources....
- ...AI Engineer – GenAI / Agentic Systems Our client is seeking an AI Engineer – GenAI / Agentic Systems to design and build enterprise-grade generative AI applications leveraging modern LLM architectures. This role focuses on developing agentic AI systems, RAG pipelines...
$55 - $65 per hour
AI/LLM Engineer - 18 month W2 contract, hybrid 3 days onsite / 2 days remote. Charlotte... ...- AI/LLM Engineer (RAG & Agentic Systems) We are seeking a highly motivated AI/LLM... ...adaptation techniques, prompt engineering, and inference optimization. Implement model safety mechanisms...Contract workInternshipLocal areaRemote work- Une agence d'expérience digitale recherche un expert en IA pour rejoindre sa communauté Innovation & IA. Le candidat idéal aura plus de 5 ans d'expérience en IA/ML et sera en charge de concevoir des solutions allant du POC à la mise en production. Ce poste offre une flexibilit...
- ...AI/ML Inference Engineer Major Financial Services Organization - Charlotte, NC 3 Open... ...intersection of large language model serving, GPU infrastructure, and enterprise MLOps -... ...on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang...Immediate start
- ...developing scalable AI solutions, leading integration efforts, and ensuring compliance. The ideal candidate has expertise in large-scale AI systems and a strong understanding of AI ethics. This position offers a dynamic work environment and opportunities for professional growth....
$159k - $305k
...team. The Frontier AI Model Methodology team plays a critical role in developing and productionizing methodologies, AI agents, and systems that transform and accelerate model development and validation at Wells Fargo scale. One of its key functions is to integrate, in close...Work experience placementVisa sponsorship- ...Experience with mechanical, electrical and software aspects of system integration and conducting throughput analyses. Ability to... ...with mathematical concepts such as probability and statistical inference, and fundamentals of plane and solid geometry and trigonometry....Flexible hours
- ...ll post it later today - should get approval and pop out in beeline next week. She has not opened up a Tech Business Systems Consultant role before. She's in uptown - so ideally that location OR Minneapolis would be another option. She'll have...Work at office
- ...Job Title: Technology Business Systems Consultant Duration: 12+ Months Location: 300 S Brevard St., Charlotte, NC 28202 (Hybrid - Onsite & Remote) Job Description: We are seeking a Technology Business Systems Consultant for a long-term contract role with...Long term contractRemote work
- ...Technology Business Systems Consultant Location: Charlotte, NC Schedule: Hybrid Duration: 9 Months Position Overview We are seeking a Technology Business Systems Consultant to support Financial Crimes Risk Management applications through reporting analysis, data...Work at office
- ...Agentic Software Engineer to design and operate production-grade AI systems. You will be responsible for end-to-end development, including... ...and a strong backend focus, ideally with proven capabilities in LLM-powered applications. This full-time role offers opportunities...Full time
- ...Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Technology Business Systems Consulting. Review and analyze complex multi-faceted, larger scale or longer-term Technology Business...Work experience placementWork at office
$248k - $396.75k
..., operator development, node health monitoring and working with GPU resource scheduling. We welcome out-of-the-box thinkers who can... ...You will be part of a DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety...$76.1k - $104.6k
...fully complete assigned projects including: hardware design, system programming, installation coordination, system and network commissioning... ...viable long-term relationships with contractors**, clients, consultants, and subcontractors. Attends job progress meetings as required...Contract workFor contractorsFor subcontractorWork at officeLocal area$76.1k - $104.6k
...and fully complete assigned projects including hardware design, system programming, installation coordination, system and network... ...maintain viable long‑term relationships with contractors, clients, consultants and subcontractors. Coordinate with Project Engineers on...For contractorsFor subcontractorLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM Inference & GPU Systems Consultant. Be the first to apply!
- consultant part time Charlotte, NC
- power bi consultant Charlotte, NC
- therapy consultant Charlotte, NC
- loss control consultant Charlotte, NC
- communications consultant Charlotte, NC
- epicor consultant Charlotte, NC
- infrastructure consultant Charlotte, NC
- compensation consultant Charlotte, NC
- treasury consultant Charlotte, NC
- retirement consultant Charlotte, NC

