Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference & GPU Systems Consultant

United Software Group Inc

Job Title: LLM Inference & GPU Systems Consultant

Location: Charlotte, NC

Interview: Video Interview

Description:

We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities

NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.

Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.

Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.

Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.

Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications

8+ years’ experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.

8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).

Proficiency in OpenShift AI and GPU orchestration tools like RunAI.

Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.

Proven track record managing the Hugging Face deployment lifecycle.

Must be onsite at client in Charlotte, NC at least 3 days/week

Vacancy posted 5 hours ago
Similar jobs that could be interesting for youBased on the LLM Inference & GPU Systems Consultant in Charlotte, NC vacancy
  • United Software Group Inc is searching for an LLM Inference & GPU Systems Consultant based in Charlotte, NC. The role involves building and maintaining large-scale on-prem LLM infrastructure, specifically focusing on NVIDIA H200 GPU clusters and OpenShift AI deployment... 
    Suggested
    3 days per week

    United Software Group Inc

    Charlotte, NC
    2 days ago
  •  ...review of third-party AI integrations, and designing sophisticated systems like retrieval frameworks. Candidates should have extensive experience in programming, LLM optimization, Kubernetes, and GPU orchestration. This position offers an exciting opportunity to shape... 
    Suggested

    NTT DATA North America

    Charlotte, NC
    3 days ago
  • Burns & McDonnell is seeking a Consultant - Grid Systems in Charlotte, NC. This role requires hands-on technical consulting within electric utility environments, focusing on grid management systems and operational technology. You will be involved in the configuration, analysis... 
    Suggested

    Burns & McDonnell

    Charlotte, NC
    1 day ago
  • A technology solutions company is seeking a Gen AI Architect to lead the design and implementation of AI/ML architecture. The successful candidate will have over 10 years of experience in enterprise environments, specializing in large-scale AI/ML deployments, including ...
    Suggested

    Accord Technologies Inc

    Charlotte, NC
    2 days ago
  • A leading consulting firm is seeking a Senior Associate for Digital Business Systems Consulting in Charlotte, NC. This role involves assessing client systems, supporting ERP implementations, and consulting on best practices. Ideal candidates have a Bachelor's degree and... 
    Suggested
    Flexible hours

    Elliott Davis, LLC

    Charlotte, NC
    2 days ago
  •  ...capabilities — such as new inference providers or evolving...  ...demonstrate how GSIs, consulting firms, and technology...  ...Translate multi-layer AI system architectures spanning...  ...security enforcement for LLM traffic, prompt...  ...NVIDIA AI Enterprise, and GPU-accelerated inference infrastructure... 
    Immediate start
    Remote work
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Charlotte, NC
    4 days ago
  •  ...IT Coordinator (Full Time) Summary Job Description The Covenant Day School Systems & Network Support Specialist is responsible for providing frontline technical support, managing core IT systems, maintaining end-user devices, and ensuring reliable connectivity across... 
    Hourly pay
    Full time
    Work at office
    Remote work

    Covenant Day School

    Matthews, NC
    2 days ago
  •  ...Generative AI, distributed systems, and agentic...  ...reliability. Infrastructure, Inference & Edge Computing: Design...  ...platforms. Optimize LLM inference, implementing...  ...strategy, including rigorous GPU management, utilization...  ...and technology consulting, data and artificial intelligence... 
    Local area

    NTT Data Americas, Inc.

    Charlotte, NC
    3 days ago
  •  ...logic. Architect sophisticated retrieval systems and agent data stacks, utilizing vector...  ...systems and AI/ML platforms. Optimize LLM inference, implementing advanced batching, caching...  ...premise hardware strategy, including rigorous GPU management, utilization, and thermal/... 
    Local area

    NTT DATA North America

    Charlotte, NC
    3 days ago
  •  ...Conversational assistants and copilots integrated into enterprise systems Retrieval-Augmented Generation (RAG) pipelines for knowledge-...  ...for workflow execution and decision support ~ Integrate LLM services (Azure OpenAI, OpenAI, Anthropic, etc.) into Java-based... 
    Contract work

    Datum Technologies Group

    Charlotte, NC
    5 days ago
  •  ...Senior Consultant In Artificial Intelligence We are seeking a highly skilled Senior Consultant in Artificial...  ...have hands-on experience designing and implementing LLM-powered applications, building RAG-based systems, and developing AI/ML solutions using Python. The candidate... 

    Argyle Infotech

    Charlotte, NC
    4 hours ago
  • $124k - $280k

     ...work will involve designing and optimising algorithms, models, and systems to enable intelligent decision-making and automation. Growing...  ...enhancing LLMs Experience in prompt engineering for LLM outputs Developing scalable data storage solutions using cloud... 
    Full time
    H1b

    PwC

    Charlotte, NC
    1 day ago
  •  ...Opportunity In this contingent resource assignment, you may: Consult as an expert to develop or influence initiatives and resources...  ...business and technical needs across Technology Business Systems Consulting. Consult on the strategy and resolution of highly... 

    Mindlance

    Charlotte, NC
    5 days ago
  •  ...AI/ML Inference Engineer Major Financial Services Organization - Charlotte, NC 3 Open...  ...intersection of large language model serving, GPU infrastructure, and enterprise MLOps -...  ...on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang... 
    Immediate start

    Hallmark Global Solutions Ltd

    Charlotte, NC
    1 day ago
  •  ...Charlotte, NC Introduction In this contingent resource assignment, you may consult on or participate in moderately complex initiatives and deliverables within Technology Business Systems Consulting. This role involves contributing to large-scale planning related to... 
    Work experience placement
    Immediate start

    Artech

    Charlotte, NC
    2 days ago
  • Une agence d'expérience digitale recherche un expert en IA pour rejoindre sa communauté Innovation & IA. Le candidat idéal aura plus de 5 ans d'expérience en IA/ML et sera en charge de concevoir des solutions allant du POC à la mise en production. Ce poste offre une flexibilit...

    Atecna

    Charlotte, NC
    3 days ago
  • $77k - $202k

     ...work will involve designing and optimising algorithms, models, and systems to enable intelligent decision-making and automation....  ...techniques enhancing LLMs - Experience in prompt engineering for LLM outputs - Designing thorough data architecture strategies -... 
    Full time
    H1b

    PwC

    Charlotte, NC
    3 days ago
  • $70 per hour

     ...financial services industry, is seeking a Middleware Integration Consultant (IBM ACE) to join their team. As a Middleware Integration...  ...be part of the IT Department supporting the Data Migration and System Transition team. The ideal candidate will demonstrate strong problem... 
    Weekly pay
    Contract work
    Temporary work
    Flexible hours

    Experis

    Charlotte, NC
    4 days ago
  • $70 - $80 per hour

     ...Third Parties / No 1099 Project Overview: Profile Deposits System Transition Our client is undertaking a critical system...  ...into the Profile system. We are seeking 3-4 Senior Middleware Consultants to lead this initiative. The engagement will focus on complex... 
    Weekly pay
    Contract work
    Temporary work
    Flexible hours

    Experis

    Charlotte, NC
    2 days ago
  •  ...ll post it later today - should get approval and pop out in beeline next week. She has not opened up a Tech Business Systems Consultant role before. She's in uptown - so ideally that location OR Minneapolis would be another option. She'll have... 
    Work at office

    Mindlance

    Charlotte, NC
    1 day ago
  •  ...Job Title: Technology Business Systems Consultant Duration: 12+ Months Location: 300 S Brevard St., Charlotte, NC 28202 (Hybrid - Onsite & Remote) Job Description: We are seeking a Technology Business Systems Consultant for a long-term contract role with... 
    Long term contract
    Remote work

    Veracity

    Charlotte, NC
    3 days ago
  •  ...Technology Business Systems Consultant Location: Charlotte, NC Schedule: Hybrid Duration: 9 Months Position Overview We are seeking a Technology Business Systems Consultant to support Financial Crimes Risk Management applications through reporting analysis, data... 
    Work at office

    Leading Utilities Organization

    Charlotte, NC
    4 hours ago
  •  ...Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Technology Business Systems Consulting. Review and analyze complex multi-faceted, larger scale or longer-term Technology Business... 
    Work experience placement
    Work at office

    Artech

    Charlotte, NC
    1 day ago
  •  ...developing scalable AI solutions, leading integration efforts, and ensuring compliance. The ideal candidate has expertise in large-scale AI systems and a strong understanding of AI ethics. This position offers a dynamic work environment and opportunities for professional growth.... 

    Ethereum Technologies LLC

    Charlotte, NC
    2 days ago
  • $248k - $396.75k

     ..., operator development, node health monitoring and working with GPU resource scheduling. We welcome out-of-the-box thinkers who can...  ...You will be part of a DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety... 

    NVIDIA

    Belmont, NC
    4 hours ago
  •  ...Agentic Software Engineer to design and operate production-grade AI systems. You will be responsible for end-to-end development, including...  ...and a strong backend focus, ideally with proven capabilities in LLM-powered applications. This full-time role offers opportunities... 
    Full time

    Siemens AG

    Charlotte, NC
    2 days ago
  • $129.6k - $233.3k

    Senior Agentic AI Systems Engineer (Compliance & Licences) - US Hybrid Charlotte, NC About the Role We are seeking a Senior Agentic AI Systems...  ...backend or systems background. Proven experience building LLM‑powered applications beyond prototypes. Hands‑on experience... 
    Local area

    Siemens Mobility

    Charlotte, NC
    4 days ago
  • $120k - $125k

     ...Generative AI (GenAI), and Large Language Model (LLM) frameworks. This role involves architecting end-to-end AI systems, guiding development teams, and ensuring robust...  ..., feature engineering, model training, inference pipelines, and monitoring frameworks. Lead the... 
    Local area

    Argyle Infotech

    Charlotte, NC
    5 days ago
  •  ...Description 1898 & Co., a division of Burns & McDonnell, is seeking an experienced Grid Systems Solution Architect to provide utility grid operations modernization consulting services for our electric utility clients. The selected candidate will join the Enterprise... 
    Full time
    Work experience placement

    Burns & McDonnell

    Charlotte, NC
    4 days ago
  • $122.29k

     ...that you submit will be collected and reviewed by associates, consultants, and vendors of Regions in order to evaluate your qualifications...  ...by visiting and logging into the careers section of the system. Job Description: At Regions, the Solutions Architect oversees... 
    Full time
    For contractors
    Work at office
    Relocation
    Visa sponsorship
    Work visa
    Relocation package
    Flexible hours
    3 days per week

    Regions Bank

    Charlotte, NC
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference & GPU Systems Consultant. Be the first to apply!