Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference & GPU Systems Consultant

United Software Group

Job Title: LLM Inference & GPU Systems Consultant

Location: Charlotte, NC

Interview: Video Interview

Description:

We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities

NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.

Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.

Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.

Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.

Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications

8+ years’ experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.

8+ years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).

Proficiency in OpenShift AI and GPU orchestration tools like RunAI.

Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.

Proven track record managing the Hugging Face deployment lifecycle.

Must be onsite at client in Charlotte, NC at least 3 days/week

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the LLM Inference & GPU Systems Consultant in Charlotte, NC vacancy
  •  ...organization, apply now. We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (...  ...companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as... 
    Suggested
    Remote work

    The Nippon Telegraph and Telephone Corporation (NTT)

    Charlotte, NC
    4 days ago
  •  ...capabilities — such as new inference providers or evolving...  ...demonstrate how GSIs, consulting firms, and technology...  ...Translate multi-layer AI system architectures spanning...  ...security enforcement for LLM traffic, prompt...  ...NVIDIA AI Enterprise, and GPU-accelerated inference infrastructure... 
    Suggested
    Immediate start
    Remote work
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Charlotte, NC
    4 days ago
  •  ...IT Coordinator (Full Time) Summary Job Description The Covenant Day School Systems & Network Support Specialist is responsible for providing frontline technical support, managing core IT systems, maintaining end-user devices, and ensuring reliable connectivity across... 
    Suggested
    Hourly pay
    Full time
    Work at office
    Remote work

    Covenant Day School

    Matthews, NC
    2 days ago
  •  ...Conversational assistants and copilots integrated into enterprise systems Retrieval-Augmented Generation (RAG) pipelines for knowledge-...  ...for workflow execution and decision support ~ Integrate LLM services (Azure OpenAI, OpenAI, Anthropic, etc.) into Java-based... 
    Suggested
    Contract work

    Datum Technologies Group

    Charlotte, NC
    22 hours ago
  •  ...Opportunity In this contingent resource assignment, you may: Consult as an expert to develop or influence initiatives and resources...  ...business and technical needs across Technology Business Systems Consulting. Consult on the strategy and resolution of highly... 
    Suggested

    Mindlance

    Charlotte, NC
    22 hours ago
  •  ...AI Engineer – GenAI / Agentic Systems Our client is seeking an AI Engineer – GenAI / Agentic Systems to design and build enterprise-grade generative AI applications leveraging modern LLM architectures. This role focuses on developing agentic AI systems, RAG pipelines... 

    TheStaffed

    Charlotte, NC
    5 hours ago
  •  ...AI Engineer – GenAI / Agentic Systems Location: Charlotte, NC or Dallas, TX Schedule: On-site 5 days/week What You'll Do Build...  ...(AWS, Azure, or GCP). Build scalable REST APIs that power LLM-driven applications integrated with enterprise data sources.... 

    Suncap Technology

    Charlotte, NC
    2 days ago
  •  ...AI/ML Inference Engineer Major Financial Services Organization - Charlotte, NC 3 Open...  ...intersection of large language model serving, GPU infrastructure, and enterprise MLOps -...  ...on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang... 
    Immediate start

    Hallmark Global Solutions Ltd

    Charlotte, NC
    2 days ago
  •  ...Charlotte, NC Introduction In this contingent resource assignment, you may consult on or participate in moderately complex initiatives and deliverables within Technology Business Systems Consulting. This role involves contributing to large-scale planning related to... 
    Work experience placement
    Immediate start

    Artech

    Charlotte, NC
    2 days ago
  • $77k - $202k

     ...work will involve designing and optimising algorithms, models, and systems to enable intelligent decision-making and automation....  ...techniques enhancing LLMs - Experience in prompt engineering for LLM outputs - Designing thorough data architecture strategies -... 
    Full time
    H1b

    PwC

    Charlotte, NC
    2 days ago
  •  ...Experience with mechanical, electrical and software aspects of system integration and conducting throughput analyses. Ability to...  ...with mathematical concepts such as probability and statistical inference, and fundamentals of plane and solid geometry and trigonometry.... 
    Flexible hours

    Murata Machinery USA Inc

    Charlotte, NC
    2 days ago
  • $45 - $50 per hour

     ...Technology Business Systems Consultant Location: Charlotte, NC Schedule: Hybrid Duration: 9 Months Pay Rate: $45-$50/hr Position Overview We are seeking a Technology Business Systems Consultant to support Financial Crimes Risk Management... 
    Work at office

    Strategic Staffing Solutions

    Charlotte, NC
    3 days ago
  •  ...Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Technology Business Systems Consulting. Review and analyze complex multi-faceted, larger scale or longer-term Technology Business... 
    Work experience placement
    Work at office

    Artech

    Charlotte, NC
    1 day ago
  •  ...Job Title: Technology Business Systems Consultant Duration: 12+ Months Location: 300 S Brevard St., Charlotte, NC 28202 (Hybrid - Onsite & Remote) Job Description: We are seeking a Technology Business Systems Consultant for a long-term contract role with... 
    Long term contract
    Remote work

    Veracity

    Charlotte, NC
    3 days ago
  •  ...ll post it later today - should get approval and pop out in beeline next week. She has not opened up a Tech Business Systems Consultant role before. She's in uptown - so ideally that location OR Minneapolis would be another option. She'll have... 
    Work at office

    Mindlance

    Charlotte, NC
    1 day ago
  • $140k - $150k

     ...the freedom to grow at a firm that is invested in your future, keep reading. The Opportunity Reporting to the Data Center and Systems Engineering Manager and Lead Virtual Infrastructure and Systems Architect, working as a member of the Technical Architecture team,... 
    Full time
    Local area
    Worldwide

    Seyfarth Shaw

    Charlotte, NC
    17 hours ago
  • $248k - $396.75k

     ..., operator development, node health monitoring and working with GPU resource scheduling. We welcome out-of-the-box thinkers who can...  ...You will be part of a DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety... 

    NVIDIA

    Belmont, NC
    22 hours ago
  • $120k - $125k

     ...Generative AI (GenAI), and Large Language Model (LLM) frameworks. This role involves architecting end-to-end AI systems, guiding development teams, and ensuring robust...  ..., feature engineering, model training, inference pipelines, and monitoring frameworks. Lead the... 
    Local area

    Argyle Infotech

    Charlotte, NC
    22 hours ago
  • $124k - $280k

     ...work will involve designing and optimising algorithms, models, and systems to enable intelligent decision-making and automation. Growing...  ...and unstructured healthcare data sources for use in AI and LLM-powered solutions Manage daily operations of a global healthcare... 
    Full time
    H1b

    PwC

    Charlotte, NC
    1 day ago
  • $169k - $219k

     ...fellow engineers. Ideal candidates bring 5+ years in backend roles and proficiency in technologies like AWS, Kotlin, and distributed systems. This position supports flexible remote work and includes competitive salaries ranging from $169,000 to $219,000 based on location... 
    Remote work
    Flexible hours

    Affirm

    Charlotte, NC
    8 days ago
  •  ...optimized for AI-native platforms, agentic systems, and rapid value iteration. This...  ...frameworks Agent design patterns Automated LLM evaluation systems Safety guardrails...  ...Feature stores Real-time inference pipelines Ensure architectural alignment... 

    Futran Tech Solutions Pvt. Ltd.

    Charlotte, NC
    22 hours ago
  •  ...Description 1898 & Co., a division of Burns & McDonnell, is seeking an experienced Grid Systems Solution Architect to provide utility grid operations modernization consulting services for our electric utility clients. The selected candidate will join the Enterprise... 
    Full time
    Work experience placement

    Burns & McDonnell

    Charlotte, NC
    3 days ago
  • $122.29k

     ...that you submit will be collected and reviewed by associates, consultants, and vendors of Regions in order to evaluate your qualifications...  ...by visiting and logging into the careers section of the system. Job Description: At Regions, the Solutions Architect oversees... 
    Full time
    For contractors
    Work at office
    Relocation
    Visa sponsorship
    Work visa
    Relocation package
    Flexible hours
    3 days per week

    Regions Bank

    Charlotte, NC
    2 days ago
  •  ...Description At 1898 & Co., part of Burns & McDonnell, our Enterprise Systems Integration (ESI) team helps clients modernize, integrate, and...  ...of OT and IT systems. With deep domain expertise and a consulting-led approach, our teams deliver solutions spanning control... 
    Full time
    Local area
    Remote work

    Burns & McDonnell

    Charlotte, NC
    1 day ago
  • $185k - $235k

     ...infrastructure and platforms to consulting, advisory, and managed services...  ...architectures spanning GPU/compute, data platforms, AI/ML...  ...software staock, MLOps pipeline, and inference deployment. Working knowledge of NVIDIA DGX/HGX systems, CUDA, AI Enterprise software suite... 
    Full time
    Shift work

    World Wide Technology

    Charlotte, NC
    3 days ago
  •  ...with the account manager to define and document the scope Provide initial solutions to clients with Evertz equipment selection & systems design, refine over the design process to provide final quotation and supporting documentation Work with internal resources and... 

    Evertz Microsystems

    Charlotte, NC
    1 day ago
  •  ...Lead Consultant Sonsoft, Inc. is a USA based corporation duly organized under the laws of the Commonwealth of Georgia. Sonsoft Inc. is...  ...sign off for testing phase Lead the team for integration / system and performance testing. Lead the deployment activities and... 
    Permanent employment
    Full time

    SonSoft

    Charlotte, NC
    3 hours ago
  • $138.15k - $168.85k

     ...Job Description: Pacific Life is seeking a Director, Treasury Systems Support to lead the strategy, implementation, and ongoing optimization of treasury technology platforms supporting our global Treasury organization. This role plays a critical part in enabling... 
    Contract work
    Temporary work
    Flexible hours

    Pacific Life

    Charlotte, NC
    4 hours ago
  •  ...Open Date 05/20/2026 Closing Date 06/03/2026 Summary: Manages, coordinates and collaborates with Information Systems Managers, Systems Engineers and CMS clients to ensure that CMS technologies achieve desired business and instructional objectives in... 
    Work at office
    Remote work

    Charlotte-Mecklenburg Schools

    Charlotte, NC
    22 hours ago
  • $198k - $273k

     ...your work truly matters. Job Description The Solutions Consultant is the evolution of the traditional Sales Engineering role, aligning...  ...security architectures. You are empowered with unmatched systems and tools and a team built on joint success. You won’t find... 
    Remote work

    Palo Alto Networks

    Charlotte, NC
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference & GPU Systems Consultant. Be the first to apply!