Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference & GPU Systems Consultant

United Software Group Inc

Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte, NC Interview: Video Interview Description: We are seeking an AI Infrastructure Runtime Engineer to build and maintain large‑scale on‑prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self‑hosting open‑source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine‑tuning pipelines. Key Responsibilities NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management. Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM. Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration. Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement. Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads. Required Qualifications 8+ years’ experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer. 8+ years hands‑on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode). Proficiency in OpenShift AI and GPU orchestration tools like RunAI. Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM. Proven track record managing the Hugging Face deployment lifecycle. Must be onsite at client in Charlotte, NC at least 3 days/week. #J-18808-Ljbffr United Software Group Inc

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the LLM Inference & GPU Systems Consultant in Charlotte, NC vacancy
  •  ...Job Title: LLM Inference & GPU Systems Consultant Location: Charlotte, NC Interview: Video Interview Description: We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private... 
    Suggested
    3 days per week

    United Software Group

    Charlotte, NC
    3 days ago
  •  ...organization, apply now. We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (...  ...companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as... 
    Suggested
    Remote work

    The Nippon Telegraph and Telephone Corporation (NTT)

    Charlotte, NC
    2 days ago
  • United Software Group Inc is searching for an LLM Inference & GPU Systems Consultant based in Charlotte, NC. The role involves building and maintaining large-scale on-prem LLM infrastructure, specifically focusing on NVIDIA H200 GPU clusters and OpenShift AI deployment... 
    Suggested
    3 days per week

    United Software Group Inc

    Charlotte, NC
    4 days ago
  •  ...review of third-party AI integrations, and designing sophisticated systems like retrieval frameworks. Candidates should have extensive experience in programming, LLM optimization, Kubernetes, and GPU orchestration. This position offers an exciting opportunity to shape... 
    Suggested

    NTT DATA North America

    Charlotte, NC
    23 hours ago
  • Description At 1898 & Co., a part of Burns & McDonnell, the Enterprise System Integration (ESI) team delivers technically rigorous solutions...  ..., secure, and real‑time grid operations. We are seeking a Consultant - Grid Systems to perform hands‑on technical consulting and... 
    Suggested
    Full time
    Work at office

    Burns & McDonnell

    Charlotte, NC
    3 days ago
  • Burns & McDonnell is seeking a Consultant - Grid Systems in Charlotte, NC. This role requires hands-on technical consulting within electric utility environments, focusing on grid management systems and operational technology. You will be involved in the configuration, analysis... 

    Burns & McDonnell

    Charlotte, NC
    3 days ago
  •  ...Principal Consultant, Integrated System Planning Location: Overland Park, KS, US Atlanta, GA, US Tampa, FL, US Dallas, TX, US Orlando, FL, US Ann Arbor, MI, US Jacksonville, FL, US Tualatin, OR, US Houston, TX, US Cary, NC, US Phoenix, AZ, US Charlotte, NC, US Austin... 
    Full time
    Part time
    Work experience placement
    Work at office
    Relocation
    Visa sponsorship
    Flexible hours

    Black & Veatch

    Charlotte, NC
    2 days ago
  • A technology solutions company is seeking a Gen AI Architect to lead the design and implementation of AI/ML architecture. The successful candidate will have over 10 years of experience in enterprise environments, specializing in large-scale AI/ML deployments, including ...

    Accord Technologies Inc

    Charlotte, NC
    4 days ago
  • A leading consulting firm is seeking a Senior Associate for Digital Business Systems Consulting in Charlotte, NC. This role involves assessing client systems, supporting ERP implementations, and consulting on best practices. Ideal candidates have a Bachelor's degree and... 
    Flexible hours

    Elliott Davis, LLC

    Charlotte, NC
    4 days ago
  • $119.77k - $140.9k

     ...technical guidance for planning, directing and monitoring operating system software and hardware. Analyzes project requirements and...  ...system software and hardware solutions Ability to provide technical consulting on complex projects Ability to formulate and define... 
    Full time
    Temporary work
    Local area
    3 days per week

    U.S. Bank

    Charlotte, NC
    1 day ago
  •  ...capabilities — such as new inference providers or evolving...  ...demonstrate how GSIs, consulting firms, and technology...  ...Translate multi-layer AI system architectures spanning...  ...security enforcement for LLM traffic, prompt...  ...NVIDIA AI Enterprise, and GPU-accelerated inference infrastructure... 
    Immediate start
    Remote work
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Charlotte, NC
    1 day ago
  •  ...logic. Architect sophisticated retrieval systems and agent data stacks, utilizing vector...  ...systems and AI/ML platforms. Optimize LLM inference, implementing advanced batching, caching...  ...premise hardware strategy, including rigorous GPU management, utilization, and thermal/... 
    Local area

    NTT DATA North America

    Charlotte, NC
    23 hours ago
  •  ...Conversational assistants and copilots integrated into enterprise systems Retrieval-Augmented Generation (RAG) pipelines for knowledge-...  ...for workflow execution and decision support ~ Integrate LLM services (Azure OpenAI, OpenAI, Anthropic, etc.) into Java-based... 
    Contract work

    Datum Technologies Group

    Charlotte, NC
    2 days ago
  •  ...Opportunity In this contingent resource assignment, you may: Consult as an expert to develop or influence initiatives and resources...  ...business and technical needs across Technology Business Systems Consulting. Consult on the strategy and resolution of highly... 

    Mindlance

    Charlotte, NC
    2 days ago
  •  ...AI Engineer – GenAI / Agentic Systems Location: Charlotte, NC or Dallas, TX Schedule: On-site 5 days/week What You'll Do Build...  ...(AWS, Azure, or GCP). Build scalable REST APIs that power LLM-driven applications integrated with enterprise data sources.... 

    Suncap Technology

    Charlotte, NC
    4 days ago
  •  ...AI Engineer – GenAI / Agentic Systems Our client is seeking an AI Engineer – GenAI / Agentic Systems to design and build enterprise-grade generative AI applications leveraging modern LLM architectures. This role focuses on developing agentic AI systems, RAG pipelines... 

    TheStaffed

    Charlotte, NC
    2 days ago
  • $55 - $65 per hour

    AI/LLM Engineer - 18 month W2 contract, hybrid 3 days onsite / 2 days remote. Charlotte...  ...- AI/LLM Engineer (RAG & Agentic Systems) We are seeking a highly motivated AI/LLM...  ...adaptation techniques, prompt engineering, and inference optimization. Implement model safety mechanisms... 
    Contract work
    Internship
    Local area
    Remote work

    The Matlen Silver Group, Inc.

    Charlotte, NC
    1 day ago
  • Une agence d'expérience digitale recherche un expert en IA pour rejoindre sa communauté Innovation & IA. Le candidat idéal aura plus de 5 ans d'expérience en IA/ML et sera en charge de concevoir des solutions allant du POC à la mise en production. Ce poste offre une flexibilit...

    Atecna

    Charlotte, NC
    23 hours ago
  •  ...AI/ML Inference Engineer Major Financial Services Organization - Charlotte, NC 3 Open...  ...intersection of large language model serving, GPU infrastructure, and enterprise MLOps -...  ...on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang... 
    Immediate start

    Hallmark Global Solutions Ltd

    Charlotte, NC
    4 days ago
  •  ...developing scalable AI solutions, leading integration efforts, and ensuring compliance. The ideal candidate has expertise in large-scale AI systems and a strong understanding of AI ethics. This position offers a dynamic work environment and opportunities for professional growth.... 

    Ethereum Technologies LLC

    Charlotte, NC
    4 days ago
  • $159k - $305k

     ...team. The Frontier AI Model Methodology team plays a critical role in developing and productionizing methodologies, AI agents, and systems that transform and accelerate model development and validation at Wells Fargo scale. One of its key functions is to integrate, in close... 
    Work experience placement
    Visa sponsorship

    Wells Fargo

    Charlotte, NC
    1 day ago
  •  ...Experience with mechanical, electrical and software aspects of system integration and conducting throughput analyses. Ability to...  ...with mathematical concepts such as probability and statistical inference, and fundamentals of plane and solid geometry and trigonometry.... 
    Flexible hours

    Murata Machinery USA Inc

    Charlotte, NC
    4 days ago
  •  ...ll post it later today - should get approval and pop out in beeline next week. She has not opened up a Tech Business Systems Consultant role before. She's in uptown - so ideally that location OR Minneapolis would be another option. She'll have... 
    Work at office

    Mindlance

    Charlotte, NC
    3 days ago
  •  ...Job Title: Technology Business Systems Consultant Duration: 12+ Months Location: 300 S Brevard St., Charlotte, NC 28202 (Hybrid - Onsite & Remote) Job Description: We are seeking a Technology Business Systems Consultant for a long-term contract role with... 
    Long term contract
    Remote work

    Veracity

    Charlotte, NC
    23 hours ago
  •  ...Technology Business Systems Consultant Location: Charlotte, NC Schedule: Hybrid Duration: 9 Months Position Overview We are seeking a Technology Business Systems Consultant to support Financial Crimes Risk Management applications through reporting analysis, data... 
    Work at office

    Leading Utilities Organization

    Charlotte, NC
    2 days ago
  •  ...Agentic Software Engineer to design and operate production-grade AI systems. You will be responsible for end-to-end development, including...  ...and a strong backend focus, ideally with proven capabilities in LLM-powered applications. This full-time role offers opportunities... 
    Full time

    Siemens AG

    Charlotte, NC
    4 days ago
  •  ...Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Technology Business Systems Consulting. Review and analyze complex multi-faceted, larger scale or longer-term Technology Business... 
    Work experience placement
    Work at office

    Artech

    Charlotte, NC
    3 days ago
  • $248k - $396.75k

     ..., operator development, node health monitoring and working with GPU resource scheduling. We welcome out-of-the-box thinkers who can...  ...You will be part of a DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety... 

    NVIDIA

    Belmont, NC
    2 days ago
  • $76.1k - $104.6k

     ...fully complete assigned projects including: hardware design, system programming, installation coordination, system and network commissioning...  ...viable long-term relationships with contractors**, clients, consultants, and subcontractors. Attends job progress meetings as required... 
    Contract work
    For contractors
    For subcontractor
    Work at office
    Local area

    YDU JC Air Cond & Ref Inc.- Dubai

    Charlotte, NC
    3 days ago
  • $76.1k - $104.6k

     ...and fully complete assigned projects including hardware design, system programming, installation coordination, system and network...  ...maintain viable long‑term relationships with contractors, clients, consultants and subcontractors. Coordinate with Project Engineers on... 
    For contractors
    For subcontractor
    Local area

    Johnson Controls

    Charlotte, NC
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference & GPU Systems Consultant. Be the first to apply!