LLM Inference & GPU Systems Architect
NTT Data Americas, Inc.
Company Overview NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long-term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com. Role Overview We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an OpenShift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines. Key Responsibilities NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management. Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM. Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration. Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement. Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads. Required Qualifications 8 years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer. 8 years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode). Proficiency in OpenShift AI and GPU orchestration tools like RunAI. Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM. Proven track record managing the Hugging Face deployment lifecycle. Must be onsite at client in Charlotte, NC at least 3 days/week. NTT DATA endeavors to make accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at . This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you’d like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here. #J-18808-Ljbffr NTT Data Americas, Inc.
- ...AI/ML Inference Engineer Major Financial Services Organization - Charlotte... ...large language model serving, GPU infrastructure, and enterprise... ...0 GPU clusters using TensorRT-LLM, Triton Inference Server, and... ...(FP8, AWQ, GPTQ) Architect and manage GPU orchestration on...SuggestedImmediate start
- ...Systems Architect 4 – AI / Distributed Systems Locations: Dallas Charlotte San Francisco Bay Area (Onsite) Must-Have... ...engineering Experience with MCP servers Hands-on experience in LLM inference optimization Knowledge of batching and caching...Suggested
- ...capabilities — such as new inference providers or evolving... ...clear enough for a GSI architect to present directly to... ...multi-layer AI system architectures spanning... ...security enforcement for LLM traffic, prompt inspection... ...NVIDIA AI Enterprise, and GPU-accelerated inference infrastructure...SuggestedImmediate startRemote workVisa sponsorshipWork visa
- ...Job Title: Senior Technology Architect | Cloud Platform | Google Machine... ..., ML, Data Science, RAG,LLM Nice to have skills... ...and embedding-based retrieval systems. Integrate Gen AI capabilities... ...AI, supporting high-volume inference and secure enterprise...SuggestedWork at office
- ...AI Solution Architect – Agentic & Generative AI Locations: Austin... ...AI-native platforms, agentic systems, and rapid value iteration.... ...design patterns Automated LLM evaluation systems Safety... ...Feature stores Real-time inference pipelines Ensure architectural...Suggested
$185k - $235k
...you be doing? The AI Solutions Architect is the technical pre-sales... ...solution architectures spanning GPU/compute, data platforms, AI/ML... ...software staock, MLOps pipeline, and inference deployment. Working knowledge of NVIDIA DGX/HGX systems, CUDA, AI Enterprise software...Full timeShift work- ...Apache Kafka for real-time data streaming, and distributed computing frameworks. ~ Experience in optimizing and deploying AI models on GPU clusters, leveraging parallel processing capabilities for large-scale deep learning and generative AI applications. ~ multi-GPU...
- ...Data - Systems Architect 4 ~5+ years of Systems Architecture experience, or equivalent ~ Seeking a Data Architect with a strong HR tech domain background ~ Will help with system architecture, data center moves, rationalization efforts, and similar....
$118k - $176k
...datasets, train and optimize models, and maintain and improve model inference services. You will learn and apply new techniques from open... ...problems across Indeed. Work spans classical ML through LLM systems. You improve search and retrieval quality using real user signals...Work experience placementLocal area- ...evolving world of intelligent systems. Location: New York, NY... ...deployment and maintenance. Architect and deliver AI-powered solutions... ..., and drive innovation in LLM and audio ML applications.... ...model training, deployment, inference, and monitoring in production...Full time
- ...Role: Business Systems Architect with Artificial Intelligence Location: Charlotte, NC We are At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver...Temporary workFlexible hours
- ...~5+ years of Systems Architecture experience, or equivalent ~ Seeking a Data Architect with a strong HR tech domain background ~ Will help with system architecture, data center moves, rationalization efforts, and similar. ~ Looking for someone with deep HR...
- ...Cube AI System Architect This role is to support a new initiative - implementing cube AI (vendor product) which is a vendor product used to source and report regulatory changes across the Globe. The app runs on SQL server and.NET – most work will be on the API that...
- A leading telecommunications company in Charlotte is seeking a Senior HR Technology Analyst specializing in compensation processes. The role involves optimizing Oracle Fusion HCM modules, developing SQL queries, and maintaining annual compensation worksheets. Candidates...
- Burns & McDonnell is seeking an experienced Grid Systems Solution Architect in Charlotte, NC to provide consulting services for electric utility clients. You will focus on grid automation support, driving modernization initiatives for SCADA, EMS, OMS, DMS, and DERMS solutions...
- An organization focused on global humanitarian initiatives in Charlotte, N.C. is seeking an NRSE Manager to oversee critical readiness efforts. This full-time role involves leading the implementation of national response projects, managing budgets, and ensuring effective...Full time
$100k - $120k
A leading building controls manufacturer is looking for a System Architect for pre-sales activities in Charlotte, NC. This senior position requires 7+ years of experience in building automation and HVAC systems. Responsibilities include technical leadership, solution architecture...Remote job$100k - $120k
System Architect - Pre Sales - USA South East About Delta Intelligent Building Technologies (Canada) Inc. (formerly known as Delta Controls). Delta Intelligent Building Technologies (Canada) Inc. (a subsidiary of Delta Electronics) is a leading building controls manufacturer...Full timeRemote workFlexible hours$119.77k - $140.9k
...Description U.S. Bank is seeking a Technology Architect with strategic and influencing... ...solution design, defining architecture across systems, data, and AI capabilities. Translate... ...capabilities into enterprise applications (model inference, AI services, decisioning systems)...Temporary workWork experience placementLocal area3 days per week$111.1k - $197.5k
...Lead System Architect Wells Fargo has multiple openings in our Consumer Lending group for Lead System Architects passionate about becoming transformational agents for the company. The Lead Architect will be a forward thinker, passionate technologist, expert in modern...Work experience placementWork at office- ...Solution Architect – Ai Gateway & Intelligence Platform We are seeking a Solution Architect... ...Advanced exposure patterns for Ai/Llm-backed Apis The intersection of Ai, security... ...years designing large-scale distributed systems in enterprise environments ~ Deep Api platform...
$125.9k - $231.1k
...working world. Microsoft 365 AI Solution Architect (Manager) EY advises clients to... ...technical deployment and Copilot Control System configuration and testing. Responsible... ...IEC 42001, NIST AI RMF, EU AI Act, OWASP LLM Top 10. Ideally, you’ll also have...Summer holidayFlexible hours- ...Job Description Position Summary The Solutions Architect will serve as the primary architectural authority for the Data Analytics organization... ...• Evaluate and select emerging technologies in the GenAI and LLM space to enhance data discovery and automated insights. Data...
- ...Description 1898 & Co., a division of Burns & McDonnell, is seeking an experienced Grid Systems Solution Architect to provide utility grid operations modernization consulting services for our electric utility clients. The selected candidate will join the Enterprise...Full timeWork experience placement
$115k - $150k
...Resident Solution Architect Charlotte, NC About Starburst Starburst delivers enterprise intelligence at scale by giving organizations... ...or externally is highly desirable. ~ Familiarity with LLM-based data interaction patterns (e.g., natural language to SQL,...Local areaFlexible hours$111k - $197k
...Lead Systems Architect At Wells Fargo, we are looking for talented people who will put our customers at the center of everything we do. We are seeking candidates who embrace diversity, equity, and inclusion in a workplace where everyone feels valued and inspired. Help...Work experience placementRemote workRelocation package$119k - $206k
...platforms like Microsoft 365, developer tooling, and key business systems . You’ll be trusted as a security authority whose work... ...security. About the Role Wells Fargo is seeking a Lead Architect to strengthen and sustain our SaaS security architecture and enterprise...Work experience placementWork at officeVisa sponsorship3 days per week$73.67 per hour
...Job Title: AI Solutions Architect Location: Pennington, NJ or Charlotte, NC Duration: Contract... ...value. Design and guide delivery of AI/LLM solutions, including orchestration, RAG,... ...agentic AI solutions, including event-driven systems. Experience integrating AI agents with...Contract workRelocation3 days per week$105.1k - $202.73k
...excellence. The successful candidate will architect and implement an advanced support model... ..., orchestration frameworks, copilots, LLM integrations, AIOps platforms, and automation... ...structures, ticket patterns, monitoring systems, and service management processes....Live inWork at officeLocal areaRemote workNight shift- ..., mapping guidance, and transformation standards for enterprise systems Develop canonical transformation patterns and reference implementation... ...eliminate conflicting definitions and shadow models AI / LLM Integration Design AI-ready canonical datasets and semantic...Contract workRemote workVisa sponsorship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM Inference & GPU Systems Architect. Be the first to apply!


