Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Principal AI Architect/Engineer

$123.5k - $206.75k

Pepsi Bottling Group

Overview The AI Observability Architect is a senior technical leader responsible for designing, deploying, and operating an enterprise-grade, production-ready AI observability platform that spans the full spectrum of modern agentic AI — from large language model (LLM) workflows and multi-agent orchestration to physical AI systems, reinforcement learning harnesses, multi-modal pipelines, and agentic marketplaces. This role serves as the strategic and engineering authority for end-to-end telemetry, tracing, safety, and quality signals across heterogeneous agent frameworks and platforms. The architect leads the convergence of AI observability with safety & security (including red teaming), Responsible AI (RAI), data science, physical AI, memory/skills engineering, agent fleet management, self-evolving harnesses, reinforcement learning, agent-to-agent protocols (A2A, UCP, AP2), and continuous quality engineering — making this a uniquely broad and high-impact role within the AI Solutions & Platforms organization. The role also owns OpenTelemetry (OTEL) integration across third-party agentic platforms (Salesforce AgentForce, ServiceNow, Microsoft Agent 365, and others), enabling unified observability and governance at enterprise scale. Responsibilities Agentic AI Observability Architecture at Scale (30%) Define and own the enterprise observability architecture for AI agents, LLMs, multi-agent workflows, and physical AI systems — covering planner/executor loops, tool/function calls, RAG retrieval chains, and memory/state transitions. Build and operate unified telemetry pipelines incorporating metrics, logs, distributed traces, semantic/vector signals, and real-time event streaming (Kafka) at enterprise scale. Instrument OpenTelemetry (OTEL) across heterogeneous platforms including Salesforce AgentForce, ServiceNow, Microsoft Agent 365, and internal frameworks — delivering protocol-level observability for agent ecosystems including MCP, A2A, UCP, and AP2. Design and implement observability for Agent Fleets, multi-modal pipelines, physical AI systems, and self-evolving reinforcement learning harnesses — including signal capture for reward shaping and policy evaluation. Deliver dashboards, alerting, SLO/SLA management, incident runbook automation, and RCA tooling that drive measurable reliability improvements and reduce MTTR across agentic services. Establish cost telemetry and FinOps observability for AI workloads — token consumption, inference cost allocation, and GPU/compute efficiency across cloud environments (Azure, AWS, GCP). Safety, Security & Red Teaming (15%) Lead observability-driven red team exercises targeting agentic AI systems — instrumenting attack surfaces, adversarial prompt injection vectors, model evasion attempts, and multi-agent trust boundary failures. Design telemetry pipelines that capture safety-critical signals: guardrail trigger rates, policy violation events, PII exposure risks, prompt leakage, and agent hallucination rates. Partner with Security and RAI teams to embed threat modeling, zero-trust agent authentication, and behavioral anomaly detection into the observability platform. Instrument secure policy enforcement layers across agent-to-agent communication protocols (A2A, UCP, AP2) and maintain audit-ready traceability for all AI decision events. Develop and maintain a Security Observability Playbook covering incident classification, escalation paths, and forensic trace retention policies for agentic AI systems. Responsible AI (RAI) & Governance (10%) Integrate RAI signal capture — fairness, bias detection, explainability, and safety metrics — directly into observability pipelines, making compliance measurable and audit-ready. Deliver governance dashboards that surface RAI compliance posture across all active AI agents and LLM deployments, aligned with global regulatory standards. Support risk assessments, gap analyses, and governance frameworks with real-time observability insights — enabling proactive risk mitigation rather than reactive audit responses. Collaborate with RAI CoE and Legal/Compliance teams to define data retention, consent logging, and model decision traceability standards embedded in the telemetry architecture. Quality Engineering for Agentic Solutions — Post Go-Live & Continuous QE (10%) Own the Continuous Quality Engineering (CQE) framework for post-production agentic solutions — defining and tracking quality metrics across accuracy, latency, agent success rate, tool-call fidelity, and user outcome measures. Build automated quality gates within CI/CD pipelines that leverage observability data to detect regressions, drift, and degradation in agent performance — preventing silent failures in production. Instrument and monitor Skill Evaluations (evals) across the Memory, Skills, and MCP harness stack — providing traceability from eval results to production behavior. Partner with product and business stakeholders to define SLA-backed quality benchmarks and deliver automated alerting when quality thresholds are breached. Drive root-cause analysis for quality failures using distributed trace data, enabling rapid iteration and continuous improvement cycles for agentic solutions. Memory, Skills, MCP & Harness Engineering Observability (10%) Design and implement observability for the agent memory layer — episodic, semantic, and working memory read/write operations — providing latency, accuracy, and drift monitoring across memory backends. Instrument MCP (Model Context Protocol) server interactions, tool registrations, skill invocations, and context injection pipelines with full trace propagation and semantic tagging. Own observability for self-evolving harness and reinforcement learning (RL) systems — capturing reward signals, policy update events, environment state transitions, and learning convergence metrics. Monitor harness execution fidelity, skill eval pass/fail rates, and regression signals across training, fine-tuning, and inference workflows — feeding data back into the quality engineering loop. Data Science Observability & Hardcore Python Engineering (5%) Lead a team of senior Python engineers building high-performance, production-grade observability tooling — including custom OTEL exporters, semantic trace enrichers, signal aggregators, and anomaly detection pipelines. Apply data science methods — statistical process control, time-series anomaly detection, clustering, and causal inference — to transform raw telemetry into actionable AI operational intelligence. Build and maintain Python-native SDKs and libraries that simplify observability onboarding for agent developers across the organization. Establish code quality standards, testing frameworks, and peer review practices for the observability engineering team — embedding software craftsmanship into the team culture. Agentic Marketplace, Registry & Ecosystem Observability (5%) Instrument the Agentic Marketplace and Agent Registry platforms — providing usage telemetry, adoption metrics, capability health scores, and dependency mapping for registered agents and skills. Design observability APIs and SDK hooks that allow marketplace-registered agents to self-report health, performance, and behavioral signals into the central observability platform. Monitor inter-agent communication patterns across the marketplace ecosystem — identifying latency hotspots, circular dependencies, and protocol mismatches in agent-to-agent (A2A) workflows. Deliver a Marketplace Observability Dashboard surfacing agent catalog health, adoption trends, quality scores, and incident history — supporting marketplace governance and curation decisions. Integration, Deployment & CI/CD Automation (5%) Build and maintain CI/CD pipelines for observability services and agent operations center components, incorporating automated testing, deployment gates, and rollback mechanisms. Automate onboarding for new agent use cases using templates, scaffolding, and configuration validation — reducing time-to-observability from weeks to hours. Drive infrastructure-as-code (IaC) practices for observability platform components across Azure, AWS, and GCP — ensuring reproducible, version-controlled, and auditable deployments. Product Delivery & Stakeholder Collaboration (10%) Operate with a product mindset — defining observability platform roadmaps, OKRs, adoption playbooks, and release milestones in partnership with AI platform and business teams. Collaborate with transformation teams, enterprise architects, security, and business stakeholders to tailor observability solutions to domain-specific requirements. Serve as the technical authority in executive and governance forums — translating complex observability data into business-relevant insights on risk, cost, and AI performance. Partner with SRE, AI platform, and product teams to drive standard adoption and reduce integration friction across the agentic AI ecosystem. People Leadership & Team Development (5%) Build, mentor, and lead a high-performing observability engineering team — spanning Python developers, data scientists, and platform engineers — with talent initially based in India. Define career paths, skills development plans, and leveling criteria aligned with PepsiCo job architecture — fostering an inclusive, high-accountability team culture. Drive hiring, coaching, performance management, and succession planning across the observability function. Decision-Making Autonomy High — Owns architecture decisions, platform roadmap, and engineering standards. Strategic alignment sought from AI Solutions Director on enterprise-level commitments. Supervision Required Low to Moderate — Operates independently with periodic alignment reviews. Proactively escalates cross-organizational dependencies and risk trade-offs. Role Complexity Very High — Spans observability, safety/security, RL harnesses, physical AI, multi-modal systems, agent protocols, quality engineering, and marketplace governance simultaneously Compensation and Benefits: The expected compensation range for this position is between $123,500 - $206,750. Location, confirmed job-related skills, experience, and education will be considered in setting actual starting salary. Your recruiter can share more about the specific salary range during the hiring process. Bonus based on performance and eligibility target payout is 15% of annual salary paid out annually. Paid time off subject to eligibility, including paid parental leave, vacation, sick, and bereavement. In addition to salary, PepsiCo offers a comprehensive benefits package to support our employees and their families, subject to elections and eligibility: Medical, Dental, Vision, Disability, Health, and Dependent Care Reimbursement Accounts, Employee Assistance Program (EAP), Insurance (Accident, Group Legal, Life), Defined Contribution Retirement Plan. Qualifications Minimum Education & Experience: Bachelor's or Master's degree in Computer Science, AI/ML, Data Science, Software Engineering, or a related field (PhD a plus for research-heavy domains). 12+ years in technology with deep experience in enterprise observability, distributed systems, platform engineering, or AI/ML infrastructure. 5+ years in a senior/principal or architect-level role with demonstrated ownership of complex, cross-functional technical programs. Core Technical Qualifications AI Observability & Distributed Systems: Expert-level knowledge of observability primitives (metrics, logs, traces, events) applied to LLM/ML/agentic systems; hands-on OpenTelemetry (OTEL) instrumentation including custom exporters, semantic conventions, and trace propagation across agent/tool boundaries. Agentic AI Frameworks: Direct experience with agentic AI platforms, multi-agent orchestration, LLM-based workflow design, and agent lifecycle management at production scale. Safety, Security & Red Teaming: Demonstrated experience conducting red team exercises against AI systems; knowledge of adversarial attack patterns, prompt injection, model evasion, and multi-agent trust boundary failures; ability to design safety telemetry pipelines. Memory, Skills & MCP: Working knowledge of agent memory architectures (episodic, semantic, working memory), Model Context Protocol (MCP), skill registries, and context injection patterns — with ability to design observability for these layers. Agent-to-Agent Protocols: Familiarity with A2A (Agent-to-Agent), UCP (Universal Communication Protocol), and AP2 patterns; ability to implement protocol-level observability and policy enforcement. Reinforcement Learning & Self-Evolving Harnesses: Understanding of RL training loops, reward signal capture, policy evaluation, and harness instrumentation for continuously improving agent systems. Physical AI & Multi-Modal Systems: Experience or strong familiarity with observability for physical AI pipelines (robotics, edge inference, sensor fusion) and multi-modal models (vision, audio, text). Data Science & Python Engineering: Proficiency in Python at a senior engineering level; experience with statistical anomaly detection, time-series analysis, and data pipeline design applied to observability data at scale. Platform Integrations (OTEL / Enterprise): Hands-on experience integrating OTEL with enterprise agentic platforms including Salesforce AgentForce, ServiceNow, Microsoft Agent 365, or similar; strong understanding of enterprise integration patterns and API design. Cloud & Infrastructure: Cloud fluency across Azure, AWS, and GCP; proficiency in Kubernetes, service mesh, IaC (Terraform/Bicep), and CI/CD tooling; experience with event streaming platforms (Kafka, Event Hubs). Quality Engineering for AI: Experience designing continuous quality frameworks (CQE) for agentic solutions including eval harnesses, regression detection, quality gates, and SLA-backed quality benchmarking. Responsible AI (RAI): Familiarity with RAI principles — fairness, bias detection, explainability, and safety — and ability to operationalize RAI signal capture within production observability pipelines. Agentic Marketplace & Registry: Experience or strong familiarity with agent marketplace architectures, capability registries, and platform governance — ideally with observability or monitoring responsibilities for marketplace-registered components. Preferred / Differentiating Technical Skills Published contributions or hands-on experience with emerging agent frameworks (LangGraph, AutoGen, CrewAI, Semantic Kernel, Bedrock Agents, or equivalent). Experience with Grafana, Datadog, New Relic, Dynatrace, or equivalent enterprise observability platforms — ideally extended to support AI/LLM workloads. Familiarity with vector databases (Pinecone, Weaviate, pgvector) and semantic search observability patterns relevant to RAG pipelines. Background in MLOps, LLMOps, or model lifecycle management — including model versioning, drift detection, and deployment governance. Experience designing observability APIs and SDK hooks for developer self-service onboarding. Differentiating Competencies Required - Translates enterprise AI strategy into observability architecture that simultaneously enables governance, safety, quality, and scale — holding the full picture across deeply technical and business dimensions. Safety-First Engineering Mindset - Instinctively designs systems with security, adversarial resilience, and RAI compliance as first-class requirements — not retrofitted features. Leads red team exercises with intellectual rigor and operational discipline. Outcome & Quality Orientation - Drives measurable impact: reduced MTTR, audit readiness, SLA adherence, agent quality scores, and RL harness convergence — translating telemetry data into business-relevant results. Cross-Functional Influencing - Navigates complex organizational dynamics — aligning engineering, governance, security, data science, and business units around shared observability standards and practices. Governance by Design - Integrates RAI, compliance, and security controls into design decisions from inception — producing systems that are audit-ready by default, not by remediation Technical Leadership Presence - Commands credibility in both executive and deep-technical forums; able to shift fluidly between C-suite communication and whiteboard architecture sessions with engineers. Adaptability & Continuous Learning - Thrives in a rapidly evolving AI landscape; quickly absorbs and operationalizes new frameworks, protocols, and research — from emerging agent communication standards to novel RL paradigms. Python Engineering Excellence - Holds a high bar for Python code quality, software craftsmanship, testing discipline, and developer experience — modeling best practices for the engineering team. Our Company will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Credit Reporting Act, and all other applicable laws, including but not limited to, San Francisco Police Code Sections 4901-4919, commonly referred to as the San Francisco Fair Chance Ordinance; and Chapter XVII, Article 9 of the Los Angeles Municipal Code, commonly referred to as the Fair Chance Initiative for Hiring Ordinance.All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status.PepsiCo is an Equal Opportunity Employer: Female / Minority / Disability / Protected Veteran / Sexual Orientation / Gender Identity / AgeIf you'd like more information about your EEO rights as an applicant under the law, please download the available EEO is the Law & EEO is the Law Supplement documents. View PepsiCo EEO Policy .Please view our Pay Transparency Statement . #J-18808-Ljbffr

Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Senior Principal AI Architect/Engineer in Plano, TX vacancy
  • $123.5k - $206.75k

     ...Overview The AI Observability Architect is a senior technical leader responsible for designing, deploying...  ...role serves as the strategic and engineering authority for end-to-end telemetry...  .... ~5+ years in a senior/principal or architect-level role with demonstrated... 
    Principal
    Senior
    Shift work

    PepsiCo

    Plano, TX
    3 days ago
  • Highbrow LLC is seeking a Hands-On Architect/Principal Software Engineer to lead AI/ML software solutions in Frisco, TX. The ideal candidate should have over 10 years of experience in software development and be proficient in Python, AWS, and GCP. The role emphasizes coding... 
    Principal

    Highbrow LLC

    Frisco, TX
    1 day ago
  •  ...Overview We are seeking a Senior AI Engineer to define and drive the end-to-end engineering of an enterprise-grade agentic orchestration capability that enables smart AI agents to autonomously execute workflows, collaborate with humans, and operate securely with... 
    Senior

    PepsiCo

    Plano, TX
    2 days ago
  •  ...Senior Principal Cybersecurity Architect Come on board with an iconic financial institution and take your career to the next level. You have found...  ...target-state architecture decisions Provide deep data engineering expertise and work across agile teams to enhance,... 
    Principal
    Senior

    Chase

    Plano, TX
    4 days ago
  • $110.7k - $185.25k

     ...Overview The AI Platform/Observability Architect is an execution-focused engineer who designs, builds, and operates observability capabilities within a defined...  ...platform. Working under the strategic direction of the Senior AI Observability Architect (L11), this role... 
    Principal

    PepsiCo

    Plano, TX
    2 days ago
  •  ...building the platforms, data products, and AI capabilities that give CBRE and its...  ...revenue impact. Role Summary The Senior AI/ML Engineer is a hands-on technical practitioner responsible...  .... Conversational AI Development Architect and fine-tune intelligent virtual... 
    Principal
    Senior
    Day shift

    CBRE

    Richardson, TX
    3 days ago
  •  ...Senior Principal Cybersecurity Architect Come on board with an iconic financial institution and take your career to the next level. You have found the...  ...to bring together talent that will consistently create AI-enabled solutions, processes, and reusable proof-of-concept... 
    Principal
    Senior

    Chase

    Plano, TX
    3 days ago
  •  ...Senior Principal Architect You're a pro who wants to influence the future of technical architecture...  ...influence, collaborating across product, engineering, operations, and business teams. You...  ...technologies (e.g., real-time payments, AI/ML in fraud detection, blockchain),... 
    Principal
    Senior
    Worldwide

    Chase

    Plano, TX
    3 days ago
  • NTT DATA is seeking an experienced SAP AI Lead Developer to define the technical vision and deliver AI/ML-enabled solutions across SAP platforms. You will collaborate with stakeholders to ensure business requirements are met and lead the implementation of scalable AI solutions... 
    Senior

    NTT DATA

    Plano, TX
    4 days ago
  •  ...TCC Toyota Motor Credit Corporation Company is seeking a highly motivated Sr. Principal Engineer in Plano, TX. In this role, you will lead technical contributions, drive innovation, and mentor engineering talent while collaborating with diverse teams. The ideal candidate... 
    Principal
    Senior

    TCC Toyota Motor Credit Corporation Company

    Plano, TX
    13 hours ago
  • $170k - $200k

     ...Dormont Manufacturing Co is looking for a hybrid Principal Architect for Automation & Orchestration. In this role, you will lead the design...  ...should have over 10 years of experience in Network Engineering, expertise in automation frameworks like Python and Terraform... 
    Principal

    Dormont Manufacturing Company

    Plano, TX
    13 hours ago
  •  ...Infosys Limited is seeking a Senior Principal Technology Architect in Richardson, Texas. The candidate will partner with business stakeholders, oversee requirement elicitation, and drive architectural design that ensures compliance with standards. The role involves leading... 
    Principal

    Infosys

    Richardson, TX
    1 day ago
  • $119.4k - $136.2k

     ...A leading financial services firm is seeking a Principal Associate, Product Designer to join their Experience Design team. The role combines expertise in product design with customer understanding and business strategy to deliver innovative solutions for complex problems... 
    Principal
    Senior

    Capital One National Association

    Plano, TX
    14 hours ago
  •  ...leading semiconductor company is seeking a distinguished Sr. Principal GaN Scientist to shape its device technology roadmap. The ideal...  ...delivering technical discussions and collaborating closely with engineering and business stakeholders. The company values inclusivity and... 
    Principal
    Senior

    Qorvo

    Richardson, TX
    1 day ago
  • $123.5k - $206.75k

     ...PepsiCo is seeking a Principal Product Manager in Plano, Texas, to lead product strategy across supply chain platforms. This is a high-visibility role responsible for defining roadmaps, shaping architectural decisions, and driving outcomes across complex ecosystems. The... 
    Principal
    Senior

    PepsiCo

    Plano, TX
    14 hours ago
  •  ...Koitecc Solutions is looking for a Principal Cybersecurity Architect to enhance and develop architecture platforms for cloud-based technologies....  ...have over 10 years of experience in software and security engineering, with a strong focus on cybersecurity architecture and programming... 
    Principal

    Koitecc Solutions

    Plano, TX
    14 hours ago
  •  ...Principal Enterprise Solution Architect The Principal Enterprise Solution Architect is the senior-most architectural authority responsible for defining...  ...integrations, and AI/ML workloads. You will...  ...executives, product leaders, engineering teams, and customers to translate... 
    Principal

    WIS International

    Plano, TX
    5 days ago
  •  ...rare opportunity to operate at the intersection of deep technical cybersecurity expertise and enterprise-level risk strategy. As a senior technical authority, you will shape how the firm evaluates and manages cybersecurity risk across its most strategically significant... 
    Principal

    Chase

    Plano, TX
    2 days ago
  •  ...PepsiCo Deutschland GmbH in Plano, Texas is seeking a Principal Product Manager to lead the product strategy across supply chain platforms. The role involves owning the product vision, collaborating closely with data science teams, and managing stakeholders to drive measurable... 
    Principal
    Senior

    PepsiCo Deutschland GmbH

    Plano, TX
    1 day ago
  • JPMorgan Chase & Co. is seeking a Senior Principal Cybersecurity Architect to lead cybersecurity strategy across multiple products and technologies. This role requires deep experience in cybersecurity architecture and the ability to drive impactful innovation. The ideal... 
    Principal

    JPMorgan Chase & Co.

    Plano, TX
    2 days ago
  • $170k - $200k

     ...services through global simplicity with trusted transparency. ROLE SUMMARY We are seeking a Principal Architect for Network Services who will understand existing engineering standards and reference architectures. The ideal candidate will guide application teams to... 
    Principal

    EOS

    Plano, TX
    10 days ago
  •  ...Principal Cybersecurity Architect Take your engineering expertise to new heights by joining a team of exceptionally talented professionals and solidify your...  ...Ability to present and effectively communicate with senior leaders and executives Understanding of the business... 
    Principal

    Chase

    Plano, TX
    3 days ago
  • $170k - $200k

     ...and are proud to deliver our services through global simplicity with trusted transparency. WHAT YOU WILL DO We are seeking a Principal Architect to define and lead next‑generation data center architecture spanning physical and logical design, software‑defined infrastructure... 
    Principal

    Dormont Manufacturing Company

    Plano, TX
    13 hours ago
  •  ...TMN Toyota Motor North America Company is looking for a Principal Engineer – Security AI Solutions in Plano, TX. This role focuses on developing AI-assisted applications to automate security workflows and integrate machine learning capabilities into red team pipelines... 
    Principal

    TMN Toyota Motor North America Company

    Plano, TX
    14 hours ago
  •  ...Welcome! Service Experts is seeking a Principal Architect to lead the architectural strategy,...  ...platform enables analytics, reporting, and AI/ML workloads at enterprise scale....  ...Platform, including L1-L3 support, data engineering, analytics, and platform operations teams... 
    Principal

    Service Experts

    Richardson, TX
    1 day ago
  •  ...Principal HBM Design Architect Our vision is to transform how the world uses information to enrich life...  ...semiconductor solutions. As an HBM Memory Design Engineer on the HBM Architecture Team, you...  ...developing modern HBM solutions for AI and ML applications. Responsibilities... 
    Principal
    Local area
    Relocation

    Micron Technology

    Richardson, TX
    14 hours ago
  •  ...Fisher Investments in Plano, Texas is seeking an AI Engineer to design, build, and improve AI solutions that are secure and scalable. This role involves collaborating with AI leads and developers to bring AI capabilities into production. The ideal candidate will have a... 
    Senior

    Fisher Investments

    Plano, TX
    14 hours ago
  •  ...Bright Vision Technologies is seeking an Edge AI Engineer to design, optimize, and deploy machine learning models efficiently on edge devices. This remote position offers a full-time, direct W2 engagement with a company recognized for its innovative software solutions.... 
    Senior
    Full time
    Remote work

    Bright Vision Technologies

    Frisco, TX
    4 days ago
  •  ...This role focuses on advanced Generative AI applications, requiring hands-on experience in prompt engineering, connecting Large Language Models (LLMs), and developing...  ...effective AI model interaction. Design and architect solutions specifically for Generative AI.... 
    Senior

    Prophecy Technologies

    Plano, TX
    1 day ago
  •  ...Micron Technology, Inc is seeking a Principal HBM Design Architect in Richardson, Texas. In this role, you will design, simulate, and optimize advanced...  ...Bandwidth Memory products, collaborating with teams on AI and ML solutions. Successful candidates will have significant... 
    Senior

    Micron Technology

    Richardson, TX
    14 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Principal AI Architect/Engineer. Be the first to apply!