Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Principal AI Architect/Engineer

$123.5k - $206.75k

Pepsi Bottling Group

Overview

The AI Observability Architect is a senior technical leader responsible for designing, deploying, and operating an enterprise-grade, production-ready AI observability platform that spans the full spectrum of modern agentic AI - from large language model (LLM) workflows and multi-agent orchestration to physical AI systems, reinforcement learning harnesses, multi-modal pipelines, and agentic marketplaces. This role serves as the strategic and engineering authority for end-to-end telemetry, tracing, safety, and quality signals across heterogeneous agent frameworks and platforms.

The architect leads the convergence of AI observability with safety & security (including red teaming), Responsible AI (RAI), data science, physical AI, memory/skills engineering, agent fleet management, self-evolving harnesses, reinforcement learning, agent-to-agent protocols (A2A, UCP, AP2), and continuous quality engineering - making this a uniquely broad and high-impact role within the AI Solutions & Platforms organization.

The role also owns OpenTelemetry (OTEL) integration across third-party agentic platforms (Salesforce AgentForce, ServiceNow, Microsoft Agent 365, and others), enabling unified observability and governance at enterprise scale.

Responsibilities

Agentic AI Observability Architecture at Scale (30%)
  • Define and own the enterprise observability architecture for AI agents, LLMs, multi-agent workflows, and physical AI systems - covering planner/executor loops, tool/function calls, RAG retrieval chains, and memory/state transitions.
  • Build and operate unified telemetry pipelines incorporating metrics, logs, distributed traces, semantic/vector signals, and real-time event streaming (Kafka) at enterprise scale.
  • Instrument OpenTelemetry (OTEL) across heterogeneous platforms including Salesforce AgentForce, ServiceNow, Microsoft Agent 365, and internal frameworks - delivering protocol-level observability for agent ecosystems including MCP, A2A, UCP, and AP2.
  • Design and implement observability for Agent Fleets, multi-modal pipelines, physical AI systems, and self-evolving reinforcement learning harnesses - including signal capture for reward shaping and policy evaluation.
  • Deliver dashboards, alerting, SLO/SLA management, incident runbook automation, and RCA tooling that drive measurable reliability improvements and reduce MTTR across agentic services.
  • Establish cost telemetry and FinOps observability for AI workloads - token consumption, inference cost allocation, and GPU/compute efficiency across cloud environments (Azure, AWS, GCP).
Safety, Security & Red Teaming (15%)
  • Lead observability-driven red team exercises targeting agentic AI systems - instrumenting attack surfaces, adversarial prompt injection vectors, model evasion attempts, and multi-agent trust boundary failures.
  • Design telemetry pipelines that capture safety-critical signals: guardrail trigger rates, policy violation events, PII exposure risks, prompt leakage, and agent hallucination rates.
  • Partner with Security and RAI teams to embed threat modeling, zero-trust agent authentication, and behavioral anomaly detection into the observability platform.
  • Instrument secure policy enforcement layers across agent-to-agent communication protocols (A2A, UCP, AP2) and maintain audit-ready traceability for all AI decision events.
  • Develop and maintain a Security Observability Playbook covering incident classification, escalation paths, and forensic trace retention policies for agentic AI systems.
Responsible AI (RAI) & Governance (10%)
  • Integrate RAI signal capture - fairness, bias detection, explainability, and safety metrics - directly into observability pipelines, making compliance measurable and audit-ready.
  • Deliver governance dashboards that surface RAI compliance posture across all active AI agents and LLM deployments, aligned with global regulatory standards.
  • Support risk assessments, gap analyses, and governance frameworks with real-time observability insights - enabling proactive risk mitigation rather than reactive audit responses.
  • Collaborate with RAI CoE and Legal/Compliance teams to define data retention, consent logging, and model decision traceability standards embedded in the telemetry architecture.
Quality Engineering for Agentic Solutions - Post Go-Live & Continuous QE (10%)
  • Own the Continuous Quality Engineering (CQE) framework for post-production agentic solutions - defining and tracking quality metrics across accuracy, latency, agent success rate, tool-call fidelity, and user outcome measures.
  • Build automated quality gates within CI/CD pipelines that leverage observability data to detect regressions, drift, and degradation in agent performance - preventing silent failures in production.
  • Instrument and monitor Skill Evaluations (evals) across the Memory, Skills, and MCP harness stack - providing traceability from eval results to production behavior.
  • Partner with product and business stakeholders to define SLA-backed quality benchmarks and deliver automated alerting when quality thresholds are breached.
  • Drive root-cause analysis for quality failures using distributed trace data, enabling rapid iteration and continuous improvement cycles for agentic solutions.
Memory, Skills, MCP & Harness Engineering Observability (10%)
  • Design and implement observability for the agent memory layer - episodic, semantic, and working memory read/write operations - providing latency, accuracy, and drift monitoring across memory backends.
  • Instrument MCP (Model Context Protocol) server interactions, tool registrations, skill invocations, and context injection pipelines with full trace propagation and semantic tagging.
  • Own observability for self-evolving harness and reinforcement learning (RL) systems - capturing reward signals, policy update events, environment state transitions, and learning convergence metrics.
  • Monitor harness execution fidelity, skill eval pass/fail rates, and regression signals across training, fine-tuning, and inference workflows - feeding data back into the quality engineering loop.
Data Science Observability & Hardcore Python Engineering (5%)
  • Lead a team of senior Python engineers building high-performance, production-grade observability tooling - including custom OTEL exporters, semantic trace enrichers, signal aggregators, and anomaly detection pipelines.
  • Apply data science methods - statistical process control, time-series anomaly detection, clustering, and causal inference - to transform raw telemetry into actionable AI operational intelligence.
  • Build and maintain Python-native SDKs and libraries that simplify observability onboarding for agent developers across the organization.
  • Establish code quality standards, testing frameworks, and peer review practices for the observability engineering team - embedding software craftsmanship into the team culture.
Agentic Marketplace, Registry & Ecosystem Observability (5%)
  • Instrument the Agentic Marketplace and Agent Registry platforms - providing usage telemetry, adoption metrics, capability health scores, and dependency mapping for registered agents and skills.
  • Design observability APIs and SDK hooks that allow marketplace-registered agents to self-report health, performance, and behavioral signals into the central observability platform.
  • Monitor inter-agent communication patterns across the marketplace ecosystem - identifying latency hotspots, circular dependencies, and protocol mismatches in agent-to-agent (A2A) workflows.
  • Deliver a Marketplace Observability Dashboard surfacing agent catalog health, adoption trends, quality scores, and incident history - supporting marketplace governance and curation decisions.
Integration, Deployment & CI/CD Automation (5%)
  • Build and maintain CI/CD pipelines for observability services and agent operations center components, incorporating automated testing, deployment gates, and rollback mechanisms.
  • Automate onboarding for new agent use cases using templates, scaffolding, and configuration validation - reducing time-to-observability from weeks to hours.
  • Drive infrastructure-as-code (IaC) practices for observability platform components across Azure, AWS, and GCP - ensuring reproducible, version-controlled, and auditable deployments.
Product Delivery & Stakeholder Collaboration (10%)
  • Operate with a product mindset - defining observability platform roadmaps, OKRs, adoption playbooks, and release milestones in partnership with AI platform and business teams.
  • Collaborate with transformation teams, enterprise architects, security, and business stakeholders to tailor observability solutions to domain-specific requirements.
  • Serve as the technical authority in executive and governance forums - translating complex observability data into business-relevant insights on risk, cost, and AI performance.
  • Partner with SRE, AI platform, and product teams to drive standard adoption and reduce integration friction across the agentic AI ecosystem.
People Leadership & Team Development (5%)
  • Build, mentor, and lead a high-performing observability engineering team - spanning Python developers, data scientists, and platform engineers - with talent initially based in India.
  • Define career paths, skills development plans, and leveling criteria aligned with PepsiCo job architecture - fostering an inclusive, high-accountability team culture.
  • Drive hiring, coaching, performance management, and succession planning across the observability function.
Decision-Making Autonomy
  • High - Owns architecture decisions, platform roadmap, and engineering standards. Strategic alignment sought from AI Solutions Director on enterprise-level commitments.
Supervision Required
  • Low to Moderate - Operates independently with periodic alignment reviews. Proactively escalates cross-organizational dependencies and risk trade-offs.
Role Complexity
  • Very High - Spans observability, safety/security, RL harnesses, physical AI, multi-modal systems, agent protocols, quality engineering, and marketplace governance simultaneously
Compensation and Benefits:
  • The expected compensation range for this position is between $123,500 - $206,750.
  • Location, confirmed job-related skills, experience, and education will be considered in setting actual starting salary. Your recruiter can share more about the specific salary range during the hiring process.
  • Bonus based on performance and eligibility target payout is 15% of annual salary paid out annually.
  • Paid time off subject to eligibility, including paid parental leave, vacation, sick, and bereavement.
  • In addition to salary, PepsiCo offers a comprehensive benefits package to support our employees and their families, subject to elections and eligibility: Medical, Dental, Vision, Disability, Health, and Dependent Care Reimbursement Accounts, Employee Assistance Program (EAP), Insurance (Accident, Group Legal, Life), Defined Contribution Retirement Plan.
Qualifications

Minimum Education & Experience:
  • Bachelor's or Master's degree in Computer Science, AI/ML, Data Science, Software Engineering, or a related field (PhD a plus for research-heavy domains).
  • 12+ years in technology with deep experience in enterprise observability, distributed systems, platform engineering, or AI/ML infrastructure.
  • 5+ years in a senior/principal or architect-level role with demonstrated ownership of complex, cross-functional technical programs.
Core Technical Qualifications
  • AI Observability & Distributed Systems: Expert-level knowledge of observability primitives (metrics, logs, traces, events) applied to LLM/ML/agentic systems; hands-on OpenTelemetry (OTEL) instrumentation including custom exporters, semantic conventions, and trace propagation across agent/tool boundaries.
  • Agentic AI Frameworks: Direct experience with agentic AI platforms, multi-agent orchestration, LLM-based workflow design, and agent lifecycle management at production scale.
  • Safety, Security & Red Teaming: Demonstrated experience conducting red team exercises against AI systems; knowledge of adversarial attack patterns, prompt injection, model evasion, and multi-agent trust boundary failures; ability to design safety telemetry pipelines.
  • Memory, Skills & MCP: Working knowledge of agent memory architectures (episodic, semantic, working memory), Model Context Protocol (MCP), skill registries, and context injection patterns - with ability to design observability for these layers.
  • Agent-to-Agent Protocols: Familiarity with A2A (Agent-to-Agent), UCP (Universal Communication Protocol), and AP2 patterns; ability to implement protocol-level observability and policy enforcement.
  • Reinforcement Learning & Self-Evolving Harnesses: Understanding of RL training loops, reward signal capture, policy evaluation, and harness instrumentation for continuously improving agent systems.
  • Physical AI & Multi-Modal Systems: Experience or strong familiarity with observability for physical AI pipelines (robotics, edge inference, sensor fusion) and multi-modal models (vision, audio, text).
  • Data Science & Python Engineering: Proficiency in Python at a senior engineering level; experience with statistical anomaly detection, time-series analysis, and data pipeline design applied to observability data at scale.
  • Platform Integrations (OTEL / Enterprise): Hands-on experience integrating OTEL with enterprise agentic platforms including Salesforce AgentForce, ServiceNow, Microsoft Agent 365, or similar; strong understanding of enterprise integration patterns and API design.
  • Cloud & Infrastructure: Cloud fluency across Azure, AWS, and GCP; proficiency in Kubernetes, service mesh, IaC (Terraform/Bicep), and CI/CD tooling; experience with event streaming platforms (Kafka, Event Hubs).
  • Quality Engineering for AI: Experience designing continuous quality frameworks (CQE) for agentic solutions including eval harnesses, regression detection, quality gates, and SLA-backed quality benchmarking.
  • Responsible AI (RAI): Familiarity with RAI principles - fairness, bias detection, explainability, and safety - and ability to operationalize RAI signal capture within production observability pipelines.
  • Agentic Marketplace & Registry: Experience or strong familiarity with agent marketplace architectures, capability registries, and platform governance - ideally with observability or monitoring responsibilities for marketplace-registered components.
Preferred / Differentiating Technical Skills
  • Published contributions or hands-on experience with emerging agent frameworks (LangGraph, AutoGen, CrewAI, Semantic Kernel, Bedrock Agents, or equivalent).
  • Experience with Grafana, Datadog, New Relic, Dynatrace, or equivalent enterprise observability platforms - ideally extended to support AI/LLM workloads.
  • Familiarity with vector databases (Pinecone, Weaviate, pgvector) and semantic search observability patterns relevant to RAG pipelines.
  • Background in MLOps, LLMOps, or model lifecycle management - including model versioning, drift detection, and deployment governance.
  • Experience designing observability APIs and SDK hooks for developer self-service onboarding.
  • Differentiating Competencies Required - Translates enterprise AI strategy into observability architecture that simultaneously enables governance, safety, quality, and scale - holding the full picture across deeply technical and business dimensions.
  • Safety-First Engineering Mindset - Instinctively designs systems with security, adversarial resilience, and RAI compliance as first-class requirements - not retrofitted features. Leads red team exercises with intellectual rigor and operational discipline.
  • Outcome & Quality Orientation - Drives measurable impact: reduced MTTR, audit readiness, SLA adherence, agent quality scores, and RL harness convergence - translating telemetry data into business-relevant results.
  • Cross-Functional Influencing - Navigates complex organizational dynamics - aligning engineering, governance, security, data science, and business units around shared observability standards and practices.
  • Governance by Design - Integrates RAI, compliance, and security controls into design decisions from inception - producing systems that are audit-ready by default, not by remediation
  • Technical Leadership Presence - Commands credibility in both executive and deep-technical forums; able to shift fluidly between C-suite communication and whiteboard architecture sessions with engineers.
  • Adaptability & Continuous Learning - Thrives in a rapidly evolving AI landscape; quickly absorbs and operationalizes new frameworks, protocols, and research - from emerging agent communication standards to novel RL paradigms.
  • Python Engineering Excellence - Holds a high bar for Python code quality, software craftsmanship, testing discipline, and developer experience - modeling best practices for the engineering team.

>

Our Company will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Credit Reporting Act, and all other applicable laws, including but not limited to, San Francisco Police Code Sections 4901-4919, commonly referred to as the San Francisco Fair Chance Ordinance; and Chapter XVII, Article 9 of the Los Angeles Municipal Code, commonly referred to as the Fair Chance Initiative for Hiring Ordinance.


All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status.


PepsiCo is an Equal Opportunity Employer: Female / Minority / Disability / Protected Veteran / Sexual Orientation / Gender Identity / Age


If you'd like more information about your EEO rights as an applicant under the law, please download the available EEO is the Law & EEO is the Law Supplement documents. View PepsiCo EEO Policy .


Please view our Pay Transparency Statement .
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Senior Principal AI Architect/Engineer in Plano, TX vacancy
  • Highbrow LLC is seeking a Hands-On Architect/Principal Software Engineer to lead AI/ML software solutions in Frisco, TX. The ideal candidate should have over 10 years of experience in software development and be proficient in Python, AWS, and GCP. The role emphasizes coding... 
    Principal

    Highbrow LLC

    Frisco, TX
    3 days ago
  •  ...Overview We are seeking a Senior AI Engineer to define and drive the end-to-end engineering of an enterprise-grade agentic orchestration capability that enables smart AI agents to autonomously execute workflows, collaborate with humans, and operate securely with... 
    Senior

    PepsiCo

    Plano, TX
    4 days ago
  •  ...Senior Principal Cybersecurity Architect Come on board with an iconic financial institution and take your career to the next level. You have found...  ...target-state architecture decisions Provide deep data engineering expertise and work across agile teams to enhance,... 
    Principal
    Senior

    Chase

    Plano, TX
    1 day ago
  • $110.7k - $185.25k

     ...Overview The AI Platform/Observability Architect is an execution-focused engineer who designs, builds, and operates observability capabilities within a defined...  ...platform. Working under the strategic direction of the Senior AI Observability Architect (L11), this role... 
    Principal

    PepsiCo

    Plano, TX
    4 days ago
  •  ...building the platforms, data products, and AI capabilities that give CBRE and its...  ...revenue impact. Role Summary The Senior AI/ML Engineer is a hands-on technical practitioner responsible...  .... Conversational AI Development Architect and fine-tune intelligent virtual... 
    Principal
    Senior
    Day shift

    CBRE

    Richardson, TX
    1 day ago
  •  ...Senior Principal Cybersecurity Architect Come on board with an iconic financial institution and take your career to the next level. You have found the...  ...to bring together talent that will consistently create AI-enabled solutions, processes, and reusable proof-of-concept... 
    Principal
    Senior

    Chase

    Plano, TX
    5 days ago
  •  ...Senior Principal Architect You're a pro who wants to influence the future of technical architecture...  ...influence, collaborating across product, engineering, operations, and business teams. You...  ...technologies (e.g., real-time payments, AI/ML in fraud detection, blockchain),... 
    Principal
    Senior
    Worldwide

    Chase

    Plano, TX
    5 days ago
  • NTT DATA is seeking an experienced SAP AI Lead Developer to define the technical vision and deliver AI/ML-enabled solutions across SAP platforms. You will collaborate with stakeholders to ensure business requirements are met and lead the implementation of scalable AI solutions... 
    Senior

    NTT DATA

    Plano, TX
    1 day ago
  •  ...Principal Enterprise Solution Architect The Principal Enterprise Solution Architect is the senior-most architectural authority responsible for defining...  ...integrations, and AI/ML workloads. You will...  ...executives, product leaders, engineering teams, and customers to translate... 
    Principal

    WIS International

    Plano, TX
    2 days ago
  •  ...rare opportunity to operate at the intersection of deep technical cybersecurity expertise and enterprise-level risk strategy. As a senior technical authority, you will shape how the firm evaluates and manages cybersecurity risk across its most strategically significant... 
    Principal

    Chase

    Plano, TX
    3 days ago
  •  ...Principal Cybersecurity Architect Take your engineering expertise to new heights by joining a team of exceptionally talented professionals and solidify your...  ...Ability to present and effectively communicate with senior leaders and executives Understanding of the business... 
    Principal

    Chase

    Plano, TX
    20 days ago
  • JPMorgan Chase & Co. is seeking a Senior Principal Cybersecurity Architect to lead cybersecurity strategy across multiple products and technologies. This role requires deep experience in cybersecurity architecture and the ability to drive impactful innovation. The ideal... 
    Principal

    JPMorgan Chase & Co.

    Plano, TX
    4 days ago
  •  ...Principal Domain Architect Drive the transformation of Consumer & Community Banking...  ...technologies, including AI, while upholding the...  ...compliance. Collaborate with senior leaders to shape strategic...  ...modeling workshops with Product, Engineering, Ops, Risk/Controls, and... 
    Principal

    Chase

    Plano, TX
    5 days ago
  •  ...Welcome! Service Experts is seeking a Principal Architect to lead the architectural strategy,...  ...platform enables analytics, reporting, and AI/ML workloads at enterprise scale....  ...Platform, including L1-L3 support, data engineering, analytics, and platform operations teams... 
    Principal

    Service Experts

    Richardson, TX
    3 days ago
  • TCC Toyota Motor Credit Corporation Company is seeking a highly motivated Sr. Principal Engineer in Plano, TX. In this role, you will lead technical contributions, drive innovation, and mentor engineering talent while collaborating with diverse teams. The ideal candidate... 
    Principal
    Senior

    TCC Toyota Motor Credit Corporation Company

    Plano, TX
    1 day ago
  • $170k - $200k

    Dormont Manufacturing Co is looking for a hybrid Principal Architect for Automation & Orchestration. In this role, you will lead the design...  ...should have over 10 years of experience in Network Engineering, expertise in automation frameworks like Python and Terraform... 
    Principal

    Dormont Manufacturing Co

    Plano, TX
    1 day ago
  •  ...Bright Vision Technologies is seeking an Edge AI Engineer to design, optimize, and deploy machine learning models efficiently on edge devices. This remote position offers a full-time, direct W2 engagement with a company recognized for its innovative software solutions.... 
    Senior
    Full time
    Remote work

    Bright Vision Technologies

    Frisco, TX
    1 day ago
  • A legal tech company is hiring a Senior Software Engineer - AI in Frisco, Texas. The role involves designing and implementing AI-powered solutions for legal workflows and requires extensive software engineering expertise, particularly in Python and AI/ML applications. With... 
    Senior
    Remote work
    Flexible hours

    Refinitiv

    Frisco, TX
    5 days ago
  •  ...Principal Architect Step into the role of a Principal Architect at JPMorganChase and become a driving force behind the development and adoption...  ...~ Ability to present and effectively communicate to Senior Leaders and Executives ~ Strategic mindset in acquiring payments... 
    Principal

    Chase

    Plano, TX
    5 days ago
  •  ...Principal Cybersecurity Architect Take your engineering expertise to new heights by joining a team of exceptionally talented professionals and solidify your...  ...communicate effectively and present technical concepts to senior leaders and executives. Proven track record of... 
    Principal

    Chase

    Plano, TX
    5 days ago
  •  ...A leading mortgage company is seeking a Senior Encompass Administrator in Plano, TX. This role involves analyzing and maintaining the loan origination system while collaborating with various departments. Ideal candidates will have a strong analytical background, extensive... 
    Senior

    Benchmark Mortgage

    Plano, TX
    3 days ago
  •  ...Principal Architect Step into the role of a Principal Architect at JPMorganChase and become a driving force behind the development and adoption...  ...Ability to present and effectively communicate to Senior Leaders and Executives Required qualifications, capabilities... 
    Principal

    Chase

    Plano, TX
    5 days ago
  •  ...memory and semiconductor technologies. As an HBM Memory Design Engineer within the HBM Architecture Team, you will design, simulate, and...  ...mixed-signal architectures supporting Machine Learning and AI applications. Responsibilities Design and analyze digital,... 
    Principal
    Local area
    Immediate start

    Micron Technology

    Richardson, TX
    4 days ago
  • $119.4k - $136.2k

    A leading financial services firm is seeking a Principal Associate, Product Designer to join their Experience Design team. The role combines expertise in product design with customer understanding and business strategy to deliver innovative solutions for complex problems... 
    Principal
    Senior

    Capital One National Association

    Plano, TX
    4 days ago
  • $146k - $309k

    Micron Technology, Inc in Richardson, Texas, is seeking a Principal SoC DFT Engineer to define and implement DFT architecture for complex HBM base-die SoC designs. The role requires close collaboration with RTL design, verification, and product engineering to optimize... 
    Principal

    Micron Technology, Inc

    Richardson, TX
    1 day ago
  • $123.5k - $206.75k

    PepsiCo is seeking a Principal Product Manager in Plano, Texas, to lead product strategy across supply chain platforms. This is a high-visibility role responsible for defining roadmaps, shaping architectural decisions, and driving outcomes across complex ecosystems. The... 
    Principal
    Senior

    PepsiCo

    Plano, TX
    2 days ago
  •  ...leading semiconductor company is seeking a distinguished Sr. Principal GaN Scientist to shape its device technology roadmap. The ideal...  ...delivering technical discussions and collaborating closely with engineering and business stakeholders. The company values inclusivity and... 
    Principal
    Senior

    Qorvo, Inc.

    Richardson, TX
    4 days ago
  • $204.25k - $285k

     ...Take your engineering expertise to new heights by joining a team of exceptionally talented...  ...performers in the industry. As a Principal Cybersecurity Architect at JPMorganChase within the...  ...present and effectively communicate with senior leaders and executives... 
    Principal

    JPMorgan Chase Bank, N.A.

    Plano, TX
    8 days ago
  • Prattwhitney is looking for a Senior Principal Engineer in Plano, Texas, to architect and maintain enterprise infrastructure platforms for defense programs. This role requires extensive engineering experience, particularly with Linux and Windows operating systems. The ideal... 
    Principal
    Senior
    Relocation package

    Prattwhitney

    Plano, TX
    3 days ago
  • A leading global financial services firm is seeking a Principal Architect for IAM within their Cybersecurity & Technology Controls organization. You will leverage your expertise to enhance architecture platforms, design scalable solutions on cloud architectures, and lead... 
    Principal

    JPMorgan Chase & Co.

    Plano, TX
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Principal AI Architect/Engineer. Be the first to apply!