Senior Principal AI Architect/Engineer
$123.5k - $206.75kPepsi Bottling Group
Overview The AI Observability Architect is a senior technical leader responsible for designing, deploying, and operating an enterprise-grade, production-ready AI observability platform that spans the full spectrum of modern agentic AI - from large language model (LLM) workflows and multi-agent orchestration to physical AI systems, reinforcement learning harnesses, multi-modal pipelines, and agentic marketplaces. This role serves as the strategic and engineering authority for end-to-end telemetry, tracing, safety, and quality signals across heterogeneous agent frameworks and platforms. The architect leads the convergence of AI observability with safety & security (including red teaming), Responsible AI (RAI), data science, physical AI, memory/skills engineering, agent fleet management, self-evolving harnesses, reinforcement learning, agent-to-agent protocols (A2A, UCP, AP2), and continuous quality engineering - making this a uniquely broad and high-impact role within the AI Solutions & Platforms organization. The role also owns OpenTelemetry (OTEL) integration across third-party agentic platforms (Salesforce AgentForce, ServiceNow, Microsoft Agent 365, and others), enabling unified observability and governance at enterprise scale. Responsibilities Agentic AI Observability Architecture at Scale (30%)
All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status.
PepsiCo is an Equal Opportunity Employer: Female / Minority / Disability / Protected Veteran / Sexual Orientation / Gender Identity / Age
If you'd like more information about your EEO rights as an applicant under the law, please download the available EEO is the Law & EEO is the Law Supplement documents. View PepsiCo EEO Policy .
Please view our Pay Transparency Statement .
- Define and own the enterprise observability architecture for AI agents, LLMs, multi-agent workflows, and physical AI systems - covering planner/executor loops, tool/function calls, RAG retrieval chains, and memory/state transitions.
- Build and operate unified telemetry pipelines incorporating metrics, logs, distributed traces, semantic/vector signals, and real-time event streaming (Kafka) at enterprise scale.
- Instrument OpenTelemetry (OTEL) across heterogeneous platforms including Salesforce AgentForce, ServiceNow, Microsoft Agent 365, and internal frameworks - delivering protocol-level observability for agent ecosystems including MCP, A2A, UCP, and AP2.
- Design and implement observability for Agent Fleets, multi-modal pipelines, physical AI systems, and self-evolving reinforcement learning harnesses - including signal capture for reward shaping and policy evaluation.
- Deliver dashboards, alerting, SLO/SLA management, incident runbook automation, and RCA tooling that drive measurable reliability improvements and reduce MTTR across agentic services.
- Establish cost telemetry and FinOps observability for AI workloads - token consumption, inference cost allocation, and GPU/compute efficiency across cloud environments (Azure, AWS, GCP).
- Lead observability-driven red team exercises targeting agentic AI systems - instrumenting attack surfaces, adversarial prompt injection vectors, model evasion attempts, and multi-agent trust boundary failures.
- Design telemetry pipelines that capture safety-critical signals: guardrail trigger rates, policy violation events, PII exposure risks, prompt leakage, and agent hallucination rates.
- Partner with Security and RAI teams to embed threat modeling, zero-trust agent authentication, and behavioral anomaly detection into the observability platform.
- Instrument secure policy enforcement layers across agent-to-agent communication protocols (A2A, UCP, AP2) and maintain audit-ready traceability for all AI decision events.
- Develop and maintain a Security Observability Playbook covering incident classification, escalation paths, and forensic trace retention policies for agentic AI systems.
- Integrate RAI signal capture - fairness, bias detection, explainability, and safety metrics - directly into observability pipelines, making compliance measurable and audit-ready.
- Deliver governance dashboards that surface RAI compliance posture across all active AI agents and LLM deployments, aligned with global regulatory standards.
- Support risk assessments, gap analyses, and governance frameworks with real-time observability insights - enabling proactive risk mitigation rather than reactive audit responses.
- Collaborate with RAI CoE and Legal/Compliance teams to define data retention, consent logging, and model decision traceability standards embedded in the telemetry architecture.
- Own the Continuous Quality Engineering (CQE) framework for post-production agentic solutions - defining and tracking quality metrics across accuracy, latency, agent success rate, tool-call fidelity, and user outcome measures.
- Build automated quality gates within CI/CD pipelines that leverage observability data to detect regressions, drift, and degradation in agent performance - preventing silent failures in production.
- Instrument and monitor Skill Evaluations (evals) across the Memory, Skills, and MCP harness stack - providing traceability from eval results to production behavior.
- Partner with product and business stakeholders to define SLA-backed quality benchmarks and deliver automated alerting when quality thresholds are breached.
- Drive root-cause analysis for quality failures using distributed trace data, enabling rapid iteration and continuous improvement cycles for agentic solutions.
- Design and implement observability for the agent memory layer - episodic, semantic, and working memory read/write operations - providing latency, accuracy, and drift monitoring across memory backends.
- Instrument MCP (Model Context Protocol) server interactions, tool registrations, skill invocations, and context injection pipelines with full trace propagation and semantic tagging.
- Own observability for self-evolving harness and reinforcement learning (RL) systems - capturing reward signals, policy update events, environment state transitions, and learning convergence metrics.
- Monitor harness execution fidelity, skill eval pass/fail rates, and regression signals across training, fine-tuning, and inference workflows - feeding data back into the quality engineering loop.
- Lead a team of senior Python engineers building high-performance, production-grade observability tooling - including custom OTEL exporters, semantic trace enrichers, signal aggregators, and anomaly detection pipelines.
- Apply data science methods - statistical process control, time-series anomaly detection, clustering, and causal inference - to transform raw telemetry into actionable AI operational intelligence.
- Build and maintain Python-native SDKs and libraries that simplify observability onboarding for agent developers across the organization.
- Establish code quality standards, testing frameworks, and peer review practices for the observability engineering team - embedding software craftsmanship into the team culture.
- Instrument the Agentic Marketplace and Agent Registry platforms - providing usage telemetry, adoption metrics, capability health scores, and dependency mapping for registered agents and skills.
- Design observability APIs and SDK hooks that allow marketplace-registered agents to self-report health, performance, and behavioral signals into the central observability platform.
- Monitor inter-agent communication patterns across the marketplace ecosystem - identifying latency hotspots, circular dependencies, and protocol mismatches in agent-to-agent (A2A) workflows.
- Deliver a Marketplace Observability Dashboard surfacing agent catalog health, adoption trends, quality scores, and incident history - supporting marketplace governance and curation decisions.
- Build and maintain CI/CD pipelines for observability services and agent operations center components, incorporating automated testing, deployment gates, and rollback mechanisms.
- Automate onboarding for new agent use cases using templates, scaffolding, and configuration validation - reducing time-to-observability from weeks to hours.
- Drive infrastructure-as-code (IaC) practices for observability platform components across Azure, AWS, and GCP - ensuring reproducible, version-controlled, and auditable deployments.
- Operate with a product mindset - defining observability platform roadmaps, OKRs, adoption playbooks, and release milestones in partnership with AI platform and business teams.
- Collaborate with transformation teams, enterprise architects, security, and business stakeholders to tailor observability solutions to domain-specific requirements.
- Serve as the technical authority in executive and governance forums - translating complex observability data into business-relevant insights on risk, cost, and AI performance.
- Partner with SRE, AI platform, and product teams to drive standard adoption and reduce integration friction across the agentic AI ecosystem.
- Build, mentor, and lead a high-performing observability engineering team - spanning Python developers, data scientists, and platform engineers - with talent initially based in India.
- Define career paths, skills development plans, and leveling criteria aligned with PepsiCo job architecture - fostering an inclusive, high-accountability team culture.
- Drive hiring, coaching, performance management, and succession planning across the observability function.
- High - Owns architecture decisions, platform roadmap, and engineering standards. Strategic alignment sought from AI Solutions Director on enterprise-level commitments.
- Low to Moderate - Operates independently with periodic alignment reviews. Proactively escalates cross-organizational dependencies and risk trade-offs.
- Very High - Spans observability, safety/security, RL harnesses, physical AI, multi-modal systems, agent protocols, quality engineering, and marketplace governance simultaneously
- The expected compensation range for this position is between $123,500 - $206,750.
- Location, confirmed job-related skills, experience, and education will be considered in setting actual starting salary. Your recruiter can share more about the specific salary range during the hiring process.
- Bonus based on performance and eligibility target payout is 15% of annual salary paid out annually.
- Paid time off subject to eligibility, including paid parental leave, vacation, sick, and bereavement.
- In addition to salary, PepsiCo offers a comprehensive benefits package to support our employees and their families, subject to elections and eligibility: Medical, Dental, Vision, Disability, Health, and Dependent Care Reimbursement Accounts, Employee Assistance Program (EAP), Insurance (Accident, Group Legal, Life), Defined Contribution Retirement Plan.
- Bachelor's or Master's degree in Computer Science, AI/ML, Data Science, Software Engineering, or a related field (PhD a plus for research-heavy domains).
- 12+ years in technology with deep experience in enterprise observability, distributed systems, platform engineering, or AI/ML infrastructure.
- 5+ years in a senior/principal or architect-level role with demonstrated ownership of complex, cross-functional technical programs.
- AI Observability & Distributed Systems: Expert-level knowledge of observability primitives (metrics, logs, traces, events) applied to LLM/ML/agentic systems; hands-on OpenTelemetry (OTEL) instrumentation including custom exporters, semantic conventions, and trace propagation across agent/tool boundaries.
- Agentic AI Frameworks: Direct experience with agentic AI platforms, multi-agent orchestration, LLM-based workflow design, and agent lifecycle management at production scale.
- Safety, Security & Red Teaming: Demonstrated experience conducting red team exercises against AI systems; knowledge of adversarial attack patterns, prompt injection, model evasion, and multi-agent trust boundary failures; ability to design safety telemetry pipelines.
- Memory, Skills & MCP: Working knowledge of agent memory architectures (episodic, semantic, working memory), Model Context Protocol (MCP), skill registries, and context injection patterns - with ability to design observability for these layers.
- Agent-to-Agent Protocols: Familiarity with A2A (Agent-to-Agent), UCP (Universal Communication Protocol), and AP2 patterns; ability to implement protocol-level observability and policy enforcement.
- Reinforcement Learning & Self-Evolving Harnesses: Understanding of RL training loops, reward signal capture, policy evaluation, and harness instrumentation for continuously improving agent systems.
- Physical AI & Multi-Modal Systems: Experience or strong familiarity with observability for physical AI pipelines (robotics, edge inference, sensor fusion) and multi-modal models (vision, audio, text).
- Data Science & Python Engineering: Proficiency in Python at a senior engineering level; experience with statistical anomaly detection, time-series analysis, and data pipeline design applied to observability data at scale.
- Platform Integrations (OTEL / Enterprise): Hands-on experience integrating OTEL with enterprise agentic platforms including Salesforce AgentForce, ServiceNow, Microsoft Agent 365, or similar; strong understanding of enterprise integration patterns and API design.
- Cloud & Infrastructure: Cloud fluency across Azure, AWS, and GCP; proficiency in Kubernetes, service mesh, IaC (Terraform/Bicep), and CI/CD tooling; experience with event streaming platforms (Kafka, Event Hubs).
- Quality Engineering for AI: Experience designing continuous quality frameworks (CQE) for agentic solutions including eval harnesses, regression detection, quality gates, and SLA-backed quality benchmarking.
- Responsible AI (RAI): Familiarity with RAI principles - fairness, bias detection, explainability, and safety - and ability to operationalize RAI signal capture within production observability pipelines.
- Agentic Marketplace & Registry: Experience or strong familiarity with agent marketplace architectures, capability registries, and platform governance - ideally with observability or monitoring responsibilities for marketplace-registered components.
- Published contributions or hands-on experience with emerging agent frameworks (LangGraph, AutoGen, CrewAI, Semantic Kernel, Bedrock Agents, or equivalent).
- Experience with Grafana, Datadog, New Relic, Dynatrace, or equivalent enterprise observability platforms - ideally extended to support AI/LLM workloads.
- Familiarity with vector databases (Pinecone, Weaviate, pgvector) and semantic search observability patterns relevant to RAG pipelines.
- Background in MLOps, LLMOps, or model lifecycle management - including model versioning, drift detection, and deployment governance.
- Experience designing observability APIs and SDK hooks for developer self-service onboarding.
- Differentiating Competencies Required - Translates enterprise AI strategy into observability architecture that simultaneously enables governance, safety, quality, and scale - holding the full picture across deeply technical and business dimensions.
- Safety-First Engineering Mindset - Instinctively designs systems with security, adversarial resilience, and RAI compliance as first-class requirements - not retrofitted features. Leads red team exercises with intellectual rigor and operational discipline.
- Outcome & Quality Orientation - Drives measurable impact: reduced MTTR, audit readiness, SLA adherence, agent quality scores, and RL harness convergence - translating telemetry data into business-relevant results.
- Cross-Functional Influencing - Navigates complex organizational dynamics - aligning engineering, governance, security, data science, and business units around shared observability standards and practices.
- Governance by Design - Integrates RAI, compliance, and security controls into design decisions from inception - producing systems that are audit-ready by default, not by remediation
- Technical Leadership Presence - Commands credibility in both executive and deep-technical forums; able to shift fluidly between C-suite communication and whiteboard architecture sessions with engineers.
- Adaptability & Continuous Learning - Thrives in a rapidly evolving AI landscape; quickly absorbs and operationalizes new frameworks, protocols, and research - from emerging agent communication standards to novel RL paradigms.
- Python Engineering Excellence - Holds a high bar for Python code quality, software craftsmanship, testing discipline, and developer experience - modeling best practices for the engineering team.
All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status.
PepsiCo is an Equal Opportunity Employer: Female / Minority / Disability / Protected Veteran / Sexual Orientation / Gender Identity / Age
If you'd like more information about your EEO rights as an applicant under the law, please download the available EEO is the Law & EEO is the Law Supplement documents. View PepsiCo EEO Policy .
Please view our Pay Transparency Statement .
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Senior Principal AI Architect/Engineer in Plano, TX vacancy
- Highbrow LLC is seeking a Hands-On Architect/Principal Software Engineer to lead AI/ML software solutions in Frisco, TX. The ideal candidate should have over 10 years of experience in software development and be proficient in Python, AWS, and GCP. The role emphasizes coding...Principal
- ...Senior Principal Cybersecurity Architect Come on board with an iconic financial institution and take your career to the next level. You have found... ...target-state architecture decisions Provide deep data engineering expertise and work across agile teams to enhance,...PrincipalSenior
- ...Overview We are seeking a Senior AI Engineer to define and drive the end-to-end engineering of an enterprise-grade agentic orchestration capability that enables smart AI agents to autonomously execute workflows, collaborate with humans, and operate securely with...Senior
$110.7k - $185.25k
...Overview The AI Platform/Observability Architect is an execution-focused engineer who designs, builds, and operates observability capabilities within a defined... ...platform. Working under the strategic direction of the Senior AI Observability Architect (L11), this role...Principal- ...building the platforms, data products, and AI capabilities that give CBRE and its... ...revenue impact. Role Summary The Senior AI/ML Engineer is a hands-on technical practitioner responsible... .... Conversational AI Development Architect and fine-tune intelligent virtual...PrincipalSeniorDay shift
- ...Senior Principal Cybersecurity Architect Come on board with an iconic financial institution and take your career to the next level. You have found the... ...to bring together talent that will consistently create AI-enabled solutions, processes, and reusable proof-of-concept...PrincipalSenior
- ...Senior Principal Architect You're a pro who wants to influence the future of technical architecture... ...influence, collaborating across product, engineering, operations, and business teams. You... ...technologies (e.g., real-time payments, AI/ML in fraud detection, blockchain),...PrincipalSeniorWorldwide
- NTT DATA is seeking an experienced SAP AI Lead Developer to define the technical vision and deliver AI/ML-enabled solutions across SAP platforms. You will collaborate with stakeholders to ensure business requirements are met and lead the implementation of scalable AI solutions...Senior
- ...Principal Enterprise Solution Architect The Principal Enterprise Solution Architect is the senior-most architectural authority responsible for defining... ...integrations, and AI/ML workloads. You will... ...executives, product leaders, engineering teams, and customers to translate...Principal
- ...rare opportunity to operate at the intersection of deep technical cybersecurity expertise and enterprise-level risk strategy. As a senior technical authority, you will shape how the firm evaluates and manages cybersecurity risk across its most strategically significant...Principal
- ...Principal Cybersecurity Architect Take your engineering expertise to new heights by joining a team of exceptionally talented professionals and solidify your... ...Ability to present and effectively communicate with senior leaders and executives Understanding of the business...Principal
- JPMorgan Chase & Co. is seeking a Senior Principal Cybersecurity Architect to lead cybersecurity strategy across multiple products and technologies. This role requires deep experience in cybersecurity architecture and the ability to drive impactful innovation. The ideal...Principal
- ...Principal Domain Architect Drive the transformation of Consumer & Community Banking... ...technologies, including AI, while upholding the... ...compliance. Collaborate with senior leaders to shape strategic... ...modeling workshops with Product, Engineering, Ops, Risk/Controls, and...Principal
- ...Welcome! Service Experts is seeking a Principal Architect to lead the architectural strategy,... ...platform enables analytics, reporting, and AI/ML workloads at enterprise scale.... ...Platform, including L1-L3 support, data engineering, analytics, and platform operations teams...Principal
- TCC Toyota Motor Credit Corporation Company is seeking a highly motivated Sr. Principal Engineer in Plano, TX. In this role, you will lead technical contributions, drive innovation, and mentor engineering talent while collaborating with diverse teams. The ideal candidate...PrincipalSenior
$170k - $200k
Dormont Manufacturing Co is looking for a hybrid Principal Architect for Automation & Orchestration. In this role, you will lead the design... ...should have over 10 years of experience in Network Engineering, expertise in automation frameworks like Python and Terraform...Principal- ...Bright Vision Technologies is seeking an Edge AI Engineer to design, optimize, and deploy machine learning models efficiently on edge devices. This remote position offers a full-time, direct W2 engagement with a company recognized for its innovative software solutions....SeniorFull timeRemote work
- A legal tech company is hiring a Senior Software Engineer - AI in Frisco, Texas. The role involves designing and implementing AI-powered solutions for legal workflows and requires extensive software engineering expertise, particularly in Python and AI/ML applications. With...SeniorRemote workFlexible hours
- ...Principal Architect Step into the role of a Principal Architect at JPMorganChase and become a driving force behind the development and adoption... ...~ Ability to present and effectively communicate to Senior Leaders and Executives ~ Strategic mindset in acquiring payments...Principal
- ...A leading mortgage company is seeking a Senior Encompass Administrator in Plano, TX. This role involves analyzing and maintaining the loan origination system while collaborating with various departments. Ideal candidates will have a strong analytical background, extensive...Senior
- ...Principal Cybersecurity Architect Take your engineering expertise to new heights by joining a team of exceptionally talented professionals and solidify your... ...communicate effectively and present technical concepts to senior leaders and executives. Proven track record of...Principal
- ...Principal Architect Step into the role of a Principal Architect at JPMorganChase and become a driving force behind the development and adoption... ...Ability to present and effectively communicate to Senior Leaders and Executives Required qualifications, capabilities...Principal
- ...memory and semiconductor technologies. As an HBM Memory Design Engineer within the HBM Architecture Team, you will design, simulate, and... ...mixed-signal architectures supporting Machine Learning and AI applications. Responsibilities Design and analyze digital,...PrincipalLocal areaImmediate start
$204.25k - $285k
...Take your engineering expertise to new heights by joining a team of exceptionally talented... ...performers in the industry. As a Principal Cybersecurity Architect at JPMorganChase within the... ...present and effectively communicate with senior leaders and executives...Principal- ...leading semiconductor company is seeking a distinguished Sr. Principal GaN Scientist to shape its device technology roadmap. The ideal... ...delivering technical discussions and collaborating closely with engineering and business stakeholders. The company values inclusivity and...PrincipalSenior
$123.5k - $206.75k
PepsiCo is seeking a Principal Product Manager in Plano, Texas, to lead product strategy across supply chain platforms. This is a high-visibility role responsible for defining roadmaps, shaping architectural decisions, and driving outcomes across complex ecosystems. The...PrincipalSenior$146k - $309k
Micron Technology, Inc in Richardson, Texas, is seeking a Principal SoC DFT Engineer to define and implement DFT architecture for complex HBM base-die SoC designs. The role requires close collaboration with RTL design, verification, and product engineering to optimize...Principal- ...advanced foundry process technologies. Collaborate with HBM architects and circuit designers on floorplanning, placement, routing, and... .... Motivated contributor who excels in team‑based, fast‑paced engineering environments. Benefits Micron offers a choice of medical,...PrincipalLocal area
- Yahoo Holdings Inc. is looking for a Principal Backend Software Engineer to lead the architectural strategy for high-scale applications. You will utilize... ...significant experience working with distributed systems, AI-assisted tools, and a strong foundation in both backend...Principal
- A leading global financial services firm is seeking a Principal Architect for IAM within their Cybersecurity & Technology Controls organization. You will leverage your expertise to enhance architecture platforms, design scalable solutions on cloud architectures, and lead...Principal
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Principal AI Architect/Engineer. Be the first to apply!
Related searches
- principal infrastructure engineer Plano, TX
- chief engineer Plano, TX
- principal developer Plano, TX
- director data engineering Plano, TX
- general engineer Plano, TX
- senior chief engineer Plano, TX
- principal network engineer Plano, TX
- data center chief engineer Plano, TX
- hotel chief engineer Plano, TX
- engineering director Plano, TX

