Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer - Application Reliability , Hybrid

$199.7k - $254.6k

Webex Events (formerly Socio)

Senior Software Engineer In Application Reliability

This position is based in San Jose, CA or North Carolina and operates under a hybrid work model.

Join Cisco's Enterprise AI team, the core group enabling Generative AI powered experiences across Cisco. Our mission is to build secure, scalable AI platforms that empower teams to safely develop, deploy, and operationalize AI-powered solutions. We operate at the intersection of applied AI, cloud infrastructure and security — partnering across engineering, security, compliance, and product teams to bring trusted AI to life at an enterprise scale.

We are a fast-growing, highly collaborative team of platform engineers, AI engineers, and data scientists who value technical depth, ownership, and pragmatic execution. What makes this team exciting is the opportunity to define how secure Generative AI is built and governed inside a global technology leader.

As a Senior Software Engineer in Application Reliability, you will own the reliability of our AI-powered applications and features from the user's perspective. While our infrastructure SRE team ensures the platform is healthy, your focus will be on feature uptime, usage trends, automated issue identification, and self-healing remediation at the application layer. You will build LangGraph-based agents for automated diagnostics, Looker dashboards for observability, and evaluation harnesses for agent quality - all powered by BigQuery, BigTable, and Python. You will partner closely with application developers, data engineers, and infrastructure SREs to ensure our APIs, RAG systems, agents, and user-facing features are reliable, observable, and continuously improving.

Your Impact
  • Define, implement, and enforce feature-level SLIs, SLOs, and error budgets for APIs, RAG systems, AI agents, and user-facing applications.
  • Build and maintain application observability systems using Looker dashboards on BigQuery and BigTable — providing real-time visibility into feature health, error patterns, and usage trends for developers, PMs, and leadership.
  • Design and build LangGraph-based agents for automated issue identification and remediation: anomaly detection on BQ logs, root cause diagnosis, auto-rollback, feature flag kill switches, and self-healing workflows.
  • Develop agent evaluation harnesses to benchmark agent performance, test multi-step workflows, handle non-deterministic outputs, and run regression testing as agents evolve.
  • Write complex SQL (BigQuery) for usage trend analysis, anomaly detection, and operational analytics; design BQ table schemas optimized for observability and debugging.
  • Analyze application usage trends and adoption metrics to proactively identify reliability risks, capacity needs, and degraded user experiences before they become incidents.
  • Partner with application development teams to embed reliability practices into the development lifecycle: deployment safety (canary, progressive rollout), structured logging standards, and distributed tracing.
  • Lead application-level incident response, root cause analysis, and blameless postmortems focused on feature impact rather than infrastructure symptoms.
  • Build Python-based tooling and automation to reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for application-layer issues.
  • Stay current with the rapidly evolving AI landscape (new frameworks, tools, and paradigms) and apply emerging techniques to improve platform reliability and developer productivity.
Minimum Qualifications
  • 10+ years of experience in software engineering with significant focus on reliability, observability, or production operations; Bachelor's or Master's Degree in Computer Science, Engineering, or a related technical discipline.
  • Strong Python development skills, with experience building production tooling, automation, and agent-based systems.
  • Production GCP experience — deploying and managing applications on GKE (Kubernetes), deep SQL expertise with BigQuery (complex queries, window functions, schema design, cost optimization), and hands-on experience with BigTable (or equivalent) for high-throughput operational data.
  • Proven experience designing and operating application-level SLI/SLO frameworks, burn-rate alerting, and error budget policies.
  • Strong debugging skills at the application layer — distributed tracing, profiling, structured log analysis, and dependency mapping.
Preferred Qualifications
  • Experience building agent evaluation harnesses (benchmarking, regression testing, guardrail validation for AI agents).
  • Familiarity with A2A protocols, streaming architectures, and event-driven systems.
  • Experience with deployment safety patterns: feature flags, canary deployments, progressive rollouts, and automated rollback.
  • Experience with GCP observability services (Cloud Logging, Cloud Trace, Cloud Monitoring).
  • Exposure to AIOps concepts: ML-driven anomaly detection, automated root cause analysis, intelligent alerting.
  • Experience driving reliability culture across engineering teams — SLO adoption, postmortem processes, and reliability reviews.
  • Active engagement with the evolving AI ecosystem; awareness of emerging tools and frameworks.
  • Hands-on experience with GenAI application development: LangGraph, agent engineering, prompt design, and agentic workflows.
  • Experience building Looker dashboards and Look ML models for operational observability.
Why Cisco?

At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.

Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.

We are Cisco, and our power starts with you.

Message to applicants applying to work in the U.S. and/or Canada:

The starting salary range posted for this position is $199,700.00 to $254,600.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation, equity, or benefits.

Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process.

U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time.

U.S. employees are eligible for paid time away as described below, subject to Cisco's policies:

  • 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
  • 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco
  • Non-exempt employees receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees
  • Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations)
  • 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next
  • Additional paid time away may be requested to deal with critical or emergency issues for family members
  • Optional 10 paid days per full calendar year to volunteer

For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies.

Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan.

  • .75% of incentive target for each 1% of
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer - Application Reliability , Hybrid in San Jose, CA vacancy
  • $90k - $215k

     ...Senior Software Engineer- Observability and Reliability Platform Engineering (REMOTE)Senior Software Engineer- Observability...  ..., zero-downtime platforms, and applications. You will help drive our...  ...experience with AWS, GCP, Azure, or hybrid data centerEducationBachelor's degree... 
    Application
    Senior
    Hourly pay
    Work experience placement
    Local area
    Remote work
    Flexible hours

    GEICO

    San Jose, CA
    3 days ago
  • $165k - $241.4k

     ...Software Development Engineer This is a hybrid position in the Milpitas, CA Office. Meet the Team You will be part of a best-in-class Software...  ...breaking technologies in Routers connecting to Cloud applications. You will have the opportunity to work with... 
    Application
    Senior
    Full time
    Temporary work
    Work at office
    Local area
    Flexible hours

    Webex Events (formerly Socio)

    Milpitas, CA
    4 days ago
  • $140k - $215k

     ...Back-End Engineer As a global leader in...  ...the cloud product software engineering team,...  ...Sensor. This is a hybrid role and will...  ...organization and up to senior leadership. Our...  ...design patterns, reliability and scaling) of...  ...building cloud-deployed applications ~ BS/BE in CS... 
    Application
    Senior
    Work experience placement
    Local area
    2 days per week

    CrowdStrike

    Sunnyvale, CA
    4 days ago
  • $140k - $215k

    ## Sr. Software Engineer - Falcon Fusion Product (Hybrid)Applylocations: USA - Sunnyvale, CAtime...  ...Our Fusion is seeking a Senior-to-Principal (Level 7)...  ...attention to performance, reliability and scalability will be...  ...is 10-15) of overall applicable experience in a... 
    Application
    Senior
    Work experience placement
    Work at office
    Local area

    CrowdStrike Holdings, Inc.

    Sunnyvale, CA
    1 day ago
  • $140k - $215k

     ...the Role: This is a Software Development Engineer role on the Cloud Runtime...  ...operations. This role is hybrid, requiring 2-3 days per week...  ...systems and components reliability and performance through monitoring...  ...for all employees and applicants for employment. The... 
    Application
    Senior
    Work experience placement
    Work at office
    Local area
    2 days per week
    3 days per week

    CrowdStrike Holdings, Inc.

    Sunnyvale, CA
    3 days ago
  •  ...Applications are still being accepted. Apply now! Job Type Full-Time Workspace Hybrid/Remote Job Description As a Senior Software Optimization Engineer, you’ll lead the design and implementation...  ...system efficiency and reliability. Collaborate with cross... 
    Application
    Senior
    Permanent employment
    Full time
    Contract work
    Remote work

    Ainabl

    San Jose, CA
    1 day ago
  • $140k - $215k

    Sr. Software Engineer - Sensor - Cloud Runtime Protection (Hybrid) page is loaded## Sr. Software Engineer - Sensor - Cloud...  ...Ensure systems and components reliability and performance through monitoring...  ...for all employees and applicants for employment. The Company does... 
    Application
    Senior
    Work experience placement
    Work at office
    Local area
    2 days per week
    3 days per week

    CrowdStrike Holdings, Inc.

    Sunnyvale, CA
    4 days ago
  • $112k - $179k

     ...Senior Full Stack Software Engineer Job Locations US-CA-Santa Clara | US...  ...the ability to work in a hybrid capacity but primarily on...  ...TMPC) while ensuring the reliability, scalability, and security...  ...driven capabilities where applicable (e.g., data analysis, predictive... 
    Application
    Senior
    Contract work
    Shift work

    Peraton

    Santa Clara, CA
    2 days ago
  • $224k - $356.5k

     ...NVIDIA is hiring engineers to scale up the introduction...  ..., familiarity with software testing and deployment...  ...at scale with focus on hybrid deployments between cloud...  ...orchestration and application tuning Provide fast...  ...effective, clear and reliable architecture specification... 
    Application
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...world. We are looking for a Senior Software Engineer to join our mission to...  ...business critical services and AI applications. You will be working with a...  ..., crafting and building reliable distributed systems, and...  ...in a globally distributed, hybrid multi‑cloud environment (AWS... 
    Application
    Senior

    NVIDIA

    Santa Clara, CA
    2 days ago
  •  ...firm in California is seeking a Senior/Staff Java Developer to...  ...maintain large-scale cloud-based applications. The ideal candidate will...  ...technologies like AWS or Azure. This hybrid role requires onsite work...  ...teams to ensure high-quality software delivery. Competitive salary... 
    Application
    Senior

    Compunnel, Inc.

    Sunnyvale, CA
    4 days ago
  • $143k - $191k

     ...decisions. We’re looking for a Software Engineer to design and build highly...  ...intuitive, scalable, and reliable product features. This is a...  ...backend or full-stack applications ~ Proficiency in Python and...  ...this role is categorized as hybrid in Santa Clara, CA The base... 
    Application
    Senior
    Work at office
    Remote work
    Flexible hours

    Eightfold LLC

    Santa Clara, CA
    4 days ago
  • $179.06k - $198.95k

     ...Clara 2 days per week (Hybrid) Expertise coding...  ...skilled and motivated engineer to design, develop, and...  ...designing for scale, reliability, and operational excellence...  ...to run efficiently as Software-as-a-Service (SaaS) on...  ...Pursuant to Applicable State Equal Pay Transparency... 
    Application
    Senior
    Hourly pay
    Full time
    Work at office
    2 days per week
    3 days per week

    Cohesity

    Santa Clara, CA
    2 days ago
  • $154.42k - $235.9k

     ...experience that make complex systems reliable, observable, and fast. As a Senior Software Engineer, you will design and deliver...  ...blocks used by AV/Robotics applications on vehicles, on benches, and...  ...eligible for relocation benefits. Hybrid: This role is categorized... 
    Application
    Senior
    Permanent employment
    Local area
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    5 days ago
  • $139k - $204k

     ...Senior Software Engineer, Cluster Orchestration CoreWeave is The Essential...  ...workloads run seamlessly, reliably, and efficiently across massive...  ...workloads, GPU-based applications, or ML pipelines. Knowledge...  ...While we prioritize a hybrid work environment, remote work... 
    Application
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    5 days ago
  •  ...effortlessly run large-scale ML applications, without the hassle of...  ...We are looking for a Software Engineer to join the ML Integration...  ...large-scale ML workloads run reliably and efficiently across our...  ...This role follows a hybrid schedule, requiring in-office... 
    Application
    Senior
    Work at office
    Remote work

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    3 days ago
  •  ...Description About Index Engines At Index Engines,...  ...for mid to senior level Software Engineers for our San...  ...Index Engines’ Linux application and will work closely...  ...systems that are scalable, reliable, and secure Guide...  ...~ Unlimited PTO ~ Hybrid work schedule with WFH... 
    Application
    Senior
    Work at office
    Work from home
    Monday to Friday

    Index Engines

    San Jose, CA
    12 days ago
  •  ...Senior Full Stack Software Engineer (Network) This role has been designed as 'Hybrid' with an expectation that you will work on average...  ...and act on their data and applications wherever they live, from edge...  ...and support scalable and reliable new features for Orchestrator... 
    Application
    Senior
    Work at office
    2 days per week

    Hewlett Packard Enterprise

    Alviso, CA
    4 days ago
  • $136k - $224.25k

     ...NVIDIA is looking for a Senior Network Reliability Engineer to support and maintain our cloud and datacenter...  ...serves the needs across the whole software stack for NVIDIA, from Graphics...  ...eligible for equity and benefits ( . Applications for this job will be accepted at... 
    Application
    Senior
    Remote work
    Shift work

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $140k - $215k

     ...CrowdStrike Backend Software Engineer As a global leader in cybersecurity, CrowdStrike protects...  ...to the right place. This role is hybrid, requiring 2-3 days per week on-site at...  ...employment opportunity for all employees and applicants for employment. The Company does not... 
    Application
    Senior
    Full time
    Work experience placement
    Work at office
    Local area
    2 days per week
    3 days per week

    CrowdStrike

    Sunnyvale, CA
    5 days ago
  • $323k - $428k

     ...groups within Advertising Engineering to deliver high-...  ...seeking a highly skilled Senior Software Engineer with deep...  ...Apache Airflow. This hybrid position bridges...  ...scalable systems for both application development and large...  ...Looker to deliver reliable, high-performance solutions... 
    Application
    Senior
    Work at office
    Local area
    Remote work
    Monday to Thursday
    Flexible hours

    Roku

    San Jose, CA
    4 days ago
  • $176k - $220k

     ...connecting users, devices, and applications in any location. Here,...  ...are looking for a Sr. Staff Software Engineer to join our Service...  ...offers flexibility to work a hybrid schedule (three days a week...  ...operate the orchestration and reliability automation that manages ZIA... 
    Application
    Senior
    Full time
    Work at office
    Local area
    3 days per week

    Zscaler

    San Jose, CA
    3 days ago
  • $136.5k - $276.5k

     ...Senior Software Engineer, Systems/Solutions Test This role has been designed as 'Hybrid' with an expectation that you will work on average...  ...act on their data and applications wherever they live, from...  ...deployments and help ensure reliability, scalability, and... 
    Application
    Senior
    Work experience placement
    Work at office
    Local area
    Immediate start
    2 days per week

    Hewlett Packard Enterprise Development LP

    Sunnyvale, CA
    1 day ago
  • $144k - $216k

     ...Description Job Description The Senior Software Engineer is a technical pillar and...  ...to system performance, reliability, and cost-efficiency...  ...with team values. #LI-Hybrid   #LI-JM1   This job...  ...offerings at any time. All applicants acknowledge that by... 
    Application
    Senior
    Work experience placement
    Immediate start

    FloQast

    San Jose, CA
    24 days ago
  • $136.5k - $276.5k

     ...Senior Platform Software Engineer This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days...  ..., and act on their data and applications wherever they live, from...  ...diagnostics tools to ensure the reliability and performance of Juniper... 
    Application
    Senior
    Work experience placement
    Work at office
    Local area
    Immediate start
    2 days per week

    Hewlett Packard Enterprise

    Sunnyvale, CA
    1 day ago
  • $300k - $425k

     ...re actively looking for a Senior Software Engineer, Content Platform who can...  ...real-time data processing applications to support business-critical...  ..., scalability, and reliability. Leverage Java frameworks...  ...Aerospike). #LI-JC5 Our Hybrid Work Approach Roku fosters... 
    Application
    Senior
    Work at office
    Local area
    Remote work
    Monday to Thursday
    Flexible hours

    Roku

    San Jose, CA
    3 days ago
  • $210k - $267k

     ...We're looking for an engineer to help lead the scaling and reliability of our data infrastructure...  ...or Temporal. Strong software engineering skills....  ...startup, so Gridmatic has a hybrid policy that will ask...  ...process, such as reviewing applications, analyzing resumes, or... 
    Application
    Senior
    Work at office
    Remote work
    Work from home
    Home office
    Flexible hours
    3 days per week

    Gridmatic

    Cupertino, CA
    5 days ago
  • $160k - $250k

     ...Crowdstrike Falcon Host Sr. Software Development Engineer (SDE) As a global leader in cybersecurity,...  ...and market trends. Note: This is a hybrid role based out of our offices in Redmond...  ...opportunity for all employees and applicants for employment. The Company does not... 
    Application
    Senior
    Work experience placement
    Work at office
    Local area
    Remote work
    2 days per week

    CrowdStrike

    Sunnyvale, CA
    5 days ago
  • $117k - $234k

     ...building highly scalable and reliable APIs, services and applications which will drive the next...  ...high impact, critical software/systems monitoring issues...  ...Were a team of software engineers, data scientists, cybersecurity...  ...of retail. Flexible, hybrid work: We use a hybrid... 
    Application
    Senior
    Full time
    Temporary work
    Part time
    Work at office
    Flexible hours

    Walmart

    Sunnyvale, CA
    5 days ago
  • $168k - $270.25k

    Senior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US, CA, Santa Claratime type: Full timeposted on: Posted...  ...- 270,250 USD.You will also be eligible for equity and .Applications for this job will be accepted at least until May 8, 2026... 
    Application
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - Application Reliability , Hybrid. Be the first to apply!