Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI-DNA SVP of Site Reliability Engineering

IgniteTech

By the time a human engineer reaches the incident, the AI agents on the team have already validated hypotheses against years of prior incidents and RCAs, parsed the logs and code paths, identified the failure pattern, and proposed — or applied — a remediation inside policy guardrails. That is the operating model you will lead: agentic SRE at Khoros — one of IgniteTech's flagship enterprise SaaS platforms, powering customer communities and social engagement for the world's largest brands — where reliability translates directly into retention, revenue, and customer trust. What You Will Be Doing Owning platform reliability and customer‑experience outcomes for an AI‑native community and social engagement platform — uptime trending up, MTTR trending down week over week, customer satisfaction trending up. As the owner of the outcome, you will be the senior leader involved in critical situations (CritSits) with customers and the face of those incidents and escalations. Designing, governing, and continuously extending the AI agent system that does the operations work — pre‑triage, alarm authoring, blocking change‑validation gates, permanent‑fix lifecycle chase, customer‑facing RCA drafting, auto‑healing on whitelisted operations. The harness is the product; the team's output is the agent surface. Leading a small, senior, fully remote team of SRE / SaaS / DevOps engineers across multiple time zones — no L1/L2 tier; every engineer ships agents, writes runbooks, and owns the incident loop end to end. Staying hands‑on yourself — driving outage bridges when the blast radius warrants it, writing RCAs, shipping agent code. A substantial share of your week goes to personal agent build and maintenance. Representing operations directly to enterprise customer leadership and to the CEO — translating reliability investment into retention, revenue, and customer‑experience outcomes. What You Will NOT Be Doing Growing the team to solve problems that AI should be solving. If you find yourself adding headcount to compensate for agent gaps, you are off‑strategy. Fix the AI, not the org chart. Running a deck‑and‑roadmap executive function. You will be at the keyboard, on the bridge, and in the agent code regularly. If you have not personally written or shipped production code in the last 12 months, the role will be uncomfortable. Owning product engineering or customer support. Your scope is the operating layer — where reliability and the AI agent surface live. Drowning in tickets. The point of this role is to remove ticket‑throughput as the primary operating metric, not optimize it. Sitting on every bridge call. You will drive the bridge personally when a top‑tier customer is down or the blast radius is material. The rest of the time, the operator tier handles incident execution and the AI surface absorbs the routine. Treating this role as a credential or a résumé line. The bar is ownership and obsession, not titles. Responsibilities Deliver the reliability outcomes — and own them like a founder accountable for them. Platform uptime trending up, MTTR trending down week over week, customer satisfaction trending up. When something is down, that is your problem; you do not sleep peacefully through a customer outage. Own enterprise‑customer escalations. Be the executive face of operations when a customer at the largest tier needs one, and engineer the operating system so escalations keep falling. Customer trust is the metric — measured in retention and contract expansion. Set and enforce AI agent quality and governance. Every agent in production has a defined scope, a measured acceptance rate, an escape hatch, and a guardrail anchored to documented failure modes. You hold the bar — including against agents that propose to skip a step. Recruit, develop, and retain a top‑1% senior‑only team. No L1, no L2 — every engineer is a pioneer in their craft. Recruiting looks more like courtship than triage. Shape the operating model and the playbook. Where AI does more, where humans must stay, what the agent surface looks like 6 and 12 months from now. The industry playbook for agentic SRE is being written right now — you write a meaningful part of it. Partner with product engineering and customer success peers. The operating layer is the hinge between them; reliability outcomes depend on those interfaces working. Own platform availability end‑to‑end: you will be expected to read, interpret, and act on operational signals, including incident histories, status dashboards, and SLA performance data. Knowing the current health posture of every system under your ownership, at any moment, is baseline. Requirements Extreme ownership. You run operations as if it were your own business — your money, your reputation, your customers. Expect to bring that same intensity here. 10+ years operating SaaS at meaningful scale, with at least 3 years in an SVP, VP, or Head‑of role managing a senior‑only engineering organization. AI‑First DNA — already operating this way today, not open to AI. You have built or led AIOps / agentic incident response / auto‑remediation systems in production. You use Claude Code, Cursor, or equivalent agentic coding tools daily. Deep AWS production experience at scale — real multi‑AZ, multi‑account production estates. Certifications are nice; production scars are mandatory. Hands‑on senior engineer who leads, not a deck‑and‑roadmap executive. You drive outage bridges, write RCAs, ship code, and evaluate agent designs. Track record of delivering reliability and customer‑experience improvements week over week while holding or shrinking headcount. Fluent / advanced English — you interface directly with the CEO and with enterprise customer leadership. Time‑zone overlap with US morning hours (roughly 13:00–17:00 UTC). OFAC‑clear country of residence. Nice to Have Pioneer voice in this field — substantive original posts, talks, articles, or open‑source work on agentic SRE / harness‑over‑headcount / AI‑replacing‑ops‑work. A documented history of being obsessed with one hard thing outside of mainstream work — a side project, an open‑source contribution, an unusual hobby pursued with depth. Background in multi‑tenant B2B SaaS at scale — community, social, customer‑experience, observability, AIOps, or developer‑tooling platforms. Hands‑on familiarity with modern observability and incident‑response stacks (Grafana, Loki, Prometheus, OpsGenie, PagerDuty, Datadog, or equivalents). What You Will Learn You will design and run one of the first operating functions built AI‑native from the inside out at enterprise scale. You will define what works, publish what doesn't, and shape a playbook the rest of the industry is still circling. The team that gets this right in the next 12 months becomes the reference everyone else cites — you will lead that team. Working Conditions Enterprise scale, startup cadence. The customer base, the platform footprint, and the contractual stakes are those of a large enterprise SaaS. The operating model is the opposite: outcomes are measured week over week, not quarter over quarter; decisions are made in days, not committees; the playbook is rewritten as the field evolves. Fully remote, async‑first, global team. Hire from anywhere with US‑morning UTC overlap; no office; work happens where the work is best done. The team is small and senior — no L1/L2 layer beneath you. No token limits. No tooling limits. The harness is the product, and we resource it accordingly. If the right answer is more compute, more inference, a better model, or a tool we have not bought yet, we buy it. #J-18808-Ljbffr IgniteTech

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the AI-DNA SVP of Site Reliability Engineering in New York, NY vacancy
  • $131k - $164k

     ...Staff Site Reliability Engineer New York, New York, United States Position Overview We are seeking...  ...000 USD About Us Diligent is the AI leader in governance, risk and...  ...working and thinking. Curiosity is in our DNA, we look for individuals willing to ask... 
    Suggested
    Work at office
    Local area
    Flexible hours

    Diligent

    New York, NY
    2 days ago
  • of SVP, Vulnerability Management & Cloud Security Posture Platform Engineering New York, NY, United States and 2 more Job Description...  ...platform improvements that increase reliability, scalability, coverage,...  ...teams harness cutting‑edge AI and breakthrough technologies... 
    Suggested
    Work experience placement
    Worldwide
    Flexible hours

    BNY Mellon

    New York, NY
    21 hours ago
  •  ...distribution services. About the role The SVP of Software and Web Engineering is accountable for the end-to-end...  .... Ensure systems are designed for reliability, scalability, auditability, and...  ...protecting delivery velocity. Build AI and automation capability (where it... 
    Suggested

    Tidal Financial Group

    New York, NY
    4 days ago
  • $165k - $225k

     ...Dataiku is the Platform for AI Success, the enterprise orchestration layer for building...  ...run it as a true business performance engine delivering measurable value. For more,...  ...How you'll make an impact: As a Site Reliability Engineer (SRE) with advanced expertise in... 
    Suggested
    Work at office
    Flexible hours

    Dataiku

    New York, NY
    1 day ago
  • $100k - $250k

     ..., economics, financials, weather, tech, AI, culture and more. We believe prediction...  ...Roadmap As a member of Kalshi's engineering team, you'll help build the next-generation...  ...You'll Do Improve observability, reliability, and service availability by defining... 
    Suggested
    Local area

    Kalshi

    New York, NY
    10 hours ago
  • $89k - $178k

     ...Sr. Site Reliability Engineer I NYC Global HQ Hybrid (3 days per week in office) DV is the leader in digital performance solutions, helping...  ...validation scripts, and self-service capabilities Leverage AI-assisted development tools to accelerate automation... 
    Work at office
    3 days per week

    DoubleVerify

    New York, NY
    2 days ago
  • $150k - $175k

     ...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed... 
    Remote work

    ASAPP

    New York, NY
    1 day ago
  •  ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the tools...  ...and TypeScript source code Experience experimenting with AI-driven approaches to operations Comfortable with... 

    TechChain Talent

    New York, NY
    1 day ago
  • $123k - $165k

     ...Site Reliability Engineer II Our engineering fleet is a horizontal set of teams providing engineering services across the organization. Our specific...  ...with distributed global teams. Experience using modern AI-assisted development tools (e.g., Copilot, Cursor, or... 

    Disney

    New York, NY
    2 days ago
  • $125k - $350k

     ...Site Reliability Engineer New York, Miami, Gurugram, London, Singapore, Sydney Job Description Opportunities may be available from time...  ...and the accelerating power of compute, machine learning and AI to power our analytics and tackle the market's and our clients... 

    Citadel Securities

    New York, NY
    1 day ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that is responsible for a range...  ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-...  ...have redefined the data platform for the AI era, enabling builders to create, transform... 
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    New York, NY
    1 day ago
  • $176.72k - $265.08k

     ...Application Development Lead - SVP Apply (opens in new window)...  ...breakers, retries, bulkheads, chaos engineering to maintain high availability...  ..., ensuring scalability, reliability, and performance. Domain...  ...accounting, and regulatory reporting. AI Integration : Drive the... 
    Full time
    Work experience placement

    Citi

    New York, NY
    1 day ago
  • $51.9 per hour

     ...This job is responsible for the reliability, availability, and...  ...efficiency. This role blends software engineering, clinical engineering, and...  ...cross-functionally with AHN site leaders and teams to navigate...  ...healthcare IT trends, including AI, security patching, and best... 
    For contractors
    Local area

    Highmark Health

    New York, NY
    2 days ago
  • $205k - $305k

     ...Director Of Site Reliability Engineering Interested in working on cutting-edge blockchain technology and creating equitable access to the global...  ...controls intersect with engineering. Pragmatically evaluate AI-assisted and agentic workflows where they can improve... 
    Temporary work
    Work at office
    Local area
    Worldwide
    Flexible hours

    Stellar

    New York, NY
    9 hours ago
  •  ...Site Reliability Engineering As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are...  ...internal forums and communities Uses enterprise-authorized AI capabilities within the work environment to accelerate... 

    Chase

    Jersey City, NJ
    1 day ago
  • $150k - $170k

     ...Senior Site Reliability Engineer – Zip Co Join to apply for the Senior Site Reliability Engineer role at Zip Co At Zip, we build cloud‑native software...  ...in Service Mesh architectures. Experience incorporating AI tools (ChatGPT, Cursor, Codex, GitHub Copilot) into day‑to‑day... 
    Casual work
    Work at office
    Remote work
    Flexible hours

    ZIP

    New York, NY
    3 days ago
  • $200k - $240k

     ...researchers from MIT and Harvard Medical School. We are building an AI layer that can accurately and scalably synthesize...  ..., and medicine. Job Description We’re hiring an experienced Site Reliability Engineer for our Boston or NYC office! You can expect to: Design, build... 
    Work at office

    Verana Health

    New York, NY
    3 days ago
  •  ...Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Dune Security. Co-...  ...safer, more resilient organizations. The Role: As a Senior Site Reliability Engineer (SRE) at Dune Security, you will play a critical role in... 
    Full time
    Work at office

    Dune Security

    New York, NY
    4 days ago
  • $7.5k

     ...Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in...  ...beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production... 
    Work at office
    Local area

    The Voleon Group

    New York, NY
    1 day ago
  • $160k - $230k

     ...Standard Template Labs is an AI-native startup reimagining the...  ...currently looking to add Platform Engineers to our team, with at least 5...  ...You’ll ensure our platform is reliable, secure, and performant from...  ...collaborative setting. Our team works on-site five days a week, growing and... 
    Work at office
    Local area

    Standard Template Labs

    New York, NY
    2 days ago
  •  ...builds the platforms and tooling that help engineering teams develop, deploy, and operate...  ...default for every product team. As a Staff Site Reliability Engineer on Release Engineering, you'll...  ...they remain fast and safe even as AI‑assisted development increases code velocity... 
    Permanent employment
    Work experience placement
    Local area

    Plaid

    New York, NY
    4 days ago
  • BNY Mellon is seeking a Senior Vice President - Front Office Equities Engineering to lead the design and architecture of trading workflows in New York City. Applicants should have over 10 years of experience in building robust software applications, particularly within... 

    BNY Mellon

    New York, NY
    1 day ago
  •  ...0,000 PLN THE OPPORTUNITY TechInsights is building the reliability and AI operations foundation for its next chapter — an AI‑first...  ...intelligence workflows in the world. We’re looking for a Senior Site Reliability Engineer who wants to own that foundation. This role is a senior... 
    Remote job
    Flexible hours

    Tech Insights

    New York, NY
    1 day ago
  •  ...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like Airbnb,...  ...the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly... 

    Unify

    New York, NY
    2 days ago
  •  ...precision, and speed, and we hold ourselves to the same bar. Our AI-native workspace lets legal professionals move faster, think...  ...raising the bar. This is the place. The role As a Senior Site Reliability Engineer you'll join the founding SRE team at our new NYC engineering... 
    Work at office

    Legora-Ab

    New York, NY
    1 day ago
  • $93.9k - $156.5k

    Hybrid role , 2 days on site. Role is located in NYC with alternative...  ...hours: 9am‑5pm EST. Site Reliability EngineerII (Tuesday‑Saturday)....  ...candidate will work alongside senior engineers to learn how we observe,...  ...integration of Artificial Intelligence (AI) and Machine Learning (ML) to... 
    Local area

    CME Chicago Mercantile Exchange Inc.

    New York, NY
    21 hours ago
  • $156k - $262k

    Senior Site Reliability Engineer (Agentic Search) New York City, New York, United States About Tavily We're building the infrastructure layer for...  ...Retrieval-Augmented Generation (RAG) and real-time reasoning in AI systems. By connecting LLMs to high-quality, trustworthy web... 
    Temporary work
    Immediate start
    Remote work

    Tavily Inc.

    New York, NY
    4 days ago
  •  ...ready to make their mark in the blockchain space. As a Senior Site Reliability Engineer, you'll work at the intersection of cloud infrastructure and...  ...Kubernetes experience is a must. You'll also own bringing AI into our engineering workflows. We want someone who can build... 

    SSV Labs

    New York, NY
    1 day ago
  • $182.3k - $220k

     ...patients first - and that mission depends on reliable, secure, and scalable systems. As a...  ...infrastructure and building tools that empower our engineers to ship safely and confidently.   You...  ..., including artificial intelligence (AI), to assist with parts of our recruiting... 
    Local area
    Flexible hours

    Ro

    New York, NY
    16 days ago
  • Senior Site Reliability Engineer - Azure Cloud Join to apply for the Senior Site Reliability Engineer role at Concord Technologies Concord Technologies...  ...Complementing its document transfer capabilities, Concord’s AI‑powered workflow applications allow organizations to receive... 
    Full time
    Local area
    Immediate start
    Remote work
    Flexible hours

    Concord Technologies

    New York, NY
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI-DNA SVP of Site Reliability Engineering. Be the first to apply!