AI-DNA SVP of Site Reliability Engineering

IgniteTech

By the time a human engineer reaches the incident, the AI agents on the team have already validated hypotheses against years of prior incidents and RCAs, parsed the logs and code paths, identified the failure pattern, and proposed — or applied — a remediation inside policy guardrails. That is the operating model you will lead: agentic SRE at Khoros — one of IgniteTech's flagship enterprise SaaS platforms, powering customer communities and social engagement for the world's largest brands — where reliability translates directly into retention, revenue, and customer trust. What You Will Be Doing Owning platform reliability and customer‑experience outcomes for an AI‑native community and social engagement platform — uptime trending up, MTTR trending down week over week, customer satisfaction trending up. As the owner of the outcome, you will be the senior leader involved in critical situations (CritSits) with customers and the face of those incidents and escalations. Designing, governing, and continuously extending the AI agent system that does the operations work — pre‑triage, alarm authoring, blocking change‑validation gates, permanent‑fix lifecycle chase, customer‑facing RCA drafting, auto‑healing on whitelisted operations. The harness is the product; the team's output is the agent surface. Leading a small, senior, fully remote team of SRE / SaaS / DevOps engineers across multiple time zones — no L1/L2 tier; every engineer ships agents, writes runbooks, and owns the incident loop end to end. Staying hands‑on yourself — driving outage bridges when the blast radius warrants it, writing RCAs, shipping agent code. A substantial share of your week goes to personal agent build and maintenance. Representing operations directly to enterprise customer leadership and to the CEO — translating reliability investment into retention, revenue, and customer‑experience outcomes. What You Will NOT Be Doing Growing the team to solve problems that AI should be solving. If you find yourself adding headcount to compensate for agent gaps, you are off‑strategy. Fix the AI, not the org chart. Running a deck‑and‑roadmap executive function. You will be at the keyboard, on the bridge, and in the agent code regularly. If you have not personally written or shipped production code in the last 12 months, the role will be uncomfortable. Owning product engineering or customer support. Your scope is the operating layer — where reliability and the AI agent surface live. Drowning in tickets. The point of this role is to remove ticket‑throughput as the primary operating metric, not optimize it. Sitting on every bridge call. You will drive the bridge personally when a top‑tier customer is down or the blast radius is material. The rest of the time, the operator tier handles incident execution and the AI surface absorbs the routine. Treating this role as a credential or a résumé line. The bar is ownership and obsession, not titles. Responsibilities Deliver the reliability outcomes — and own them like a founder accountable for them. Platform uptime trending up, MTTR trending down week over week, customer satisfaction trending up. When something is down, that is your problem; you do not sleep peacefully through a customer outage. Own enterprise‑customer escalations. Be the executive face of operations when a customer at the largest tier needs one, and engineer the operating system so escalations keep falling. Customer trust is the metric — measured in retention and contract expansion. Set and enforce AI agent quality and governance. Every agent in production has a defined scope, a measured acceptance rate, an escape hatch, and a guardrail anchored to documented failure modes. You hold the bar — including against agents that propose to skip a step. Recruit, develop, and retain a top‑1% senior‑only team. No L1, no L2 — every engineer is a pioneer in their craft. Recruiting looks more like courtship than triage. Shape the operating model and the playbook. Where AI does more, where humans must stay, what the agent surface looks like 6 and 12 months from now. The industry playbook for agentic SRE is being written right now — you write a meaningful part of it. Partner with product engineering and customer success peers. The operating layer is the hinge between them; reliability outcomes depend on those interfaces working. Own platform availability end‑to‑end: you will be expected to read, interpret, and act on operational signals, including incident histories, status dashboards, and SLA performance data. Knowing the current health posture of every system under your ownership, at any moment, is baseline. Requirements Extreme ownership. You run operations as if it were your own business — your money, your reputation, your customers. Expect to bring that same intensity here. 10+ years operating SaaS at meaningful scale, with at least 3 years in an SVP, VP, or Head‑of role managing a senior‑only engineering organization. AI‑First DNA — already operating this way today, not open to AI. You have built or led AIOps / agentic incident response / auto‑remediation systems in production. You use Claude Code, Cursor, or equivalent agentic coding tools daily. Deep AWS production experience at scale — real multi‑AZ, multi‑account production estates. Certifications are nice; production scars are mandatory. Hands‑on senior engineer who leads, not a deck‑and‑roadmap executive. You drive outage bridges, write RCAs, ship code, and evaluate agent designs. Track record of delivering reliability and customer‑experience improvements week over week while holding or shrinking headcount. Fluent / advanced English — you interface directly with the CEO and with enterprise customer leadership. Time‑zone overlap with US morning hours (roughly 13:00–17:00 UTC). OFAC‑clear country of residence. Nice to Have Pioneer voice in this field — substantive original posts, talks, articles, or open‑source work on agentic SRE / harness‑over‑headcount / AI‑replacing‑ops‑work. A documented history of being obsessed with one hard thing outside of mainstream work — a side project, an open‑source contribution, an unusual hobby pursued with depth. Background in multi‑tenant B2B SaaS at scale — community, social, customer‑experience, observability, AIOps, or developer‑tooling platforms. Hands‑on familiarity with modern observability and incident‑response stacks (Grafana, Loki, Prometheus, OpsGenie, PagerDuty, Datadog, or equivalents). What You Will Learn You will design and run one of the first operating functions built AI‑native from the inside out at enterprise scale. You will define what works, publish what doesn't, and shape a playbook the rest of the industry is still circling. The team that gets this right in the next 12 months becomes the reference everyone else cites — you will lead that team. Working Conditions Enterprise scale, startup cadence. The customer base, the platform footprint, and the contractual stakes are those of a large enterprise SaaS. The operating model is the opposite: outcomes are measured week over week, not quarter over quarter; decisions are made in days, not committees; the playbook is rewritten as the field evolves. Fully remote, async‑first, global team. Hire from anywhere with US‑morning UTC overlap; no office; work happens where the work is best done. The team is small and senior — no L1/L2 layer beneath you. No token limits. No tooling limits. The harness is the product, and we resource it accordingly. If the right answer is more compute, more inference, a better model, or a tool we have not bought yet, we buy it. #J-18808-Ljbffr IgniteTech

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the AI-DNA SVP of Site Reliability Engineering in New York, NY vacancy

Staff Site Reliability Engineer
$131k - $164k
...Staff Site Reliability Engineer New York, New York, United States Position Overview We are seeking... ...000 USD About Us Diligent is the AI leader in governance, risk and... ...working and thinking. Curiosity is in our DNA, we look for individuals willing to ask...
Suggested
Work at office
Local area
Flexible hours
Diligent
New York, NY
2 days ago
SVP, Vulnerability Management & Cloud Security Posture Platform Engineering
of SVP, Vulnerability Management & Cloud Security Posture Platform Engineering New York, NY, United States and 2 more Job Description... ...platform improvements that increase reliability, scalability, coverage,... ...teams harness cutting‑edge AI and breakthrough technologies...
Suggested
Work experience placement
Worldwide
Flexible hours
BNY Mellon
New York, NY
21 hours ago
SVP of Software and Web Engineering
...distribution services. About the role The SVP of Software and Web Engineering is accountable for the end-to-end... .... Ensure systems are designed for reliability, scalability, auditability, and... ...protecting delivery velocity. Build AI and automation capability (where it...
Suggested
Tidal Financial Group
New York, NY
4 days ago
Site Reliability Engineer II
$165k - $225k
...Dataiku is the Platform for AI Success, the enterprise orchestration layer for building... ...run it as a true business performance engine delivering measurable value. For more,... ...How you'll make an impact: As a Site Reliability Engineer (SRE) with advanced expertise in...
Suggested
Work at office
Flexible hours
Dataiku
New York, NY
1 day ago
Site Reliability Engineer
$100k - $250k
..., economics, financials, weather, tech, AI, culture and more. We believe prediction... ...Roadmap As a member of Kalshi's engineering team, you'll help build the next-generation... ...You'll Do Improve observability, reliability, and service availability by defining...
Suggested
Local area
Kalshi
New York, NY
10 hours ago
Sr. Site Reliability Engineer I
$89k - $178k
...Sr. Site Reliability Engineer I NYC Global HQ Hybrid (3 days per week in office) DV is the leader in digital performance solutions, helping... ...validation scripts, and self-service capabilities Leverage AI-assisted development tools to accelerate automation...
Work at office
3 days per week
DoubleVerify
New York, NY
2 days ago
Senior Site Reliability Engineer
$150k - $175k
...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed...
Remote work
ASAPP
New York, NY
1 day ago
Senior Site Reliability Engineer
...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the tools... ...and TypeScript source code Experience experimenting with AI-driven approaches to operations Comfortable with...
TechChain Talent
New York, NY
1 day ago
Site Reliability Engineer II
$123k - $165k
...Site Reliability Engineer II Our engineering fleet is a horizontal set of teams providing engineering services across the organization. Our specific... ...with distributed global teams. Experience using modern AI-assisted development tools (e.g., Copilot, Cursor, or...
Disney
New York, NY
2 days ago
Site Reliability Engineer
$125k - $350k
...Site Reliability Engineer New York, Miami, Gurugram, London, Singapore, Sydney Job Description Opportunities may be available from time... ...and the accelerating power of compute, machine learning and AI to power our analytics and tackle the market's and our clients...
Citadel Securities
New York, NY
1 day ago
Senior Site Reliability Engineer, Fleet Management
$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range... ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-... ...have redefined the data platform for the AI era, enabling builders to create, transform...
Work at office
Local area
Remote work
Worldwide
Flexible hours
MongoDB
New York, NY
1 day ago
Wealth Tech Platform Lead - Supervisory Apps, SVP
$176.72k - $265.08k
...Application Development Lead - SVP Apply (opens in new window)... ...breakers, retries, bulkheads, chaos engineering to maintain high availability... ..., ensuring scalability, reliability, and performance. Domain... ...accounting, and regulatory reporting. AI Integration : Drive the...
Full time
Work experience placement
Citi
New York, NY
1 day ago
Manager Site Reliability Engineering
$51.9 per hour
...This job is responsible for the reliability, availability, and... ...efficiency. This role blends software engineering, clinical engineering, and... ...cross-functionally with AHN site leaders and teams to navigate... ...healthcare IT trends, including AI, security patching, and best...
For contractors
Local area
Highmark Health
New York, NY
2 days ago
Director, Site Reliability Engineering
$205k - $305k
...Director Of Site Reliability Engineering Interested in working on cutting-edge blockchain technology and creating equitable access to the global... ...controls intersect with engineering. Pragmatically evaluate AI-assisted and agentic workflows where they can improve...
Temporary work
Work at office
Local area
Worldwide
Flexible hours
Stellar
New York, NY
9 hours ago
Lead Site Reliability Engineer
...Site Reliability Engineering As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are... ...internal forums and communities Uses enterprise-authorized AI capabilities within the work environment to accelerate...
Chase
Jersey City, NJ
1 day ago
Senior Site Reliability Engineer
$150k - $170k
...Senior Site Reliability Engineer – Zip Co Join to apply for the Senior Site Reliability Engineer role at Zip Co At Zip, we build cloud‑native software... ...in Service Mesh architectures. Experience incorporating AI tools (ChatGPT, Cursor, Codex, GitHub Copilot) into day‑to‑day...
Casual work
Work at office
Remote work
Flexible hours
ZIP
New York, NY
3 days ago
Senior Site Reliability Engineer
$200k - $240k
...researchers from MIT and Harvard Medical School. We are building an AI layer that can accurately and scalably synthesize... ..., and medicine. Job Description We’re hiring an experienced Site Reliability Engineer for our Boston or NYC office! You can expect to: Design, build...
Work at office
Verana Health
New York, NY
3 days ago
Senior Site Reliability Engineer (SRE)
...Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Dune Security. Co-... ...safer, more resilient organizations. The Role: As a Senior Site Reliability Engineer (SRE) at Dune Security, you will play a critical role in...
Full time
Work at office
Dune Security
New York, NY
4 days ago
Site Reliability Engineer
$7.5k
...Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in... ...beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production...
Work at office
Local area
The Voleon Group
New York, NY
1 day ago
Sr. Site Reliability Engineer
$160k - $230k
...Standard Template Labs is an AI-native startup reimagining the... ...currently looking to add Platform Engineers to our team, with at least 5... ...You’ll ensure our platform is reliable, secure, and performant from... ...collaborative setting. Our team works on-site five days a week, growing and...
Work at office
Local area
Standard Template Labs
New York, NY
2 days ago
Staff Site Reliability Engineer, Release Engineering
...builds the platforms and tooling that help engineering teams develop, deploy, and operate... ...default for every product team. As a Staff Site Reliability Engineer on Release Engineering, you'll... ...they remain fast and safe even as AI‑assisted development increases code velocity...
Permanent employment
Work experience placement
Local area
Plaid
New York, NY
4 days ago
SVP - Front Office Equities Engineering & AI-Driven Platform
BNY Mellon is seeking a Senior Vice President - Front Office Equities Engineering to lead the design and architecture of trading workflows in New York City. Applicants should have over 10 years of experience in building robust software applications, particularly within...
BNY Mellon
New York, NY
1 day ago
Senior Site Reliability Engineer (Remote Poland)
...0,000 PLN THE OPPORTUNITY TechInsights is building the reliability and AI operations foundation for its next chapter — an AI‑first... ...intelligence workflows in the world. We’re looking for a Senior Site Reliability Engineer who wants to own that foundation. This role is a senior...
Remote job
Flexible hours
Tech Insights
New York, NY
1 day ago
Senior Site Reliability Engineer
...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like Airbnb,... ...the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly...
Unify
New York, NY
2 days ago
Senior Site Reliability Engineer
...precision, and speed, and we hold ourselves to the same bar. Our AI-native workspace lets legal professionals move faster, think... ...raising the bar. This is the place. The role As a Senior Site Reliability Engineer you'll join the founding SRE team at our new NYC engineering...
Work at office
Legora-Ab
New York, NY
1 day ago
Site Reliability Engineer II
$93.9k - $156.5k
Hybrid role , 2 days on site. Role is located in NYC with alternative... ...hours: 9am‑5pm EST. Site Reliability EngineerII (Tuesday‑Saturday).... ...candidate will work alongside senior engineers to learn how we observe,... ...integration of Artificial Intelligence (AI) and Machine Learning (ML) to...
Local area
CME Chicago Mercantile Exchange Inc.
New York, NY
21 hours ago
Senior Site Reliability Engineer (Agentic Search)
$156k - $262k
Senior Site Reliability Engineer (Agentic Search) New York City, New York, United States About Tavily We're building the infrastructure layer for... ...Retrieval-Augmented Generation (RAG) and real-time reasoning in AI systems. By connecting LLMs to high-quality, trustworthy web...
Temporary work
Immediate start
Remote work
Tavily Inc.
New York, NY
4 days ago
Senior Site Reliability Engineer
...ready to make their mark in the blockchain space. As a Senior Site Reliability Engineer, you'll work at the intersection of cloud infrastructure and... ...Kubernetes experience is a must. You'll also own bringing AI into our engineering workflows. We want someone who can build...
SSV Labs
New York, NY
1 day ago
Senior Site Reliability Engineer
$182.3k - $220k
...patients first - and that mission depends on reliable, secure, and scalable systems. As a... ...infrastructure and building tools that empower our engineers to ship safely and confidently. You... ..., including artificial intelligence (AI), to assist with parts of our recruiting...
Local area
Flexible hours
Ro
New York, NY
16 days ago
Senior Site Reliability Engineer
Senior Site Reliability Engineer - Azure Cloud Join to apply for the Senior Site Reliability Engineer role at Concord Technologies Concord Technologies... ...Complementing its document transfer capabilities, Concord’s AI‑powered workflow applications allow organizations to receive...
Full time
Local area
Immediate start
Remote work
Flexible hours
Concord Technologies
New York, NY
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI-DNA SVP of Site Reliability Engineering. Be the first to apply!