AI-DNA SVP of Site Reliability Engineering
IgniteTech
By the time a human engineer reaches the incident, the AI agents on the team have already validated hypotheses against years of prior incidents and RCAs, parsed the logs and code paths, identified the failure pattern, and proposed — or applied — a remediation inside policy guardrails. That is the operating model you will lead: agentic SRE at Khoros — one of IgniteTech's flagship enterprise SaaS platforms, powering customer communities and social engagement for the world's largest brands — where reliability translates directly into retention, revenue, and customer trust. What You Will Be Doing Owning platform reliability and customer‑experience outcomes for an AI‑native community and social engagement platform — uptime trending up, MTTR trending down week over week, customer satisfaction trending up. As the owner of the outcome, you will be the senior leader involved in critical situations (CritSits) with customers and the face of those incidents and escalations. Designing, governing, and continuously extending the AI agent system that does the operations work — pre‑triage, alarm authoring, blocking change‑validation gates, permanent‑fix lifecycle chase, customer‑facing RCA drafting, auto‑healing on whitelisted operations. The harness is the product; the team's output is the agent surface. Leading a small, senior, fully remote team of SRE / SaaS / DevOps engineers across multiple time zones — no L1/L2 tier; every engineer ships agents, writes runbooks, and owns the incident loop end to end. Staying hands‑on yourself — driving outage bridges when the blast radius warrants it, writing RCAs, shipping agent code. A substantial share of your week goes to personal agent build and maintenance. Representing operations directly to enterprise customer leadership and to the CEO — translating reliability investment into retention, revenue, and customer‑experience outcomes. What You Will NOT Be Doing Growing the team to solve problems that AI should be solving. If you find yourself adding headcount to compensate for agent gaps, you are off‑strategy. Fix the AI, not the org chart. Running a deck‑and‑roadmap executive function. You will be at the keyboard, on the bridge, and in the agent code regularly. If you have not personally written or shipped production code in the last 12 months, the role will be uncomfortable. Owning product engineering or customer support. Your scope is the operating layer — where reliability and the AI agent surface live. Drowning in tickets. The point of this role is to remove ticket‑throughput as the primary operating metric, not optimize it. Sitting on every bridge call. You will drive the bridge personally when a top‑tier customer is down or the blast radius is material. The rest of the time, the operator tier handles incident execution and the AI surface absorbs the routine. Treating this role as a credential or a résumé line. The bar is ownership and obsession, not titles. Responsibilities Deliver the reliability outcomes — and own them like a founder accountable for them. Platform uptime trending up, MTTR trending down week over week, customer satisfaction trending up. When something is down, that is your problem; you do not sleep peacefully through a customer outage. Own enterprise‑customer escalations. Be the executive face of operations when a customer at the largest tier needs one, and engineer the operating system so escalations keep falling. Customer trust is the metric — measured in retention and contract expansion. Set and enforce AI agent quality and governance. Every agent in production has a defined scope, a measured acceptance rate, an escape hatch, and a guardrail anchored to documented failure modes. You hold the bar — including against agents that propose to skip a step. Recruit, develop, and retain a top‑1% senior‑only team. No L1, no L2 — every engineer is a pioneer in their craft. Recruiting looks more like courtship than triage. Shape the operating model and the playbook. Where AI does more, where humans must stay, what the agent surface looks like 6 and 12 months from now. The industry playbook for agentic SRE is being written right now — you write a meaningful part of it. Partner with product engineering and customer success peers. The operating layer is the hinge between them; reliability outcomes depend on those interfaces working. Own platform availability end‑to‑end: you will be expected to read, interpret, and act on operational signals, including incident histories, status dashboards, and SLA performance data. Knowing the current health posture of every system under your ownership, at any moment, is baseline. Requirements Extreme ownership. You run operations as if it were your own business — your money, your reputation, your customers. Expect to bring that same intensity here. 10+ years operating SaaS at meaningful scale, with at least 3 years in an SVP, VP, or Head‑of role managing a senior‑only engineering organization. AI‑First DNA — already operating this way today, not open to AI. You have built or led AIOps / agentic incident response / auto‑remediation systems in production. You use Claude Code, Cursor, or equivalent agentic coding tools daily. Deep AWS production experience at scale — real multi‑AZ, multi‑account production estates. Certifications are nice; production scars are mandatory. Hands‑on senior engineer who leads, not a deck‑and‑roadmap executive. You drive outage bridges, write RCAs, ship code, and evaluate agent designs. Track record of delivering reliability and customer‑experience improvements week over week while holding or shrinking headcount. Fluent / advanced English — you interface directly with the CEO and with enterprise customer leadership. Time‑zone overlap with US morning hours (roughly 13:00–17:00 UTC). OFAC‑clear country of residence. Nice to Have Pioneer voice in this field — substantive original posts, talks, articles, or open‑source work on agentic SRE / harness‑over‑headcount / AI‑replacing‑ops‑work. A documented history of being obsessed with one hard thing outside of mainstream work — a side project, an open‑source contribution, an unusual hobby pursued with depth. Background in multi‑tenant B2B SaaS at scale — community, social, customer‑experience, observability, AIOps, or developer‑tooling platforms. Hands‑on familiarity with modern observability and incident‑response stacks (Grafana, Loki, Prometheus, OpsGenie, PagerDuty, Datadog, or equivalents). What You Will Learn You will design and run one of the first operating functions built AI‑native from the inside out at enterprise scale. You will define what works, publish what doesn't, and shape a playbook the rest of the industry is still circling. The team that gets this right in the next 12 months becomes the reference everyone else cites — you will lead that team. Working Conditions Enterprise scale, startup cadence. The customer base, the platform footprint, and the contractual stakes are those of a large enterprise SaaS. The operating model is the opposite: outcomes are measured week over week, not quarter over quarter; decisions are made in days, not committees; the playbook is rewritten as the field evolves. Fully remote, async‑first, global team. Hire from anywhere with US‑morning UTC overlap; no office; work happens where the work is best done. The team is small and senior — no L1/L2 layer beneath you. No token limits. No tooling limits. The harness is the product, and we resource it accordingly. If the right answer is more compute, more inference, a better model, or a tool we have not bought yet, we buy it. #J-18808-Ljbffr IgniteTech
$131k - $164k
...Staff Site Reliability Engineer New York, New York, United States Position Overview We are seeking... ...000 USD About Us Diligent is the AI leader in governance, risk and... ...working and thinking. Curiosity is in our DNA, we look for individuals willing to ask...SuggestedWork at officeLocal areaFlexible hours- of SVP, Vulnerability Management & Cloud Security Posture Platform Engineering New York, NY, United States and 2 more Job Description... ...platform improvements that increase reliability, scalability, coverage,... ...teams harness cutting‑edge AI and breakthrough technologies...SuggestedWork experience placementWorldwideFlexible hours
- ...distribution services. About the role The SVP of Software and Web Engineering is accountable for the end-to-end... .... Ensure systems are designed for reliability, scalability, auditability, and... ...protecting delivery velocity. Build AI and automation capability (where it...Suggested
$165k - $225k
...Dataiku is the Platform for AI Success, the enterprise orchestration layer for building... ...run it as a true business performance engine delivering measurable value. For more,... ...How you'll make an impact: As a Site Reliability Engineer (SRE) with advanced expertise in...SuggestedWork at officeFlexible hours$100k - $250k
..., economics, financials, weather, tech, AI, culture and more. We believe prediction... ...Roadmap As a member of Kalshi's engineering team, you'll help build the next-generation... ...You'll Do Improve observability, reliability, and service availability by defining...SuggestedLocal area$89k - $178k
...Sr. Site Reliability Engineer I NYC Global HQ Hybrid (3 days per week in office) DV is the leader in digital performance solutions, helping... ...validation scripts, and self-service capabilities Leverage AI-assisted development tools to accelerate automation...Work at office3 days per week$150k - $175k
...Site Reliability Engineer At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we're guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed...Remote work- ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the tools... ...and TypeScript source code Experience experimenting with AI-driven approaches to operations Comfortable with...
$123k - $165k
...Site Reliability Engineer II Our engineering fleet is a horizontal set of teams providing engineering services across the organization. Our specific... ...with distributed global teams. Experience using modern AI-assisted development tools (e.g., Copilot, Cursor, or...$125k - $350k
...Site Reliability Engineer New York, Miami, Gurugram, London, Singapore, Sydney Job Description Opportunities may be available from time... ...and the accelerating power of compute, machine learning and AI to power our analytics and tackle the market's and our clients...$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range... ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-... ...have redefined the data platform for the AI era, enabling builders to create, transform...Work at officeLocal areaRemote workWorldwideFlexible hours$176.72k - $265.08k
...Application Development Lead - SVP Apply (opens in new window)... ...breakers, retries, bulkheads, chaos engineering to maintain high availability... ..., ensuring scalability, reliability, and performance. Domain... ...accounting, and regulatory reporting. AI Integration : Drive the...Full timeWork experience placement$51.9 per hour
...This job is responsible for the reliability, availability, and... ...efficiency. This role blends software engineering, clinical engineering, and... ...cross-functionally with AHN site leaders and teams to navigate... ...healthcare IT trends, including AI, security patching, and best...For contractorsLocal area$205k - $305k
...Director Of Site Reliability Engineering Interested in working on cutting-edge blockchain technology and creating equitable access to the global... ...controls intersect with engineering. Pragmatically evaluate AI-assisted and agentic workflows where they can improve...Temporary workWork at officeLocal areaWorldwideFlexible hours- ...Site Reliability Engineering As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are... ...internal forums and communities Uses enterprise-authorized AI capabilities within the work environment to accelerate...
$150k - $170k
...Senior Site Reliability Engineer – Zip Co Join to apply for the Senior Site Reliability Engineer role at Zip Co At Zip, we build cloud‑native software... ...in Service Mesh architectures. Experience incorporating AI tools (ChatGPT, Cursor, Codex, GitHub Copilot) into day‑to‑day...Casual workWork at officeRemote workFlexible hours$200k - $240k
...researchers from MIT and Harvard Medical School. We are building an AI layer that can accurately and scalably synthesize... ..., and medicine. Job Description We’re hiring an experienced Site Reliability Engineer for our Boston or NYC office! You can expect to: Design, build...Work at office- ...Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Dune Security. Co-... ...safer, more resilient organizations. The Role: As a Senior Site Reliability Engineer (SRE) at Dune Security, you will play a critical role in...Full timeWork at office
$7.5k
...Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in... ...beautiful modern office, daily catered lunches, and more. As a Site Reliability Engineer (SRE), you will work at the intersection of production...Work at officeLocal area$160k - $230k
...Standard Template Labs is an AI-native startup reimagining the... ...currently looking to add Platform Engineers to our team, with at least 5... ...You’ll ensure our platform is reliable, secure, and performant from... ...collaborative setting. Our team works on-site five days a week, growing and...Work at officeLocal area- ...builds the platforms and tooling that help engineering teams develop, deploy, and operate... ...default for every product team. As a Staff Site Reliability Engineer on Release Engineering, you'll... ...they remain fast and safe even as AI‑assisted development increases code velocity...Permanent employmentWork experience placementLocal area
- BNY Mellon is seeking a Senior Vice President - Front Office Equities Engineering to lead the design and architecture of trading workflows in New York City. Applicants should have over 10 years of experience in building robust software applications, particularly within...
- ...0,000 PLN THE OPPORTUNITY TechInsights is building the reliability and AI operations foundation for its next chapter — an AI‑first... ...intelligence workflows in the world. We’re looking for a Senior Site Reliability Engineer who wants to own that foundation. This role is a senior...Remote jobFlexible hours
- ...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like Airbnb,... ...the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly...
- ...precision, and speed, and we hold ourselves to the same bar. Our AI-native workspace lets legal professionals move faster, think... ...raising the bar. This is the place. The role As a Senior Site Reliability Engineer you'll join the founding SRE team at our new NYC engineering...Work at office
$93.9k - $156.5k
Hybrid role , 2 days on site. Role is located in NYC with alternative... ...hours: 9am‑5pm EST. Site Reliability EngineerII (Tuesday‑Saturday).... ...candidate will work alongside senior engineers to learn how we observe,... ...integration of Artificial Intelligence (AI) and Machine Learning (ML) to...Local area$156k - $262k
Senior Site Reliability Engineer (Agentic Search) New York City, New York, United States About Tavily We're building the infrastructure layer for... ...Retrieval-Augmented Generation (RAG) and real-time reasoning in AI systems. By connecting LLMs to high-quality, trustworthy web...Temporary workImmediate startRemote work- ...ready to make their mark in the blockchain space. As a Senior Site Reliability Engineer, you'll work at the intersection of cloud infrastructure and... ...Kubernetes experience is a must. You'll also own bringing AI into our engineering workflows. We want someone who can build...
$182.3k - $220k
...patients first - and that mission depends on reliable, secure, and scalable systems. As a... ...infrastructure and building tools that empower our engineers to ship safely and confidently. You... ..., including artificial intelligence (AI), to assist with parts of our recruiting...Local areaFlexible hours- Senior Site Reliability Engineer - Azure Cloud Join to apply for the Senior Site Reliability Engineer role at Concord Technologies Concord Technologies... ...Complementing its document transfer capabilities, Concord’s AI‑powered workflow applications allow organizations to receive...Full timeLocal areaImmediate startRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI-DNA SVP of Site Reliability Engineering. Be the first to apply!
- site reliability engineering manager New York, NY
- site reliability engineer remote New York, NY
- site reliability engineer sre New York, NY
- site reliability engineer New York, NY
- on-site clinical research associate (traveling/remote) New York, NY
- junior website developer New York, NY
- site merchandiser New York, NY
- IT site lead New York, NY
- site acquisition specialist New York, NY
- site leader New York, NY


