Principal Site Reliability Engineer
$163.62k - $212.71kiSpot
Job Description
Job Description
Immigration / Work Authorization Notice: Applicants must be currently authorized to work in the United States. iSpot is not able to sponsor or take over sponsorship of an employment visa for this position at this time.
iSpot competes for the best talent. Our compensation packages consist of salary and equity in one of Seattle's hottest start-ups, as well as other standard benefits. Most importantly, we provide a really interesting working experience, and the chance to contribute to the success of something great.
What You'll Be Part Of:iSpot.tv is changing how brands, agencies, and networks measure and assess the impact of TV advertising. We deal with BIG data, operating mainly in AWS with multiple Kubernetes clusters and thousands of servers. We are looking for an experienced SRE leader with the skills and passion to make a significant impact on our ecosystem. You will have a wide array of projects to tackle, with ample opportunities for growth.
You will be a key member of our SRE leadership team, focused on empowering developers to build, test, and deploy applications faster and more efficiently. You will both lead the team and remain hands-on in designing, building, and maintaining the tools, platforms, and processes that improve our engineering teams' productivity and streamline the software development lifecycle. Your work will directly impact developer happiness and the speed at which we can deliver innovative features to our customers.
Responsibilities:We are seeking a seasoned and strategic Lead/Principal Site Reliability Engineer to drive the reliability, scalability, and performance of our core production systems while significantly enhancing the internal developer experience. This role sits at the intersection of operations and development, requiring deep technical expertise, strong leadership, and a passion for optimizing the entire software development lifecycle (SDLC).
Our team consists of senior engineers who work together with minimal supervision to attain those goals. Candidates must possess deep operational experience with AWS and Kubernetes to support teams utilizing these systems. You will lead the technical direction of the team while remaining a key individual contributor. You will be responsible for creating a culture of engineering excellence, designing self-service platforms, and fostering alignment across all engineering teams to accelerate product delivery and maintain world-class service stability.The key responsibilities are:
- System Reliability and Operations (SRE Focus)
- Platform Design and Management: Architect, build, and maintain scalable, highly available, and reliable cloud infrastructure in AWS leveraging modern container orchestration technologies.
- Data Pipeline Reliability: Serve as the reliability and cost optimization expert for high-volume, data-intensive workloads. Focus on optimizing and ensuring the stability of distributed data processing engines, specifically Apache Spark and related ecosystems (e.g., EMR, Databricks, Glue).
- Observability and Monitoring: Establish comprehensive observability practices by defining SLIs/SLOs, implementing advanced monitoring, alerting, and logging solutions to quickly identify and resolve system anomalies.
- Automation: Drive automation across all operational aspects, including infrastructure provisioning (Terraform), scaling, deployment, and incident response, minimizing toil and manual effort.
- Incident Management: Lead and participate in the incident response lifecycle, performing thorough post-mortems to derive actionable insights and implement preventative measures to improve system resilience.
- AIOps: Define and champion the strategic roadmap for AI/ML integration within SRE, establishing organizational best practices for AIOps, automated incident remediation, Toil Reduction via LLMs, and Automated Root Cause Analysis (RCA) and the governance of LLM-driven tooling to enhance system observability and resilience.
- Developer Experience and Productivity (DevEx Focus)
- Platform Strategy: Design, implement, and champion self-service tools, internal developer portals, and services that empower engineering teams to manage their infrastructure and deployments independently and efficiently.
- AI Developer Tools: Lead the standardization of AI developer assistants by architecting and maintaining global 'steering files' and context-configuration standards, ensuring AI-generated code aligns with our specific patterns, security protocols, and architectural guardrails.
- CI/CD Optimization: Own and continuously improve the CI/CD pipelines, reducing build times, streamlining deployment workflows, and integrating best practices for testing, security (Shift Left), and code quality. Maintain and improve our container orchestration and deployment tools, leveraging Kubernetes, Helm, and ArgoCD to create seamless developer workflows.
- KPIs: Develop, implement, and maintain a set of key performance indicators (KPIs) to measure and improve the developer experience across all of Engineering.
- Mentorship and Documentation: Guide and mentor senior engineers, promoting SRE/DevEx principles. Develop clear, comprehensive documentation and tutorials to ensure seamless adoption of new tools and platforms.
- Cost and Efficiency: Strategically identify and implement opportunities for cloud cost optimization and resource efficiency without compromising reliability or performance.
III. Strategic Leadership and Cross-Team Alignment
- Architecting the Roadmap: Define, champion, and communicate the long-term technical roadmap for the SRE and DevEx platforms, balancing immediate operational needs with strategic, future-state goals.
- Driving Cross-Team Alignment: Act as a critical liaison between infrastructure, security, and product development teams. Proactively drive cross-team alignment on architectural standards, tooling choices, and development workflows to ensure consistency and shared accountability for system health.
- Bottleneck Identification and Mitigation: Systematically identify engineering bottlenecks, friction points, and points of organizational toil within the SDLC. Implement targeted solutions—whether technical, process-based, or organizational—to mitigate these constraints and enhance overall engineering velocity.
- Planning and Execution: Collaborate with engineering leadership to transform the strategic roadmap into actionable, prioritized plans, securing cross-functional buy-in and resources for successful execution.
Qualifications and Education Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
- 10+ years of relevant experience in software engineering, cloud architecture, and/or Site Reliability Engineering, with at least 3 years in a leadership or lead contributor role.
- Deep expertise of AWS, including EKS, ECR, RDS, SQS/SNS, VPC, MWAA and S3.
- Strong proficiency in Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation).
- Specialized experience in optimizing large-scale data platforms, specifically with Apache Spark. Proven ability to profile, troubleshoot, and tune Spark jobs for performance, cost, and reliability.
- 5+ years of experience with Kubernetes and containerization in general, including associated tools (kubectl, Helm, ArgoCD).
- Strong knowledge of AWS cost optimization.
- TCP/IP networking, including routing and AWS security groups.
- Excellent knowledge of CI/CD concepts and experience developing associated pipelines in CircleCI.
- Proficient in high-level scripting languages, including shell scripting, Python, and/or JavaScript.
- Experience with OTel and monitoring tools such as Splunk or DataDog. Experience with native AI observability tools is a plus.
- Experience with evaluating and rolling out GenAI tools for improving developer efficiency.
- Excellent communication, collaboration, and stakeholder management skills, with proven experience driving technical initiatives across multiple teams.
- Experience with researching and selecting new/modern developer toolsets and assisting teams in adopting them including vendor assessments, security assessments and procurement process.
- Experience in Ad-Tech or "BIG Data" processing organization is highly preferred
Target cash compensation range: $163,620 - $212,710 USD Annually
We are committed to providing competitive, market-informed compensation. The cash compensation above includes base salary, variable commission for employees in eligible roles, and annual bonus targets for eligible roles. In addition to cash compensation, all full time iSpotters are eligible to participate in iSpot's equity plan to receive stock options. Non-exempt roles will also be eligible for (pre-approved) overtime pay. Individual compensation packages are influenced by different factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons.
For more information on total rewards package, go HERE
Hybrid & Flexible Workplace Policy
iSpot supports a hybrid and flexible workplace. Depending on location and work responsibilities, employees may be designated as full-time or part-time office-based or a fully remote employee. A hybrid work schedule indicates that you work in the office some days and work from home other days. The best hybrid workplaces allow for flexibility while also encouraging consistency.
Those local or living in surrounding areas to one of our offices (Bellevue, WA or New York, NY) will work a hybrid schedule, coming into their local office 1-3 days a week. While those in a role, not office-based and located further away from our offices, will work a fully remote schedule. If you have questions regarding exact details of our hybrid & flexible workplace policy, please let your recruiter know and they will discuss with you further.
#LI-Hybrid
If you don't feel you met every single requirement for the role, don't rule yourself out. Please apply anyway!
iSpot is an equal opportunity employer. All applicants will receive consideration for employment without regard to race, ethnicity, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please contact our HR team.
California Residents applying for positions at iSpot can access our California Consumer Privacy Act here.
- Designs and architects infrastructure and service to ensure reliability and functionality. Forecasts demands and responds to capacity needs... ...new tools and develops and maintains advanced knowledge of site reliability trends. #LI-E2 Only Oracle brings together...PrincipalFull timeFlexible hours
$175k - $327k
IBM in Bellevue is seeking an Application Architect to design applications that automate business processes and ensure quality and performance. You will define comprehensive designs, evaluate components, and oversee user experience. The role requires deep expertise in ...Principal- ...methodologies, with a proven track record of defining best practices for data cloud implementation. Technical Leadership in Data Engineering Processes: Experience establishing data development processes, governance frameworks, and tooling (orchestration, observability,...PrincipalFull timeContract workTemporary workPart timeShift work
$274k - $376.2k
...s talk. This is a hybrid role. It requires going to the local office 3 times a week. The Auth0Lab Team We are a small team of engineers exploring new Auth0 products and features ideas. We take things from 0 to 1 and we look to shape the future of identity. Our team...PrincipalFull timeWork at officeLocal areaWorldwideFlexible hours$227k - $303k
...-facing capabilities that enable every engineer at CoreWeave to build and ship software... ...throughput, heterogeneous workloads, and the reliability demands of an infrastructure platform... ...engineers. About the Role This is a principal-level software engineering leadership...PrincipalPermanent employmentFull timeTemporary workCasual workWork at officeRemote workFlexible hours- Blue Origin LLC is hiring a Sr Principal Software Engineer to lead technical strategy for software development in Seattle. The role demands extensive experience in software engineering, specifically in cloud services like AWS and distributed backend systems. Candidates...PrincipalRelocation
- Job Description Job Description Part-Time Associate Veterinarian – (King County, WA) Join a newly remodeled practice that blends modern efficiency with top-tier care! We're looking for an experienced DVM 2–2.5 days per week to join our talented team. Why you'...PrincipalPart time2 days per week
$264.1k - $369.74k
Sr Principal Software Engineering - Enterprise Technology page is loaded## Sr Principal Software Engineering - Enterprise Technologylocations: Greater... ...for:WA applicants is $264,103.00 - $369,743.85**Other site ranges may differ****Culture Statement****Export Control Regulations...PrincipalPermanent employmentTemporary workLocal areaRelocation- Our Global Technology Infrastructure group is a team of innovators who love technology as much as you do. Together, you’ll use a disciplined, innovative and a business focused approach to develop a wide variety of high-quality products and solutions. You’ll work in a stable...PrincipalFull time
$166k - $220k
...years. ABOUT THE TEAM The Production Engineering team is a newly formed organization... ...Software Platform, dedicated to ensuring the reliability, performance, and scalability of... ...We are seeking an experienced Senior Site Reliability Engineer who is passionate...Full timeWork experience placementImmediate start- ...Senior Director, Principal Gifts About the Company Philanthropic organization supporting Indigenous culture & individuals Industry Non-Profit Organization Management Type Non Profit Founded 2017 Employees 11-50 Categories ~ Non-Profit & Philanthropy...Principal
$103.71k - $138.28k
...demonstrated knowledge and experience in system architecture and engineering disciplines. Specific technical knowledge of enterprise level... ...Amazon Web Services. •Supports due diligence activities including site surveys, design, design review, bill of materials creation,...Full timeTemporary workRemote work$150k - $300k
A leading technology consulting firm is seeking a Principal Consultant in Seattle to lead Zero Trust architecture design and implementation focusing on Zscaler. This hands-on role requires deep expertise in Zscaler, network security architectures, and proven experience...Principal$155.26k - $194.08k
...Act now in effect our Accessibility product line is at the centre of our commercial growth strategy. We're looking for a Senior Principal Product Marketing Manager to own and evolve our Accessibility positioning, messaging, and go-to-market strategy shaping how...PrincipalPermanent employmentFull timeTemporary workFlexible hoursShift work$88.86k - $118.48k
...deliver meaningful impact, and help shape the future of AI‑ready connectivity, join us today. The Role The Senior IT Systems Engineer provides advanced Tier II support by troubleshooting and repairing network devices, tools, and services for a nationwide fiber...Full timeTemporary workWork at officeRemote workShift workNight shift- ...Specific Essential Duties and Responsibilities: - Provide Tier‑3 engineering support for Microsoft 365 GCC, Exchange Online, hybrid Exchange... .... - Support SharePoint Online platform operations, including site collections, permissions, integrations, and collaboration...Minimum wageFull timeContract workTemporary workWork experience placement
- ...Requirements Maximus is currently seeking a Cloud Platform Engineer. This is a remote position. Maximus is a trusted federal... ...enable VoIP, VTC, and real-time communications systems, ensuring reliability, performance, and operational continuity. Job-Specific...Minimum wageFull timeContract workTemporary workWork experience placementRemote work
- ...critical operations.We are seeking a Principal Satellite Flight Software Engineer to provide technical expertise for... ...to ensure maintainability, reliability, and mission readinessPartner closely... ...is $230,773.00 - $323,081.85Other site ranges may differCulture StatementDon...PrincipalPermanent employmentTemporary workLocal areaWorldwide
$137.3k - $294k
SAP Concur Senior Value Advisor SMB NA The SAP Concur Value Advisor for SMB North America is a customer-facing, sales support role designed to scale value selling across SAP Concur's corporate segment through digital-first, one-to-many engagement models. This role sits...WorldwideFlexible hoursShift work- ...Description & Requirements Maximus is currently seeking a Software Engineer . In this role, you will provide expertise in the areas of managed file transfer and EDI X12 translations. In addition, they must configure, support and maintain environments and procedures...Minimum wageFull timeContract workTemporary workWork experience placementRemote work
- ...background aligns with future opportunities, we’ll reach out directly when formal applications become available. About Software Engineering Roles at Danaher Are you passionate about building real-world applications, writing clean code, and solving meaningful...Remote jobInternship
$125k
...Description & Requirements Maximus is seeking a strategic and results-oriented Federal Marketing Principal Specialist to lead integrated marketing initiatives supporting our Federal Markets: Defense, Civilian, Federal Healthcare, National Security, and Space Sector business...PrincipalMinimum wageFull timeContract workTemporary workWork experience placement$100k - $172.5k
...Description: We are searching for the best talent for a Principal Product Security Engineer to be located in Danvers, MA or Raritan, NJ. Remote work... ...per week (for candidates within commutable distance to site). Partner with engineering teams (cloud, console, pump...PrincipalFull timeTemporary workWork at officeLocal areaImmediate startRemote work3 days per week$292k - $386k
...THE JOB This role is responsible for leading the software engineering teams that power Connected Warfare's next‑generation warfighter... ...execution—driving R&D for next‑generation capabilities while ensuring reliable delivery of production systems to soldiers in the field...Full timeContract workTemporary workWork experience placement$157.6k - $197k
...around the world. About the role As a Senior Software Engineer, you will be responsible for designing, developing, and maintaining... ..., and logging. * Write comprehensive tests to ensure the reliability and stability of code. * Stay up-to-date with the latest...Full timeWork at officeFlexible hours$147k - $202k
...the Auth0 Platform, and we are looking for an Observability Engineer to help ensure that our Product and Platform Engineers can monitor... ...platform stability. If you have experience within the Site Reliability Engineering (SRE) field or working as a Development Operations...Full timeLocal areaWorldwideFlexible hours$182k - $242k
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave...Permanent employmentFull timeTemporary workCasual workWork at officeFlexible hours$110.7k - $171.8k
...'s mission is to connect the world through the most creative, reliable and secure payment network - enabling individuals, businesses,... ...Opportunity: Visa is actively seeking qualified candidates for Software Engineering Opportunities as part of our Military Talent Program. We value...Full timeWork at officeLocal areaRemote workWorldwideRelocation package$120k - $140k
...1B, F-1 OPT, and STEM OPT extension at this time. Your Role We are seeking a motivated and technically proficient Solutions Engineer to serve as a trusted advisor to customers throughout the sales process and beyond. In this role, you will collaborate with Sales...Remote jobFull timeH1b$128.19k - $184.01k
...change in the world. This role will be working on our Poe product. About the Team and Role: We are seeking a talented iOS Engineer to join us in building Poe, an innovative platform that brings together the world’s leading AI models in one place. You will work...Remote jobFull time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Site Reliability Engineer. Be the first to apply!
- senior director engineering Bellevue, WA
- engineering director Bellevue, WA
- principal engineer Bellevue, WA
- director software engineering Bellevue, WA
- general engineer Bellevue, WA
- senior chief engineer Bellevue, WA
- principal developer Bellevue, WA
- senior principal engineer Bellevue, WA
- data center chief engineer Bellevue, WA
- senior civil engineer project manager Bellevue, WA


