Software Engineer, Site Reliability NYC

$160k - $300k

Hebbia, Inc.

About Hebbia The AI platform for investors and bankers that generates alpha and drives upside. Founded in 2020 by George Sivulka and backed by Peter Thiel and Andreessen Horowitz, Hebbia powers investment decisions for BlackRock, KKR, Carlyle, Centerview, and 40% of the world’s largest asset managers. Our flagship product, Matrix, delivers industry-leading accuracy, speed, and transparency in AI-driven analysis. It is trusted to help manage over $30 trillion in assets globally. We deliver the intelligence that gives finance professionals a definitive edge. Our AI uncovers signals no human could see, surfaces hidden opportunities, and accelerates decisions with unmatched speed and conviction. We do not just streamline workflows. We transform how capital is deployed, how risk is managed, and how value is created across markets. Hebbia is not a tool. Hebbia is the competitive advantage that drives performance, alpha, and market leadership. The Role We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them rather than simply operating them. You will write production-quality code that keeps the platform reliable at scale, embed with product engineering teams to influence architecture from the start, and build the internal tooling that every engineer at Hebbia depends on. This is not a ticket-driven ops role. You will spend most of your time writing code: instrumenting services, eliminating performance bottlenecks, building deployment platforms, and translating incident post-mortems into lasting architectural improvements. Responsibilities Own critical production services end-to-end, from design and code review through deployment, operation, and incident response Profile, benchmark, and rewrite hot paths to eliminate bottlenecks as Hebbia scales Lead incident response and drive post-mortem culture, translating findings into code changes and architectural improvements rather than runbooks Design and build observability frameworks from scratch, writing custom instrumentation, alerting logic, and debugging tooling that surfaces production issues before customers feel them Define and enforce SLOs across platform services and build the feedback loops that keep engineering teams accountable to them Own capacity planning and cost efficiency: model growth, right-size infrastructure, and write automation that prevents over-provisioning and resource exhaustion Build robust, well-tested internal platforms and deployment tooling held to the same engineering standards as customer-facing code Own and continuously improve CI/CD systems so engineering teams can ship safely and quickly Embed with product engineering teams as a peer software engineer, contributing directly to production codebases and co-designing systems for reliability from the start Partner on infrastructure security through threat modeling, hardening, and automated compliance tooling Who You Are 5+ years software development with a track record of writing, shipping, and maintaining production services, not just operating infrastructure Production-grade proficiency in at least one systems or backend language: Go, Python, C++, or Rust Proven experience as a Production Engineer, SRE, or software engineer with a deep infrastructure focus, comfortable owning services end-to-end across the full stack Deep understanding of distributed systems Container orchestration expertise and hands-on experience debugging complex distributed failures in production Working knowledge of OS-level concepts Cloud platform fluency (AWS preferred) Experience in building and maintaining observability stacks Strong CI/CD pipeline expertise and a track record of improving developer velocity without sacrificing safety Background at a company with a Production Engineering or software-focused SRE culture is a strong plus Experience building platforms for AI/ML workloads or high-throughput document processing pipelines is a plus Compensation The salary range for this role is $160,000 to $300,000. This range may be inclusive of several career levels at Hebbia and will be narrowed during the interview process based on the candidate’s experience and qualifications. Adjustments outside of this range may be considered for candidates whose qualifications significantly differ from those outlined in the job description. Life @ Hebbia PTO: Unlimited Insurance: Medical + Dental + Vision + 401K Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent Fertility benefits: $15k lifetime benefit New hire equity grant: competitive equity package with unmatched upside potential

LI-Onsite
J-18808-Ljbffr Hebbia, Inc.

Apply

Vacancy posted 14 hours ago

Similar jobs that could be interesting for youBased on the Software Engineer, Site Reliability NYC in San Francisco, CA vacancy

Software Engineer, Site Reliability
$180k - $250k
...infrastructure running at scale. You own the reliability and availability of customer-facing... ...of production issues, and improve software development speed, reliability and maintainability... ...automation, runbooks, and chaos engineering Requirements 5+ years experience in...
Website
Currently hiring
Relocation
Visa sponsorship
Fal
San Francisco, CA
14 hours ago
Staff + Sr. Software Engineer, AI Reliability
$325k
...Anthropic’s mission is to create reliable, interpretable, and steerable... ...of committed researchers, engineers, policy experts, and business... ...serving -- critical for both site reliability and Anthropic's... ...looking for reliability-minded software engineers and SREs. Are...
Website
Visa sponsorship
Menlo Ventures
San Francisco, CA
2 days ago
AI-Driven Senior Software Engineer - Legal Tech, NYC
CLERA is seeking a Senior Software Engineer to join our in-person, fast-paced NYC startup. This role involves building scalable, user-centric tools to help legal... ...with LLM integrations. This position requires on-site work in NYC and a collaborative culture focused on innovation...
Website
CLERA
San Francisco, CA
3 days ago
Site Reliability Engineer III
$151.5k - $252.5k
...are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering... ...4x7 production operations for a SaaS (Software as a Service) or cloud service... ..., Nevada, Hawaii, New York (excluding NYC boroughs); Sales roles located in Georgia...
Website
Base plus commission
Local area
Worldwide
Veeam
San Francisco, CA
1 day ago
Software Engineer III/Senior, AI Gateway
$202.5k - $247.5k
...inference, device fleets, and site-to-site connectivity.... ...our success! We like software that’s serious and... ...runs entirely on AWS. Engineers develop by using remote... ...Buildkite to operate and ship reliably. React is used for user... ...1 (SF, LA, Seattle, NYC): $202,500 - $247,500...
Website
Permanent employment
Full time
Work at office
Local area
Remote work
Home office
Flexible hours
ngrok Inc.
San Francisco, CA
14 hours ago
Software Engineer - Compiler
$170k - $235k
...Software Engineer - Compiler San Francisco, CA About the Role Sigma... ...interface, ensuring speed, reliability, and scalability for all... ...environment in all our offices in SF, NYC, London and Sydney. Our... ...a job application on this site, Sigma processes your...
Website
Full time
Work at office
Flexible hours
Sigma Computing
San Francisco, CA
14 days ago
Software Engineer, Infrastructure
$140k - $260k
...Infrastructure Engineer As an Infrastructure Engineer, you will build and scale... ...logging, and alerting systems to maintain reliability Manage CI/CD pipelines to ensure... ...Location This is an on-site role based in our NYC or SF office, designed for builders who...
Website
Work at office
Visa sponsorship
Profound
San Francisco, CA
4 days ago
Senior Manager, Site Reliability Engineering - Infrastructure Platform
$232k - $319k
...scale the service with great people and reliable, cost-effective, and efficient infrastructure... ...the velocity of SRE and product engineering by developing robust platforms, powerful... ...and hiring process. In accordance with NYC Local Law 144, if you are an applicant or...
Website
Permanent employment
Local area
Worldwide
Flexible hours
Okta, Inc.
San Francisco, CA
14 hours ago
Site Reliability Engineer
...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow...
Website
Work at office
Remote work
Flexible hours
2 days per week
Plenful
San Francisco, CA
3 days ago
CloudDevs: Senior Web site Reliability Engineer (SRE)
CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing... ...in designing for scale and bettering how groups ship software program, you’ll match proper in. Key Duties Work as a...
Website
The10minutecareersolution
San Francisco, CA
2 days ago
Software Engineer, Backend, Workflow Runner
$140k - $260k
...lean, fast-moving team across NYC, SF, Buenos Aires, and London,... ...that turns complex AI work into reliable, composable workflows. You... ...Do Build core workflow engine primitives used to orchestrate... ...Location This is an on-site role based in our NYC or SF office...
Website
Work at office
Visa sponsorship
Shift work
Profound
San Francisco, CA
4 days ago
Sr Live Video Quality Software Engineer
$130.2k - $195.3k
...culture. Job Title: Senior Software Engineer (Video) Location: Burbank, CA / New York, NYC - Onsite Overview The Video... ...and improve the performance, reliability, and scalability of microservices... .... Opportunities for both on-site and virtual engagement events....
Website
Local area
Paramount Unified School District
San Francisco, CA
2 days ago
Software Engineer, Infrastructure
$128.5k - $161k
...and tools for operating software in production. You’ll... ...collaborate with other engineers on the Infrastructure team... ...that are secure, reliable, and performant. Through... ...services using modern site-reliability practices,... ...locations( Boston, Denver, NYC, SF) Compensation: The...
Website
Currently hiring
Local area
Remote work
Weekend work
3 days per week
Semgrep
San Francisco, CA
14 hours ago
Software Engineering SMTS - Cloud Reliability
$148.5k - $223.9k
...Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation) Location: New York, NY; San Francisco, CA About... ...Bachelor's degree in Computer Science, Computer Engineering, Software Engineering or relevant work experience ~7+ years of...
Website
Work experience placement
Shift work
Salesforce
San Francisco, CA
4 days ago
Site Reliability Engineer - Scale & Observability
A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability...
Website
gamma.app
San Francisco, CA
3 days ago
Senior Site Reliability Engineer - AI-Driven, Scalable Infra
OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...
Website
Flexible hours
OutSystems, Inc.
San Francisco, CA
14 hours ago
Senior Site Reliability Engineer
US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average... ..., the successful candidate will bridge the gap between software development and systems engineering. You will be...
Website
Axiom Pursuits
San Francisco, CA
14 hours ago
Site Reliability Engineer — AI Systems (Remote)
TELCOR Inc is looking for a Site Reliability Engineer to ensure the reliability, scalability, and performance of our AI products' systems. The role involves designing and operating resilient systems in cloud and containerized environments while managing production infrastructure...
Website
Remote job
TELCOR Inc
San Francisco, CA
3 days ago
Platform Engineer: Fast, Reliable & Cost-Efficient Infra
...company in San Francisco seeks a Platform/DevOps Engineer to manage and optimize CI/CD pipelines, enhance infrastructure reliability, and facilitate deployment across multiple... ...a flexible work environment, following an on-site requirement in San Francisco. #J-18808-Ljbffr...
Website
Flexible hours
Untolabs
San Francisco, CA
3 days ago
Remote Site Reliability Engineer: Scale & Observability
$175k - $250k
I did my part and supported the Regular Toilet is seeking a Site Reliability Engineer to enhance the reliability and performance of our systems at WorkOS. As a key member of the SRE team, you will handle critical responsibilities like improving incident responses and collaborating...
Website
Remote job
Flexible hours
I did my part and supported the Regular Toilet
San Francisco, CA
14 hours ago
Remote Senior Site Reliability Engineer (SRE) - Zetachain
We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll... ...Developer A seasoned developer with a solid foundation in software engineering, particularly in backend development. Someone...
Website
Remote job
Blockchain Works
San Francisco, CA
7 days ago
Site Reliability Engineering
...Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative... ...practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle of software development--from inception and design, through...
Website
Forhyre
San Francisco, CA
29 days ago
Hyperbolic Labs - Senior Site Reliability Engineer
...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...
Website
deCircle
San Francisco, CA
4 days ago
Site Reliability Engineer
...manifesto. About the Role We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we... ...This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and...
Website
Worldwide
Shift work
Happyrobot Inc.
San Francisco, CA
14 hours ago
Site Reliability Engineer
...that significantly outperforms individual engineers. We combine language models with human ingenuity to push the boundaries of software development efficiency and quality. The Role We are seeking an experienced Site Reliability Engineer to join our Platform Engineering...
Website
CodeRabbit
San Francisco, CA
14 hours ago
Site Reliability Engineer
$125k - $165k
Position Site Reliability Engineer Location Lincoln, NE, San Francisco, CA, or Remote Job ID 434 Openings 1 Job Summary The Site Reliability Engineer will help ensure the reliability, scalability, and performance of the systems that power our AI products. This role...
Website
Temporary work
Remote work
Visa sponsorship
Work visa
Flexible hours
TELCOR Inc
San Francisco, CA
3 days ago
Reliability Engineer: Cloud, Edge & On-site Deployments
$150k - $170k
Claryo, Inc. is seeking an Integration Reliability Engineer in San Francisco, CA, responsible for ensuring the reliability of systems across cloud and edge environments. The candidate will build and maintain observability tools and improve incident response processes....
Website
Claryo, Inc.
San Francisco, CA
2 days ago
Senior Site Reliability Engineer - Observability
...home day is currently Tuesday. Engineering at Lambda is responsible for... ...infrastructure. Develop platform software to make observability adoptable and improve product reliability. Lead members of other... ...Have 5+ years of experience in Site Reliability Engineering practices...
Website
Work at office
Local area
Work from home
Lambda
San Francisco, CA
4 days ago
Senior Manager, Site Reliability Engineering
$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems....
Website
Full time
Contract work
Temporary work
Local area
Flexible hours
Tubi
San Francisco, CA
4 days ago
Senior Site Reliability Engineer - AI Cloud & GPU Infra
A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...
Website
Hyperbolic Labs
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Site Reliability NYC. Be the first to apply!