Member of Technical Staff, AI Reliability & Monitoring Engineering Lead
$256kFull-time
Postman
WHO ARE WE?
Postman is the world’s leading API platform, used by more than 45 million+ developers and 500,000 organizations, including 98% of the Fortune 500. Postman is helping developers and professionals across the globe build the API-first world by simplifying each step of the API lifecycle and streamlining collaboration—enabling users to create better APIs, faster. The company is headquartered in San Francisco and has offices in Boston, New York, Austin, Tokyo, London, and Bangalore - where Postman was founded. Postman is privately held, with funding from Battery Ventures, BOND, Coatue, CRV, Insight Partners, and Nexus Venture Partners. Learn more at postman.com or connect with Postman on X via @getpostman. P.S: We highly recommend reading The "API-First World" graphic novel [ to understand the bigger picture and our vision at Postman.THE OPPORTUNITY
Postman is seeking an experienced AI Systems Reliability Engineer to help define, build, and maintain the infrastructure and processes that ensure the reliability, scalability, and performance of Postman’s AI-powered API and agentic systems in production. This role focuses on monitoring, availability, incident response, and automation to support AI services and tools trusted by millions of developers globally.WHAT YOU’LL DO
* Develop and manage reliability metrics (SLOs) for AI-driven API services and agentic AI platform features * Implement comprehensive observability and monitoring systems for real-time performance and fault detection * Design and drive automated failover, recovery, and incident response strategies for high-availability AI infrastructure * Optimize resource utilization, particularly GPU/accelerator efficiency, ensuring cost-effective AI system operation * Collaborate closely with engineering, platform, and product teams to align reliability efforts with broader organizational goals * Lead efforts to build internal tooling and automation focused on AI system stability and operational excellence * Drive continuous improvement in deployment practices, monitoring approaches, and incident management processesABOUT YOU
* Have a strong background in AI reliability engineering, SRE, or DevOps for distributed systems * Understand the unique challenges of maintaining large-scale AI systems and integrating AI-specific metrics into reliability frameworks * Are experienced with cloud platforms, monitoring tools, and incident response automation * Are comfortable collaborating across teams to influence best practices for AI system reliability and operational health * Thrive in dynamic, fast-paced environments focusing on delivering reliable, safe AI-powered services Bonus Skills and Experiences * Hands-on experience with AI/ML infrastructure, including GPU/xPU optimization and scaling- Familiarity with API platform operations and large-scale distributed services
- Prior experience building or operating observability tools tailored for AI
WHAT ELSE?
In addition to Postman's pay-on-performance philosophy, and a flexible schedule working with a fun, collaborative team, Postman offers a comprehensive set of benefits, including full medical coverage, flexible PTO, wellness reimbursement, and a monthly lunch stipend. Along with that, our wellness programs will help you stay in the best of your physical and mental health. Our frequent and fascinating team-building events will keep you connected, while our donation-matching program can support the causes you care about. We’re building a long-term company with an inclusive culture where everyone can be the best version of themselves. At Postman we value in person collaboration. We are in office 5 days a week for all roles based out of our hubs in San Francisco Bay Area, Boston, Austin, Tokyo and London. For roles based in Bangalore, employees currently work in the office three days a week and will transition to five days per week by the end of the year. We were thoughtful in our approach which is based on collaboration and grounded in feedback from our workforce, leadership team, and peers. The benefits of our in office model will be shared knowledge, brainstorming sessions, communication, and building trust in-person that cannot be replicated via zoom.OUR VALUES
At Postman, we create with the same curiosity that we see in our users. We value transparency and honest communication about not only successes, but also failures. In our work, we focus on specific goals that add up to a larger vision. Our inclusive work culture ensures that everyone is valued equally as important pieces of our final product. We are dedicated to delivering the best products we can.EQUAL OPPORTUNITY
Postman is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. Postman does not accept unsolicited headhunter and agency resumes. Postman will not pay fees to any third-party agency or company that does not have a signed agreement with Postman.Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff, AI Reliability & Monitoring Engineering Lead in San Francisco, CA vacancy
$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,... ...for banks and fintechs using AI agents that function like... ...growth and is expanding its engineering team to accelerate development... ...KYC, KYB, and transaction monitoring, targeting a $50B+ market...SuggestedFull timeTemporary workH1bWork at officeVisa sponsorshipRelocation package- ...the next generation of AI infrastructure: large... ...Labs is seeking an Member of Staff focused on AI Research... ...will work on Monitoring and evaluating... ...in computer science, engineering, or comparable area of... ...work alongside highly technical engineers, and help shape...SuggestedInternship
- ...Member Of Technical Staff @ Lotus AI Lotus AI is a groundbreaking primary care app... ...includes ex-founders and engineers who have built and scaled... ...and Analytics Build monitoring and analytics for background... ...30 days ~ Shipping reliably, understands the core...Suggested
$150k - $300k
...Chief Scientist, Together AI), Dylan Patel (... ...the jobs. Core Technical Responsibilities Hosted... ...submission, live run monitoring, logs, metrics, model/... ...We're looking for engineers who are fluent across... ...development and encourage team members to contribute to the...SuggestedWork at officeLocal areaRemote workVisa sponsorshipRelocation packageFlexible hours$200k - $350k
...the job Pantheon - Member of Technical Staff: Infrastructure... ...small, high-caliber engineering team - meaning you'll... ...management, observability, monitoring, and recovery... ...experience in voice AI, streaming systems,... ...whatever it takes to ship reliable robots into the field...SuggestedH1bRemote workVisa sponsorship$150k
...We are seeking a Member of Technical Staff Simulation Engineer to join our AI robotics research team developing... ...- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and... ...Ability to build robust monitoring systems - Experience...InternshipLocal area- ...building the next generation of AI infrastructure: large-scale... ...Gimlet Labs is seeking a Member of Technical Staff (Intern) to help develop... ...platform for deploying and monitoring AI workloads. In this role,... ...degree in computer science, engineering, or comparable area of...Internship
$150k - $300k
...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri... ...with performance engineering at its core. The... ...remains fast, robust, and reliable at scale. Core Technical Responsibilities... ...workload management and monitoring Develop REST APIs... ...and encourage team members to contribute to the...Work at officeRemote workVisa sponsorshipRelocation packageFlexible hours- Member of Technical Staff, Infrastructure and Training Systems Location... ...Numerics is an AI lab bringing the rigor... ...without sacrificing reliability. This role is ideal... ..., checkpointing, monitoring, experiment hygiene,... ...across research and engineering. Partner closely with...Full time
- ...Founding Generalist Engineer at Trajectory, you will... ...workflows, monitoring dashboards, and the core... ...that sit on top of our leading RL infrastructure. You... ...partners (Series B+ AI vertical product companies... ...our team of founding Members of Technical Staff to design the...
- ...Numerics is an AI lab bringing the... ...and deploying the technical systems that... ...About the Role As a Member of Technical Staff, Biosecurity at... ...Numerics, you will lead the design,... ...blends research and engineering. You should be... ...controls, gating, monitoring, human‑in‑the‑...Full time
- ...nation states. Our team of AI researchers and company builders... ....AI is looking for a Member of Technical Staff - Infrastructure Security to... ...head first into the hands‑on engineering work of building... ...policy enforcement Automate monitoring and benchmarking of Kube cluster...Relocation package
- ...The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy... ..., you’ll create scalable, reliable AI applications that... ...is a hands‑on role for an engineer who thrives at the intersection... ...deployment pipelines to monitoring and A/B testing. Are fluent...Full timeFlexible hours
- Overview Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine... ...and transform them into scalable, reliable applications used by enterprise... ..., model training, inference, and monitoring while ensuring high availability and...Permanent employmentFull timeContract workFlexible hours
- Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences... ...them into reliable, real‑time applications... ...is a hands‑on engineering role for someone who... ..., and monitoring, ensuring consistent... ...includes a market‑leading salary and equity...Full timeFlexible hours
- ...The Infrastructure Engineering function sits... ...is responsible for reliably building, deploying... ...Reliability Engineering Lead to design, build,... ...Build and operate monitoring, alerting, and... ...staying close to technical detail. Influence... ...OpenAI OpenAI is an AI research and deployment...Work at office
$10k
Voice AI that resolves, not transfers. Most phone systems trap... ...our first dedicated QA Engineer on the product team to own end... ...assistant builder, simulations, monitoring, knowledge base, or... ...dashboards — where the audience is technical. Early-stage startup experience...Flexible hoursShift work$125k - $200k
...Founding Fullstack Engineer Burnt is... ...vertically integrated AI agents that take over... ...Making critical technical decisions that will... ...future Building and leading our engineering... ...Build reliable orchestration systems... ...Develop robust monitoring and safety systems...Full timeTemporary workCurrently hiringImmediate startFlexible hours$200k
...models and solve alignment more reliably than humans can alone. Our... .... About the Role As an engineer on the Supercomputing Platform... ...used to schedule and manage AI workloads Develop modular... ...Improve observability, monitoring, and reliability of core platform...RelocationVisa sponsorship- ...Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization and orchestration... ...model serving & deployment architectures Understanding of monitoring, logging, observability, and version control best practices for...
$180k - $350k
...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri... ...ll work directly with engineering, research, and leadership... ...our defenses. Core Technical Responsibilities... ...Build security monitoring and alerting across infrastructure... ..., run the drills, lead the post-mortems Design...Work at officeRemote workVisa sponsorshipRelocation packageFlexible hours- ...Perplexity is AI for people who expect more. This... ...data scientist, analytics engineer, or data engineer - the... ...'s data accurately and reliably Automate the data... ...production deployment and monitoring Turn the data team... ...of every data team member and every stakeholder who...
$10k
...Voice AI That Resolves, Not Transfers Voice AI that resolves, not transfers... .../ networking / persistence / monitoring / models / kafka), and the Pulumi stacks... ...in Go and drive a measurable reliability or capacity win. 90 Day: Lead a roadmap pillar of the cell-based...Flexible hours$160k - $270k
...Mandolin Product Engineer Nearly every disease will... ...faster, powered by AI agents. Mandolin partners... ...establishing the technical patterns that will scale... ...systems to deliver reliable experiences in healthcare... ...CI/CD pipelines, and monitoring/observability tools....Full timeWork at officeLocal area- ...accelerates every team by providing reliable, scalable developer... ...sits at the intersection of engineering and product translating ambiguous... ...customer integrations. Monitoring & Developer Tooling: Dashboards... ...'s most capable open-weight AI systems get built faster....Relocation package
$200k - $280k
...advanced physics and AI to model... ...a scalable risk engine. We stay when traditional... ...We’re hiring a Member of the Technical Staff - Product... ...inspections, risk monitoring, and customer‑facing... ..., quality, reliability, and long‑term evolution... ...a: Technical Lead (Tech Lead / Eng...Full timeTemporary workH1bWork at officeVisa sponsorshipWork visaFlexible hours- ...enterprises, and even nation states. Our team of AI researchers and company builders come... ...and rapid hardware debugging. Platform Engineering: Design and iterate on our cluster... ...workloads across large, multi-GPU fleets Monitoring & Observability: Implement comprehensive...Relocation package
- ...balance, memory management, data throughput, and networking Develop monitoring and debugging tools for large-scale runs, enabling rapid... ...System-level mindset with a track record of tuning hardware-software interactions for maximum utilization #J-18808-Ljbffr Genesis AIRemote job
$10k
...banks and fintechs with AI agents that work like... ...we’re expanding our engineering team to move faster &... ...KYB, and transaction monitoring end-to-end. Why Join... ...customers. We’re trusted by leading banks and fintechs... ...work, 4× lower costs. Technical Challenges 1. Browser...Temporary workWork at officeRelocation package- ...caching, memory management, graph compilation) Develop monitoring and debugging tools to guarantee reliability, determinism, and rapid diagnosis of regressions... ...hardware-software interactions for maximum efficiency, throughput, and responsiveness #J-18808-Ljbffr Genesis AIRemote job
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff, AI Reliability & Monitoring Engineering Lead. Be the first to apply!
Related searches
- technical support associate San Francisco, CA
- decision support analyst San Francisco, CA
- desktop support analyst San Francisco, CA
- senior technical analyst San Francisco, CA
- user support analyst San Francisco, CA
- customer support technician San Francisco, CA
- technical support analyst San Francisco, CA
- support analyst San Francisco, CA
- tech assistant San Francisco, CA
- technical support specialist San Francisco, CA

