SRE Infrastructure Engineer

OJUS LLC

Title : SRE Infrastructure Engineer

Location : SFO, CA (5 Days Onsite)

Job Description:

We are seeking a SRE Infrastructure Resource having 8+ years of professional experience ensuring the reliability, scalability, and performance of Google Cloud-based services through automation, monitoring, and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform), optimizing GKE/Kubernetes, incident response, and implementing SLIs/SLOs to minimize manual toil.

This role requires close collaboration with cross-functional teams, adherence to DevOps and Agile practices, and ownership of service quality and delivery.

Key Responsibilities

· GCP Infrastructure Management: Design, deploy, and maintain robust infrastructure components, including VPCs, Compute Engine, GKE (Kubernetes), and storage solutions.

· Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual, repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.

· Observability & Incident Management: Develop monitoring, alerting, and logging systems (e.g., Cloud Monitoring, Prometheus, Grafana). Act as primary on-call to troubleshoot production incidents.

· Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence

· CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins, ArgoCD, Artifactory, DevSecOps, GitLab CI, or GitHub Actions

· Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) - Latency, Traffic, Errors, and Saturation

· Optimization & Security: Proactively optimize infrastructure for cost, performance, and security compliance.

· Site Reliability Engineer, Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health, and GCE visibility

Mandatory Technical Skills & Competencies

· Experience: 8+ years in SRE, DevOps, or systems engineering, specifically with Google Cloud Platform.

· Technical Skills: Deep knowledge of Linux, Kubernetes (GKE), networking (VPCs, CDNs), and containerization.

· Programming: Proficiency in scripting/programming languages like Python, Go, or Shell.

· Methodologies: Strong understanding of GitOps, CI/CD pipelines, and SRE principles (error budgets, toil reduction)

· Strong troubleshooting skills across the full stack (network, OS, application).

· Ability to balance system stability with the need for rapid deployment.

· Observability Tools: Experience implementing monitoring and logging stacks like Prometheus, Grafana, or the Google Cloud Operations Suite

· Excellent collaboration skills to work with development teams for service ownership

Soft Skills

· Strong problem-solving and analytical skills

· Clear communication with technical and non-technical stakeholders

· Ownership mindset and production-grade engineering discipline

· Ability to work independently and within cross-functional teams

Apply

Vacancy posted 6 hours ago

Similar jobs that could be interesting for youBased on the SRE Infrastructure Engineer in San Francisco, CA vacancy

SRE/Infrastructure Engineer
...plumbing that lets E2B run millions of sandboxes. Today our infrastructure runs on Nomad and Terraform across Google Cloud, with multi-cloud... ...our largest customers. We're looking for an infrastructure engineer who actually wants to live in Terraform and Kubernetes every...
Suggested
Live in
Work from home
E2B
San Francisco, CA
1 day ago
Senior SRE & Platform Engineer for AI-Driven Ops
$163k - $203k
GoTo Meeting is looking for a Senior Site Reliability Engineer in San Francisco. You will be responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This role requires expertise in Kubernetes, cloud platforms (preferably GCP),...
Suggested
GoTo Meeting
San Francisco, CA
11 hours ago
Senior Cluster SRE & Cloud Ops Engineer
...years of experience in Site Reliability Engineering, DevOps, or a similar role focused on... ...production systems , Deep expertise in SRE principles and practices, including SLOs... ...Desirable) Experience with machine learning infrastructure, model serving, or distributed AI...
Suggested
Fireworks AI
San Francisco, CA
11 hours ago
Sr. Director, SRE Platform Engineering
$202.8k - $327.63k
...lifecycle management (CLM). What you’ll do The Senior Director, SRE Platform Engineering is a senior engineering leader responsible for bringing... ...is a people manager role reporting to the GVP, Global Infrastructure and Operations. Responsibility Define and drive a...
Suggested
Permanent employment
Contract work
Work at office
Local area
Remote work
2 days per week
DocuSign, Inc.
San Francisco, CA
4 days ago
Senior SRE Platform Engineer for AI-Powered Code Review
...innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses... ...in Site Reliability Engineering, strong knowledge of GCP and infrastructure as code using Terraform. It offers a competitive salary and...
Suggested
CodeRabbit
San Francisco, CA
3 days ago
Senior Manager, Site Reliability Engineering - Infrastructure Platform
$232k - $319k
...secures AI by building the trusted, neutral infrastructure that enables organizations to safely... ...org and various initiatives across SRE & Infrastructure organization. Lead... ...partnership with architects and product engineering Build a world-class observability platform...
Permanent employment
Local area
Worldwide
Flexible hours
Okta, Inc.
San Francisco, CA
11 hours ago
Site Reliability Engineer, Infrastructure - Analytics Platform
...Team The Scaling team designs, builds, and operates critical infrastructure that enables research at OpenAI. Our mission is simple: accelerate... ...the Role We're looking for an experienced Site Reliability Engineer to own production-critical infrastructure end to end. This...
OpenAI
San Francisco, CA
11 hours ago
Cloud-Native Site Reliability Engineer | Kubernetes & AWS
$125k - $165k
...leading innovator in laboratory software is seeking a Site Reliability Engineer in San Francisco, CA. The role focuses on ensuring reliability and performance of AI systems, managing production infrastructure, and operating resilient systems in cloud environments. The...
TELCOR
San Francisco, CA
3 days ago
Senior Site Reliability Engineer - Remote, Multi-Cloud
$180k - $210k
...Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base... ...video understanding and multimodal AI. About the Role As an Infrastructure Engineer at TwelveLabs, you will design and build the core infrastructure...
Remote job
Full time
H1b
Work at office
Worldwide
Visa sponsorship
Flexible hours
Twelve Labs
San Francisco, CA
3 days ago
Senior Site Reliability Engineer - AI Cloud & GPU Infra
A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...
Hyperbolic Labs
San Francisco, CA
2 days ago
Senior / Staff Site Reliability, Platform Engineering
...complex, distributed, cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in ensuring these systems... ...instrumental in designing, building, and maintaining the shared infrastructure services and platforms that our product and application teams...
Saviynt
San Francisco, CA
11 hours ago
Senior Site Reliability Engineer: Cloud Reliability Leader
Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product... .... The ideal candidate has at least 6 years of experience in SRE or Cloud Engineering, expertise in Terraform and Datadog, and is...
Careers at Drata
San Francisco, CA
1 day ago
Senior Site Reliability Engineer — Cloud Infra Lead
Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work... ...to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure... ...The ideal candidate has over 6 years of SRE or DevOps experience, holds a Bachelor's...
Airwallex-
San Francisco, CA
11 hours ago
Staff Site Reliability Engineer, Cloud
$165k - $200k
...network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and... ...critical insight accessible to every engineer, Kentik is the real-time source of truth... ...based Systems Administration, IT and/or SRE related projects Expertise in public cloud...
Full time
Remote work
Home office
Israelvcforum
San Francisco, CA
1 hour ago
Site Reliability Engineer (SRE) / DevOps Engineer
$210k - $300k
...Site Reliability Engineer (SRE) / DevOps Engineer Location: Onsite in NYC or San Francisco Compensation: $210,000–$300,000 Base... ...Engineer to help build, scale, and operate highly reliable cloud infrastructure and developer platforms. In this role, you will be...
TechLine Consulting
San Francisco, CA
2 days ago
Senior SRE Engineer: Scale & Reliability (Kubernetes/GCP)
A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...
Speak
San Francisco, CA
3 days ago
Platform Engineer
...Backend/Infrastructure Engineer/Cloud Platform Engineer – Startup – San Francisco Bay Area - Visa sponsorship not available, US Citizens only. Please do not apply if you are seeking sponsorship Backend/Infrastructure Engineer/Cloud Platform Engine e r is required...
Venture Up
San Francisco, CA
11 hours ago
Senior Software Engineer, ML Platform
$230k
...Senior Software Engineer, ML Platform | Parafin San Francisco, CA (Hybrid) $230K+Base with Competitive Equity Visa Sponsorship... ...Senior Software Engineer, ML Platfor m to own and scale the infrastructure powering machine learning-driven underwriting and financial...
H1b
Visa sponsorship
Carnaby Fox
San Francisco, CA
11 hours ago
Platform Systems Engineer; Sensing and Perception, Maps and Localization
$147.93k - $291.61k
...impact the world in a positive way. To learn more visit: You will... - Drive Technical Execution: Lead the end-to-end systems engineering lifecycle for the Sensing, Perception, Maps, and Localization domains, ensuring timely delivery of robust solutions across...
Full time
Contract work
Work at office
Work from home
Flexible hours
Waabi
San Francisco, CA
3 days ago
Platform Solutions Engineer
...first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes... .... At the same time, you will build and maintain the cloud infrastructure, CI systems, monitoring, and automation that keep Mithrl running...
Work at office
Mithrl
San Francisco, CA
8 days ago
Senior IT Systems Engineer - Platform & AI
$100k - $220k
I did my part and supported the Regular Toilet is hiring an IT Engineer in San Francisco to enhance and scale IT systems. This full-time role involves collaborating with top talents and contributing to impactful healthcare projects. The ideal candidate is a self-starter...
Full time
I did my part and supported the Regular Toilet
San Francisco, CA
11 hours ago
Senior Site Reliability Engineer, Infrastructure Foundations
$15 per hour
...is looking for a Senior Site Reliability Engineer to support and develop the platform... ...Wikimedia’s Site Reliability Engineering (SRE) team is principally responsible for ensuring... ...top-10 website and its underlying infrastructure is healthy and developing further in support...
Permanent employment
For contractors
Remote work
Nerdleveltech
San Francisco, CA
4 days ago
Site Reliability Engineer - AI Infrastructure
Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by... ...improvements. What We’re Looking For 5+ years experience in SRE, DevOps, or infrastructure engineering roles. Strong Linux...
Full time
Remote work
Andromeda Cluster
San Francisco, CA
2 days ago
Senior Site Reliability Engineer AI Infrastructure
Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded... ...and engineering. The Role This is not a generalist SRE role. You will design, operate, and debug large‑scale GPU infrastructure...
Full time
Remote work
Cortes 23
San Francisco, CA
11 hours ago
Software Engineer, Translations Platform
$149k - $350k
...products. Figma’s Code Platform team builds the foundational infrastructure that enables seamless translation between design, code, and AI... ..., scalable, and extendable for the company. We’re hiring engineers to join Code Platform to work on our core code translation infrastructure...
Full time
Remote work
Figma
San Francisco, CA
7 days ago
Software Engineer, Platform Operations
$136k - $170k
...manufacturing, data processing, and software engineering, our office is a truly inspiring mix of... ...design, build, and operate the core infrastructure that enables Planet's engineering teams... ..., or Site Reliability Engineering (SRE) role. ~ Deep understanding of Kubernetes...
Full time
Temporary work
Work at office
Local area
Remote work
Home office
3 days per week
Planet Labs PBC
San Francisco, CA
6 days ago
Senior Software Engineer II - Mobile Platform
$126k - $250k
...Senior Software Engineer I/II - Mobile Platform Join to apply for the Senior Software Engineer I/II - Mobile Platform role at Samsara... ...more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field services...
Full time
Work at office
Remote work
Flexible hours
Samsara
San Francisco, CA
8 days ago
Founding Engineer Own the Platform & Roadmap (Remote)
...A venture-backed technology platform is seeking a Founding Engineer to join their growing engineering team. You will influence all aspects of engineering, product roadmap, and strategy while working directly with senior leadership. The ideal candidate has a Bachelor's...
Remote work
Flexible hours
Kipsi
San Francisco, CA
18 days ago
Senior Software Engineer, AI Platform
$130.9k - $198k
...Representing more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field... ...past call, or content that wins deals. As a Senior Software Engineer, AI Platform, you’ll lead the design and development of core...
Full time
Contract work
Internship
Remote work
Flexible hours
Samsara
San Francisco, CA
7 days ago
Senior Platform Engineer
...Job Description Job Description Why Flux Flux is taking the hard out of hardware, by developing the first AI Hardware Engineer. Our goal is to democratize the ability to create bleeding edge hardware, and revolutionize how electronics are designed and built around...
Remote work
Shift work
Flux Defunct
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to SRE Infrastructure Engineer. Be the first to apply!