Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

SRE Infrastructure Engineer

OJUS LLC

Title : SRE Infrastructure Engineer

Location : SFO, CA (5 Days Onsite)

Job Description:

We are seeking a SRE Infrastructure Resource having 8+ years of professional experience ensuring the reliability, scalability, and performance of Google Cloud-based services through automation, monitoring, and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform), optimizing GKE/Kubernetes, incident response, and implementing SLIs/SLOs to minimize manual toil.

This role requires close collaboration with cross-functional teams, adherence to DevOps and Agile practices, and ownership of service quality and delivery.

Key Responsibilities

· GCP Infrastructure Management: Design, deploy, and maintain robust infrastructure components, including VPCs, Compute Engine, GKE (Kubernetes), and storage solutions.

· Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual, repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.

· Observability & Incident Management: Develop monitoring, alerting, and logging systems (e.g., Cloud Monitoring, Prometheus, Grafana). Act as primary on-call to troubleshoot production incidents.

· Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence

· CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins, ArgoCD, Artifactory, DevSecOps, GitLab CI, or GitHub Actions

· Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) - Latency, Traffic, Errors, and Saturation

· Optimization & Security: Proactively optimize infrastructure for cost, performance, and security compliance.

· Site Reliability Engineer, Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health, and GCE visibility

Mandatory Technical Skills & Competencies

· Experience: 8+ years in SRE, DevOps, or systems engineering, specifically with Google Cloud Platform.

· Technical Skills: Deep knowledge of Linux, Kubernetes (GKE), networking (VPCs, CDNs), and containerization.

· Programming: Proficiency in scripting/programming languages like Python, Go, or Shell.

· Methodologies: Strong understanding of GitOps, CI/CD pipelines, and SRE principles (error budgets, toil reduction)

· Strong troubleshooting skills across the full stack (network, OS, application).

· Ability to balance system stability with the need for rapid deployment.

· Observability Tools: Experience implementing monitoring and logging stacks like Prometheus, Grafana, or the Google Cloud Operations Suite

· Excellent collaboration skills to work with development teams for service ownership

Soft Skills

· Strong problem-solving and analytical skills

· Clear communication with technical and non-technical stakeholders

· Ownership mindset and production-grade engineering discipline

· Ability to work independently and within cross-functional teams

Vacancy posted 6 hours ago
Similar jobs that could be interesting for youBased on the SRE Infrastructure Engineer in San Francisco, CA vacancy
  •  ...plumbing that lets E2B run millions of sandboxes. Today our infrastructure runs on Nomad and Terraform across Google Cloud, with multi-cloud...  ...our largest customers. We're looking for an infrastructure engineer who actually wants to live in Terraform and Kubernetes every... 
    Suggested
    Live in
    Work from home

    E2B

    San Francisco, CA
    1 day ago
  • $163k - $203k

    GoTo Meeting is looking for a Senior Site Reliability Engineer in San Francisco. You will be responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This role requires expertise in Kubernetes, cloud platforms (preferably GCP),... 
    Suggested

    GoTo Meeting

    San Francisco, CA
    11 hours ago
  •  ...years of experience in Site Reliability Engineering, DevOps, or a similar role focused on...  ...production systems , Deep expertise in SRE principles and practices, including SLOs...  ...Desirable) Experience with machine learning infrastructure, model serving, or distributed AI... 
    Suggested

    Fireworks AI

    San Francisco, CA
    11 hours ago
  • $202.8k - $327.63k

     ...lifecycle management (CLM). What you’ll do The Senior Director, SRE Platform Engineering is a senior engineering leader responsible for bringing...  ...is a people manager role reporting to the GVP, Global Infrastructure and Operations. Responsibility Define and drive a... 
    Suggested
    Permanent employment
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    DocuSign, Inc.

    San Francisco, CA
    4 days ago
  •  ...innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses...  ...in Site Reliability Engineering, strong knowledge of GCP and infrastructure as code using Terraform. It offers a competitive salary and... 
    Suggested

    CodeRabbit

    San Francisco, CA
    3 days ago
  • $232k - $319k

     ...secures AI by building the trusted, neutral infrastructure that enables organizations to safely...  ...org and various initiatives across SRE & Infrastructure organization. Lead...  ...partnership with architects and product engineering Build a world-class observability platform... 
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    11 hours ago
  •  ...Team The Scaling team designs, builds, and operates critical infrastructure that enables research at OpenAI. Our mission is simple: accelerate...  ...the Role We're looking for an experienced Site Reliability Engineer to own production-critical infrastructure end to end. This... 

    OpenAI

    San Francisco, CA
    11 hours ago
  • $125k - $165k

     ...leading innovator in laboratory software is seeking a Site Reliability Engineer in San Francisco, CA. The role focuses on ensuring reliability and performance of AI systems, managing production infrastructure, and operating resilient systems in cloud environments. The... 

    TELCOR

    San Francisco, CA
    3 days ago
  • $180k - $210k

     ...Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base...  ...video understanding and multimodal AI. About the Role As an Infrastructure Engineer at TwelveLabs, you will design and build the core infrastructure... 
    Remote job
    Full time
    H1b
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    Twelve Labs

    San Francisco, CA
    3 days ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 

    Hyperbolic Labs

    San Francisco, CA
    2 days ago
  •  ...complex, distributed, cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in ensuring these systems...  ...instrumental in designing, building, and maintaining the shared infrastructure services and platforms that our product and application teams... 

    Saviynt

    San Francisco, CA
    11 hours ago
  • Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product...  .... The ideal candidate has at least 6 years of experience in SRE or Cloud Engineering, expertise in Terraform and Datadog, and is... 

    Careers at Drata

    San Francisco, CA
    1 day ago
  • Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work...  ...to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure...  ...The ideal candidate has over 6 years of SRE or DevOps experience, holds a Bachelor's... 

    Airwallex-

    San Francisco, CA
    11 hours ago
  • $165k - $200k

     ...network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and...  ...critical insight accessible to every engineer, Kentik is the real-time source of truth...  ...based Systems Administration, IT and/or SRE related projects Expertise in public cloud... 
    Full time
    Remote work
    Home office

    Israelvcforum

    San Francisco, CA
    1 hour ago
  • $210k - $300k

     ...Site Reliability Engineer (SRE) / DevOps Engineer Location: Onsite in NYC or San Francisco Compensation: $210,000–$300,000 Base...  ...Engineer to help build, scale, and operate highly reliable cloud infrastructure and developer platforms. In this role, you will be... 

    TechLine Consulting

    San Francisco, CA
    2 days ago
  • A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform... 

    Speak

    San Francisco, CA
    3 days ago
  •  ...Backend/Infrastructure Engineer/Cloud Platform Engineer – Startup – San Francisco Bay Area - Visa sponsorship not available, US Citizens only. Please do not apply if you are seeking sponsorship Backend/Infrastructure Engineer/Cloud Platform Engine e r is required... 

    Venture Up

    San Francisco, CA
    11 hours ago
  • $230k

     ...Senior Software Engineer, ML Platform | Parafin San Francisco, CA (Hybrid) $230K+Base with Competitive Equity Visa Sponsorship...  ...Senior Software Engineer, ML Platfor m to own and scale the infrastructure powering machine learning-driven underwriting and financial... 
    H1b
    Visa sponsorship

    Carnaby Fox

    San Francisco, CA
    11 hours ago
  • $147.93k - $291.61k

     ...impact the world in a positive way. To learn more visit: You will... - Drive Technical Execution: Lead the end-to-end systems engineering lifecycle for the Sensing, Perception, Maps, and Localization domains, ensuring timely delivery of robust solutions across... 
    Full time
    Contract work
    Work at office
    Work from home
    Flexible hours

    Waabi

    San Francisco, CA
    3 days ago
  •  ...first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes...  .... At the same time, you will build and maintain the cloud infrastructure, CI systems, monitoring, and automation that keep Mithrl running... 
    Work at office

    Mithrl

    San Francisco, CA
    8 days ago
  • $100k - $220k

    I did my part and supported the Regular Toilet is hiring an IT Engineer in San Francisco to enhance and scale IT systems. This full-time role involves collaborating with top talents and contributing to impactful healthcare projects. The ideal candidate is a self-starter... 
    Full time

    I did my part and supported the Regular Toilet

    San Francisco, CA
    11 hours ago
  • $15 per hour

     ...is looking for a Senior Site Reliability Engineer to support and develop the platform...  ...Wikimedia’s Site Reliability Engineering (SRE) team is principally responsible for ensuring...  ...top-10 website and its underlying infrastructure is healthy and developing further in support... 
    Permanent employment
    For contractors
    Remote work

    Nerdleveltech

    San Francisco, CA
    4 days ago
  • Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by...  ...improvements. What We’re Looking For 5+ years experience in SRE, DevOps, or infrastructure engineering roles. Strong Linux... 
    Full time
    Remote work

    Andromeda Cluster

    San Francisco, CA
    2 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded...  ...and engineering. The Role This is not a generalist SRE role. You will design, operate, and debug large‑scale GPU infrastructure... 
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    11 hours ago
  • $149k - $350k

     ...products. Figma’s Code Platform team builds the foundational infrastructure that enables seamless translation between design, code, and AI...  ..., scalable, and extendable for the company. We’re hiring engineers to join Code Platform to work on our core code translation infrastructure... 
    Full time
    Remote work

    Figma

    San Francisco, CA
    7 days ago
  • $136k - $170k

     ...manufacturing, data processing, and software engineering, our office is a truly inspiring mix of...  ...design, build, and operate the core infrastructure that enables Planet's engineering teams...  ..., or Site Reliability Engineering (SRE) role. ~ Deep understanding of Kubernetes... 
    Full time
    Temporary work
    Work at office
    Local area
    Remote work
    Home office
    3 days per week

    Planet Labs PBC

    San Francisco, CA
    6 days ago
  • $126k - $250k

     ...Senior Software Engineer I/II - Mobile Platform Join to apply for the Senior Software Engineer I/II - Mobile Platform role at Samsara...  ...more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field services... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Samsara

    San Francisco, CA
    8 days ago
  •  ...A venture-backed technology platform is seeking a Founding Engineer to join their growing engineering team. You will influence all aspects of engineering, product roadmap, and strategy while working directly with senior leadership. The ideal candidate has a Bachelor's... 
    Remote work
    Flexible hours

    Kipsi

    San Francisco, CA
    18 days ago
  • $130.9k - $198k

     ...Representing more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field...  ...past call, or content that wins deals. As a Senior Software Engineer, AI Platform, you’ll lead the design and development of core... 
    Full time
    Contract work
    Internship
    Remote work
    Flexible hours

    Samsara

    San Francisco, CA
    7 days ago
  •  ...Job Description Job Description Why Flux Flux is taking the hard out of hardware, by developing the first AI Hardware Engineer. Our goal is to democratize the ability to create bleeding edge hardware, and revolutionize how electronics are designed and built around... 
    Remote work
    Shift work

    Flux Defunct

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to SRE Infrastructure Engineer. Be the first to apply!