SRE Infrastructure Engineer
OJUS LLC
Title : SRE Infrastructure Engineer
Location : SFO, CA (5 Days Onsite)
Job Description:
We are seeking a SRE Infrastructure Resource having 8+ years of professional experience ensuring the reliability, scalability, and performance of Google Cloud-based services through automation, monitoring, and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform), optimizing GKE/Kubernetes, incident response, and implementing SLIs/SLOs to minimize manual toil.
This role requires close collaboration with cross-functional teams, adherence to DevOps and Agile practices, and ownership of service quality and delivery.
Key Responsibilities
· GCP Infrastructure Management: Design, deploy, and maintain robust infrastructure components, including VPCs, Compute Engine, GKE (Kubernetes), and storage solutions.
· Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual, repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.
· Observability & Incident Management: Develop monitoring, alerting, and logging systems (e.g., Cloud Monitoring, Prometheus, Grafana). Act as primary on-call to troubleshoot production incidents.
· Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence
· CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins, ArgoCD, Artifactory, DevSecOps, GitLab CI, or GitHub Actions
· Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) - Latency, Traffic, Errors, and Saturation
· Optimization & Security: Proactively optimize infrastructure for cost, performance, and security compliance.
· Site Reliability Engineer, Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health, and GCE visibility
Mandatory Technical Skills & Competencies
· Experience: 8+ years in SRE, DevOps, or systems engineering, specifically with Google Cloud Platform.
· Technical Skills: Deep knowledge of Linux, Kubernetes (GKE), networking (VPCs, CDNs), and containerization.
· Programming: Proficiency in scripting/programming languages like Python, Go, or Shell.
· Methodologies: Strong understanding of GitOps, CI/CD pipelines, and SRE principles (error budgets, toil reduction)
· Strong troubleshooting skills across the full stack (network, OS, application).
· Ability to balance system stability with the need for rapid deployment.
· Observability Tools: Experience implementing monitoring and logging stacks like Prometheus, Grafana, or the Google Cloud Operations Suite
· Excellent collaboration skills to work with development teams for service ownership
Soft Skills
· Strong problem-solving and analytical skills
· Clear communication with technical and non-technical stakeholders
· Ownership mindset and production-grade engineering discipline
· Ability to work independently and within cross-functional teams
- ...plumbing that lets E2B run millions of sandboxes. Today our infrastructure runs on Nomad and Terraform across Google Cloud, with multi-cloud... ...our largest customers. We're looking for an infrastructure engineer who actually wants to live in Terraform and Kubernetes every...SuggestedLive inWork from home
$163k - $203k
GoTo Meeting is looking for a Senior Site Reliability Engineer in San Francisco. You will be responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This role requires expertise in Kubernetes, cloud platforms (preferably GCP),...Suggested- ...years of experience in Site Reliability Engineering, DevOps, or a similar role focused on... ...production systems , Deep expertise in SRE principles and practices, including SLOs... ...Desirable) Experience with machine learning infrastructure, model serving, or distributed AI...Suggested
$202.8k - $327.63k
...lifecycle management (CLM). What you’ll do The Senior Director, SRE Platform Engineering is a senior engineering leader responsible for bringing... ...is a people manager role reporting to the GVP, Global Infrastructure and Operations. Responsibility Define and drive a...SuggestedPermanent employmentContract workWork at officeLocal areaRemote work2 days per week- ...innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses... ...in Site Reliability Engineering, strong knowledge of GCP and infrastructure as code using Terraform. It offers a competitive salary and...Suggested
$232k - $319k
...secures AI by building the trusted, neutral infrastructure that enables organizations to safely... ...org and various initiatives across SRE & Infrastructure organization. Lead... ...partnership with architects and product engineering Build a world-class observability platform...Permanent employmentLocal areaWorldwideFlexible hours- ...Team The Scaling team designs, builds, and operates critical infrastructure that enables research at OpenAI. Our mission is simple: accelerate... ...the Role We're looking for an experienced Site Reliability Engineer to own production-critical infrastructure end to end. This...
$125k - $165k
...leading innovator in laboratory software is seeking a Site Reliability Engineer in San Francisco, CA. The role focuses on ensuring reliability and performance of AI systems, managing production infrastructure, and operating resilient systems in cloud environments. The...$180k - $210k
...Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base... ...video understanding and multimodal AI. About the Role As an Infrastructure Engineer at TwelveLabs, you will design and build the core infrastructure...Remote jobFull timeH1bWork at officeWorldwideVisa sponsorshipFlexible hours- A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...
- ...complex, distributed, cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in ensuring these systems... ...instrumental in designing, building, and maintaining the shared infrastructure services and platforms that our product and application teams...
- Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product... .... The ideal candidate has at least 6 years of experience in SRE or Cloud Engineering, expertise in Terraform and Datadog, and is...
- Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work... ...to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure... ...The ideal candidate has over 6 years of SRE or DevOps experience, holds a Bachelor's...
$165k - $200k
...network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and... ...critical insight accessible to every engineer, Kentik is the real-time source of truth... ...based Systems Administration, IT and/or SRE related projects Expertise in public cloud...Full timeRemote workHome office$210k - $300k
...Site Reliability Engineer (SRE) / DevOps Engineer Location: Onsite in NYC or San Francisco Compensation: $210,000–$300,000 Base... ...Engineer to help build, scale, and operate highly reliable cloud infrastructure and developer platforms. In this role, you will be...- A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...
- ...Backend/Infrastructure Engineer/Cloud Platform Engineer – Startup – San Francisco Bay Area - Visa sponsorship not available, US Citizens only. Please do not apply if you are seeking sponsorship Backend/Infrastructure Engineer/Cloud Platform Engine e r is required...
$230k
...Senior Software Engineer, ML Platform | Parafin San Francisco, CA (Hybrid) $230K+Base with Competitive Equity Visa Sponsorship... ...Senior Software Engineer, ML Platfor m to own and scale the infrastructure powering machine learning-driven underwriting and financial...H1bVisa sponsorship$147.93k - $291.61k
...impact the world in a positive way. To learn more visit: You will... - Drive Technical Execution: Lead the end-to-end systems engineering lifecycle for the Sensing, Perception, Maps, and Localization domains, ensuring timely delivery of robust solutions across...Full timeContract workWork at officeWork from homeFlexible hours- ...first commercially available AI Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes... .... At the same time, you will build and maintain the cloud infrastructure, CI systems, monitoring, and automation that keep Mithrl running...Work at office
$100k - $220k
I did my part and supported the Regular Toilet is hiring an IT Engineer in San Francisco to enhance and scale IT systems. This full-time role involves collaborating with top talents and contributing to impactful healthcare projects. The ideal candidate is a self-starter...Full time$15 per hour
...is looking for a Senior Site Reliability Engineer to support and develop the platform... ...Wikimedia’s Site Reliability Engineering (SRE) team is principally responsible for ensuring... ...top-10 website and its underlying infrastructure is healthy and developing further in support...Permanent employmentFor contractorsRemote work- Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by... ...improvements. What We’re Looking For 5+ years experience in SRE, DevOps, or infrastructure engineering roles. Strong Linux...Full timeRemote work
- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded... ...and engineering. The Role This is not a generalist SRE role. You will design, operate, and debug large‑scale GPU infrastructure...Full timeRemote work
$149k - $350k
...products. Figma’s Code Platform team builds the foundational infrastructure that enables seamless translation between design, code, and AI... ..., scalable, and extendable for the company. We’re hiring engineers to join Code Platform to work on our core code translation infrastructure...Full timeRemote work$136k - $170k
...manufacturing, data processing, and software engineering, our office is a truly inspiring mix of... ...design, build, and operate the core infrastructure that enables Planet's engineering teams... ..., or Site Reliability Engineering (SRE) role. ~ Deep understanding of Kubernetes...Full timeTemporary workWork at officeLocal areaRemote workHome office3 days per week$126k - $250k
...Senior Software Engineer I/II - Mobile Platform Join to apply for the Senior Software Engineer I/II - Mobile Platform role at Samsara... ...more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field services...Full timeWork at officeRemote workFlexible hours- ...A venture-backed technology platform is seeking a Founding Engineer to join their growing engineering team. You will influence all aspects of engineering, product roadmap, and strategy while working directly with senior leadership. The ideal candidate has a Bachelor's...Remote workFlexible hours
$130.9k - $198k
...Representing more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field... ...past call, or content that wins deals. As a Senior Software Engineer, AI Platform, you’ll lead the design and development of core...Full timeContract workInternshipRemote workFlexible hours- ...Job Description Job Description Why Flux Flux is taking the hard out of hardware, by developing the first AI Hardware Engineer. Our goal is to democratize the ability to create bleeding edge hardware, and revolutionize how electronics are designed and built around...Remote workShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to SRE Infrastructure Engineer. Be the first to apply!
- site reliability engineer San Francisco, CA
- site reliability engineer sre San Francisco, CA
- site reliability engineer remote San Francisco, CA
- principal infrastructure engineer San Francisco, CA
- lead infrastructure engineer San Francisco, CA
- remote infrastructure engineer San Francisco, CA
- data infrastructure engineer San Francisco, CA
- senior infrastructure engineer San Francisco, CA
- infrastructure engineer San Francisco, CA
- infrastructure automation engineer San Francisco, CA



