Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Sr. SRE Platform Software Engineer

Full-time

Bitdeer Technologies Group

About Bitdeer Technologies Group

Bitdeer is a world-leading technology company for AI and Bitcoin mining infrastructure.

Bitdeer is committed to providing comprehensive Bitcoin mining solutions for its customers and building AI computational infrastructure to support the AI revolution. Bitdeer handles complex processes involved in computing such as equipment procurement, transport logistics, data center design and construction, equipment management, and daily operations. Bitdeer also offers advanced cloud capabilities to customers with high demand for artificial intelligence.

Headquartered in Singapore, Bitdeer has deployed data centers across multiple countries, including the United States, Norway, Bhutan, and Ethiopia.

To learn more, visit  (

Position Overview

Build and operate one or more bounded contexts of the NeoCloud SRE platform — the multi-region substrate that observes, protects, and operates a GPU rental fleet across self-built and OEM-rented data centers. You take an architect-approved design and turn it into production code that ships through GitOps + the CICD release pipeline, ride the Plugin Framework conventions, meet declared SLOs, and stay drift-free.

This is the build + run role. You don't only ship code; you ship a service that other squads, cloud-service teams, and tenants depend on. You take the on-call pager for what you build.

Key Responsibilities
You will own 1-2 of these:

  • Collection & Storage: collection-agent, customer-sdk-gateway, metrics-store, logs-store, traces-store, profiles-store, analytics-lake, enrichment-service, collection-monitor.
  • Alert, Correlation & SLO: alert-engine-framework, alert-correlation, slo-framework, default M-series alert rules.
  • Topology, Cluster-Health & Cluster Platform Services: topology-service, cluster-health-rollup, OSS-SRE-tool collection plugins for K8s, Slurm, Ray, Volcano, Kueue, and KubeRay.
  • Fault-Prediction: prediction-engine-framework and built-in predictors (GPU, Link, Disk, XPA, Straggler, SDC, Stranded GPU).
  • Remediation, Workflow, Inspection & Jobs: remediation-actuator, orchestration-substrate (workflow engine), inspection-orchestrator, job-scheduler, NCCL-baseline inspection probe.
  • Hardware Lifecycle & DC Ops: hardware-lifecycle, dc-operations, boot-provisioning, rolling-upgrade, bare-metal-bmc-service, auto-discovery, ZTP D0–D5 pipeline, IPMI bare-metal management.
  • Identity, Secrets, Tenant-Config & CMDB: iam-service, secrets-service, tenant-sre-config, cmdb-cache, schema registry.
  • Customer-Bridge, Ticketing & SRE Platform Portal: customer-bridge, customer-ticketing, sre-operation-system, Customer Console BFF, SRE Console BFF.
  • Backup, DR & Meta-Monitor: backup-orchestrator, meta-monitor, external-watcher integration (Datadog or equivalent).
  • CI/CD, GitOps, Plugin Framework & SRE Image Registry: cicd-pipeline, gitops-sync, plugin-registry, sre-image-registry.
  • Self-Improving Agent: agent-control-plane, agent-discovery, agent-codegen, agent-sandbox, per-Region LLM gateway.
  • Global SRE Management: maintenance-window-orchestrator, change-management, capacity-planner, cost-optimizer, gpu-efficiency-dashboard, network-stability-dashboard, patching-orchestrator, artifact-management, compat-matrix-service, security-platform.

Qualifications

  • Software Engineering Experience: 7+ years of production software engineering experience, including 2 or more years operating what you built (real on-call experience, not just shipping code).
  • Programming Languages: Production-depth mastery of at least one systems-grade language—Go (preferred), Rust, or Java. Proficiency in Python for tooling and SDK work.
  • Distributed Systems Fundamentals: Strong grasp of at-least-once vs. exactly-once trade-offs, idempotency, back-pressure, leader election, consistent hashing, gossip, and fan-out. Ability to evaluate CRDT vs. Raft vs. Paxos and select the right tool for the job.
  • Multi-Region Observability Stack: Experience at production scale with Prometheus, VictoriaMetrics, Mimir, Thanos, Loki, Elasticsearch, Tempo, Jaeger, or OpenTelemetry. Must have built or substantively contributed to the ingest, query, or storage paths of these systems.
  • GitOps & CI/CD: Hands-on experience with Argo, Flux, Helm, Kustomize, Cosign signing, signed-bundle promotion, and blast-radius-aware rollouts.
  • Kubernetes Operator Pattern: Proven experience writing a controller or CRD handling real production traffic, with a deep understanding of watch-cache mechanics, leader election, and reconcile loops.
  • mTLS & Secrets Management: Experience executing end-to-end mTLS bootstrap with certificate rotation. Hands-on experience with HashiCorp Vault or cloud KMS (AWS KMS / GCP KMS).
  • SQL & Time-Series Data: Ability to read a Prometheus query plan, build a recording-rule strategy, and write SQL that joins per-tenant telemetry against analytics-lake tables.
  • Testing Discipline: Rigorous approach to unit, integration, contract, chaos, and soak testing. Experience writing and maintaining your own comprehensive tests.
  • Technical Writing Fluency: Ability to author clear design docs that align with existing platform architecture, create runbooks optimized for 3 AM on-call responses, and write intent-driven PR descriptions.

Preferred Qualifications (GPU / AI-Infra Context)
Experience in at least one of the following areas is a strong plus:

  • NVIDIA Internals: Deep understanding of DCGM and NVIDIA driver internals, including XID semantics and MIG / vGPU partitioning.
  • Networking & Fabrics: Experience with InfiniBand or RoCE fabrics, including subnet managers, partitioning, optical health, and NCCL collective tracing.
  • HPC Storage: Experience managing Lustre, NetApp, Pure, DDN, VAST, or NVMe-oF under multi-tenant loads.
  • Hardware Management: Hands-on experience with BMC, IPMI, and Redfish at OEM scale (Supermicro, Dell, HPE, Lenovo).
  • Cluster Platform Internals: Familiarity with Kubernetes GPU Operator, Slurm controller, or Ray GCS.
  • BS/MS in Computer Science or similar
  • Hyperscale or NeoCloud experience

--------------------------------------------------------------------

Bitdeer is committed to providing equal employment opportunities in accordance with country, state, and local laws. Bitdeer does not discriminate against employees or applicants based on conditions such as race, color, gender identity and/or expression, sexual orientation, marital and/or parental status, religion, political opinion, nationality, ethnic background or social origin, social status, disability, age, indigenous status, and union.

Vacancy posted 29 days ago
Similar jobs that could be interesting for youBased on the Sr. SRE Platform Software Engineer in San Jose, CA vacancy
  • A global technology leader is looking for an experienced SRE software engineer in Cupertino, California, to build and enhance compute infrastructure for Apple's services. The role involves developing AI-powered tooling, automating deployment, and ensuring that services... 
    Senior
    Software

    Apple Inc.

    Cupertino, CA
    1 day ago
  • A leading technology company is looking for a Java SRE Engineer to support large-scale cloud migrations and production systems on AWS and...  ...Kubernetes. You will lead migrations, design robust AWS EKS platforms, and implement deployment strategies. The ideal candidate has... 
    Senior

    EITACIES Inc.

    Santa Clara, CA
    2 days ago
  • $230k - $315k

    Job Summary We’re seeking a Distinguished Engineer to lead the architecture, scalability,...  ...Alto Networks’ Data Loss Prevention (DLP) platform. This role drives the technical vision for...  ...across product, Site Reliability Engineering (SRE), privacy, and Machine Learning (ML)... 
    Senior
    Software
    Visa sponsorship

    Palo Alto Networks, Inc.

    Santa Clara, CA
    5 days ago
  • $180.6k - $271k

    Company Qualcomm Technologies, Inc. Job Area Engineering Group, Engineering Group > Software Engineering General Summary Open to Candidates in Santa Clara, Austin, San Diego, or Remote As a Software Virtual Platform Engineer, you will specialize in virtual platforms, including... 
    Senior
    Software
    Work experience placement
    Remote work
    Work from home

    Qualcomm

    Santa Clara, CA
    5 days ago
  • $170k - $230k

    General Motors is hiring a Senior Platform Engineer to enhance the Autonomous Vehicle (AV) Cloud Engineering team. The role involves building...  ...have a strong background in Kubernetes, Google Cloud, and software engineering principles. The position offers a salary range... 
    Senior
    Software

    General Motors

    Sunnyvale, CA
    5 days ago
  • $200k - $322k

    NVIDIA Gruppe in Santa Clara is seeking a Senior Staff Software Engineer to lead engineering efforts in their enterprise systems. Responsibilities...  ...in a relevant field, with at least 12 years of experience in SRE or DevOps, and expertise in backend languages like Go, Python,... 
    Senior
    Software

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $172.1k - $305.6k

    Cupertino, California, United States Software and Services The Apple Services Engineering team is one of the most exciting examples of Apple’s long-held passion...  ...-end solutions. The Service Reliability Engineering (SRE) team is responsible for service infrastructure that... 
    Software
    Relocation

    Apple Inc.

    Cupertino, CA
    5 days ago
  • Google Inc. in Sunnyvale, CA is seeking a Senior Software Engineering Manager to lead and develop teams across multiple locations. You will provide technical leadership to major projects while managing engineers and their professional development. The role involves setting... 
    Senior
    Software

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $166k - $244k

    A leading tech company is seeking a Senior Software Engineer to develop next-generation technologies in Sunnyvale, CA. The role involves software development and managing project priorities within a global team. Candidates should have a Bachelor's degree and 5 years of... 
    Senior
    Software
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • CrowdStrike, Inc. is seeking an experienced SRE Engineering Manager in Sunnyvale, California to lead a...  ...and scalability of our cloud-native security platform. The role involves managing significant challenges, leveraging software engineering expertise, and mentoring team... 
    Software

    Koitecc Solutions

    Sunnyvale, CA
    1 day ago
  • Cerebras is seeking a Software Engineer to join our Inference Platform team in Sunnyvale, California. This role involves developing and leading projects that integrate cloud and ML components. You will contribute to shaping the technical direction and improve system performance... 
    Senior
    Software

    Cerebras

    Sunnyvale, CA
    5 days ago
  • $248k - $349k

    A leading technology company is seeking a Senior Staff Software Engineer to take charge of high-impact infrastructure projects. This role involves designing, developing, and enhancing software solutions while providing technical leadership and managing project timelines... 
    Senior
    Software

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $181.1k - $318.4k

    Senior Site Reliability Engineer, Storage SRE / Apple Services Engineering Cupertino, California, United States | Software and Services At Apple, we believe that innovation flourishes...  ...and technical lead in our Apple Data Platform (ADP) SRE organization, you will apply... 
    Senior
    Software
    Relocation

    Apple Inc.

    Cupertino, CA
    4 days ago
  • $207k - $300k

    Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should... 
    Senior
    Software

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $140k - $215k

     ...with the world's most advanced AI-native platform. We work on large scale distributed...  ...Role At CrowdStrike, Site Reliability Engineering (SRE) is at the forefront of ensuring the reliability...  ..., leveraging your expertise in software engineering, systems design, and automation... 
    Software
    Full time
    Work experience placement
    Work at office
    Local area
    2 days per week

    Koitecc Solutions

    Sunnyvale, CA
    1 day ago
  • $129.3k - $193.9k

    Qualcomm is seeking a Software Virtual Platform / Simulation Senior Engineer to design and develop SystemC TLM models that accurately represent SoC architectures. The role requires collaboration with hardware and software engineers and extensive experience in C++ programming... 
    Senior
    Software

    Qualcomm

    Santa Clara, CA
    4 days ago
  • Job Description We are hiring a Senior Platform Engineer to join the Autonomous Vehicle (AV) Cloud Engineering team within AV Core Infrastructure...  ...of production‑grade clusters. Strong proficiency in software engineering and DevOps principles, specifically using Golang... 
    Senior
    Software
    Work experience placement
    Local area

    General Motors

    Sunnyvale, CA
    5 days ago
  • Qualcomm in Santa Clara is looking for a Senior Software Engineer for their robotics software platform. You will shape the architecture and lead technical developments while collaborating across teams to deliver high-performance solutions. Successful candidates will possess... 
    Senior
    Software

    Qualcomm

    Santa Clara, CA
    5 days ago
  • Java SRE Engineer Onsite San Francisco Bay Area Infrastructure Engineer (2 Positions) We are looking for an experienced Java SRE / Platform Engineer to support large-scale cloud migrations and production systems on AWS and Kubernetes platforms. This role is focused on... 

    EITACIES Inc.

    Santa Clara, CA
    2 days ago
  • $210k - $295k

    SPACE EXPLORATION TECHNOLOGIES CORP in Sunnyvale, CA, is seeking a Principal Software Engineer for the Platform Team. This role focuses on building foundational AI tooling and security infrastructure to enhance engineering workflows at SpaceX. The ideal candidate will... 
    Senior
    Software

    SPACE EXPLORATION TECHNOLOGIES CORP

    Sunnyvale, CA
    2 days ago
  • $120.5k - $243k

    Hobbsnews is seeking a Senior Platform Software Engineer located in Sunnyvale, California. This hybrid role requires on-site work two days per week, focusing on high-performance networking and security systems development. The ideal candidate will hold a Master’s or Ph... 
    Senior
    Software
    2 days per week

    Hobbsnews

    Sunnyvale, CA
    4 days ago
  • Apple Inc. is seeking a Senior Software Engineer focused on building large-scale voice and real-time communication platforms in Sunnyvale, California. This role involves leading the design and development of distributed systems that enhance customer and agent interactions... 
    Senior
    Software

    Apple

    Sunnyvale, CA
    1 day ago
  • $100k

    Netflix, Inc. is seeking a skilled software engineer to lead advancements in Metaflow, a platform enhancing machine learning applications. The role involves designing and implementing product improvements while collaborating with researchers and the open-source community... 
    Senior
    Software
    Flexible hours

    Netflix, Inc.

    Los Gatos, CA
    5 days ago
  • Qualcomm is looking for a Software Engineer in Santa Clara, CA, to lead the development of its robotics software platform. You will be responsible for defining the architecture and leading cross-functional teams to deliver production-grade solutions. The ideal candidate... 
    Senior
    Software

    Qualcomm

    Santa Clara, CA
    4 days ago
  • NVIDIA Corporation is seeking a Senior HPC Support Engineer to provide solutions for AI hardware and software products. You'll play a key role in resolving complex customer issues and collaborating with engineering and marketing teams on technical matters. The ideal candidate... 
    Senior
    Software
    Remote job

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...mobility AI company committed to solving mobility challenges with software and AI. As the Global Software Center of Hyundai Motor Group...  ...urban transportation operating system. At 42dot, our AD ML Platform Engineers build the core data platform and ML training / eval platform... 
    Senior
    Software
    Full time
    Work experience placement

    42dot Inc.

    Sunnyvale, CA
    2 days ago
  • Cisco Systems, Inc. is seeking a Senior Software Engineer in Milpitas, CA, for a hybrid position focused on building DPU-accelerated networking platforms. The ideal candidate will engage in software architecture, design, and development of network services. This role provides... 
    Senior
    Software

    Cisco

    Milpitas, CA
    2 days ago
  • A leading technology company is seeking a Senior System Software Engineer focused on OpenBMC for GPU Server platforms. This role involves firmware design, development, and performance analysis, requiring strong experience in BMC Firmware development and device drivers.... 
    Senior
    Software
    Remote job

    NVIDIA

    Santa Clara, CA
    1 day ago
  • A leading technology company is seeking a Senior Software Engineer to develop backend services for internal applications that enhance the workflows of Apple employees. The ideal candidate will have at least five years of experience in backend development using modern programming... 
    Senior
    Software

    Apple Inc.

    Cupertino, CA
    2 days ago
  • Pure Storage, Inc. is seeking a skilled software engineer to enhance its development platforms and automation services. This role focuses on improving scalability and reliability within the Santa Clara office environment. The ideal candidate will have over 5 years of experience... 
    Senior
    Software
    Work at office

    Pure Storage, Inc.

    Santa Clara, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Sr. SRE Platform Software Engineer. Be the first to apply!