Sr. SRE Platform Software Engineer
Bitdeer Technologies Group
About Bitdeer Technologies Group
Bitdeer is a world-leading technology company for AI and Bitcoin mining infrastructure.
Bitdeer is committed to providing comprehensive Bitcoin mining solutions for its customers and building AI computational infrastructure to support the AI revolution. Bitdeer handles complex processes involved in computing such as equipment procurement, transport logistics, data center design and construction, equipment management, and daily operations. Bitdeer also offers advanced cloud capabilities to customers with high demand for artificial intelligence.
Headquartered in Singapore, Bitdeer has deployed data centers across multiple countries, including the United States, Norway, Bhutan, and Ethiopia.
To learn more, visit (Position Overview
Build and operate one or more bounded contexts of the NeoCloud SRE platform — the multi-region substrate that observes, protects, and operates a GPU rental fleet across self-built and OEM-rented data centers. You take an architect-approved design and turn it into production code that ships through GitOps + the CICD release pipeline, ride the Plugin Framework conventions, meet declared SLOs, and stay drift-free.
This is the build + run role. You don't only ship code; you ship a service that other squads, cloud-service teams, and tenants depend on. You take the on-call pager for what you build.
Key Responsibilities
You will own 1-2 of these:
- Collection & Storage: collection-agent, customer-sdk-gateway, metrics-store, logs-store, traces-store, profiles-store, analytics-lake, enrichment-service, collection-monitor.
- Alert, Correlation & SLO: alert-engine-framework, alert-correlation, slo-framework, default M-series alert rules.
- Topology, Cluster-Health & Cluster Platform Services: topology-service, cluster-health-rollup, OSS-SRE-tool collection plugins for K8s, Slurm, Ray, Volcano, Kueue, and KubeRay.
- Fault-Prediction: prediction-engine-framework and built-in predictors (GPU, Link, Disk, XPA, Straggler, SDC, Stranded GPU).
- Remediation, Workflow, Inspection & Jobs: remediation-actuator, orchestration-substrate (workflow engine), inspection-orchestrator, job-scheduler, NCCL-baseline inspection probe.
- Hardware Lifecycle & DC Ops: hardware-lifecycle, dc-operations, boot-provisioning, rolling-upgrade, bare-metal-bmc-service, auto-discovery, ZTP D0–D5 pipeline, IPMI bare-metal management.
- Identity, Secrets, Tenant-Config & CMDB: iam-service, secrets-service, tenant-sre-config, cmdb-cache, schema registry.
- Customer-Bridge, Ticketing & SRE Platform Portal: customer-bridge, customer-ticketing, sre-operation-system, Customer Console BFF, SRE Console BFF.
- Backup, DR & Meta-Monitor: backup-orchestrator, meta-monitor, external-watcher integration (Datadog or equivalent).
- CI/CD, GitOps, Plugin Framework & SRE Image Registry: cicd-pipeline, gitops-sync, plugin-registry, sre-image-registry.
- Self-Improving Agent: agent-control-plane, agent-discovery, agent-codegen, agent-sandbox, per-Region LLM gateway.
- Global SRE Management: maintenance-window-orchestrator, change-management, capacity-planner, cost-optimizer, gpu-efficiency-dashboard, network-stability-dashboard, patching-orchestrator, artifact-management, compat-matrix-service, security-platform.
Qualifications
- Software Engineering Experience: 7+ years of production software engineering experience, including 2 or more years operating what you built (real on-call experience, not just shipping code).
- Programming Languages: Production-depth mastery of at least one systems-grade language—Go (preferred), Rust, or Java. Proficiency in Python for tooling and SDK work.
- Distributed Systems Fundamentals: Strong grasp of at-least-once vs. exactly-once trade-offs, idempotency, back-pressure, leader election, consistent hashing, gossip, and fan-out. Ability to evaluate CRDT vs. Raft vs. Paxos and select the right tool for the job.
- Multi-Region Observability Stack: Experience at production scale with Prometheus, VictoriaMetrics, Mimir, Thanos, Loki, Elasticsearch, Tempo, Jaeger, or OpenTelemetry. Must have built or substantively contributed to the ingest, query, or storage paths of these systems.
- GitOps & CI/CD: Hands-on experience with Argo, Flux, Helm, Kustomize, Cosign signing, signed-bundle promotion, and blast-radius-aware rollouts.
- Kubernetes Operator Pattern: Proven experience writing a controller or CRD handling real production traffic, with a deep understanding of watch-cache mechanics, leader election, and reconcile loops.
- mTLS & Secrets Management: Experience executing end-to-end mTLS bootstrap with certificate rotation. Hands-on experience with HashiCorp Vault or cloud KMS (AWS KMS / GCP KMS).
- SQL & Time-Series Data: Ability to read a Prometheus query plan, build a recording-rule strategy, and write SQL that joins per-tenant telemetry against analytics-lake tables.
- Testing Discipline: Rigorous approach to unit, integration, contract, chaos, and soak testing. Experience writing and maintaining your own comprehensive tests.
- Technical Writing Fluency: Ability to author clear design docs that align with existing platform architecture, create runbooks optimized for 3 AM on-call responses, and write intent-driven PR descriptions.
Preferred Qualifications (GPU / AI-Infra Context)
Experience in at least one of the following areas is a strong plus:
- NVIDIA Internals: Deep understanding of DCGM and NVIDIA driver internals, including XID semantics and MIG / vGPU partitioning.
- Networking & Fabrics: Experience with InfiniBand or RoCE fabrics, including subnet managers, partitioning, optical health, and NCCL collective tracing.
- HPC Storage: Experience managing Lustre, NetApp, Pure, DDN, VAST, or NVMe-oF under multi-tenant loads.
- Hardware Management: Hands-on experience with BMC, IPMI, and Redfish at OEM scale (Supermicro, Dell, HPE, Lenovo).
- Cluster Platform Internals: Familiarity with Kubernetes GPU Operator, Slurm controller, or Ray GCS.
- BS/MS in Computer Science or similar
- Hyperscale or NeoCloud experience
--------------------------------------------------------------------
Bitdeer is committed to providing equal employment opportunities in accordance with country, state, and local laws. Bitdeer does not discriminate against employees or applicants based on conditions such as race, color, gender identity and/or expression, sexual orientation, marital and/or parental status, religion, political opinion, nationality, ethnic background or social origin, social status, disability, age, indigenous status, and union.
- Roku, Inc. is seeking a talented Senior Software Engineer specializing in MLOps/DevOps to join its Advertising Performance team in Austin, Texas. This role involves supporting and scaling ML infrastructure, ensuring efficient model lifecycle management, and optimizing cloud...SeniorSoftware
- ...Technology (AST), Service Availability and Engineering team, you will be immersed in a... ...Advisors (RIAs). We offer Trading Platform and Products, Account Management... ...The role incorporates aspects of software engineering and operations, SRE practices embracing automation to...SeniorSoftwareWork at officeNight shift
- Site Reliability Engineer Long term contract- 2+ years 100% remote in the continental US... ...team. You will be helping to build a new platform built on AWS cloud technologies. The... ...or Master's degree in Computer Science, Software Engineering, or a related field 4+ years...SeniorSoftwareLong term contractRemote work
- ...Senior Platform Engineer The Senior Platform Engineer at ClosedLoop is responsible for building... ...safe — spanning AI-first enablement, SRE/DevOps practices, AWS cloud-native infrastructure... ...of experience in engineering roles at software companies. ~ Proficiency in Python,...SeniorSoftwareShift work
$106.5k - $133.1k
Certinia delivers a Services-as-a-Business platform that powers and connects all aspects of... ...visit . THE ROLE As a Senior Platform Engineer, Veda AI you will work in Certinia's... ...An ability to apply architectural and software patterns appropriately, and the judgement...SeniorSoftwareFull timeRemote workFlexible hours$89.2k - $209.5k
...Description Role Summary Oracle Health Platform Engineering builds core platform capabilities that... ...operations. We are seeking a Senior Software Developer (IC3) to design, develop, and... ...with cross-functional stakeholders (SRE/Operations, Security, Product, and...SeniorSoftwareTemporary workVisa sponsorshipFlexible hours- Job Title: Sr. Kong Middleware Engineer or Kong Lead Location: Austin, TX or Fort... ...Practices solutions in Kong platform. Key Responsibilities Design... ...with architects, SRE/operations, and security teams... ...Qualifications 8+ years of software development experience with...SeniorSoftwareContract work
- LPL Financial LLC is seeking an AVP, Software Engineer in Austin, Texas. This hands-on position focuses on delivering cloud-native platform services and enabling AI capabilities. The ideal candidate will have extensive experience in application development and AI solutions...SeniorSoftware
$86.4k - $199.5k
...you learning and, on your toes, delivering mission critical services that our customers depend on. Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end‑to‑end configuration...SeniorHourly payTemporary workFlexible hoursShift workWeekend work- ...The Role The AI Engineering and Productivity team in the Global Planning... ...lifecycle. As a Senior Software Engineer , you will be responsible... ...across enterprise data platforms (e.g., SQL Server, Oracle, PostgreSQL... ..., dashboards, alerting, SRE concepts) for data and application...SeniorSoftware
- ...watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we... ...seeking a talented and experienced Senior Software Engineer, MLOps/DevOps, to join the Advertising... ...has a strong background in DevOps/SRE practices, cloud infrastructure management...SeniorSoftwareWork at officeLocal areaRemote workMonday to ThursdayFlexible hours
- Indeed, Inc. is seeking a Software Engineer III to design and maintain data infrastructure for our database platform team. You will enhance reliability and simplify adoption for engineers by collaborating with site reliability engineers and application teams. The ideal...SeniorSoftware
- ...global asset management firm seeks a Senior Integration Engineer to join their Platform Security team in Austin, TX. The ideal candidate will have... ...with teams, writing clean code, and ensuring high-quality software solutions. This role offers competitive compensation and...SeniorSoftware
$156.64k
...currently seeking a Senior Cloud Platform Architect to lead the vision,... ..., governance models, and engineering standards while ensuring platforms... ...new custom and cloud software, coordinate installation and... ...observability, incident management, and SRE practices. Drive...SeniorSoftwareRemote workShift work- ...A tech-driven insurance provider based in Austin is seeking a Senior Software Engineer to lead the development of an internal AI platform. The role involves building APIs, collaborating with applied scientists, and designing data strategies to enable effective AI integration...SeniorSoftwareFlexible hours
- Apple Inc. in Austin, Texas is seeking a Software Architect to design and build distributed systems for their products. This role involves... ...candidate will have significant experience with cloud-native platforms and tools like Kubernetes, and a strong understanding of...SeniorSoftware
- ...Senior Platform Engineer Develop and maintain a software platform that: Exposes Linux security instrumentation information Processes and stores information into a database Publishes information for consumption by a distributed system Ensure all software...SeniorSoftware
- ...Senior Platform Engineer – NODA AI Location: Austin, TX (Hybrid on-site, with up to 10% travel) Clearance Requirement: U.S. Citizen with... ...our ability to rapidly iterate and deploy mission-critical software across diverse environments. They ensure our orchestration software...SeniorSoftwareFlexible hours
- Cloudera is looking for a Staff Software Engineer in Austin, Texas to help build their next-generation AI & Machine Learning platform. The role involves designing and coding scalable application services, collaborating closely with various engineering teams, and driving...SeniorSoftwareWork from homeFlexible hours
- Dimensional Fund Advisors is seeking a Senior SRE in Austin, Texas, to manage the developer tooling ecosystem. This hybrid position will involve both operations and engineering work, focusing on Python and .NET toolchains. The ideal candidate will have extensive experience...Senior
$146k - $204.5k
Expedia, Inc. in Austin seeks an engineer for their GraphQL platform, responsible for building and enhancing the core infrastructure. The ideal candidate will have over 5 years of software development experience, particularly in GraphQL and high-performance systems. This...SeniorSoftware- Traveltechessentialist in Austin is seeking an Engineer to design and develop a core GraphQL platform. This role involves optimizing observability solutions... ...in Computer Science and 5+ years of experience in software development, particularly with Rust or Kotlin. A full...SeniorSoftware
- ...Enterprise Middleware organization is seeking a Senior Kafka Platform Engineer with experience to lead the evolution of its enterprise... ...science, Engineering, or related field 10+ years of hands‑on software engineering experience with proven technical leadership 5+...SeniorSoftwareWork at office
- Expedia Group is looking for an engineer to design and develop their core GraphQL platform in Austin, Texas. The role includes coding high-performance distributed... ...’s degree in a relevant field, 5+ years of software development experience, and expertise in Rust or Kotlin...SeniorSoftware
$184.5k - $258k
Expedia, Inc. is seeking a Senior Software Developer in Austin, Texas. This role involves leading the design and development of AI-powered travel experiences and managing complex cross-service dependencies. The ideal candidate will have strong software development experience...SeniorSoftware$146k - $204.5k
Expedia, Inc. is seeking a Software Engineer to design and develop the core GraphQL platform in Austin, Texas. The role involves collaboration across engineering teams and focuses on building high-performance systems. Ideal candidates should have a Bachelor's degree in...SeniorSoftware- ...! Job Description Title: Senior DevOps Engineer Intro Contoro Robotics is an Austin-based... ...to help scale and harden our Cloud Platform infrastructure. This role is critical to... ...Best Practices & Collaboration Promote software development best practices, including automation...SeniorSoftwareRemote work
$198.24k - $272.58k
Procore is looking for a talented Senior Manager, Software Engineering to lead their pivotal platform for construction technology. Located in Austin, Texas, this role involves overseeing strategic direction, managing a high-performing team, and championing data security...SeniorSoftware- Sr Application Engineer — IT Inventory Platform, IS&T Austin, Texas, United States Software and Services The Asset Intelligence Platform is the consolidated registry that maps applications and services to their underlying infrastructure and dependencies across our data...SeniorSoftware
- Senior Software Engineer (Voice Platform), Customer Systems Austin, Texas, United States Software and Services Join a team building the next generation of large-scale voice and real-time communication platforms that power seamless customer and agent experiences across...SeniorSoftware
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Sr. SRE Platform Software Engineer. Be the first to apply!
- site reliability engineer remote Austin, TX
- site reliability engineer Austin, TX
- site reliability engineer sre Austin, TX
- platform developer Austin, TX
- senior platform engineer Austin, TX
- platform engineering manager Austin, TX
- platform engineer Austin, TX
- client platform engineer Austin, TX
- data platform engineer Austin, TX
- software sales engineer Austin, TX

