Tech Lead - Network Observability
$180k - $260kClockwork.io
Job Description
Job Description
About Clockwork Systems
Clockwork.io – Software Driven Fabrics to increase GPU cluster utilization
Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex, traditional infrastructure struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability to catch and quickly resolve problems, workload fault tolerance to keep jobs running through failures, and performance acceleration that dynamically routes and paces traffic to avoid congestion.
To learn more, visitAbout the Role
We are seeking an experienced Tech Lead to lead the architecture, development, and scaling of a high-performance network monitoring and observability platform. This role will focus on building systems that provide deep visibility into RDMA, RoCE, InfiniBand, and TCP/IP networks. The ideal candidate has strong experience in distributed systems, Linux networking, and modern observability stacks (e.g., Grafana/Prometheus).
What You Will Do- Lead architecture, design, and development of scalable network monitoring platforms for high-performance RDMA, RoCE, InfiniBand, and TCP/IP infrastructure.
- Build backend telemetry services, observability dashboards, alerts, diagnostics, anomaly detection, SLA monitoring, and traffic analysis workflows.
- Troubleshoot complex production issues across application, OS, server, RDMA, and network layers while optimizing low-latency collection, aggregation, and alerting.
- Establish engineering standards, drive automation, define technical roadmaps with cross-functional teams, and mentor engineers on distributed systems and high-performance networking best practices.
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field.
- Strong hands-on programming experience in C++, Go, Python, Rust, or similar systems programming languages.
- Proven experience leading engineering teams, major technical initiatives, or complex infrastructure projects.
- Experience building distributed systems, backend services, telemetry pipelines, or observability platforms.
- Hands-on experience with RDMA, RoCE, InfiniBand, or other high-performance network fabrics.
- Familiarity with libibverbs, RDMA verbs, RDMA CM, queue pairs, completion queues, memory registration, and related RDMA concepts.
- Strong knowledge of Linux networking, TCP/IP, DNS, routing, MTU, congestion control, packet loss, latency, and performance tuning.
- Experience with traceroute-style diagnostics, path discovery, network reachability checks, synthetic probes, or active network measurements.
- Experience with monitoring and visualization platforms such as Prometheus, Grafana, Datadog, Splunk, OpenTelemetry, or similar tools.
- Strong debugging skills across software, operating system, server, and network layers.
- Experience operating production systems in Linux-based environments.
- Strong architectural judgment and ability to design systems for reliability, scalability, and operational simplicity.
- Experience supporting AI/ML, HPC, storage, or GPU cluster infrastructure workloads.
- Experience with large-scale RoCE or InfiniBand deployments.
- Experience with NCCL, distributed training infrastructure, or AI cluster diagnostics.
- Experience with eBPF, XDP, DPDK, perf, tcpdump, Wireshark, ethtool, iproute2, rdma-core, or Linux kernel networking tools.
- Experience with cloud infrastructure on AWS, GCP, or Azure.
- Experience with Kubernetes, service discovery, configuration management, and infrastructure automation.
- Knowledge of security, compliance, and infrastructure best practices.
- Experience designing time-series data systems, alerting pipelines, or high-cardinality telemetry platforms.
Enjoy
- Challenging projects.
- A friendly and inclusive workplace culture.
- Competitive compensation.
- A great benefits package.
- Catered lunch.
Compensation for this position will vary based on the skills and experience you bring, as well as internal equity considerations. For candidates hired at the posted level, the expected base salary range is $180,000 - $260,000. The offered compensation package may also include stock options or other equity awards, subject to Clockwork's equity program and applicable approvals
In addition to cash compensation, this role is eligible to participate in the company's equity program , which may include stock options granted in accordance with the company's equity plan and subject to approval and applicable vesting schedules.
Clockwork Systems is an equal opportunity employer. We are committed to building world-class teams by welcoming bright, passionate individuals from all backgrounds. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, religion, age, sex, sexual orientation, gender identity or expression, national origin, disability, or protected veteran status. We believe diversity drives innovation, and we grow stronger together.
$180k - $260k
Clockwork.io in Palo Alto is seeking a Tech Lead to architect and develop a high-performance network monitoring platform. This role demands strong programming skills in languages such as C++, Go, or Python and significant experience with distributed systems and networking...Network$180k - $260k
...AI fabrics by delivering cross-stack observability to catch and quickly resolve problems,... ...looking for a passionate and experienced Tech Lead - Frontend / Full Stack to join our... ...and turning complex infrastructure and network data into clear, intuitive visual experiences...Network$235k - $295k
...of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above. As a...NetworkLocal areaWorldwide$235k - $295k
Sr. Staff Software Engineer, Observability Location: Mountain View, California At Databricks, we are passionate about enabling data teams... ...exabytes of data per day. At our scale, we observe cloud hardware, network, and operating system faults, and our software must gracefully...Network$200k - $287.5k
...redefine the future of how work gets done. Observe by Snowflake is an AI-powered... ...root cause and resolution 10x faster. Leading engineering teams at companies like Capital... ...programming: concurrency, memory management, networking, and I/O A track record of solving...NetworkFlexible hours- ...LinkedIn is the world's largest professional network, built to create economic opportunity... ...We're hiring a Data Foundations Lead to architect and scale the core data foundations... .... Embed quality, controls, and observability: Define quality checks, reconciliation routines...NetworkFor contractorsWork at officeFlexible hours
$251k - $310k
Waymo is seeking a Staff Technical Lead Manager to lead their ML Evaluation team. This role involves defining the strategic vision for... ...closely with modeling teams to validate deep neural networks. The ideal candidate will have over 5 years of experience in large...Network$205k - $310k
Backend Platform Tech Lead Palo Alto, CA • Engineering • Hybrid • Full-time Instrumental technology is used by the world’s most admired... ...for highly specialized industries, such as manufacturing, networking, cybersecurity, and securities trading. We’re a growing team...NetworkFull time$207k - $300k
Tech Lead, YouTube Shorts Discovery, ML Recommendations corporate_fare YouTube place Mountain View, CA, USA Bachelor’s degree or equivalent... ...retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural...NetworkFull time$207k - $300k
Technical Lead, Native Commerce Integrations corporate_fare Google place Mountain View, CA, USA Apply Bachelor's degree or equivalent... ...retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural...NetworkFull timeLocal area$212k - $318.4k
Senior Software Engineer - AI Observability - AI, Search & Knowledge Platform Cupertino, California... ...collaborate with a team of engineers to lead the design and development of user-... ...platforms, Kubernetes, object storage, networking, databases, and observability services...NetworkRelocation package$207k - $301k
Tech Lead Manager, Google Analytics Gold Processing Backend Mountain View, CA, USA Qualifications Bachelor's degree or equivalent practical... ...developing large‑scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware...Network$207k - $300k
...engineers who bring fresh ideas from all areas—including information retrieval, distributed computing, large‑scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design, and mobile—and who are ready to...NetworkFull time- ...Snowflake is hiring a Senior Software Engineer in Menlo Park, CA, to lead the evolution of our APM and AI observability products. This role requires expertise in backend development, complex data pipelines, and collaboration across teams. Candidates should have a BS in...
$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...root cause of production issue and resolution 10x faster. Leading engineering teams at companies like Capital One, Topgolf, and...Flexible hours$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI‑powered observability platform engineered... ...velocity with the reach and ecosystem of one of the world’s leading data platforms. We are hiring a Senior Software Engineer to own...Flexible hours- ...Snowflake is hiring a Senior Software Engineer in Menlo Park, California. In this role, you will own and drive the evolution of our AI observability and APM products. Responsibilities include building and optimizing streaming data pipelines, designing backend services, and...
$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...from detection to root cause and resolution 10x faster. Leading engineering teams at companies like Capital One, Topgolf, and Dialpad...Temporary workFlexible hours$200k - $287.5k
...redefine the future of how work gets done. Observe by Snowflake is an AI-powered... ...production issue and resolution 10x faster. Leading engineering teams at companies like Capital... ...frameworks. Prior experience in a tech lead or staff engineer capacity on a product...Flexible hours$160k - $200k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...from detection to root cause and resolution 10x faster. Leading engineering teams at companies like Capital One, Topgolf, and Dialpad...Immediate startFlexible hours$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...libraries that strengthen Observe's position as a leading OTel destination. Collaborate with the OpenTelemetry open-source...Flexible hours- ...Software Engineer - Observability, Mid-Level Join to apply for the Software Engineer - Observability, Mid-Level role at Jobright.ai Software Engineer - Observability, Mid-Level 2 days ago Be among the first 25 applicants Join to apply for the Software...Full timeH1b
$200k - $287.5k
...to the next level. We are looking for a Senior Engineer in Observability to help define and build the next generation of AI powered observability... ...platforms such as AWS, Azure, or GCP Proven ability to lead complex technical projects and influence architecture...Flexible hours- ...seeking Senior Staff Software Engineer Observability Platform for one of our client, Please share... ...scale. Real-Time Data Orchestration: Lead the design of high-throughput messaging... ...experience specifically in large-scale network engineering, telemetry, or observability...Network
$224k - $356.5k
## Senior Software Development Tech Lead - NVLink FW and NVOSApplylocations: US, CA, Santa... ...NVLink team develops the firmware and network OS (NVOS) for NVLink, NVIDIA’s... ...* Establish guidelines for evaluation, observability, and continuous improvement of our networking...Network$240k - $400k
...on, customer facing delivery. You will lead builds across Node.js services, AI and agent... ...secure, scalable systems across networking, autoscaling, multi tenant patterns. Proficiency... ...Code using Terraform or CDK and strong observability with metrics, tracing, logs, and SLOs....NetworkVisa sponsorship- ...Tech Lead, AI Compute Infrastructure Los Angeles, Palo Alto, San Francisco, Toronto, Singapore About HeyGen At HeyGen, our... ...training, and continuous evaluation/benchmarking. Enhance Observability: Develop world-class observability, tracing, and...Full time
- ...days a week at our Mountain View, CA office. What You Will Do Lead electronic components selection required for an Autonomous Vehicle... ...-line tools Experience troubleshooting vehicle communication networks like CAN, CAN-FD, LIN Experience creating harnesses, soldering...NetworkWork at office
$140k - $250k
...leveraging native services, resiliency and observability. Understanding of how-to design and... ...designing products leveraging network, security, compute and storage domain... ...putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity...NetworkTemporary workWork at officeFlexible hours$250k - $300k
...Watch (2026), Forbes AI 50, and Gartner's Tech Innovators in Agentic AI, Glean continues... .... About the Role: The Tech Lead Manager of the Agentic Runtime team builds... ...work across distributed systems, production observability, and ML infra integrations to deliver an...Home officeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead - Network Observability. Be the first to apply!
- technical leader Palo Alto, CA
- technical lead Palo Alto, CA
- data network cabling Palo Alto, CA
- network consultant Palo Alto, CA
- IT network Palo Alto, CA
- network operations center Palo Alto, CA
- cloud network engineer Palo Alto, CA
- network operations center engineer Palo Alto, CA
- staffing network Palo Alto, CA
- network operations center manager Palo Alto, CA


