Tech Lead - Network Observability
$180k - $260kClockwork Inc
About Clockwork Systems Clockwork.io - Software Driven Fabrics to increase GPU cluster utilization Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex, traditional infrastructure struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability to catch and quickly resolve problems, workload fault tolerance to keep jobs running through failures, and performance acceleration that dynamically routes and paces traffic to avoid congestion.
To learn more, visit About the Role We are seeking an experienced Tech Lead to lead the architecture, development, and scaling of a high-performance network monitoring and observability platform. This role will focus on building systems that provide deep visibility into RDMA, RoCE, InfiniBand, and TCP/IP networks. The ideal candidate has strong experience in distributed systems, Linux networking, and modern observability stacks (e.g., Grafana/Prometheus).
What You Will Do
Compensation for this position will vary based on the skills and experience you bring, as well as internal equity considerations. For candidates hired at the posted level, the expected base salary range is $180,000 - $260,000. The offered compensation package may also include stock options or other equity awards, subject to Clockwork's equity program and applicable approvals In addition to cash compensation, this role is eligible to participate in the company's equity program, which may include stock options granted in accordance with the company's equity plan and subject to approval and applicable vesting schedules. Clockwork Systems is an equal opportunity employer. We are committed to building world-class teams by welcoming bright, passionate individuals from all backgrounds. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, religion, age, sex, sexual orientation, gender identity or expression, national origin, disability, or protected veteran status. We believe diversity drives innovation, and we grow stronger together.
To learn more, visit About the Role We are seeking an experienced Tech Lead to lead the architecture, development, and scaling of a high-performance network monitoring and observability platform. This role will focus on building systems that provide deep visibility into RDMA, RoCE, InfiniBand, and TCP/IP networks. The ideal candidate has strong experience in distributed systems, Linux networking, and modern observability stacks (e.g., Grafana/Prometheus).
What You Will Do
- Lead architecture, design, and development of scalable network monitoring platforms for high-performance RDMA, RoCE, InfiniBand, and TCP/IP infrastructure.
- Build backend telemetry services, observability dashboards, alerts, diagnostics, anomaly detection, SLA monitoring, and traffic analysis workflows.
- Troubleshoot complex production issues across application, OS, server, RDMA, and network layers while optimizing low-latency collection, aggregation, and alerting.
- Establish engineering standards, drive automation, define technical roadmaps with cross-functional teams, and mentor engineers on distributed systems and high-performance networking best practices.
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field.
- Strong hands-on programming experience in C++, Go, Python, Rust, or similar systems programming languages.
- Proven experience leading engineering teams, major technical initiatives, or complex infrastructure projects.
- Experience building distributed systems, backend services, telemetry pipelines, or observability platforms.
- Hands-on experience with RDMA, RoCE, InfiniBand, or other high-performance network fabrics.
- Familiarity with libibverbs, RDMA verbs, RDMA CM, queue pairs, completion queues, memory registration, and related RDMA concepts.
- Strong knowledge of Linux networking, TCP/IP, DNS, routing, MTU, congestion control, packet loss, latency, and performance tuning.
- Experience with traceroute-style diagnostics, path discovery, network reachability checks, synthetic probes, or active network measurements.
- Experience with monitoring and visualization platforms such as Prometheus, Grafana, Datadog, Splunk, OpenTelemetry, or similar tools.
- Strong debugging skills across software, operating system, server, and network layers.
- Experience operating production systems in Linux-based environments.
- Strong architectural judgment and ability to design systems for reliability, scalability, and operational simplicity.
- Experience supporting AI/ML, HPC, storage, or GPU cluster infrastructure workloads.
- Experience with large-scale RoCE or InfiniBand deployments.
- Experience with NCCL, distributed training infrastructure, or AI cluster diagnostics.
- Experience with eBPF, XDP, DPDK, perf, tcpdump, Wireshark, ethtool, iproute2, rdma-core, or Linux kernel networking tools.
- Experience with cloud infrastructure on AWS, GCP, or Azure.
- Experience with Kubernetes, service discovery, configuration management, and infrastructure automation.
- Knowledge of security, compliance, and infrastructure best practices.
- Experience designing time-series data systems, alerting pipelines, or high-cardinality telemetry platforms.
- Challenging projects.
- A friendly and inclusive workplace culture.
- Competitive compensation.
- A great benefits package.
- Catered lunch.
Compensation for this position will vary based on the skills and experience you bring, as well as internal equity considerations. For candidates hired at the posted level, the expected base salary range is $180,000 - $260,000. The offered compensation package may also include stock options or other equity awards, subject to Clockwork's equity program and applicable approvals In addition to cash compensation, this role is eligible to participate in the company's equity program, which may include stock options granted in accordance with the company's equity plan and subject to approval and applicable vesting schedules. Clockwork Systems is an equal opportunity employer. We are committed to building world-class teams by welcoming bright, passionate individuals from all backgrounds. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, religion, age, sex, sexual orientation, gender identity or expression, national origin, disability, or protected veteran status. We believe diversity drives innovation, and we grow stronger together.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Tech Lead - Network Observability in Palo Alto, CA vacancy
$180k - $260k
Clockwork.io in Palo Alto is seeking a Tech Lead to architect and develop a high-performance network monitoring platform. This role demands strong programming skills in languages such as C++, Go, or Python and significant experience with distributed systems and networking...Network- ...Strong coding in Python / Go ~ Deep Kubernetes (clusters, networking, operators) ~ Distributed systems cloud-native design ~ AWS... ...with Product, AI Security teams Nice to Have: Observability tools (Prometheus, Grafana, Datadog) Multi-cloud / hybrid...NetworkRemote work
$190k - $261.25k
...of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above. As a...NetworkWorldwide$180k - $260k
...AI fabrics by delivering cross-stack observability to catch and quickly resolve problems,... ...looking for a passionate and experienced Tech Lead - Frontend / Full Stack to join our... ...and turning complex infrastructure and network data into clear, intuitive visual experiences...Network$200k - $287.5k
...redefine the future of how work gets done. Observe by Snowflake is an AI-powered... ...root cause and resolution 10x faster. Leading engineering teams at companies like Capital... ...programming: concurrency, memory management, networking, and I/O A track record of solving...NetworkFlexible hours$235k - $295k
Sr. Staff Software Engineer, Observability Location: Mountain View, California At Databricks, we are passionate about enabling data teams... ...exabytes of data per day. At our scale, we observe cloud hardware, network, and operating system faults, and our software must gracefully...Network- ...Tech Lead- Python + Spark (PySpark) Location: Palo Alto, CA (Onsite from day 1) Duration: Contract/Fulltime Job Description:... ...services around AWS BigData/Analytics) ~ Good understanding on AWS networking (VPC) for connectivity between jobs running in AWS VPC and...NetworkFull timeContract work
- ...LinkedIn is the world's largest professional network, built to create economic opportunity... ...We're hiring a Data Foundations Lead to architect and scale the core data foundations... .... Embed quality, controls, and observability: Define quality checks, reconciliation routines...NetworkFor contractorsWork at officeFlexible hours
$205k - $310k
Backend Platform Tech Lead Palo Alto, CA • Engineering • Hybrid • Full-time Instrumental technology is used by the world’s most admired... ...for highly specialized industries, such as manufacturing, networking, cybersecurity, and securities trading. We’re a growing team...NetworkFull time$207k - $300k
Tech Lead, YouTube Shorts Discovery, ML Recommendations corporate_fare YouTube place Mountain View, CA, USA Bachelor’s degree or equivalent... ...retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural...NetworkFull time$207k - $300k
Technical Lead, Native Commerce Integrations corporate_fare Google place Mountain View, CA, USA Apply Bachelor's degree or equivalent... ...retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural...NetworkFull timeLocal area$207k - $300k
...engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is...NetworkFull time$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...root cause of production issue and resolution 10x faster. Leading engineering teams at companies like Capital One, Topgolf, and...Flexible hours$160k - $200k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...from detection to root cause and resolution 10x faster. Leading engineering teams at companies like Capital One, Topgolf, and Dialpad...Immediate startFlexible hours$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform engineered... ...velocity with the reach and ecosystem of one of the world's leading data platforms. We are hiring a Senior Software Engineer to own...Flexible hours$200k - $287.5k
...function, but to help redefine the future of how work gets done. Observe by Snowflake is an AI-powered observability platform built on... ...from detection to root cause and resolution 10x faster. Leading engineering teams at companies like Capital One, Topgolf, and Dialpad...Temporary workFlexible hours$200k - $287.5k
...redefine the future of how work gets done. Observe by Snowflake is an AI-powered... ...production issue and resolution 10x faster. Leading engineering teams at companies like Capital... ...frameworks. Prior experience in a tech lead or staff engineer capacity on a product...Flexible hours- ...Senior Staff Software Engineer – Observability Platform Rootshell Enterprise Technologies... ...scale. Real-Time Data Orchestration: Lead the design of high-throughput messaging... ...experience specifically in large-scale network engineering, telemetry, or observability...Network
$240k - $400k
...on, customer facing delivery. You will lead builds across Node.js services, AI and agent... ...secure, scalable systems across networking, autoscaling, multi tenant patterns. Proficiency... ...Code using Terraform or CDK and strong observability with metrics, tracing, logs, and SLOs....NetworkVisa sponsorship- ...Senior Lead Software Engineer Be an integral part of an agile team that's constantly... ...at JPMorganChase within the VPC Network organization, you provide deep engineering... ...practices (CI/CD, infrastructure as code, observability) ~ Experience applying expertise and new...Network
$100k
...Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance... ...software models, compilers, platforms, networking, and semiconductors. Our diverse team... ...data grows by orders of magnitude. Observability and telemetry are key to ensuring our...NetworkPermanent employmentFull time$152k - $215k
...top-notch technology products. As a Lead Software Engineer at JPMorgan Chase within... ...or C++. Experience with graph neural networks and graph processing frameworks (DGL,... ...with model monitoring, A/B testing, and ML observability tools. Familiarity with MLOps tools...Network- ...Lead Software Engineer, Vice President As a Lead Software Engineer, Vice President, at JPMorgan Chase within the Commercial... ...-native solutions on AWS (compute, storage, security/IAM, networking, observability) ~ Strong experience with LLMs and agentic systems,...NetworkBank staffImmediate start
$300 per month
...We are looking for a Staff Software Engineer to lead the architecture and evolution of Crusoe's observability platform at scale. In this role, you will define... ...OpenTelemetry) integrated with service meshes, APIs, and networking layers Establishing observability standards...NetworkTemporary work$140k - $250k
...leveraging native services, resiliency and observability. Understanding of how-to design and... ...designing products leveraging network, security, compute and storage domain... ...putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity...NetworkTemporary workWork at officeFlexible hours- ...days a week at our Mountain View, CA office. What You Will Do Lead electronic components selection required for an Autonomous Vehicle... ...-line tools Experience troubleshooting vehicle communication networks like CAN, CAN-FD, LIN Experience creating harnesses, soldering...NetworkWork at office
$250k - $300k
...Watch (2026), Forbes AI 50, and Gartner's Tech Innovators in Agentic AI, Glean continues... .... About the Role: The Tech Lead Manager of the Agentic Runtime team builds... ...work across distributed systems, production observability, and ML infra integrations to deliver an...Home officeFlexible hours$15.36k - $23.04k
...Lead Systems Engineer - Traffic Management USA, Durham; USA, Miami; USA, Palo Alto... ...service mesh, strengthening resilience and observability, and pushing our capabilities so that... ...service communication. Worked with AWS networking and compute primitives (ALB/NLB,...NetworkWork at officeWork from homeRelocation packageFlexible hours$212k - $318.4k
Senior Software Engineer - AI Observability - AI, Search & Knowledge Platform Cupertino, California... ...collaborate with a team of engineers to lead the design and development of user-... ...platforms, Kubernetes, object storage, networking, databases, and observability services...NetworkRelocation package- ...Contribute to sprint retrospectives with quality metrics and observations from the test cycle Technical Investigation & Quality Engineering... ...Reproduce and root-cause complex bugs in distributed, networked security products Partner with developers to define...NetworkInternshipWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Lead - Network Observability. Be the first to apply!
Related searches
- technical leader Palo Alto, CA
- technical lead Palo Alto, CA
- palo alto networks Palo Alto, CA
- network intern Palo Alto, CA
- staffing network Palo Alto, CA
- rn network Palo Alto, CA
- network consultant Palo Alto, CA
- food network Palo Alto, CA
- director network services Palo Alto, CA
- network operations center technician Palo Alto, CA




