Staff Observability Engineer: Scale Metrics & Reliability
Pantera Capital
A technology company is seeking engineers to join their observability team in Palo Alto. This role involves designing and implementing scalable observability infrastructure, developing high-performance telemetry pipelines, and ensuring the reliability and performance of the observability stack. Ideal candidates should have production-level proficiency in programming languages like Go, Rust, or Scala, and a strong understanding of distributed systems. Competitive salary range and various benefits are included. #J-18808-Ljbffr
$235k - $295k
...data and AI infrastructure company is seeking a Sr. Staff Software Engineer to join their Observability team in Mountain View, California. In this role,... ...develop key observability solutions and ensure product reliability across cloud regions. Candidates should have 15+...Suggested- A leading technology company is seeking a Staff Software Engineer focusing on fault management to enhance server reliability and influence team designs. The ideal candidate... ...extensive experience in C++ programming and large-scale systems development. Key responsibilities...Suggested
$126k - $204.5k
...maintaining a large‑scale GCP environment, including... ...of our comprehensive observability systems. To meet the... ...high cardinality metrics, implemented tracing,... ...collaborate closely with our engineering teams to develop... ...and ensure the reliability and availability of our...Suggested$150k - $180k
...onsite energy infrastructure (i.e. large scale BESS) through our proprietary... ...serve as software-focused Senior Site Reliability Engineer at Verrus. This is a full‑time position... ...allocation for workloads. Reliability & Observability : design and implement comprehensive monitoring...SuggestedFull timeWork at officeLocal areaFlexible hours$135k - $179k
...organization of scientists, engineers, and physicians... ...(NGS), population‑scale clinical studies,... ...companies. As a Staff Network Engineer... ...to ensure reliable, predictable network... ...Logs, CloudWatch metrics/logs, and Route 53... ...scalability, reliability, observability, and security....SuggestedFull timeLocal areaFlexible hours- ...Staff Network EngineerSkip to main contentGEICO... ....#Staff Network Engineer page is loaded## Staff... ..., security, and reliability · Implement and maintain observability for the network platform, including metrics, alerts, and dashboards... ...large-scale IP fabrics, including...Hourly payWork experience placementLocal areaFlexible hours
$190k - $300k
...with their user base. AI Engineers, Data Science, and... ...applications at production scale across industries have... ...in the field of AI Observability and has received... ...challenges in AI safety and reliability. Working on exciting... ...agentic observability metrics (e.g., response relevancy...Work at office3 days per week$169k - $224k
...organization of scientists, engineers, and physicians and we... ...(NGS), population-scale clinical studies, and state... ...com GRAIL is seeking a Staff Site Reliability / DevOps Engineer to... ...Establish and evolve observability platforms (metrics, logs, traces) and define...Full timeWork at officeLocal areaFlexible hoursShift work$227k - $290k
...industry-leading Reasoning Engine that uses a combination... ...Will Do As a site reliability engineer, you will be an... ...teams to rapidly deploy and scale Moveworks infrastructure... ...maintain monitoring, metrics, and reporting systems for observability and actionable alerting....Full timeImmediate start$214k - $289.5k
Senior Staff Machine Learning Engineer Category: Software Engineering... ...business value at Intuit scale. In this role, you... ...for adaptability, observability, and secure... ...continuous improvement of reliability, fairness, and... ...hypotheses, success metrics, and iterative validation...Worldwide$220k - $240k
...Staff Data Engineer We're ALSO, an electric mobility company... ..., and large-scale data processing — ensuring... ...telemetry flows are reliable, scalable, cost-efficient... ...telemetry data (events, metrics, time-series) with... ...Develop fault-tolerant, observable, and debuggable...Local areaFlexible hours$220k - $255k
...enjoyable and 10-50x more efficient. ALSO is looking for a Reliability Engineer to play a key role in developing and leading the reliability... ...What You Will Do Establish reliability targets and metrics for new product development that include actuators, batteries...Work at officeLocal areaRemote workFlexible hours1 day per week$206.5k - $258.1k
...contributor to build and scale our AI solutions... ...Technology teams. As a Staff AI Engineer, you will design,... ...instrument offline and online metrics & telemetry to ensure... ..., CI/CD, testing, observability); familiarity with... ...and contributions to reliability/SLOs and operational...Full timeContract workTemporary workPart timeLocal areaShift work- ...company in California seeks a Member of Technical Staff — Training to design and optimize large-scale distributed training systems for frontier AI models... ...involves collaborating with researchers and improving the reliability of long-running training jobs. Competitive...
$200k - $240k
...and 10-50x more efficient. ALSO is looking for a Field Reliability Engineer to play a key role in tracking and improving the reliability... ...identify any gaps in reliability test plan. Develop novel damage metrics to more accurately model failure mechanisms. Work with...Work at officeLocal areaRemote workFlexible hours1 day per week$186k - $232.5k
...Summary Are you a Staff or Lead-level Platform Engineer passionate about developer... ...teams to reliably and securely ship products... ...loop. DevEx Metrics & Advocacy: Track... ...standards for quality, observability, compliance,... ..., supporting large-scale software engineering...Full timeContract workTemporary workPart timeLocal areaShift work$165k - $242k
...innovators to build and scale AI with confidence.... ...We're looking for a Staff Storage Engineer to play a key role... ...systems by building reliable, scalable, and high-throughput... ..., durability, and observability of our storage stack.... ...using telemetry, metrics, and dashboards to improve...Permanent employmentTemporary workCasual workWork at officeFlexible hours$180k
xAI in Palo Alto, California, is seeking a talented engineer for the X Search team. This team focuses on building the core search engine... ...candidates have experience with vector databases and large-scale search systems, with a proven track record in production ML systems...$190k - $240k
...weekly software builds Establish processes and metrics to measure software quality, performance, and... ...efficiently Collaborate with software engineering teams on architecture, observability, infrastructure, and reliability needs Support production readiness reviews,...Hourly payLocal areaFlexible hours$181k - $262k
Hardware Engineering Mountain View, California Staff Hardware Reliability Engineer - Sensors Who we are Aurora’s mission is to deliver the benefits of self-driving... ...Aumovio (formerly Continental) to bring a robust, scaled product to market. In this role you will Lead...Contract workWork at officeLocal area3 days per week- ...California in 2004 when a visionary engineer, Fred Luddy, saw the... ...hybrid indexing technology at scale across large clusters,... ...performance, scalability, and observability of search, including query latency... .... ~ Drive reliability and operability across the platform...Full timeWork at officeRemote workFlexible hoursShift work
$180k
...xAI is seeking a Software Engineer in Palo Alto, California, to join their small, innovative... ...design to ensure scalability and reliability for applications used by millions. The ideal... ...least 2 years of experience with large scale applications, and strong collaboration skills...- ...week) We are seeking a Staff Software Engineer to join the Wallet –... ...most critical and high-scale engineering domains. You will... ...engineering quality, security, and reliability. You’ll collaborate... .... Lead initiatives around observability, alert hygiene, capacity planning...
$180k - $260k
...integration into customers’ logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will work closely with our...Odd jobWork at officeRemote work$126k - $203.5k
...Summary The Production Engineering team is responsible for building, scaling, and operating the cloud... .... As a Senior Staff Production Engineer, Platform... ...infrastructure, and production reliability, you will develop... ...Design and implement observability, monitoring, and telemetry...- ...About the Role As a Senior Staff Software Engineer at Hippocratic AI, you’ll define... ...systems that power reliable, testable, and incrementally... ...pipelines, feature flag strategy, observability, and developer tooling—... ...who have built and scaled software systems across multiple...Work at officeLocal area
$152k - $248k
...Job Description Position: Staff Network Engineer – Data Center & Core Network Engineering Location... ...network performance, capacity, reliability, and observability. Responsibilities Review... ...Design, deploy, and operate large-scale network infrastructure for multiple...Work at office$152k - $248k
...Center & Core Network Engineering team is responsible for... ..., security, and reliability within campus and across... ...hypergrowth. As a Staff Network Engineer, you'... ...capacity, reliability, and observability. Design, deploy, and operate a large-scale network for data...For contractorsWork at officeFlexible hours$198.9k - $304.8k
...of transportation on a global scale. Role As a Technical Lead you... ...align multiple teams to ship reliable, scalable autonomy... ...technical reviews and drive software engineering best practices across the team... ...features and defining useful metrics for analyzing performance. Mentor...Work experience placementLocal areaRelocation packageFlexible hours$189k - $300k
...of transportation on a global scale. The Data Scaling team... ...collaborative, high-impact team of AI/ML engineers, data scientists and... ...Contribute to the safety, reliability, and scalability of next-generation... ...autonomous vehicles. As a Staff AI/ML Engineer in the...Local areaRemote workWork from homeRelocationRelocation packageFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Observability Engineer: Scale Metrics & Reliability. Be the first to apply!
- staff data engineer Palo Alto, CA
- assistant engineer Palo Alto, CA
- staff engineer Palo Alto, CA
- software engineer staff Palo Alto, CA
- senior staff systems engineer Palo Alto, CA
- senior staff engineer Palo Alto, CA
- technology administrator Palo Alto, CA
- engineering aide Palo Alto, CA
- staff automation engineer
- assistant field engineer


