Wireless Reliability Engineer (AP SRE) - San Jose, CA
Nile Global Inc
Position: Wireless Reliability Engineer (AP SRE) Location: San Jose, CA Mission: Help eliminate bad WiFi experiences by making Nile's access point platform measurably more reliable before it reaches production. Nile delivers Connectivity as a Service for enterprise campuses. Instead of oneoff hardware and manual break/fix testing, we operate a service with strong reliability and security guarantees. This role sits at the intersection of wireless, systems, and software engineering to make that possible. Role Overview: As a Wireless Reliability Engineer on the AP SRE team, you will own the reliability of Nile's access point platform across performance, correctness, and security, primarily in preproduction environments. You will: - Design and evolve the automation, validation, and chaos frameworks that exercise our APs in CI/CD and in the lab.
- Drive deep L1-L7 investigations when complex wireless issues appear and convert those learnings into durable tests and platform changes.
- Partner closely with firmware, cloud SRE, and security to ensure reliability is engineered in, not bolted on. This is an individual contributor role at Senior / Staff level, with high technical ownership and visibility.
What You'll Do
Build the Machine That Tests the Machine
- Architect Python-based automation scripts that integrate into CI/CD to validate AP firmware, drivers, and wireless features at scale.
- Continuously increase automated coverage of functional, performance, and longevity tests for WiFi features (11ax/11be), roaming, QoS, and management-plane behavior.
- Define test strategy and guardrails: what must be covered on every change, on nightly runs, and on longrunning stress suites.
Wireless Reliability & Chaos Engineering (PreProduction)
- Design and run chaos and stress scenarios against WiFi and datapath stacks: RF impairments, load patterns, roaming storms, congestion, and failure injections.
- Characterize and harden the system under realistic client and traffic mixes using tools like IxChariot/IxANVL, Spirent, or Alethea (or equivalent).
- Turn intermittent or "rare" failures into reproducible automated tests that block regressions.
Security & Zero Trust Validation
- Validate 802.1X, WPA3, and Zero Trust campus behavior, including onboarding flows, policy enforcement, and failure/attack scenarios.
- Work with security engineering to translate threat models into repeatable test plans and automation (e.g., auth storms, misbehaving supplicants, rogue AP/client scenarios).
Hardware, SoC, and Telemetry
- Use siliconlevel telemetry and debug hooks on Qualcomm (or similar) AP SoCs to understand RF performance, power, and error behavior under load.
- Collaborate with silicon/firmware teams to correlate lab findings with driver/firmware changes and influence roadmap and design decisions.
DeepDive Debugging & RCAs
- Lead deep technical investigations across L1-L7 when Nile or customer scenarios expose weird or hardtoreproduce behavior.
- Produce clear RCAs that tie symptoms to root cause and result in concrete fixes in firmware, cloud controllers, or test systems.
- Feed every critical RCA back into automation, metrics, or specifications so the same class of issue does not recur. How You'll Work: - Collaboration: Work daytoday with AP firmware, wireless systems, cloud SRE, security, and product teams. You will often be the bridge between RF realities, protocol behavior, and software implementation.
- Scope: Primary focus is preproduction reliability and validation. You will also engage with selected highseverity production incidents when deep wireless expertise is needed and then codify those learnings back into tests.
- Oncall: This role is not a traditional 24×7 production oncall rotation, but you may be pulled into critical incident investigations where AP or wireless expertise is required. In Your First 6-12 Months, You Will
- Design and roll out a nextgeneration AP reliability test scripts in CI/CD for AP features that you own.
- Build and stabilize a set of chaos/stress scenarios that uncover new issues in WiFi performance, roaming, and security under load.
- Lead multiple complex RCAs on wireless issues (lab or field) that directly result in:
- Firmware or SoC configuration changes.
- New automated tests or monitors.
- Updated best practices for deployment or configuration. What You Bring: MustHave Experience
- Experience: ~5+ years in one or more of: wireless systems engineering, AP or client firmware validation, WiFi performance/reliability, or SRE for networking systems.
- Wireless proficiency:
- Handson work with 802.11ac/ax (11ax required; 11be familiarity a plus).
- Comfortable reading and interpreting packet captures (e.g., Wireshark) and RF measurements; you know what a "clean" RF environment looks like and how to recognize common impairments.
- Software & Automation:
- Strong Python skills building test frameworks, harnesses, or tooling (not just oneoff scripts).
- Experience integrating tests into CI/CD pipelines (GitLab CI, Jenkins, etc.).
- Debugging mindset:
- Proven track record debugging complex multicomponent systems (AP + clients + backbone + cloud).
- Ability to design experiments, isolate variables, and turn qualitative "it seems flaky" reports into measurable hypotheses.
- Security fundamentals:
- Working knowledge of 802.1X, EAP, WPA2/WPA3, and robust onboarding flows.
- Comfort validating security behavior (e.g., PMF, key management, misconfigurations, and downgrade/failure scenarios). Nice to Have: - Deep familiarity with 11be/EHT, OFDMA, MUMIMO, and multiband/multilink operation. - Handson experience with Qualcomm or similar WiFi SoC platforms and corresponding debug/telemetry interfaces. - Experience with one or more WiFi test tools/stacks: - Ixia, IxChariot, IxANVL, Spirent, Alethea, or similar. - RF impairers, channel emulators, or shielded chambers. - Prior work in SREstyle roles for networking/wireless services: SLIs/SLOs, error budgets, oncall, and incident management. - Experience using AIassisted development tools (Cursor, Copilot, etc.) as part of your daily workflow. Why Nile: - Problem Space: Work on endtoend campus WiFi and Zero Trust at scale, not just isolated AP features.
- Impact: Your work ships directly into Nile's service, every improvement in your lab shows up as higher reliability for customers.
- Environment: Small, senior team, low bureaucracy, and tight feedback loops between design, implementation, and validation.
- Ownership: You'll have genuine ownership over how we test and qualify our AP platform, and a strong voice in product and architecture decisions that affect reliability. If you're a hands-on engineer who enjoys combining RF, protocols, and software to make systems robust and you'd rather build reliable machines that test machines than run the same manual tests twice, we'd like to talk.
- Drive deep L1-L7 investigations when complex wireless issues appear and convert those learnings into durable tests and platform changes.
- Partner closely with firmware, cloud SRE, and security to ensure reliability is engineered in, not bolted on. This is an individual contributor role at Senior / Staff level, with high technical ownership and visibility.
What You'll Do
Build the Machine That Tests the Machine
- Architect Python-based automation scripts that integrate into CI/CD to validate AP firmware, drivers, and wireless features at scale.
- Continuously increase automated coverage of functional, performance, and longevity tests for WiFi features (11ax/11be), roaming, QoS, and management-plane behavior.
- Define test strategy and guardrails: what must be covered on every change, on nightly runs, and on longrunning stress suites.
Wireless Reliability & Chaos Engineering (PreProduction)
- Design and run chaos and stress scenarios against WiFi and datapath stacks: RF impairments, load patterns, roaming storms, congestion, and failure injections.
- Characterize and harden the system under realistic client and traffic mixes using tools like IxChariot/IxANVL, Spirent, or Alethea (or equivalent).
- Turn intermittent or "rare" failures into reproducible automated tests that block regressions.
Security & Zero Trust Validation
- Validate 802.1X, WPA3, and Zero Trust campus behavior, including onboarding flows, policy enforcement, and failure/attack scenarios.
- Work with security engineering to translate threat models into repeatable test plans and automation (e.g., auth storms, misbehaving supplicants, rogue AP/client scenarios).
Hardware, SoC, and Telemetry
- Use siliconlevel telemetry and debug hooks on Qualcomm (or similar) AP SoCs to understand RF performance, power, and error behavior under load.
- Collaborate with silicon/firmware teams to correlate lab findings with driver/firmware changes and influence roadmap and design decisions.
DeepDive Debugging & RCAs
- Lead deep technical investigations across L1-L7 when Nile or customer scenarios expose weird or hardtoreproduce behavior.
- Produce clear RCAs that tie symptoms to root cause and result in concrete fixes in firmware, cloud controllers, or test systems.
- Feed every critical RCA back into automation, metrics, or specifications so the same class of issue does not recur. How You'll Work: - Collaboration: Work daytoday with AP firmware, wireless systems, cloud SRE, security, and product teams. You will often be the bridge between RF realities, protocol behavior, and software implementation.
- Scope: Primary focus is preproduction reliability and validation. You will also engage with selected highseverity production incidents when deep wireless expertise is needed and then codify those learnings back into tests.
- Oncall: This role is not a traditional 24×7 production oncall rotation, but you may be pulled into critical incident investigations where AP or wireless expertise is required. In Your First 6-12 Months, You Will
- Design and roll out a nextgeneration AP reliability test scripts in CI/CD for AP features that you own.
- Build and stabilize a set of chaos/stress scenarios that uncover new issues in WiFi performance, roaming, and security under load.
- Lead multiple complex RCAs on wireless issues (lab or field) that directly result in:
- Firmware or SoC configuration changes.
- New automated tests or monitors.
- Updated best practices for deployment or configuration. What You Bring: MustHave Experience
- Experience: ~5+ years in one or more of: wireless systems engineering, AP or client firmware validation, WiFi performance/reliability, or SRE for networking systems.
- Wireless proficiency:
- Handson work with 802.11ac/ax (11ax required; 11be familiarity a plus).
- Comfortable reading and interpreting packet captures (e.g., Wireshark) and RF measurements; you know what a "clean" RF environment looks like and how to recognize common impairments.
- Software & Automation:
- Strong Python skills building test frameworks, harnesses, or tooling (not just oneoff scripts).
- Experience integrating tests into CI/CD pipelines (GitLab CI, Jenkins, etc.).
- Debugging mindset:
- Proven track record debugging complex multicomponent systems (AP + clients + backbone + cloud).
- Ability to design experiments, isolate variables, and turn qualitative "it seems flaky" reports into measurable hypotheses.
- Security fundamentals:
- Working knowledge of 802.1X, EAP, WPA2/WPA3, and robust onboarding flows.
- Comfort validating security behavior (e.g., PMF, key management, misconfigurations, and downgrade/failure scenarios). Nice to Have: - Deep familiarity with 11be/EHT, OFDMA, MUMIMO, and multiband/multilink operation. - Handson experience with Qualcomm or similar WiFi SoC platforms and corresponding debug/telemetry interfaces. - Experience with one or more WiFi test tools/stacks: - Ixia, IxChariot, IxANVL, Spirent, Alethea, or similar. - RF impairers, channel emulators, or shielded chambers. - Prior work in SREstyle roles for networking/wireless services: SLIs/SLOs, error budgets, oncall, and incident management. - Experience using AIassisted development tools (Cursor, Copilot, etc.) as part of your daily workflow. Why Nile: - Problem Space: Work on endtoend campus WiFi and Zero Trust at scale, not just isolated AP features.
- Impact: Your work ships directly into Nile's service, every improvement in your lab shows up as higher reliability for customers.
- Environment: Small, senior team, low bureaucracy, and tight feedback loops between design, implementation, and validation.
- Ownership: You'll have genuine ownership over how we test and qualify our AP platform, and a strong voice in product and architecture decisions that affect reliability. If you're a hands-on engineer who enjoys combining RF, protocols, and software to make systems robust and you'd rather build reliable machines that test machines than run the same manual tests twice, we'd like to talk.
Vacancy posted more than 2 months ago
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Wireless Reliability Engineer (AP SRE) - San Jose, CA. Be the first to apply!
Related searches
- reliability maintenance engineering technician San Jose, CA
- sr reliability engineer San Jose, CA
- reliability engineer San Jose, CA
- senior reliability engineer San Jose, CA
- network reliability engineer
- database reliability engineer
- principal reliability engineer
- reliability maintenance engineering technician
- hardware reliability engineer
- reliability engineering manager
