Senior Site Reliability Engineer - Data Infrastructure (San Jose)
$212.8k - $387.6kByteDance
Senior Site Reliability Engineer - Data Infrastructure
Location: San Jose
Team: Infrastructure
Employment Type: Regular
Job Code: A59871
Responsibilities
The Data Infrastructure SRE team is responsible for the reliability, scalability, and efficiency of the core data services that power our products. We manage a massive, distributed environment built on technologies like Kubernetes, Redis, MySQL, and Message Queue. Our work is not about building features, but about engineering the resilience and performance of the underlying platform that all product teams depend on. We are the guardians of production, ensuring our data systems run smoothly, nonstop. This role includes participation in a rotational on-call schedule to ensure nonstop coverage for our critical data infrastructure. You will be expected to respond to, troubleshoot, and resolve production incidents. Our team collaborates across multiple time zones, and you will engage in rigorous change management and post-incident review processes to maintain system stability. Role Summary As a Site Reliability Engineer, you will be on the front lines of keeping our large-scale data systems running reliably and efficiently. You will focus on hands-on operational work, from responding to alerts and managing production changes to automating routine tasks. This role is an excellent opportunity to develop deep expertise in modern infrastructure technologies and SRE practices while working alongside senior engineers to solve challenging problems.
Responsibilities:
- Incident response and postmortems: Act as an incident commander for critical production issues, guiding the team through triage and resolution. Drive deep, blameless post-incident reviews and ensure that follow-up actions are implemented to prevent recurrence.
- SLO/SLA and error budgets: Define, negotiate, and maintain Service Level Objectives (SLOs) for critical data services. Champion the use of error budgets to balance reliability work with feature development.
- Capacity and cost optimization: Lead initiatives in capacity planning, performance tuning, and resource management. Develop strategies and automation to ensure our infrastructure scales efficiently and stays within budget.
- Pragmatic automation and AI orchestration: Design and build automation and leverage AI Agents to eliminate operational toil, improve deployment safety, and enhance overall operational efficiency. Focus on creating maintainable, robust tools and intelligent workflows that make the entire team more effective.
- Operational excellence and change management: Uphold and improve our standards for production operations, including runbooks, monitoring, and alerting. Vet complex changes and deployments to ensure they meet our bar for production readiness.
- Data Center and AI Infrastructure: Lead the construction, maintenance, and optimization of data centers and specialized AI infrastructure, ensuring high availability and peak performance for complex AI-driven workloads.
- Cross-team influence and mentorship: Act as a subject matter expert on reliability, consulting with application development and other infrastructure teams. Mentor junior SREs, helping them develop their technical and operational skills.
Qualifications
Minimum Qualifications:
- Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.
- 5+ years of experience in a Site Reliability Engineering, Production Engineering, or similar role.
- Strong proficiency in a programming or scripting language (e.g., Go, Python, Bash) for automation and tool development.
- Deep understanding of Linux/Unix operating systems, networking fundamentals (TCP/IP, DNS), and distributed systems.
Preferred Qualifications:
- Extensive hands-on experience managing large-scale data infrastructure (e.g., MySQL, Redis, Kafka, Flink).
- Proven experience with container orchestration technologies, particularly Kubernetes, in a production environment.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- A systematic problem-solving approach, coupled with strong communication skills and a sense of ownership.
- Experience leading incident response for complex, high-impact events.
- Experience in the operation and construction of Data Centers is a big plus.
Job Information
For Pay TransparencyCompensation Description (Annually)
The base salary range for this position in the selected city is $212800 - $387600 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate's qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units. Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure). The Company reserves the right to modify or change these benefits programs at any time, with or without notice.
For Los Angeles County (unincorporated) Candidates: Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment: 1. Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues; 2. Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and 3. Exercising sound judgment.
About Us
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
Why Join ByteDance
Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day. As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.
Diversity & Inclusion
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
Reasonable Accommodation
ByteDance is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at
$153.6k - $286.6k
...A leading software company in San Jose seeks a Senior Software Development Engineer to build innovative AI-driven solutions. The role involves architecting systems and collaborating with diverse teams. Ideal candidates have extensive experience in programming and web technologies...SeniorFull time- ...A cloud solutions company is seeking a GCP Architect in San Jose, CA, to develop comprehensive cloud architecture solutions. The ideal candidate should have 5-7 years of experience as a Cloud Platform Architect, demonstrating deep expertise in GCP environments. Key responsibilities...SeniorContract work2 days per week3 days per week
- ...About the job Senior Software Engineer - Data Security (AI-Driven) | San Jose, CA (Hybrid) Senior Software Engineer - Data Security (AI-Driven) | San... ...in a fast-paced environment-balancing speed with reliability Collaborate closely with engineering partners...SeniorFull timeContract workRelocation package
$212.8k
...Senior/Tech Lead AI/LLM Network Software Development Engineer - San Jose Location: San Jose Team: Technology... ...create hyperscale data-center networking solutions... ...intelligent network infrastructure to meet the... ...improve the scalability, reliability and performance of...SeniorTemporary workLocal area$177.9k - $277.4k
...Senior Software Engineer (Nextest, San Jose) Location: San Jose, CA, US Opportunity Overview Nextest is looking for software engineers to join an exciting, dynamic, hardworking, engaging, and collaborative team. As a candidate contributor, you will be involved in all aspects...SeniorFlexible hours$150k - $180k
...Senior Software Engineer - Core Team - San Jose, CA About ZEDEDA ZEDEDA unlocks the value of AI... ...operate, turning real-time data into real and tangible... ...that run natively in our infrastructure — agents that integrate... ...with our services, run reliably at scale, and deliver real...SeniorWork at office$185k
...Senior Software Security Engineer Engineering · US, San Jose · Hybrid Who We Are Spectro Cloud lets organizations... ...the world run AI infrastructure at scale - without... ...infrastructure across edge, data center, and cloud.... ...response, and reliability improvements Clearly communicate...SeniorWork at officeFlexible hoursShift work3 days per week$156k - $387.6k
...Data Center Network Automation Engineer ByteDance San Jose, CA, US ByteDance is a global incubator of platforms at the cutting edge of commerce, content... ...and operating the global, intelligent network infrastructure to meet the requirements of high availability,...Temporary work$177.9k - $284.7k
...Senior Software Engineer - Tech Lead (Nextest, San Jose) Location: San Jose, CA, US North Reading, MA, US Opportunity Overview Nextest is looking for software engineers to join an exciting, dynamic, hardworking, engaging, and collaborative team. As a candidate contributor...SeniorFlexible hours- ...The City of San José is recruiting for a Senior Systems Applications Programmer (SSAP) with a focus on financial applications in the Information Technology Department. As a technical support, this position will be responsible for citywide customer supporting, enhancing...Senior
- ...About the job Software Engineer - Data Security (AI-Driven) | San Jose, CA (Hybrid) Software Engineer Are you ready to join an exciting early-stage start-up that detects active data breaches and protects businesses? Be part of a team thats revolutionizing...Full timeContract workWorldwideRelocation package
$149.9k - $166.3k
...a Bachelor's degree in Systems Engineering, or a related Science, Engineering... ...of software, hardware, reliability, maintainability, safety and other... ...This position is fully on site. While on-site, you will be a part of our San Jose, CA location. #CJ1 Salary...SeniorWork experience placementFlexible hours$78.5k - $102.9k
## Data Center Engineer - San JoseSan Jose,California,United StatesFind out how well you match... ...for ensuring the reliable installation, maintenance... ...of optical transport and infrastructure within customer data centers... ...travel to a non-home market site at a moment’s notice and...Temporary workWork at officeImmediate startRemote work$78.5k - $102.9k
...motivated and skilled Data Center Optical Engineer to lead work in... ...responsible for ensuring the reliable installation,... ...optical transport and infrastructure within customer data... ...a non‑home market site at a moment's notice... ...United States (US) || San Jose (CA) Job details:...Temporary workWork at officeImmediate startRemote work- ...Job Description: AI Infrastructure Engineer San Jose, CA Duration: 6+ months Must have skills: AI, Kubernetes, Orchestration... ...tailored AI solutions that bridge the gap between private data centers and public cloud. Your day-to-day will involve...Full timeWork at office
- ...Machine Learning Engineer | Python | Pytorch | Distributed Training... ...Optimisation | GPU | Hybrid, San Jose, CA Title: Machine Learning... ...models from Research into reliable, performant, and cost-efficient... ...vector/feature stores and data pipelines (FAISS/Milvus/Pinecone...
$156k - $387.6k
...Network Software Development Engineer ByteDance San Jose, CA, US ByteDance is a... ..., to create hyper-scale data-center networking... ...global, intelligent network infrastructure to meet the requirements... ...congestion control, and system reliability. 3. Design and maintain...Temporary workLocal area$176.3k - $293.7k
...Principal Software Engineer- Java (HYBRID San Jose, CA) page is loaded## Principal Software Engineer- Java... ...identifying root causes, and implementing reliable fixes. They are also comfortable... ...understanding of software design principles, data structures, and algorithms.*...For contractorsWork experience placement- ...US staffing Inc is seeking a Business Analyst for an onsite position in San Jose, CA. The ideal candidate will have over 6 years of experience in the IT field, strong analytical skills, and proficiency in Agile methodology. Responsibilities include gap/feasibility analysis...Senior
- ...A leading cloud hosting provider is seeking a Data Center Operations Engineer for its San Jose location. This full-time role involves hands-on management and support of server and data center operations. Candidates should have at least 1 year of experience, strong hardware...Full time
$220k
...Samsung SDS America in San Jose, CA is seeking a Senior Security Operations Engineer to lead security operations projects and ensure effective threat management. You will develop detailed runbooks, oversee the configuration of SIEM systems, and collaborate with engineering...Senior$120k - $150k
...Monolithic Power Systems, Inc. is seeking a Sr. IT ERP System Administrator in San Jose, CA. This key role supports over 300 ERP users, primarily in Finance and Sales & Marketing, focusing on system issue resolution, user training, and efficiency improvement. Qualified...Senior- ...experienced SAP BASIS ADMIN to administer SAP HANA and S/4 HANA landscapes. This full-time, permanent position is based onsite in San Jose, California. The role requires extensive knowledge of SAP Basis tasks including installations, upgrades, and high availability strategies...SeniorPermanent employmentFull time
$190k - $270k
...agentic workflows. Build systems that allow hardware engineers to "query" complex design rules and legacy data with high accuracy. Engineering Data Strategy:... ...exemptions or licenses must be filed. Nearest Major Market San Jose Nearest Secondary Market Palo Alto Job Segment...SeniorTemporary workFor contractorsWork at officeShift workNight shift$900 per month
...A healthcare provider in San Jose, CA, seeks a compassionate Podiatrist for a part-time role. This position requires providing on-site podiatric care to senior communities. The ideal candidate will have an active Podiatry license in California and experience with geriatric...SeniorPart time- ...Veriipro is seeking a seasoned Dynamics 365 Consultant located in San Jose, CA. Candidates must have over 8 years of experience leading enterprise-level implementations of Dynamics 365 solutions. Key responsibilities include collaborating with clients, customizing modules...SeniorLocal area
- ...Position: Wireless Reliability Engineer (AP SRE) Location: San Jose, CA Mission: Help eliminate bad WiFi experiences by making Nile's access point... ...on. This is an individual contributor role at Senior / Staff level, with high technical ownership and visibility...Night shift
- ...Broadcom Inc. is looking for a Physical Design Engineer to join the ASIC Products Division in San Jose, CA. This role involves working with cutting-edge technology to drive next-gen AI designs while executing Physical Design and Verification processes. The ideal candidate...Senior
- ...Avanciers Inc. is seeking an Accounts Receivable Specialist for a contract role in San Jose, CA. The ideal candidate will have over 5 years of accounts receivable experience and a Bachelor's degree in Accounting or Finance. The role involves processing customer payments...SeniorContract work
- ...Adobe Inc. is seeking a skilled software engineer in San Jose to define API integration patterns,... ...solutions using Kubernetes, and guide senior engineers. The ideal candidate has over... ...development, performance optimization, and data migration strategies. Adobe is an Equal...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer - Data Infrastructure (San Jose). Be the first to apply!
- site reliability engineer San Jose, CA
- site reliability engineer sre San Jose, CA
- remote data engineer San Jose, CA
- data engineer intern San Jose, CA
- entry level big data engineer San Jose, CA
- big data devops engineer San Jose, CA
- data engineer San Jose, CA
- data engineer contract San Jose, CA
- software data engineer San Jose, CA
- big data cloud engineer San Jose, CA


