Reliability Lead, Common Services
$206k - $303kCoreWeave
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at What You’ll Do The Common Services organization at CoreWeave is responsible for the shared platforms, APIs, and foundational services that power our AI cloud products and internal engineering teams. From authentication and authorization to core platform primitives and developer experience tooling, this organization ensures that the rest of CoreWeave can build, ship, and operate reliably at scale. About the Role As Reliability Lead, Common Services, you will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high‑quality operational practices across multiple teams. You’ll monitor production incidents within Common Services, and work directly with your partner teams to design systems that are reliable, observable, and supportable. Your day‑to‑day work blends hands‑on technical work and cross‑functional leadership to drive continuous improvement of Common Services production operations. In This Role, You Will Establish and lead the SRE / production engineering practice for the Common Services organization, including standards for reliability, incident management, and on‑call, in partnership with the central Product Engineering organization. Develop an Operational Excellence strategy that focuses on not only improving system performance but also monitoring and reducing operational toil. Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services, and ensure these become part of how teams plan and make tradeoffs. Own and improve the incident management lifecycle for Common Services, including on‑call rotations, escalation paths, incident tooling, post‑incident reviews, and follow‑through on corrective actions. Drive the observability strategy (metrics, logs, traces, dashboards, alerts) for Common Services, ensuring we have actionable visibility into the health, performance, and capacity of key systems. Collaborate with engineering leads to design and review architectures for reliability, scalability, resilience, and operability, including failure modes, redundancy, and graceful degradation. Lead efforts to automate and harden operational workflows, including deployments, rollbacks, configuration management, change management, and routine maintenance tasks. Build strong, trust‑based relationships with partner teams and stakeholders, becoming a go‑to leader for production readiness and operational risk within Common Services. Hire, mentor, and develop SRE and production engineering talent, fostering a culture of continuous improvement, learning from incidents, and humane on‑call. Partner with other SRE and production engineering leaders across CoreWeave to align on global practices, tools, and reliability goals, representing the needs and constraints of Common Services. Who You Are 7+ years of experience in Site Reliability Engineering, Production Engineering, or similar roles working on distributed systems or cloud/platform services. 2+ years of technical leadership experience (team lead, staff/principal engineer, or people manager) where you drove reliability and operational improvements across multiple services or teams. Strong background in Linux‑based production environments, containers, and orchestration technologies (e.g., Kubernetes), including debugging complex issues in live systems. Hands‑on experience with observability stacks (metrics, logging, tracing) and alerting systems, and a track record of designing meaningful SLIs/SLOs and alert strategies. Proven experience running on‑call rotations and incident response, including leading high‑severity incidents and driving high‑quality post‑incident reviews. Demonstrated ability to design for reliability (capacity planning, redundancy, failover, backoff, circuit breaking, graceful degradation, etc.) in large‑scale or mission‑critical systems. Comfortable working with infrastructure‑as‑code and automation tooling (e.g., Terraform, Ansible, Helm, CI/CD pipelines) to make operations repeatable, auditable, and safe. Strong cross‑functional communication skills—you can translate between engineering, product, and business stakeholders and influence without relying solely on authority. A bias toward data‑driven decision making, using production data, capacity signals, and incident trends to inform priorities and investments. Preferred Background working with GPU workloads, high‑performance computing, or latency/throughput‑sensitive systems. Experience with multi‑tenant, multi‑region, or highly regulated environments, and the associated reliability considerations. Familiarity with service ownership models and strong opinions on how to align ownership, on‑call, and accountability in a scalable way. Experience mentoring or managing senior engineers and building high‑performing teams through coaching, feedback, and clear expectations. Benefits Base salary range: $206,000 to $303,000. The starting salary will be determined based on job‑related knowledge, skills, experience, and market location. Compensation also includes discretionary bonus, equity awards, and a comprehensive benefits program. The range posted reflects typical compensation for this role; actual compensation is determined by qualifications, experience, interview performance, and location. Medical, dental, and vision insurance – 100% paid for by CoreWeave. Company‑paid life insurance. Voluntary supplemental life insurance. Short‑term and long‑term disability insurance. Flexible Spending Account. Health Savings Account. Tuition reimbursement. Participation in Employee Stock Purchase Program (ESPP). Mental wellness benefits through Spring Health. Family‑forming support provided by Carrot. Paid parental leave. Flexible, full‑service childcare support with Kidsin. 401(k) with generous employer match. Flexible paid time off. Catered lunch each day at office and data center locations. Causal work environment. Work culture focused on innovative disruption. Export Control Compliance This position requires access to export‑controlled information. To conform to U.S. Government export regulations applicable to that information, the applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158; (B) eligible to access the export‑controlled information without a required export authorization; (C) or eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process. Equal Opportunity Employment Statement CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information. As part of this commitment and consistent with the Americans with Disabilities Act (ADA), CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: View email address on click.appcast.io. #J-18808-Ljbffr
$206k - $303k
CoreWeave, located in New York, is hiring a Reliability Lead for the Common Services organization. This role requires 7+ years of experience in Site Reliability Engineering and strong leadership skills. The incumbent will define reliability strategies, partner with engineering...SuggestedFlexible hours$24 - $27.61 per hour
...Guest Experience Lead | Woodbury Common Premium Outlet lululemon is an innovative performance apparel... ...to travel to assigned store with reliable transportation methods Schedule/Availability... ...stores with food/beverage service only: Food safety and/or liquor service...SuggestedMinimum wagePart timeLocal areaShift workNight shiftWeekend workDay shiftAfternoon shiftEarly shift- ...Dexter Services, Inc is seeking a Store Manager for their location in Kentucky. The ideal candidate will have excellent customer service skills, reliability, and a high school diploma. Responsibilities include managing store operations, training staff, and ensuring safety...SuggestedFlexible hours
- ...all aspects of operations, including the service focus and merchandising for their store.... ...Retirement Savings Plan Responsibilities Leads, supports and values Customer Service in... ...and professional manner while supporting a common goal. Accountable for driving profitable...SuggestedWork at officeWeekend workAfternoon shift
- ...skilled Incident Manager in Jersey City, New Jersey, to lead Major Incident Management and ensure swift service restoration during critical outages. This role... ...and driving continuous improvement in service reliability. If you have strong leadership skills and a background...Suggested
- ...effectively. Successful candidates will demonstrate strong customer service skills, manage subordinate employees, and ensure food safety... ...role requires candidates to be at least 18 years of age, with reliable transportation. Responsibilities include training team members...Shift work
- ...growing web hosting provider is seeking a Site Reliability Engineer to ensure the reliability and... ...of observability tools and cloud services. Responsibilities include defining SLIs, improving system stability, and leading incident responses. This remote position offers...Remote work
$157.5k - $254.35k
...driven and creative Senior Site Reliability Engineer to join the Site Reliability... ...Engineer (Senior SRE) to lead reliability initiatives for high‑impact services. In this role, you will own the... ...Linux, networking fundamentals, and common infrastructure components (load...Contract workWork at officeLocal areaRemote work- ...maintaining operational standards. Candidates should be at least 18 years old with reliable transportation and capable of performing physical tasks. The role requires strong customer service skills and teamwork to provide a positive dining experience. #J-18808-Ljbffr...Shift work
$17.95 - $26.68 per hour
...Position Category: Facility Services (FACILITYSERVICES) Company Description Pilot Company is an industry-leading network of travel centers with more than 30,000 team members... ...Model behaviors that support the company’s common purpose; ensure guests and team members are...Local area- ...Job Title: Customer Service & Dispatch Coordinator (HVAC & Trades) Position Type: Full-Time, Remote Working Hours: U.S. business hours... ...attention to detail and commitment to accurate documentation. Reliable high‑speed internet with audio and video capabilities. Fast, reliable...Full timeRemote work
$70k - $75k
...Overview Customer Service Coordinator, North America – Dolce & Gabbana represents excellence rooted in its “Made in Italy” heritage.... ...CS process Organizational skills, high attention to detail and reliable work ethic Efficient communication and reporting skills Problem...Work at officeWeekend work$86k - $138k
Itlearn360 is looking for an experienced AWS Cloud Site Reliability Engineer (SRE) to enhance the reliability and performance of our cloud... ...in SRE or similar roles, with strong expertise in AWS services and CI/CD tools. A competitive salary range of $86,000 - $138,...- QUEST DIAGNOSTICS INC is looking for a Principal Site Reliability Engineer who will build and lead SRE practices to enhance the availability, resiliency, and stability of Quest products and services. Your role includes mentoring top engineering talent and collaborating...Remote job
- Tech Insights is seeking a Senior Site Reliability Engineer to build an AI-first intelligence platform. The role focuses on strategic reliability initiatives across production services, ensuring high performance and reliability for AI workloads. Candidates should have strong...Remote work
- ...running shifts effectively. Candidates must prioritize customer service and ensure food safety while managing complaints. Essential... ...leadership. Applicants must be 18+, pass background checks, and have reliable transportation. The role involves standing, lifting, and...Shift work
$50.83k - $81.33k
...talents and strengths.Supervise and provide guidance to Customer Service Representatives regarding department policies, procedures, and... ...to doing the right thing.We are one team working toward a common goal.We are each responsible for customer service.We practice open...Work at office- ...Supervisor to support effective operations, ensuring excellent customer service and team collaboration. Responsibilities include managing... ...least 18 years old, with a preference for internal promotions. Reliable transportation is required. This position averages 30+ hours...Shift work
$140k - $240k
Tower Research Capital is a leading quantitative trading firm founded in 1998. Tower has... ...compliance, and a full suite of business services. Our Business Support teams enable our... ...on operations, in particular Application Reliability Engineering, ensuring that long-term rollouts...Casual workWork at office$150k - $190k
...Manager, Production Support & Service Reliability This position is not eligible for immigration sponsorship. The Manager of Production... ...recurring support risks Team & Process Leadership Lead production support engineers and AI operations analysts as the...Work at officeLocal areaRemote work2 days per week- ...Jones Petroleum Company is seeking a Food Service Manager to oversee daily operations and ensure high-quality customer service. The ideal candidate will have strong multi-tasking abilities and be reliable, with the capacity to work 50 hours per week. Responsibilities...
$185k - $215k
...RapidSOS in New York is looking for an SRE Manager to lead engineering operations and ensure reliable infrastructure for emergency response services. The ideal candidate will have over 7 years of experience in SRE or DevOps, directly managing Kubernetes and AWS setups...Remote work- Lead Home Service Technician / Handyman This is a role for a skilled professional who leads... ...TruBlue’s high standard of professionalism, reliability, and customer care ●Communicate... ...consistency across every project Common projects include: ●Bathroom and kitchen...Full timeWork at officeFlexible hours
- Summary The Customer Service Coordinator supports North America logistics by managing account administration, order entry and shipping... ...mandatory. Strong organizational skills, attention to detail and reliable work ethic. Effective communication, reporting and problem-...Weekend work
- Intern’s Needed Bilingual Customer Service & Appointment Coordinator (English/Spanish) Position Type: Part-Time, Temporary (with potential... ...of all client information (HIPAA awareness is a plus). Reliable internet connection and quiet work environment. #J-18808-Ljbffr...Temporary workPart timeRemote workAfternoon shift
- ...ARCAN Capital is seeking a Maintenance Supervisor to oversee maintenance services and manage a team. This role is vital for maintaining the community's safety and quality. Responsibilities include performing repairs, managing requests, and ensuring compliance with housing...
- ...Petroleum LLC is seeking an enthusiastic Assistant Store Manager in Oklahoma to oversee daily store operations and enhance customer service. Key responsibilities include managing sales goals, and inventory levels, and supporting staff training. Ideal candidates should...
- ...Bridgestone Americas, Inc. is looking for a skilled Service Manager to join our team in Georgia. In this role, you will utilize your automotive... ...and people skills to manage customer relationships and lead a team of vehicle technicians. The ideal candidate should have at...Flexible hours
$122k - $128k
...YDU JC Air Cond & Ref Inc.- Dubai is hiring for the position of HVAC Truck Based Service Manager in New York, United States. This role involves managing customer account leadership, enhancing labor and material growth, and ensuring the profitability of service delivery...- ...staff, ensuring an exceptional guest experience. Qualified candidates must possess a high school diploma or equivalent, with experience in hospitality. The role requires strong leadership, customer service, and organizational skills. #J-18808-Ljbffr Coral Hospitality, LLC
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Reliability Lead, Common Services. Be the first to apply!

