Site Reliability Engineering (SRE) Architect
RADIANT
Position Title: Site Reliability Engineering (SRE) Architect (Telecom OSS/BSS & Mainframe) Location: Dallas - TX, Basking Ridge - NJ, NC, and Tampa - FL. Work Arrangement: Hybrid/Onsite Interview Type: video Must have:
15+ years of progressive experience in enterprise IT and telecommunications environments, with extensive expertise in designing, implementing, and supporting complex OSS/BSS ecosystems that enable large-scale business and network operations.
8+ years of hands-on architecture experience across IBM Mainframe z/OS and midrange platforms (Linux/Solaris), delivering scalable, secure, and highly available enterprise solutions.
Demonstrated expertise in Site Reliability Engineering (SRE) principles, including defining and managing Service Level Objectives (SLOs), Service Level Indicators (SLIs), Error Budgets, reliability governance, and continuous service improvement.
Deep functional and technical knowledge of Telcordia OSS applications, including SWITCH, TIRKS, FACS, WFA, and SOAC, with experience integrating and optimizing telecom operational support systems. Proven ability to design and implement high-availability, fault-tolerant, resilient, and disaster recovery architectures, ensuring business continuity and mission-critical system reliability.
Strong hands-on expertise with IBM Mainframe technologies, including z/OS internals, JCL, IMS, VSAM, DB2, CICS, system utilities, workload management, performance tuning, and production diagnostics.
Extensive experience implementing observability and monitoring solutions using industry-leading tools such as Splunk, Dynatrace, Instana, IBM NetCool, Grafana, and AppDynamics to improve operational visibility and proactive incident detection.
Proven success in driving automation, self-healing capabilities, infrastructure as code, CI/CD reliability practices, and DevOps/SRE transformation across hybrid cloud and on-premises enterprise environments.
Strong understanding of end-to-end telecommunications business processes, including service provisioning, inventory management, order management, activation, network fulfillment, service assurance, and lifecycle management.
Extensive experience leading major incident management, conducting Root Cause Analysis (RCA), problem management, and implementing preventive measures to significantly improve MTTD (Mean Time to Detect), MTTR (Mean Time to Resolve), system stability, and operational excellence.
Proven ability to collaborate with cross-functional teams including Enterprise Architecture, Infrastructure, Development, Operations, Network Engineering, and business stakeholders to deliver highly reliable, business-critical technology solutions.
Excellent leadership, stakeholder management, and communication skills, with a strong track record of mentoring technical teams, driving reliability engineering best practices, and supporting large-scale enterprise transformation initiatives. About Us At Radiant Digital, we provide IT solutions and consulting services to help government agencies and businesses in the USA, Canada, the Middle East, and Southeast Asia. On the federal side, we support agencies like NASA, the Department of State (DOS), the IRS, ACL, ACF,USDA and many others, along with numerous state and local government agencies. We work with industries like telecom, healthcare, entertainment, oil and gas offering solutions designed to meet their specific needs. We focus on improving systems, making better use of data, and updating applications to keep up with changing markets.
15+ years of progressive experience in enterprise IT and telecommunications environments, with extensive expertise in designing, implementing, and supporting complex OSS/BSS ecosystems that enable large-scale business and network operations.
8+ years of hands-on architecture experience across IBM Mainframe z/OS and midrange platforms (Linux/Solaris), delivering scalable, secure, and highly available enterprise solutions.
Demonstrated expertise in Site Reliability Engineering (SRE) principles, including defining and managing Service Level Objectives (SLOs), Service Level Indicators (SLIs), Error Budgets, reliability governance, and continuous service improvement.
Deep functional and technical knowledge of Telcordia OSS applications, including SWITCH, TIRKS, FACS, WFA, and SOAC, with experience integrating and optimizing telecom operational support systems. Proven ability to design and implement high-availability, fault-tolerant, resilient, and disaster recovery architectures, ensuring business continuity and mission-critical system reliability.
Strong hands-on expertise with IBM Mainframe technologies, including z/OS internals, JCL, IMS, VSAM, DB2, CICS, system utilities, workload management, performance tuning, and production diagnostics.
Extensive experience implementing observability and monitoring solutions using industry-leading tools such as Splunk, Dynatrace, Instana, IBM NetCool, Grafana, and AppDynamics to improve operational visibility and proactive incident detection.
Proven success in driving automation, self-healing capabilities, infrastructure as code, CI/CD reliability practices, and DevOps/SRE transformation across hybrid cloud and on-premises enterprise environments.
Strong understanding of end-to-end telecommunications business processes, including service provisioning, inventory management, order management, activation, network fulfillment, service assurance, and lifecycle management.
Extensive experience leading major incident management, conducting Root Cause Analysis (RCA), problem management, and implementing preventive measures to significantly improve MTTD (Mean Time to Detect), MTTR (Mean Time to Resolve), system stability, and operational excellence.
Proven ability to collaborate with cross-functional teams including Enterprise Architecture, Infrastructure, Development, Operations, Network Engineering, and business stakeholders to deliver highly reliable, business-critical technology solutions.
Excellent leadership, stakeholder management, and communication skills, with a strong track record of mentoring technical teams, driving reliability engineering best practices, and supporting large-scale enterprise transformation initiatives. About Us At Radiant Digital, we provide IT solutions and consulting services to help government agencies and businesses in the USA, Canada, the Middle East, and Southeast Asia. On the federal side, we support agencies like NASA, the Department of State (DOS), the IRS, ACL, ACF,USDA and many others, along with numerous state and local government agencies. We work with industries like telecom, healthcare, entertainment, oil and gas offering solutions designed to meet their specific needs. We focus on improving systems, making better use of data, and updating applications to keep up with changing markets.
Vacancy posted 5 hours ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineering (SRE) Architect in Dallas, TX vacancy
- Role: Senior SRE Engineer Location: Washington DC - Hybrid Job Description... ...SaaS , tasked with architecting and automating large-scale monitoring... ...Grail to drive proactive reliability, mentoring cross-functional... ...: Ability to work on-site in the Washington, DC area as...SuggestedWork from homeFlexible hours
- ...description Lead / Principal SRE - DevOps Experience : 8-... ...Technical leader defining reliability strategy, platform architecture... ...Key Responsibilities Architect highly available, secure,... ...improvements Mentor senior engineers and influence leadership Mandatory...Suggested
- ...Job Description Job Description About the Role We're seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our SRE foundation from the ground up at InfiniteChoice. This is a rare greenfield opportunity to establish SRE practices...SuggestedRemote work
- Goldman Sachs Bank AG is seeking a Site Reliability Engineer (SRE) in Dallas to oversee production services, ensuring system health and reliability. The role involves a mix of software and systems engineering, improving capacity through automation and effective management...Suggested
- Compunnel, Inc. is seeking a Senior Cloud Engineer to join the Cloud SRE team in Dallas, Texas. In this role, you will design and develop cloud solutions, ensuring platform reliability and engineering reliability tools. The ideal candidate will have over 7 years of software...Suggested
- Job Description Cloud SRE Engineer - Associate Who We Look For Goldman Sachs Engineers are innovators and problem-solvers who thrive... ...paced global environments. We are seeking a motivated Cloud Site Reliability Engineer (SRE) to support the WM Data Engineering ecosystem....
- ...Senior Site Reliability Engineer The primary responsibility of the Senior Site Reliability Engineer (SRE) to lead reliability engineering initiatives across our Azure estate and... ...Observability & Monitoring Architect end-to-end monitoring using Azure Monitor...Shift workNight shift
- ...Role: Site Reliability Engineer Duration: 6 Months Contract Location: Dallas, TX- Hybrid Job Description (SRE) Collaborating closely with engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues...Contract work
- ...Senior Site Reliability Engineer Lantern is the specialty care platform connecting people with the best care when they need it most. By curating... .... In this pivotal role, you will define and implement SRE practices, drive incident management processes, build observability...Temporary workFlexible hours
- ...Senior Site Reliability Engineer Our client is seeking a Senior Site Reliability Engineer for a month 6-month contract in Irving, TX. Will be... ...establish SLOs and achieve those SLOs. ~5+ years of experience in SRE, DevOps, Production Support, or similar operational roles....Full timeContract workTemporary workWork experience placementFlexible hours
- ...in the following areas: Providing SRE support for multiple distributed software... ...and applications with high reliability, resiliency, performance & quality, and... ...runbooks/playbooks; and, Using Chaos Engineering to test the robustness of the systems and...
- ...applications are highly available, reliable, and performant at a global... ...Computer Science or related Engineering field required. Master's... ...Skills ~5-7 years of hands-on SRE experience. ~1-2 years of... ...1 year of lead experience of site reliability engineering team required...Contract workWork at office
- ...building product industries operate across the globe. We are looking for a Manager, Site Reliability Engineering to be part of revolutionizing these industries. We're looking for a hands‑on SRE leader to build and develop a high‑performing team that oversees reliability...
- System Reliability Engineer (SRE) 1 —> 3 to 5 years experience Location :- Kansas City, Mi or Atlanta, GA or Dallas, Texas Job Description We are seeking an experienced System Reliability Engineer (SRE) 1 to join our team. The ideal candidate will have 3 to 5 years of...
- ...Only On W2 SRE/DevOps Developer Location Options: Minneapolis, MN, Chandler, AZ... ...As part of the ongoing investment in the reliability and modernization of these systems, client... ...impact to reliability. Integration Engineering, ensuring that client is consuming the right...Work experience placement3 days per week
- Job Position: Blockchain Site Reliability Engineer Location: Dallas, TX, USA (Remote Acceptable) Company... ...Blockchain Site Reliability Engineer (SRE), you will be responsible for ensuring... ...with engineering teams and solution architects on reliability improvements and...Remote workWorldwide
- A leading technology company is looking for a System Reliability Engineer (SRE) 1 to ensure the reliability, scalability, and performance of their systems. The ideal candidate should have 3 to 5 years of experience in the SRE role, strong knowledge of system architecture...
- Compliance Engineering, Site Reliability Engineering, Vice President, Dallas Job Description We are Compliance Engineering, a global team of more than... ...uplift and rebuild the Compliance application portfolio. SRE at Goldman Sachs combines software and systems engineering...Full timeWork at office
- InfStones is seeking a Blockchain Site Reliability Engineer in Dallas, TX, to ensure the stability and performance of blockchain nodes. The role combines proactive monitoring, troubleshooting, and the development of automation tools to enhance operational efficiency. The...Remote job
- ...Role: .Net Architect Location: Nashville, TN / Dallas, TX - Hybrid Contract... ...Description: ~ Excellent software engineering and product architecture design foundation... ...methodologies tools like XP Lean SAFe DevSecOps SRE ADO GitHub SonarQube etc to deliver high...Contract work
$169.79k - $233.47k
...their supply chains. DevOps Architect At o9, we invest in... ...architectural decisions and shaping engineering culture across the... ...product teams to deliver software reliably and at scale. What you'll... ...engineering, infrastructure, or SRE roles - with at least 3 years...Shift work$85 - $90 per hour
...Role: Senior SRE Engineer Location: Dallas / Fort Worth, Texas Rate: up to $85-$90 per... ...Structure: 8 Month contract *** 4 days on-site *** -- We have a great new... ...optimize existing CI/CD pipelines. Design, architect and develop cloud-native solutions using...Hourly payContract workWork experience placement- Goldman Sachs is seeking a motivated Cloud Site Reliability Engineer (SRE) in Dallas, Texas. The candidate will be responsible for ensuring the resilience and scalability of cloud-native services on AWS. Key responsibilities include defining SLOs, implementing AI-driven...Full time
- ...Qualifications: 8+ years of software engineering experience, or equivalent demonstrated through... ...implement and maintain scalable and reliable infrastructure on Google Cloud Platform... ...vendor resources. Willingness to work on-site at stated location in the job opening....For contractorsWork experience placement
$50 - $53 per hour
...area onsite at the project, significantly reducing and/or eliminating the demands to travel. Key Responsibilities: Site Reliability Engineers are expected to be able to drive technology triage efforts to completion by assisting with restoral steps, identifying root...Hourly payLive inWork at officeLocal areaFlexible hours3 days per week$50 - $53 per hour
...area onsite at the project, significantly reducing and/or eliminating the demands to travel. Key Responsibilities: Site Reliability Engineers are expected to be able to drive technology triage efforts to completion by assisting with restoral steps, identifying...Hourly payLive inWork at officeLocal areaFlexible hours3 days per week- ISNetworld seeks an Advanced Site Reliability Administrator in Dallas, Texas, responsible for ensuring uptime and performance of cloud-based environments. You will manage both Windows and Linux systems, deploying resources effectively, and automating processes to maintain...Work at officeRemote workFlexible hours
$195.26k - $268.49k
...the planet. Senior DevOps Architect At o9, we invest deeply... ...teams to deliver software reliably at a massive global scale... ...DevOps and Platform Engineering solutions across multi-cloud... ...Release, Infrastructure, or Site Reliability Engineering (SRE) roles. Senior...- PNC Financial Services Group, Inc. is seeking a Senior Site Reliability Engineer for its SRC Lending organization in Dallas, TX. The role focuses on engineering stability, performance, and resiliency across production environments. Qualifications include a university degree...
- We are seeking an experienced Site Reliability Engineer to lead the migration of on‑prem applications to Cloud and to maintain the Cloud applications... ...POC with new technology solutions that are recommended by architects (50%) Site Reliability Engineering Support for...Permanent employmentContract workLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineering (SRE) Architect. Be the first to apply!
Related searches
- site reliability engineer sre Dallas, TX
- site reliability engineer Dallas, TX
- on-site clinical research associate (traveling/remote) Dallas, TX
- junior website developer Dallas, TX
- site merchandiser Dallas, TX
- IT site lead Dallas, TX
- site leader Dallas, TX
- site safety Dallas, TX
- site recruiter Dallas, TX
- on site coordinator Dallas, TX

