SRE
TriOptus LLC
Site Reliability Engineer (SRE)
As a part of the FRDC Site Reliability Engineer (SRE) team, you will help identify resilience challenges, build reusable, foundational software and infrastructure components to improve, influence, and validate the resilience and reliability for technologies that move trillions of dollars per day. Responsibilities include, but are not limited to:
- Participate in the design of build, refactor major software components that improve the availability, resilience, performance of our system
- Design, code, test, and deliver software to automate manual operational work
- Support incident responses, blameless postmortem, design and implement the product improvement to prevent incident reoccurring
- Implement application patterns in support of better service level objectives
- Implement self-healing, resiliency patterns
- Exercise failure cases regularly to validate resilience assumptions
- Engage with development teams throughout the life cycle of incident, ensure lessons learned are translated into automated or process adjust responses to help develop software for reliability and scale, ensuring minimal refactoring or changes
- Code, test and deliver software to automate manual operational work
- Troubleshoot incidents, participate in blameless post-incident evaluations and ensure permanent closure of incidents
- Identify application patterns and analytics in support of better service level objectives
- Analyze self-healing and resiliency patterns and contribute to software which can use these outcomes
- Implement best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting
Requirements & Qualifications:
- Bachelor’s degree or equivalent experience in a software engineering discipline
- 2+ years of hands-on software engineer experience
- Curious about solving resilience problems in run time at scale
- Expertise in at least one technology stack designing, coding, testing, and delivering software
- Knowledge in a few of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
- Experience in cloud native, distributed application design and implementation
- Demonstrated communication and ownership skills
- Debugging and trouble shooting skills
- Collaboration with a diversified high-performing multi-location team
- Excellent analytical, interpersonal and communication skills
- Understanding of SRE methodologies/practices
Required Skills: Technical expertise of 4+ years the below areas, overall IT experience of 6+ years:
- Proficiency in Java / JVM based system design & implementation
- Infrastructure knowledge required including Unix, Windows, networking, and scripting (e.g. Perl / Python)
- Experience with orchestration tools like Jenkins CI/CD, or Jules
- Experience following source control best practices: Git/bitbucket
- Experience with database development (MySQL / Oracle)
- Understanding of architecture and design across distributed systems
Prefer Skills:
- Knowledge of SpringBoot / Microservices architecture
- Experience using Pivotal Cloud Foundry
- Experience with Public Cloud: AWS
- Enterprise platforms using Big Data tools and technologies (e.g. Hadoop, Spark, Hive, Impala, Dremio, Nifi, Ignite)
- Experience setting up & building solutions for Containers e.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the SRE in San Antonio, TX vacancy
- HDR is seeking an experienced individual to define and lead the operational strategy for observability, monitoring, and reliability engineering within the VCF platform. The role involves developing enterprise-level standards, leading major incident management, and ensuring...Suggested
- iHeartMedia in San Antonio is looking for a Senior Site Reliability Engineer. This role includes leading a talented team of SREs/DevOps Engineers to maintain the reliability, availability, and performance of software systems and infrastructure. The ideal candidate will ...Suggested
- NAB Leadership Foundation in San Antonio, TX is seeking a full-time leader for a team of SREs and DevOps Engineers. Responsibilities include ensuring reliability and performance of software systems across Cloud Services. The ideal candidate will hold a Master's degree in...SuggestedFull time
- ...SLOs and engage with exception processes when technical limitations exist. Work with dependent process (Site Reliability Management, SRE, Event Management and Incident Management, CMDB) to create enhancement stories that will improve SLO efficacy and value when passing...Suggested
- Job Title Responsibilities: Develop, test, and debug automated tasks (Apps, Systems, Infrastructure) Troubleshoot minor incidents and contribute to resolution through post-mortems Participate in the application or service development lifecycle through code ...Suggested
- ...hybrid - on site and telework. Minimum Requirements 8+ years required: Senior systems administration, platform engineering, DevOps, SRE, or infrastructure operations experience supporting enterprise or government environments. 5+ years required: Hands‑on...For contractorsRemote work
- ...degree in Computer Science or related field or equivalent practical experience* 4+ years of experience in DevOps, Cloud Engineering, or SRE roles* Strong expertise in: + CI/CD tools (Azure DevOps, GitHub Actions, Jenkins) + Cloud platforms (Azure strongly preferred; AWS/...
- ...outbound, inventory, and labor management workflows to maintain system stability and reliability. Apply Site Reliability Engineering (SRE) practices to improve platform scalability, observability, performance tuning, monitoring, alerting, and automation. Mentor junior...Full time
$80k - $133k
...to Obtain Public Trust**What You Will Do:*** Collaborate with team members and cross-departmental partners to establish and maintain SRE practice in an Agile Scrum framework.* Participate in system design reviews to identify points of failure, promote automation and self...Permanent employmentContract workTemporary workRemote workFlexible hours- ...strategy, maintaining multi-cloud solutions, and enforcing security measures in CI/CD pipelines. Ideal candidates have 4+ years in DevOps/SRE, robust AWS and Kubernetes knowledge, and a passion for mission-driven work. The position is on-site and offers competitive benefits,...Flexible hours
- ...Identity and Access Management (IAM), encryption (at rest/in transit), and cloud-native security toolsets. Site Reliability Engineering (SRE): Implement robust observability (monitoring, logging, alerting) with tools like Prometheus, Grafana, and CloudWatch. Drive...Flexible hoursShift work
- Dovel Technologies, Inc is seeking a Site Reliability Engineer to establish and maintain SRE practices within Agile teams. The candidate will collaborate with cross-departmental partners to enhance system reliability and participate in code reviews and incident management...Remote jobFlexible hours
$151.5k - $346k
...Automation: GenAI, LLM/RAG patterns, MLOps, RPA/ITPA orchestration Security & Risk: identity, data protection, Responsible AI guardrails SRE & Observability: SLOs/SLIs, error budgets, runbooks, auto‑remediation FinOps: unit economics, cost optimization, capacity management...Contract workSummer holidayFlexible hours- ...leadership and deep technical expertise. The successful candidate will be tasked with leading and scaling the platform engineering and SRE organizations, defining long-term cloud infrastructure and platform strategy, and driving modernization initiatives across Kubernetes...
- ...infrastructure, and security teams to drive best practices. Required Qualifications Experience: ~3-5+ years in DevOps, SRE, or infrastructure engineering roles. ~ Deep experience with Azure. ~ Previous experience supporting Power Platforms. ~ Experience...Remote work
- ...networking (VPCs, Route53, etc), GoLang Terraform, Understanding regulatory compliance (e.g, SOC 2, ISO 27001), Datadog, Elastic, SRE or DevOps methodologies Key Responsibilities Monitoring & Alerting • Continuously monitor AWS Resilience Hub to track resilience...
- ...automation, continuous improvement, collaboration, and patient safety. Develops core metrics for monitoring and maintaining system health for SRE practitioners (e.g., latency, traffic, errors, and saturation) leveraging industry practices, manufacturer guidance, and other...Local area
- ...AWS/Azure/GCP) bility to interpret architecture diagrams, data flows, and system design Knowledge of DevOps, CI/CD, monitoring, SRE fundamentals Familiarity with data engineering concepts (ETL, data models, analytics). Product visioning, roadmap planning, and...
- ...what we do! What We Need: iHeartMedia Entertainment, Inc. seeks candidates for the position of Senior Site Reliability Engineer (SRE), responsible for leading a talented team of SREs/DevOps Engineers across a wide variety of Cloud Services to ensure the reliability,...Full timeFlexible hours
$131.3k - $237.35k
...implementing Kubernetes-based developer platforms or Internal Developer Platforms (IDPs). Experience with Site Reliability Engineering (SRE) practices and operational excellence programs. Professional certifications such as: AWS Certified Solutions Architect AWS...Work at officeLocal areaImmediate start
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to SRE. Be the first to apply!


