Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

Hard Rock Digital

What are we building? Hard Rock Digital is a team focused on becoming the best online sportsbook, casino, and social gaming company in the world. We’re building a team that resonates passion for learning, operating, and building new products and technologies for millions of consumers. We care about each customer interaction, experience, behavior, and insight and strive to ensure we’re always acting authentically. Rooted in the kindred spirits of Hard Rock and the Seminole Tribe of Florida, Hard Rock Digital taps a brand known the world over as the leader in gaming, entertainment, and hospitality. We’re taking that foundation of success and bringing it to the digital space - ready to join us? What’s the position? We are looking for a Senior Site Reliability Engineer who combines deep infrastructure expertise with a forward-thinking approach to AI-driven operations. In this role you will maintain and improve the reliability, scalability, and performance of our Java-based applications while pioneering the use of large language models (LLMs), agentic workflows, and intelligent automation to transform how we monitor, respond to, and prevent incidents. You will design and build autonomous and semi-autonomous AI agents that consume observability data, triage alerts, generate runbooks, automate incident response steps, and surface actionable insights—reducing toil and accelerating mean time to resolution. This is a hands‑on engineering role for someone who is equally comfortable tuning a JVM, writing PromQL, and prototyping an agentic pipeline with tool‑calling LLMs. Key Responsibilities Application Reliability & Performance Ensure the availability, reliability, and performance of high‑traffic Java‑based applications in a distributed environment. Troubleshoot and resolve complex issues across production and non‑production environments. Participate in pre‑ and post‑deployment performance testing and monitoring to continuously improve application performance. Optimize Java application performance with a focus on JVM tuning, efficient resource utilization, and horizontal scaling. Monitoring, Observability & AIOps Deploy and manage the Grafana stack (Grafana, Prometheus, Loki, Mimir, Alloy) to deliver real‑time monitoring, logging, and alerting. Implement and refine observability strategies that enhance visibility into application and infrastructure health. Create and maintain dashboards, alerts, and log queries for comprehensive system health monitoring. Integrate AI/ML models into the observability pipeline for anomaly detection, predictive alerting, and intelligent alert correlation and noise reduction. AI & Agentic Workflow Engineering Design, build, and operate agentic AI workflows that automate operational tasks such as alert triage, root cause analysis, runbook execution, and incident summarization. Develop tool‑calling LLM agents that interact with infrastructure APIs (Kubernetes, Grafana, Jira, Slack, PagerDuty) to execute diagnostic and remediation actions autonomously or with human‑in‑the‑loop approval. Build and maintain MCP (Model Context Protocol) servers and integrations that expose internal systems as tool surfaces for AI agents. Evaluate, select, and operationalize LLM frameworks and orchestration platforms (e.g., LangChain, LangGraph, CrewAI, n8n, or custom solutions) for production‑grade agentic systems. Implement guardrails, evaluation harnesses, and feedback loops to ensure AI agent outputs are accurate, safe, and continuously improving. Champion the adoption of AI‑assisted development and operations practices across the SRE and broader engineering organization. Incident Management & Root Cause Analysis Support the operations team’s incident response efforts, conduct post‑mortems, and identify root causes to prevent recurrence. Leverage AI tools to accelerate incident timelines, auto‑generate post‑mortem drafts, and surface patterns across historical incidents. Document and share lessons learned, contributing to a culture of continuous improvement. Automation & Toil Reduction Identify repetitive operational workflows and engineer AI‑augmented or fully automated replacements. Build self‑service tools and chatbot interfaces that allow engineering teams to query system status, retrieve logs, and execute standard operating procedures through natural language. Measure and report on toil reduction metrics to quantify the impact of automation initiatives. Collaboration & Cross‑functional Support Work closely with developers, architects, and data/ML engineers to design solutions that improve reliability and leverage AI capabilities. Collaborate with DevOps and NOC teams to support the application platform. Communicate SRE practices, AI/automation capabilities, and operational insights to technical and non‑technical stakeholders. Provide feedback on application performance, potential improvements, and observability metrics. Why This Role Is Different This is not a traditional SRE position with AI bolted on as an afterthought. We are building a team that treats AI and agentic automation as core competencies—on par with Kubernetes expertise or observability design. You will have the autonomy to experiment with cutting‑edge AI tools, the backing of leadership to deploy them in production, and a mandate to measurably reduce operational toil through intelligent systems. What are we looking for? Core SRE & Infrastructure (Required) Degree in Computer Science or a related field, or equivalent professional experience. 5+ years in SRE, DevOps, or similar infrastructure roles with experience managing large‑scale, high‑availability production systems. 3+ years hands‑on experience managing production Kubernetes clusters, including deep understanding of architecture, networking, storage, and security. Experience with cluster autoscaling (Karpenter), upgrades, and multi‑cluster management. Proficiency with kubectl, Helm, Kubernetes operators, and container orchestration troubleshooting. Advanced expertise with the Grafana observability stack: dashboards, alerting, visualization, and Grafana Alloy for telemetry collection. Proficiency in PromQL and experience with Loki for log aggregation and analysis. Hands‑on experience managing Java‑based applications in distributed environments, including JVM tuning and optimization. Cloud platform expertise (AWS preferred; GCP or Azure also valued). Familiarity with Infrastructure as Code tools such as Terraform/Terragrunt or Ansible. ArgoCD proficiency for GitOps workflows and continuous deployment. Strong scripting abilities in Python, Bash, or Go, with experience building CI/CD pipelines and deployment automation. Proven track record with on‑call rotations, incident response, and root cause analysis. AI, Automation & Agentic Systems (Required) 1+ years of practical experience building or operating AI/LLM‑powered tools, agents, or workflows in a production or production‑adjacent context. Demonstrated ability to design agentic systems that use tool calling, retrieval‑augmented generation (RAG), or multi‑step reasoning to accomplish operational tasks. Experience integrating LLM APIs (e.g., Anthropic Claude, OpenAI, or open‑source models) into backend services or automation pipelines. Familiarity with at least one agentic orchestration framework or workflow engine (LangChain, LangGraph, CrewAI, n8n, Temporal, or equivalent). Understanding of prompt engineering best practices, including structured outputs, system prompts, and few‑shot examples. Familiarity with AI‑assisted coding tools (Claude Code, Codex, Cursor) and their integration into engineering workflows. Experience building or consuming MCP (Model Context Protocol) servers to expose internal tools to AI agents. Awareness of AI safety, hallucination mitigation, and human‑in‑the‑loop design patterns for autonomous systems. Preferred / Bonus Hands‑on experience with vector databases (Pinecone, Weaviate, pgvector) for RAG‑based knowledge retrieval. Experience with LLM evaluation frameworks (e.g., Galileo, LangSmith, Braintrust) for monitoring agent quality in production. Contributions to open‑source AI/ML or SRE tooling projects. Background in data engineering or ML pipelines that complements SRE responsibilities. Soft Skills Strong communication skills (written and verbal) with the ability to translate complex AI and infrastructure concepts for diverse audiences. Proactive problem‑solver with a bias toward automation and continuous improvement. Ability to mentor junior team members on both traditional SRE practices and emerging AI‑driven approaches. Positive attitude and openness to constructive feedback. What’s in it for you? We offer our employees more than just competitive compensation. Our team benefits include: Competitive pay and benefits Flexible vacation allowance A hybrid / remote working environment Startup culture backed by a secure, global brand Roster of Uniques We care deeply about every interaction our customers have with us, and trust and empower our staff to own and drive their experience. Our vision for our business and customers is built on fostering a diverse and inclusive work environment where regardless of background or beliefs you feel able to be authentic and bring all your talent into play. We want to celebrate you being you (we are an equal opportunity employer). #J-18808-Ljbffr Hard Rock Digital

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Florida, NY vacancy
  • $135.2k - $181.2k

     ...design, build, and support development pipelines; automate infrastructure and operations; create telemetry for monitoring; engineer high reliability and reinforce best practices to secure company data. Perform systems administration on Linux, Docker containers, and AWS... 
    Senior
    Work experience placement
    Worldwide

    The Walt Disney Company

    Florida, NY
    3 days ago
  •  ...demands extensive knowledge of AWS, Linux, and scripting languages like Python and Bash. Candidates must have a passion for designing reliable systems and an automation mindset to reduce operational load. The ideal applicant will have at least 8 years of experience in a... 
    Senior

    Success Tech Solutions Inc.

    Florida, NY
    4 days ago
  • $111.9k - $150k

     ...Disney Cruise Line - The Walt Disney Company is seeking a skilled Senior Platform Engineer to manage core platform capabilities that enhance magical Disney experiences across various sectors. This role involves designing and implementing data pipelines, DevOps automation... 
    Senior

    Disney Cruise Line

    Florida, NY
    20 hours ago
  • A leading construction firm is seeking a Project Principal to oversee projects in Miami and Palm Beach. This role demands strong leadership and project management skills, ensuring successful client and architect experiences. Candidates must have a Bachelor's degree in ...
    Senior

    Dowbuilt

    Florida, NY
    4 days ago
  • $130k - $160k

     ...Disney is seeking a skilled Senior Software Engineer for the Payments and Accounting Technology team in New York, Town of Florida. This role focuses on building and maintaining APIs and modern commerce platforms to enhance transaction experiences. Ideal candidates will... 
    Senior

    Disney

    Florida, NY
    3 days ago
  • SPACE EXPLORATION TECHNOLOGIES CORP, located in the Town of Florida, NY, is seeking a Sr. Launch Reliability Engineer. The role involves ensuring the reliability of launch systems, collaborating with various teams, and driving improvements in operational processes. Candidates... 
    Senior

    SPACE EXPLORATION TECHNOLOGIES CORP

    Florida, NY
    5 days ago
  • MissionHires is seeking an experienced developer in the Town of Florida, NY. The role involves designing and maintaining software solutions for the drayage and transportation industry. Ideal candidates will have over 5 years of experience in software development, particularly...
    Senior

    MissionHires

    Florida, NY
    3 days ago
  •  ...Snyk Ltd. is seeking a Senior Solutions Engineer for LATAM to own technical strategies and support enterprise sales. The ideal candidate will have 5–7 years of experience, multilingual fluency in Spanish, Portuguese, and English, and proven ability to define and present... 
    Senior
    Full time
    Remote work

    Snyk Ltd.

    Florida, NY
    4 days ago
  • $106.61k - $284.28k

     ...The Hispanic Alliance for Career Enhancement is seeking a seasoned Cloud Platform Engineer to lead the evolution and maintenance of our high-availability cloud infrastructure. This role requires expertise in Kubernetes and a holistic understanding of cloud engineering... 
    Senior
    Full time

    Hispanic Alliance for Career Enhancement

    Florida, NY
    3 days ago
  •  ...modern workplace can be. Come join us in building something that's already changing how the world works. Seeking Top-Notch Go and Rust Engineers: We are looking for exceptional Go and Rust engineers with 6+ years of hands-on software development experience, ideally with a... 
    Senior

    Capitolis

    Florida, NY
    3 days ago
  • $111.7k - $167.5k

    A leading aerospace and defense company is seeking a Sr. Principal Engineer Quality in New York. This role involves developing quality systems, collaborating on standards, and leading initiatives for continuous improvement. The ideal candidate has a Bachelor’s in STEM... 
    Senior
    Relocation package

    Northrop Grumman Corp. (JP)

    Florida, NY
    2 days ago
  • Fairygodboss is seeking a Senior Staff Database Reliability Engineer to set technical direction and ensure the reliability of SQL Server and PostgreSQL platforms across cloud and hybrid deployments. As a key contributor, you will influence architecture decisions and lead... 
    Senior

    Fairygodboss

    Florida, NY
    2 days ago
  •  ...LEN Lennar Corporation is seeking a Sr Software Engineer specialized in Salesforce solutions. This role entails designing and implementing technology solutions while collaborating with the Salesforce Engineering team to ensure quality and security. The ideal candidate... 
    Senior
    Remote work

    LEN Lennar Corporation

    Florida, NY
    4 days ago
  • Overview Speechify expands and our Platform team seeks a Senior Software Engineer. This role is central to ensuring our success at Speechify by working on key features like Payments, Analytics, Subscriptions, and our API. If you are passionate about strategizing, enjoy... 
    Senior

    Letit

    Florida, NY
    3 days ago
  • Letit is seeking a Senior Software Engineer to join their Platform team in New York. This role is vital for developing key features including Payments, Analytics, and Subscriptions. The ideal candidate will manage backend APIs and enhance performance while focusing on... 
    Senior

    Letit

    Florida, NY
    2 days ago
  •  ...Fairygodboss is seeking a Senior Manager of Software Engineering to lead multiple teams within the Payroll domain. This role requires a strategic leader with extensive experience in .NET and C#, guiding engineering teams delivering scalable payroll solutions. The successful... 
    Senior

    Fairygodboss

    Florida, NY
    3 days ago
  • $89.3k - $148.8k

     ...It's Time to Join Stryker! Are you looking for an opportunity to apply your software engineering talent in a domain that is shaping the future of surgery? As a Senior Software Engineer, Applications on Stryker’s Mako SmartRobotics team, you will help design and develop... 
    Senior

    Stryker Group LLC

    Florida, NY
    19 hours ago
  •  ...A leading tech company is looking for a Senior Front-End Engineer to join their team remotely. The role involves upgrading and maintaining a multifaceted SaaS platform. Ideal candidates will have over 7 years of experience in software engineering, particularly with AngularJS... 
    Senior
    Remote work

    SherlockTalent

    Florida, NY
    3 days ago
  •  ...Handtevy-Pediatric Emergency Standards is hiring a Senior Software Engineer to focus on Enterprise API Integrations. This role involves designing...  ...integration platform, ensuring contracts, security, and reliability. Candidates should have a bachelor's degree in Computer... 
    Senior

    Handtevy-Pediatric Emergency Standards

    Florida, NY
    3 days ago
  • A software development company in New York is seeking an experienced iOS Developer to design, develop, and maintain high-quality mobile applications for Apple devices. The ideal candidate must possess a strong understanding of Swift and Objective-C, with at least 6 years...
    Senior

    STRATIS Cloud Tech Solutions INC

    Florida, NY
    4 days ago
  • Cervin is seeking a Senior Solutions Engineer to act as a trusted technical advisor to key Enterprise customers. This role involves driving technical discovery and collaborating closely with sales to land and expand customer accounts. The ideal candidate will have over... 
    Senior

    Cervin

    Goshen, NY
    2 days ago
  • $130k - $160k

    Disney Cruise Line - The Walt Disney Company is inviting applications for a Senior Software Engineer to join their Payments and Accounting Technology team. This role focuses on enhancing digital solutions integrated with core financial operations across platforms. The... 
    Senior

    Disney Cruise Line - The Walt Disney Company

    Florida, NY
    1 day ago
  •  ...A defense technology company is seeking a Data Engineer III to design and implement medallion architectures using Databricks. The role involves developing automated data pipelines and implementing MLOps best practices. Candidates should have strong data engineering skills... 
    Senior

    Agile Defense

    Florida, NY
    3 days ago
  • $106.61k - $284.28k

    The Hispanic Alliance for Career Enhancement is seeking a seasoned Cloud Platform Engineer to lead the evolution and maintenance of our high-availability cloud infrastructure. This role requires expertise in Kubernetes and a holistic understanding of cloud engineering.... 
    Senior
    Full time

    Hispanic Alliance for Career Enhancement

    Florida, NY
    2 days ago
  • $192k

    Carrier is seeking a Project Developer to direct the development of complex projects. This role includes providing technical and financial support for sales activities, building financial models, and preparing customer presentations. The ideal candidate has at least 5 ...
    Senior

    Carrier

    Florida, NY
    1 day ago
  •  ...Stryker Group is seeking a Senior Software Engineer, Applications to join their Mako SmartRobotics team in Florida, USA. This role involves designing and developing software for robotic-assisted surgery products, requiring hands-on interaction with robotic systems and... 
    Senior

    Stryker Group LLC

    Florida, NY
    20 hours ago
  • $135k - $165k

    Nucleus Security in Florida is seeking a Senior Software Engineer to lead the lifecycle of AI-driven features. This role focuses on evolving their semantic data layer for scalable AI and analytics use cases while collaborating across teams. The ideal candidate will have... 
    Senior
    Remote job

    Nucleus Security

    Florida, NY
    5 days ago
  • $135k - $145k

    PAE Government Services Inc. is seeking a skilled software engineer based in New York. The role involves designing, developing, and troubleshooting software programs, alongside leading testing efforts and managing technical proposals. A Bachelor's in Computer Science and... 
    Senior

    PAE Government Services Inc.

    Florida, NY
    2 days ago
  • Itlearn360 is seeking a Salesforce Developer located in Juno Beach, FL. The role requires over 12 years of Salesforce development experience, with a focus on Field Service Lightning in a utilities context. Proficiency in Apex, Lightning Web Components, and Salesforce APIs...
    Senior

    Itlearn360

    Florida, NY
    4 days ago
  •  ...Speechify Inc: Senior Software program Engineer, Platform As Speechify expands, our Platform staff seeks a Senior Software program Engineer. This position is central to making sure our success at Speechify by engaged on key options like: Funds, Analytics, Subscriptions... 
    Senior
    Work at office

    The10minutecareersolution

    Florida, NY
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!