Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Infrastructure & Reliability Engineer - AI Platform

$157.7k - $277.8k

Writer

Location New York City, NY Employment Type Full time Location Type Hybrid Department Engineering, product & design Compensation SF & NYC Base Compensation $157.7K – $277.8K • Offers Equity WRITER is committed to transparent, market-based compensation practices. Compensation offered will be determined by multiple factors such as role scope and complexity, location, experience, knowledge, and skills. Cash compensation is only one part of WRITER’s competitive Total Rewards package, which can also include equity, an immersive purpose-driven culture, career development, and thoughtfully designed benefits and well-being offerings. About WRITER WRITER is where the world's leading enterprises orchestrate AI-powered work. Our vision is to expand human capacity through superintelligence. And we're proving it's possible – through powerful, trustworthy AI that unites IT and business teams together to unlock enterprise-wide transformation. With WRITER's end-to-end platform, hundreds of companies like Mars, Marriott, Uber, and Vanguard are building and deploying AI agents that are grounded in their company's data and fueled by WRITER's enterprise-grade LLMs. Valued at $1.9B and backed by industry-leading investors including Premji Invest, Radical Ventures, and ICONIQ Growth, WRITER is rapidly cementing its position as the leader in enterprise generative AI. Founded in 2020 with office hubs in San Francisco, New York City, Austin, Chicago, and London, our team thinks big and moves fast, and we're looking for smart, hardworking builders and scalers to join us on our journey to create a better future of work with AI. About the role At WRITER, our mission to expand human capacity with super intelligence relies on a foundational truth: our platform must be available, performant, and reliable, 24/7. As an Infrastructure engineer, you'll be at the heart of making this a reality, impacting every enterprise customer who trusts us with their AI-powered workflows. This isn't just about keeping the lights on; it's about pushing the boundaries of what's possible, proactively identifying and solving complex systemic challenges, and laying the groundwork for our rapid growth and the evolving demands of enterprise generative AI. You'll build resilient systems, automate across the stack, and champion reliability best practices, directly enabling our ambitious product roadmap and ensuring our customers always have access to the powerful tools they need. This is a hybrid position, based out of our New York City hub. You'll report to our director of engineering. ♀️ What you'll do Use and build AI native approaches for operational tasks and infrastructure management and platforms using Python, Go, or similar languages, significantly reducing manual toil across our production environment Design and implement scalable, fault-tolerant infrastructure AI solutions on public cloud providers (AWS, GCP, Azure) to support WRITER's rapidly expanding, high-traffic AI platform Own the reliability, performance, and efficiency of WRITER’s core services, defining and upholding stringent Service Level Objectives (SLOs) and Error Budgets Own the observability stack for monitoring, logging, and alerting systems to ensure rapid detection of issues across our complex distributed systems Lead incident response, post-mortems, and root cause analyses, applying learnings to proactively prevent future outages and build a more resilient system architecture Collaborate closely with product and engineering teams, providing expert guidance on system design for reliability, performance, and scalability from conception through launch ⭐️ What you need A solid 7+ years of experience in Infrastructure engineering, DevOps, Production engineering, Cloud platform or a similar role focused on building and operating large-scale, high-availability production systems Deep expertise with cloud platforms (AWS strongly preferred), containerization technologies like Docker and Kubernetes, and Infrastructure-as-Code tools such as Terraform Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance Demonstrated ability to Challenge the status quo, proactively identify systemic weaknesses, and propose innovative solutions to complex reliability problems Excellent communication, collaboration, and problem-solving skills, with a talent for building strong relationships and Connecting with cross-functional teams A strong sense of ownership and accountability, eager to Own mission-critical systems and drive them toward peak performance and unparalleled reliability Benefits & perks (US Full-time employees) Generous PTO, plus company holidays Medical, dental, and vision coverage for you and your family Paid parental leave for all parents (16 weeks) Fertility and family planning support Early-detection cancer testing through Galleri Flexible spending account and dependent FSA options Health savings account for eligible plans with company contribution Annual work-life stipends for: Wellness stipend for gym, massage/chiropractor, personal training, etc. Learning and development stipend Company-wide off-sites and team off-sites Competitive compensation, company stock options and 401k WRITER is an equal-opportunity employer and is committed to diversity. We don't make hiring or employment decisions based on race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other basis protected by applicable local, state or federal law. Under the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. By submitting your application on the application page, you acknowledge and agree to WRITER's Global Candidate Privacy Notice . Compensation Range: $157.7K - $277.8K #J-18808-Ljbffr Writer

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Infrastructure & Reliability Engineer - AI Platform in San Francisco, CA vacancy
  • $232k - $319k

     ...Every Identity, from AI to Human Identity is...  ...the trusted, neutral infrastructure that enables organizations...  ...The Infrastructure Platform and Shared Services Team...  ...with great people and reliable, cost-effective, and efficient...  ...and product engineering Build a world-class... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    2 days ago
  •  ...builds, and operates critical infrastructure that enables research at...  ...our workloads, while remaining reliable and easy to use. About the Role...  ...experienced Site Reliability Engineer to own production-critical...  ...About OpenAI OpenAI is an AI research and deployment company... 
    Suggested

    OpenAI

    San Francisco, CA
    2 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded by Nat...  ...deliver compute when and where it’s needed most. Our platform routes training and inference jobs across global... 
    Senior
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    2 days ago
  • A cutting-edge AI startup in San Francisco is seeking a Senior Infrastructure Engineer to build platforms for AI agents. Your role will involve creating systems that other engineers rely on, ensuring reliability and fast deployment. You'll work with technologies like Python... 
    Senior

    Giga

    San Francisco, CA
    16 hours ago
  • $196k - $245k

     ...everyone does on our platform: play video games. Over...  ...games. Our Platform Infrastructure teams are responsible...  ...ensuring Discord remains reliable, efficient, and scalable. As a Senior Software Engineer on these teams, you...  ...Experience utilizing AI tools like Claude Code... 
    Senior
    Full time
    Relocation
    Relocation package

    Discord

    San Francisco, CA
    2 days ago
  • $100k - $250k

    A leading AI software company in San Francisco is seeking a Senior Infrastructure Engineer to build the infrastructure for AI software development. You will work on components like AI agents and app hosting while ensuring scalable services. The ideal candidate has strong... 
    Senior

    Hercules

    San Francisco, CA
    16 hours ago
  • $250k - $350k

    Senior Software Engineer - Infrastructure/Platform — AfterQuery Location: San Francisco, CA (Onsite) Compensation: $250...  ...AfterQuery AfterQuery is a frontier AI research lab focused on pushing...  ...shared engineering infrastructure Reliability, monitoring, observability, and fault... 
    Senior
    Full time
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    4 days ago
  • Senior Software Engineer, Infrastructure & Platform Role Overview: As a Senior Software Engineer, Infrastructure & Platform...  ...pipelines used to train frontier AI models. This is a highly technical...  ..., ensuring systems are scalable, reliable, and capable of supporting... 
    Senior

    AfterQuery

    San Francisco, CA
    3 days ago
  • $216k - $270k

     ...As a Software Engineer on the Machine Learning Infrastructure team, you will build the "Operating...  ...-performance training platform that handles the immense...  ...orchestration, networking, and reliability challenges that emerge...  ...into breakthrough AI. You will: Architect... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    6 days ago
  • Monaco is building an AI-native revenue platform that replaces the fragmented GTM stack (CRM, sequencing...  ...design across both product and infrastructure, software supply chain security, and...  ...least resistance for a fast-moving engineering team handling sensitive revenue data... 
    Senior
    Work at office
    Shift work

    Monaco

    San Francisco, CA
    16 hours ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research...  ...Scale, our mission is to develop reliable AI systems for the world's most important... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    6 days ago
  •  ...in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses...  ...reliability and performance of an AI-powered code review platform. The...  ..., strong knowledge of GCP and infrastructure as code using Terraform. It offers... 
    Senior

    CodeRabbit

    San Francisco, CA
    16 hours ago
  • OpenArt AI in San Francisco is seeking a Senior Platform & Reliability Engineer to design and improve the reliability of its infrastructure. The role emphasizes building and operating production systems while collaborating with product engineers to ensure platform scalability... 
    Senior

    OpenArt AI

    San Francisco, CA
    1 day ago
  • $200k - $250k

     ...Vizcom is a visual creation platform that combines modern web tooling with AI-powered workflows. Our stack...  ...Kubernetes-based production infrastructure. We’re hiring a senior owner of stability and infrastructure...  ...to ensure the platform is reliable, fast, and resilient as we... 
    Senior
    Permanent employment

    Vizcom

    San Francisco, CA
    2 days ago
  • Overview Senior Platform & Reliability Engineer OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We’re building the...  ...design, scale, and improve the reliability of our infrastructure, from architectural decisions to hands-on implementation... 
    Senior
    Remote work
    Worldwide
    Visa sponsorship

    OpenArt AI

    San Francisco, CA
    1 day ago
  •  ...identity security, delivering an AI-powered platform that governs and secures...  .... As a Staff Platform Engineer, you will play a critical...  ...leadership role. You will own reliability for major platform domains...  ...maintaining the shared infrastructure services and platforms... 
    Senior

    Saviynt

    San Francisco, CA
    2 days ago
  •  ...This is a job that Jill, our AI Recruiter, is recruiting for on behalf of one of our customers. She will pick...  ...network The next step is to speak to Jack. Job Title: Senior Platform and Infrastructure Engineer Company Description: Context - Lux Capital and... 
    Senior
    Live in

    Jack and Jill AI

    San Francisco, CA
    4 days ago
  • Rippling is hiring a Senior Staff Software Engineer in San Francisco to lead the development of large-scale distributed systems and platform initiatives. This role requires at least 10 years...  ...expertise in building scalable infrastructure and a strong understanding of... 
    Senior

    Rippling

    San Francisco, CA
    4 days ago
  • $200k - $265k

     ...leading healthcare technology firm is seeking a Senior Software Engineer to design and maintain the infrastructure that empowers healthcare providers. This role involves...  ...software engineering and experience with cloud platforms, containers, and databases. The position offers... 
    Senior

    Ambience Healthcare, Inc.

    San Francisco, CA
    16 hours ago
  • A healthcare technology company is seeking a Senior Software Engineer to design and maintain core platform infrastructure. This role involves significant responsibility in ensuring system scalability and resilience while leading platform initiatives. Candidates should... 
    Senior
    Remote work

    Ambience Healthcare

    San Francisco, CA
    2 days ago
  • Monaco, based in San Francisco, seeks a Security Engineer to own the security posture of their AI-native platform. In this unique role, you will manage compliance, design secure systems, and ensure the integrity of software supply chains in a fast-paced environment. The... 
    Senior

    Monaco

    San Francisco, CA
    16 hours ago
  • A leading tech company is seeking an Infrastructure Engineer to build and scale its core platform powering AI systems. The role involves designing Kubernetes and Terraform...  ...for security and performance, and ensuring reliability. Ideal candidates will have over 5 years of... 
    Senior

    Brain Co.

    San Francisco, CA
    4 days ago
  • The Role You'll own the internal developer platform that every engineer at Monaco builds on - the systems, environments, and tooling that turn an idea...  ...and ship, and you'll get to do it at a company that treats AI-assisted development as a first-class part of the workflow,... 
    Senior
    Work at office
    Local area
    Remote work

    Monaco

    San Francisco, CA
    4 days ago
  • A rapidly growing data company in San Francisco is seeking a Senior Engineer specializing in data infrastructure to drive the technical direction of their data platform. In this role, you'll design robust systems for data ingestion and transformation while partnering closely... 
    Senior

    Middesk

    San Francisco, CA
    16 hours ago
  • $180k - $220k

     ...Senior Software Engineer – Infrastructure/Platform — AfterQuery Location: San Francisco, CA (Onsite) Compensation...  ...AfterQuery is a high-growth AI research company building training...  ..., fault tolerance, and production reliability Long-term platform architecture... 
    Senior
    Full time
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    19 days ago
  •  ...Title : SRE Infrastructure Engineer Location : SFO, CA (5 Days Onsite) Job...  ...experience ensuring the reliability, scalability, and performance...  ...Engineer, Google Cloud Engine AI SRE at Google: Focus...  ...specifically with Google Cloud Platform. · Technical Skills: Deep... 

    OJUS LLC

    San Francisco, CA
    2 days ago
  • $174k - $252k

    Senior Software Engineer, Infrastructure, Google Cloud Platforms Google - Sunnyvale, CA, USA; San Francisco, CA, USA Requirements Bachelor’s degree or equivalent practical experience. 5 years of experience with software development in one or more programming languages... 
    Senior
    Full time

    Google Inc.

    San Francisco, CA
    3 days ago
  • $163k - $203k

    GoTo Meeting is looking for a Senior Site Reliability Engineer in San Francisco. You will be responsible for...  ..., and security of Prosper’s Cloud Platform portfolio. This role requires expertise...  ...mentor junior engineers and implement AI-driven operations. Benefits include a... 
    Senior

    GoTo Meeting

    San Francisco, CA
    2 days ago
  •  ...The TeamPlatform Engineering is the department within...  ...a range of critical infrastructure and operational functions...  ...that ensure cluster reliability and security (e.g.,...  ...cloud infrastructure platforms, including AWS, GCP,...  ...the database for the AI era, enabling innovators... 
    Senior
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    1 day ago
  • A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates... 
    Senior

    OpenAI

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Infrastructure & Reliability Engineer - AI Platform. Be the first to apply!