Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI/ML Infra & SRE Engineer

AI Chopping Block, Inc.

Senior Infrastructure Engineer – Bland As a Senior Infrastructure Engineer at Bland, responsibilities include contributing to the design of scalable architecture by building distributed systems using Kubernetes that handle high-volume, real-time voice processing with strict latency and reliability requirements; building and supporting machine learning infrastructure including training pipelines and real-time inference serving across multiple regions; maintaining robust integrations with enterprise telephony systems, SIP trunks, and VoIP infrastructure; identifying architectural flaws and solving them; ensuring platform reliability through monitoring, alerting, and incident response systems to maintain enterprise-grade uptime; anticipating and solving scaling challenges related to exponential call volume growth; and implementing security best practices and compliance requirements for enterprise customers in regulated industries. Lead – AI/ML Stack Infrastructure Lead the team responsible for the infrastructure supporting AI/ML Stack, focusing on scalability and efficiency of the Machine Learning Operations platform. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment across business units, balancing short-term tactical deliveries with long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically to support existing services and execute key strategic initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify pain points, remove bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for ML Engineers to manage large datasets and artifacts efficiently. Modernize the AI product inference stack for significant growth in global deployments. Work with Site Reliability Engineering to establish comprehensive system observability metrics. Conduct assessments for technology refresh and benchmark proprietary tools against commercial and open-source alternatives to meet future needs. Infrastructure Engineer – AI/ML Workflows The Infrastructure Engineer is responsible for building robust, secure, and scalable cloud infrastructure to support AI and machine learning workflows. This includes designing, building, and deploying cloud infrastructure, partnering with technical and non-technical stakeholders from idea generation through implementation and shipping, enabling Machine Learning Engineers and Data Scientists by contributing to internal best practices, standards, and reusable code repositories, proactively identifying and recommending ways customers can leverage cloud infrastructure to solve key challenges, creating and maintaining reusable, company-wide libraries and infrastructure-as-code, and researching and integrating the best open-source technologies to enhance Faculty's infrastructure capabilities. Staff DevOps Engineer – AI Workloads The Staff DevOps Engineer will design and architect secure, scalable cloud and edge infrastructure for deploying AI workloads across multi-cloud and hybrid environments. They will build and maintain production-grade Infrastructure as Code using tools like Terraform, Ansible, or Pulumi, managing over 100 resources with GitOps workflows and automated validation. The role includes designing and operating production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing container security, multi-tenancy, and resource optimization. They will implement secure CI/CD pipelines with integrated security controls and automated deployment workflows for containerized AI models. The engineer will lead MLOps infrastructure initiatives including model deployment pipelines, versioning, feature stores, experiment tracking, and monitoring for model performance and drift. Responsibilities also include designing comprehensive observability and monitoring solutions using tools like Prometheus, Grafana, ELK, or Datadog with distributed tracing, application performance monitoring, and real-time alerting. They will implement security best practices such as least-privilege access, encryption at rest and in transit, network segmentation, and automated compliance validation. The engineer will lead incident response and reliability initiatives, participate in on-call rotation, conduct post-mortems, and drive continuous improvement for system reliability. Architecting disaster recovery and business continuity strategies with automated backup, failover, and recovery processes is required. They will develop reusable infrastructure modules and templates to accelerate environment provisioning and standardize deployment patterns. Mentoring mid-level and senior engineers on cloud architecture, DevOps best practices, and platform reliability through design reviews and technical guidance is part of the role. They will also drive technical documentation and knowledge sharing including runbooks, architecture decision records, and infrastructure standards. Site Reliability Engineer, Inference Infrastructure As a Site Reliability Engineer on the Model Serving team, you will build self-service systems that automate managing, deploying, and operating services, including custom Kubernetes operators supporting language model deployments. You will automate environment observability and resilience, enabling all developers to troubleshoot and resolve problems, and take steps to ensure defined SLOs are met, including participating in an on-call rotation. Additionally, you will build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback, as well as develop the team through knowledge sharing and an active review process. Location San Francisco or New York, United States #J-18808-Ljbffr AI Chopping Block, Inc.

Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Senior AI/ML Infra & SRE Engineer in San Francisco, CA vacancy
  • Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing...  ...in storage engineering and a strong background in AI/ML workloads. The position offers competitive salary and... 
    Senior
    Remote job

    Hamilton Barnes Associates Limited

    San Francisco, CA
    17 hours ago
  • TRM Labs is looking for a Senior or Staff ML Systems Engineer to focus on building and scaling the technical infrastructure for AI/ML systems in San Francisco. This position involves developing reusable CI/CD workflows and automating model versioning to ensure compliance... 
    Senior

    TRM Labs

    San Francisco, CA
    17 hours ago
  • $250k - $280k

     ...Staff / Principal Founding Engineer (Backend-Leaning) - AI Systems Platform San Francisco (in-office) $...  ...(Python, TypeScript, APIs, AWS, cloud infra) Background in 0→1 or early-stage startup...  ...AI systems, agent frameworks, or data/ML pipelines, come from top 5 CS... 
    Senior
    Work at office
    Immediate start
    Flexible hours

    Xpertalent

    San Francisco, CA
    1 day ago
  • $200k - $275k

     ...through data pipelines, and powers ML models that clinicians rely on...  ...our data scientists and ML engineers to build and operate the infrastructure...  .... Why You You've built AI infrastructure at a startup...  ...-stage startup where you owned infra end-to-end This Role Is... 
    Suggested
    Home office
    Day shift

    Healthleap AI

    San Francisco, CA
    1 day ago
  • $148.5k - $260.1k

     ...re looking for a seasoned DevOps Database Engineer who thrives at the intersection of database...  ...modern cloud infrastructure, and applied AI. You’ll own the reliability, performance,...  ...our database systems while leveraging AI/ML tools to drive automation, intelligent monitoring... 
    Senior

    Centaur Labs

    San Francisco, CA
    3 days ago
  •  ...interact with the web by building AI agents that can reliably do...  ...Responsibilities: Scale infra for post-training of multimodal...  ...agent Work closely with product engineers to translate cutting-edge AI...  ...Looking For: Experience with ML infrastructure (GPU clusters)... 
    Work at office
    Relocation
    Visa sponsorship

    Yutori

    San Francisco, CA
    22 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda...  .... The Role This is not a generalist SRE role. You will design, operate, and...  ..., networking, orchestration, and ML frameworks. Drive blameless post‑... 
    Senior
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    3 days ago
  •  ...role offers significant autonomy and the chance to influence investment strategies through innovative systems. You'll collaborate directly with the team on cutting-edge projects involving machine learning and AI. Competitive compensation is provided. #J-18808-Ljbffr... 
    Senior
    Remote work

    One Concern

    San Francisco, CA
    17 hours ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Senior

    Hyperbolic Labs

    San Francisco, CA
    17 hours ago
  •  ...Gravity Engineering Services Pvt Ltd. is looking for a Product Manager to own the AI platform product roadmap and drive the strategy for AI operationalization. You will collaborate...  ...of experience in product management within ML/AI or cloud services and possesses a strong... 
    Senior

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    1 day ago
  • $150k - $250k

    Foundry Robotics Inc. is looking for a Senior Software Engineer to join their team in San Francisco. This vital role focuses on building cloud-based backend systems, infrastructure, and ensuring data integrity in advanced robotics manufacturing. The successful candidate... 
    Senior

    Foundry Robotics Inc.

    San Francisco, CA
    4 days ago
  • A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes...  .... Join a dynamic team aiming at advancements in AI and ML infrastructure. #J-18808-Ljbffr Perplexity

    Perplexity

    San Francisco, CA
    1 day ago
  • $160k - $235k

     ...Senior AI Engineer, AI Platform San Francisco, CA; USA (Remote) Affinity stitches together billions...  ...Partner with cross-functional (product, infra, data engineering, and software...  ...underlie all of our data processing and ML Operations. Qualifications Don't... 
    Senior
    Work at office
    Remote work
    Worldwide
    Flexible hours
    2 days per week
    3 days per week

    Affinity

    San Francisco, CA
    2 days ago
  • $159k - $278.25k

     ...'ll play the role of a backend engineer working across the stack (backend...  ...to contribute to more ML specific tasks. Successful individuals...  ...Work with other teams to build AI & data‑driven GTM products Monitor...  ...of the stack (foundational infra, backend, UX). You should be able... 
    Senior
    Work at office
    3 days per week

    Rippling

    San Francisco, CA
    3 days ago
  •  ...achieve net zero by 2050. The role As an AI Engineer, you will play a pivotal role in integrating...  ...models (LLMs) and machine learning (ML) solutions into our platform and internal...  ...pipelines and vector databases. Our Tech Stack Infra: AWS, Fargate, Redis, PostgreSQL, SQS,... 
    Senior

    WithClutch

    San Francisco, CA
    4 days ago
  • $131.4k - $235.95k

    Autodesk, Inc. is seeking a Senior Machine Learning Engineer for MLOps in San Francisco. You will ensure AI-powered experiences meet high standards for reliability and scalability. Key responsibilities include automating model testing, managing inference services, and integrating... 
    Senior

    Autodesk, Inc.

    San Francisco, CA
    1 day ago
  •  ...Senior AI Engineer Disney Entertainment and ESPN Product & Technology Technology is at the heart...  ...~5+ years of backend or applied AI/ML engineering with a track record of building...  ...skills; able to work across infra, data, security, and product teams. ~... 
    Senior

    The Walt Disney Studios

    San Francisco, CA
    2 days ago
  •  ...technology company based in Seattle is looking for a Senior Machine Learning Engineer who will design and implement AI-driven solutions for optimizing their...  ...research. Ideal candidates will bring over 8 years of ML experience, proficiency in PyTorch or TensorFlow,... 
    Senior

    DocuSign, Inc.

    San Francisco, CA
    1 day ago
  • $180k - $247.5k

    A leading data and AI company located in California is looking for an experienced Product...  ...the product roadmap, collaborate with engineering teams, and engage directly with enterprise...  ...product management, and familiarity with ML/AI infrastructure. This position offers a... 
    Senior

    Databricks

    San Francisco, CA
    4 days ago
  •  ...Senior AI/ML Engineer — LLM & Agent Stack Every production AI system, whether it's powering customer support, writing code, analyzing financial...  ...with substantial experience building distributed systems, infra, or ML platforms. ~ Deep practical experience integrating... 
    Senior

    TrueFoundry

    San Francisco, CA
    2 days ago
  • $202k

    About the Role (Sr AI/ML Engineer : Not Data Scientist) Core Security Engineering’s mission is to make the Uber production environment secure...  ..., Authorization). Collaborate across Security, Risk, and Infra teams to deliver scalable, production-ready solutions. Provide... 
    Senior
    Full time

    Uber

    San Francisco, CA
    3 days ago
  • $125k - $225k

    A leading AI infrastructure company is seeking a Senior Backend Engineer to build core backend services for a high-scale observability platform. The ideal candidate...  ...This role involves designing APIs specifically for ML workflows and handling real-time data pipelines. Embracing... 
    Senior
    Remote work

    Space Executive

    Berkeley, CA
    17 hours ago
  • OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes... 
    Senior
    Flexible hours

    OutSystems, Inc.

    San Francisco, CA
    3 days ago
  • $139.2k - $174k

    A leading cloud services provider is looking for a Senior Engineer 2 to join their AI Infrastructure Control Plane team. This role involves architecting...  ..., along with significant experience in building AI/ML products. The position offers a compensation range of $139... 
    Senior
    Remote work

    DigitalOcean

    San Francisco, CA
    17 hours ago
  • $178.6k - $268k

     ...Data Experiences Product Team. This role involves partnering with engineering to enhance user experiences, defining success metrics, and...  ...experience in Product Management, strong communication skills, and AI/ML product expertise. The role offers hybrid work options and... 

    Stripe

    San Francisco, CA
    4 days ago
  • Epoch Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and improve...  ...decisions while ensuring high availability for AI workloads. The ideal candidate has over 8 years of SRE experience, deep knowledge of Kubernetes,... 
    Senior

    Epoch Biodesign

    San Francisco, CA
    2 days ago
  • $300k

     ...mode startup building out their AI and cloud platform, powered by...  ..., or inference. As a Platform Engineer/Senior Site Reliability Engineer, you...  .... Collaborate with ML, networking, and platform teams...  ...~7+ years of experience in SRE, DevOps, or Infrastructure Engineering... 
    Senior
    Permanent employment
    San Francisco, CA
    more than 2 months ago
  • A cutting-edge AI firm in San Francisco is seeking a talented engineer to design and implement robust CI/CD pipelines for machine learning workflows. The ideal candidate will have a bachelor's degree in Computer Science or a related field, with at least 3 years of experience... 

    Pantera Capital

    San Francisco, CA
    3 days ago
  • A forward-thinking AI startup in healthcare is seeking a Senior Forward Deployed Engineer in San Francisco. The role involves deploying AI solutions within critical healthcare...  .... Candidates should have a strong background in ML/AI product development, excellent communication... 
    Senior

    Slope

    San Francisco, CA
    4 days ago
  •  ...A tech company specializing in AI infrastructure is seeking a Senior Product Manager for Runpod Anywhere. The role involves leading the entry into a new market and building a product line from scratch, along with engaging customers to refine product strategy. Applicants... 
    Senior
    Remote work

    Runpod

    San Francisco, CA
    17 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI/ML Infra & SRE Engineer. Be the first to apply!