Senior AI/ML Infra & SRE Engineer
AI Chopping Block, Inc.
Senior Infrastructure Engineer – Bland As a Senior Infrastructure Engineer at Bland, responsibilities include contributing to the design of scalable architecture by building distributed systems using Kubernetes that handle high-volume, real-time voice processing with strict latency and reliability requirements; building and supporting machine learning infrastructure including training pipelines and real-time inference serving across multiple regions; maintaining robust integrations with enterprise telephony systems, SIP trunks, and VoIP infrastructure; identifying architectural flaws and solving them; ensuring platform reliability through monitoring, alerting, and incident response systems to maintain enterprise-grade uptime; anticipating and solving scaling challenges related to exponential call volume growth; and implementing security best practices and compliance requirements for enterprise customers in regulated industries. Lead – AI/ML Stack Infrastructure Lead the team responsible for the infrastructure supporting AI/ML Stack, focusing on scalability and efficiency of the Machine Learning Operations platform. Develop and execute the long-term vision and roadmap for the MLOps team to support ML development and deployment across business units, balancing short-term tactical deliveries with long-term architectural transformation. Manage and mentor a team of 6-7+ engineers, allocating resources strategically to support existing services and execute key strategic initiatives. Collaborate cross-functionally with leaders in machine learning, data science, product engineering, and infrastructure to identify pain points, remove bottlenecks, and facilitate new solution deployment. Architect compute and storage pipelines for ML Engineers to manage large datasets and artifacts efficiently. Modernize the AI product inference stack for significant growth in global deployments. Work with Site Reliability Engineering to establish comprehensive system observability metrics. Conduct assessments for technology refresh and benchmark proprietary tools against commercial and open-source alternatives to meet future needs. Infrastructure Engineer – AI/ML Workflows The Infrastructure Engineer is responsible for building robust, secure, and scalable cloud infrastructure to support AI and machine learning workflows. This includes designing, building, and deploying cloud infrastructure, partnering with technical and non-technical stakeholders from idea generation through implementation and shipping, enabling Machine Learning Engineers and Data Scientists by contributing to internal best practices, standards, and reusable code repositories, proactively identifying and recommending ways customers can leverage cloud infrastructure to solve key challenges, creating and maintaining reusable, company-wide libraries and infrastructure-as-code, and researching and integrating the best open-source technologies to enhance Faculty's infrastructure capabilities. Staff DevOps Engineer – AI Workloads The Staff DevOps Engineer will design and architect secure, scalable cloud and edge infrastructure for deploying AI workloads across multi-cloud and hybrid environments. They will build and maintain production-grade Infrastructure as Code using tools like Terraform, Ansible, or Pulumi, managing over 100 resources with GitOps workflows and automated validation. The role includes designing and operating production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing container security, multi-tenancy, and resource optimization. They will implement secure CI/CD pipelines with integrated security controls and automated deployment workflows for containerized AI models. The engineer will lead MLOps infrastructure initiatives including model deployment pipelines, versioning, feature stores, experiment tracking, and monitoring for model performance and drift. Responsibilities also include designing comprehensive observability and monitoring solutions using tools like Prometheus, Grafana, ELK, or Datadog with distributed tracing, application performance monitoring, and real-time alerting. They will implement security best practices such as least-privilege access, encryption at rest and in transit, network segmentation, and automated compliance validation. The engineer will lead incident response and reliability initiatives, participate in on-call rotation, conduct post-mortems, and drive continuous improvement for system reliability. Architecting disaster recovery and business continuity strategies with automated backup, failover, and recovery processes is required. They will develop reusable infrastructure modules and templates to accelerate environment provisioning and standardize deployment patterns. Mentoring mid-level and senior engineers on cloud architecture, DevOps best practices, and platform reliability through design reviews and technical guidance is part of the role. They will also drive technical documentation and knowledge sharing including runbooks, architecture decision records, and infrastructure standards. Site Reliability Engineer, Inference Infrastructure As a Site Reliability Engineer on the Model Serving team, you will build self-service systems that automate managing, deploying, and operating services, including custom Kubernetes operators supporting language model deployments. You will automate environment observability and resilience, enabling all developers to troubleshoot and resolve problems, and take steps to ensure defined SLOs are met, including participating in an on-call rotation. Additionally, you will build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback, as well as develop the team through knowledge sharing and an active review process. Location San Francisco or New York, United States #J-18808-Ljbffr AI Chopping Block, Inc.
- Drata is seeking a Senior Platform AI Engineer in San Francisco to develop our AI infrastructure, responsible for building and managing the systems... ...experience in software engineering with a strong emphasis on AI/ML infrastructure. #J-18808-Ljbffr Careers at DrataSenior
- Open CEDA is seeking experienced software engineers to join Watershed in San Francisco to build an AI suite for measuring emissions and decarbonizing businesses.... ...engineering experience, particularly in backend or AI/ML sectors. Candidates must be willing to work from...SeniorWork at officeRemote work
- Watershed is seeking experienced software engineers to build the AI platform that powers its emissions measurement and decarbonization products. You... ...This role requires 6+ years of experience in backend or AI/ML engineering, with a strong emphasis on building production-...SeniorWork at office
- Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing... ...in storage engineering and a strong background in AI/ML workloads. The position offers competitive salary and...SeniorRemote job
- Dormont Manufacturing Co in San Francisco is looking for a Senior or Staff AI Infrastructure Engineer dedicated to building and scaling AI/ML systems. You'll develop robust pipelines and integrate cutting-edge tools while partnering with engineering and data science teams...Senior
- ...interact with the web by building AI agents that can reliably do... ...Responsibilities: Scale infra for post-training of multimodal... ...agent Work closely with product engineers to translate cutting‑edge AI capabilities... ...for: Experience with ML infrastructure (GPU clusters)...Work at officeRelocationVisa sponsorship
$200k - $275k
...About Healthleap HealthLeap builds AI that helps clinicians... ...through data pipelines, and powers ML models that clinicians rely on... ...alongside our data scientists and ML engineers to build and operate the... ...-stage startup where you owned infra end-to-end This role is NOT for...Work at officeHome officeDay shift$200k - $240k
Dormont Manufacturing Co is seeking a Senior or Staff ML Systems Engineer to build and maintain AI infrastructure, focusing on model training and deployment. The role involves working in a fast-paced environment to ensure compliance, security, and performance in AI applications...Senior$159k - $278.25k
...'ll play the role of a backend engineer working across the stack (backend... ...to contribute to more ML specific tasks. Successful individuals... ...Work with other teams to build AI & data-driven GTM products... ...levels of the stack (foundational infra, backed, ux). You should be able...SeniorWork at office3 days per week$300k
...mode startup building out their AI and cloud platform, powered by... ..., or inference. As a Platform Engineer/Senior Site Reliability Engineer, you... ...workloads. Collaborate with ML, networking, and platform... ...Have: 7+ years of experience in SRE, DevOps, or Infrastructure Engineering...Senior- A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes... .... Join a dynamic team aiming at advancements in AI and ML infrastructure. #J-18808-Ljbffr Perplexity
- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda... .... The Role This is not a generalist SRE role. You will design, operate, and... ..., networking, orchestration, and ML frameworks. Drive blameless post‑...SeniorFull timeRemote work
$150k - $250k
Foundry Robotics Inc. is looking for a Senior Software Engineer to join their team in San Francisco. This vital role focuses on building cloud-based backend systems, infrastructure, and ensuring data integrity in advanced robotics manufacturing. The successful candidate...Senior- ...Senior AI Engineer Disney Entertainment and ESPN Product & Technology is a global organization... ...~5+ years of backend or applied AI/ML engineering with a track record of building... ...communication skills; able to work across infra, data, security, and product teams....Senior
- Anyscale is seeking a Senior Site Reliability Engineer to join our Infrastructure team in San Francisco, California. The ideal candidate will enhance distributed AI application development and work on open-source Ray integration. We need engineers with strong experience...Senior
- ...Senior Software Engineer - AI Core Engineering Disney Entertainment and ESPN Product & Technology Technology... ...~5+ years of backend or applied AI/ML engineering with a track record of... ...skills; able to work across infra, data, security, and product teams....Senior
- ...offers significant autonomy and the chance to influence investment strategies through innovative systems. You'll collaborate directly with the team on cutting-edge projects involving machine learning and AI. Competitive compensation is provided. #J-18808-Ljbffr One ConcernSeniorRemote job
$141.9k - $190.3k
...Senior Software Engineer - AI Core Engineering Disney Entertainment and ESPN Product & Technology Technology... ...~5+ years of backend or applied AI/ML engineering with a track record of... ...skills; able to work across infra, data, security, and product teams....Senior$131.4k - $235.95k
Autodesk, Inc. is seeking a Senior Machine Learning Engineer for MLOps in San Francisco. You will ensure AI-powered experiences meet high standards for reliability and scalability. Key responsibilities include automating model testing, managing inference services, and integrating...Senior- Algora Public Benefit Corporation is looking for an AI Cloud Infra Engineer to join their team in San Francisco. You will ensure the reliability of backend systems and work closely with engineers to plan for future growth. The ideal candidate has strong cloud infrastructure...Senior
$180k - $247.5k
A leading data and AI company located in California is looking for an experienced Product... ...the product roadmap, collaborate with engineering teams, and engage directly with enterprise... ...product management, and familiarity with ML/AI infrastructure. This position offers a...Senior- ...The role Watershed is building the AI suite for companies to measure their emissions and... ...business. We’re looking for software engineers to help build the AI platform that powers... ...of experience in backend, platform, or AI/ML engineering Experience building products and...SeniorWork at officeRemote work
$320k - $405k
...interpretable, and steerable AI systems. We want AI to... ...committed researchers, engineers, policy experts, and... ...Infrastructure Engineer, Node Infra About the role Anthropic... ...build alignment across senior stakeholders and... ...InfiniBand) for distributed ML workloads. Demonstrated...SeniorVisa sponsorship$216k - $270k
About Scale AI Scale AI is the data foundation for AI, helping... .... Role Overview As a Senior Forward Deployed AI Engineer on our Enterprise team, you... ...customer data scientists, ML engineers, and software developers... ...in a devops, platform, or infra role Familiarity with...SeniorFull time$220k - $300k
...services operate. By combining frontier agentic AI, an enterprise-grade platform, and deep... ...can do. REPRESENTATIVE PROJECTS - Context Engineering & Agent Infrastructure. Build the platform... ..., with at least 1+ year focused on AI/ML engineering. Staff candidates will typically...Senior- A cutting-edge AI firm in San Francisco is seeking a talented engineer to design and implement robust CI/CD pipelines for machine learning workflows. The ideal candidate will have a bachelor's degree in Computer Science or a related field, with at least 3 years of experience...
- ...A healthcare technology company in San Francisco is seeking an experienced AI/ML Engineer to enhance healthcare delivery in the U.S. In this hybrid role, you'll develop ML models and data pipelines to improve patient care. The ideal candidate has 5+ years of experience...Senior
$190k - $280k
...organizations, Sentry is today's application monitoring standard and our team is building its AI-native future. About the role As a Senior Software Engineer on Sentry's AI/ML team, you'll be directly responsible for developing the platform used by our debugging...SeniorHourly pay- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...SeniorFlexible hours
$250k
LeoForce is seeking a Senior Software Engineer in San Francisco to design agentic systems and improve AI-native workflows. This hybrid role allows flexibility with 2-3 days in the office each week. Ideal candidates will have 2-8 years of software engineering experience...SeniorWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI/ML Infra & SRE Engineer. Be the first to apply!
- ai engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai research engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai developer San Francisco, CA
- ai engineer remote San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- senior ml engineer San Francisco, CA


