Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff (AI Infrastructure Engineer)

Perplexity

We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-scale AI training and inference clusters Responsibilities Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training workloads Manage and optimize Slurm-based HPC environments for distributed training of large language models Develop robust APIs and orchestration systems for both training pipelines and inference services Implement resource scheduling and job management systems across heterogeneous compute environments Benchmark system performance, diagnose bottlenecks, and implement improvements across both training and inference infrastructure Build monitoring, alerting, and observability solutions tailored to ML workloads running on Kubernetes and Slurm Respond swiftly to system outages and collaborate across teams to maintain high uptime for critical training runs and inference services Optimize cluster utilization and implement autoscaling strategies for dynamic workload demands Qualifications Strong expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management Hands-on experience with Slurm workload management, including job scheduling, resource allocation, and cluster optimization Experience with deploying and managing distributed training systems at scale Deep understanding of container orchestration and distributed systems architecture High level familiarity with LLM architecture and training processes (Multi-Head Attention, Multi/Grouped-Query, distributed training strategies) Experience managing GPU clusters and optimizing compute resource utilization Required Skills Expert-level Kubernetes administration and YAML configuration management Proficiency with Slurm job scheduling, resource management, and cluster configuration Python and C++ programming with focus on systems and infrastructure automation Hands-on experience with ML frameworks such as PyTorch in distributed training contexts Strong understanding of networking, storage, and compute resource management for ML workloads Experience developing APIs and managing distributed systems for both batch and real-time workloads Solid debugging and monitoring skills with expertise in observability tools for containerized environments Preferred Skills Experience with Kubernetes operators and custom controllers for ML workloads Advanced Slurm administration including multi-cluster federation and advanced scheduling policies Familiarity with GPU cluster management and CUDA optimization Experience with other ML frameworks like TensorFlow or distributed training libraries Background in HPC environments, parallel computing, and high-performance networking Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices Experience with container registries, image optimization, and multi-stage builds for ML workloads Required Experience Demonstrated experience managing large-scale Kubernetes deployments in production environments Proven track record with Slurm cluster administration and HPC workload management Previous roles in SRE, DevOps, or Platform Engineering with focus on ML infrastructure Experience supporting both long-running training jobs and high-availability inference services Ideally, 3-5 years of relevant experience in ML systems deployment with specific focus on cluster orchestration and resource management #J-18808-Ljbffr Perplexity

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Infrastructure Engineer) in San Francisco, CA vacancy
  • $100k - $300k

     ...Cogent Security Cogent is an Applied AI Lab building the next generation of AI...  ...are looking for talented, ambitious AI/ML Engineers who are excited to build in the Applied AI...  ...Onboard, support and uplevel future team members Mentor and grow future junior team members... 
    Suggested

    Cogent Security

    San Francisco, CA
    2 days ago
  • $150k - $250k

     ...servicing with the industry’s most advanced AI credit-servicing agents. We are backed...  ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,...  ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel... 
    Suggested
    Full time
    Internship
    Worldwide

    Krew

    San Francisco, CA
    17 days ago
  • Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied...  ...build, deploy, and scale model training and evaluation infrastructure Visualize and communicate results within Valthos... 
    Suggested
    Full time
    Work at office

    Valthos

    San Francisco, CA
    12 hours ago
  • Member of Technical Staff: AI Research & Engineering in Media Integrity About Synhawk Synhawk builds omnimodal foundation models for communication integrity, aimed at infrastructure-side deployment in telco and banking sectors. Our platform analyzes the integrity of audio... 
    Suggested
    Immediate start
    Shift work

    Synhawk

    San Francisco, CA
    2 days ago
  • $150k - $250k

    Founding Member of Technical Staff (Platform Engineering) I’m currently partnered with a well‑funded, early‑stage applied AI company building at the frontier of reinforcement learning...  ...and scaling RL training + inference infrastructure for real‑world usage Designing... 
    Suggested
    Visa sponsorship
    Relocation package

    DeepRec.ai

    San Francisco, CA
    12 hours ago
  • $200k - $400k

     ...Infrastructure Engineer Opportunity We are looking for an Infrastructure Engineer who thrives on...  ...resource allocation to ensure our real-time AI features hit their latency targets....  ...: Ability to write clear technical specs for both internal teams and external... 
    Flexible hours

    Simile

    San Francisco, CA
    1 day ago
  • $200k - $350k

     ...Edison Scientific builds and commercializes AI agents for science. Scientific discovery...  ...assembling a team of top researchers and engineers across AI and biology to build an AI...  ...reliability and adoption, and be the go-to technical contact for AI within the client organization... 
    Work at office
    Remote work

    Edison Scientific Inc.

    San Francisco, CA
    4 days ago
  • $220k

    We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency...  ..., text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache... 

    Perplexity

    San Francisco, CA
    2 days ago
  • $220k - $405k

    Overview Perplexity is seeking an energetic engineer to join our highly driven Agents engineering team. The Agents team consists of AI/ML, backend, and full-stack engineers who...  ...and leverage cutting‑edge AI models, infrastructure, and browser technologies to advance the... 
    Flexible hours

    Perplexity

    San Francisco, CA
    4 days ago
  • $100k - $300k

     ...Founding- and Staff-level Engineers We are looking for Founding- and Staff-level Engineers to design...  ...strategies that enable Applied AI use cases like semantic search and retrieval...  ...experience as a hands-on engineer and technical leader leading multiple projects ~... 

    Cogent Security, Inc.

    San Francisco, CA
    4 days ago
  •  ...Cloud Security Engineer Perplexity is seeking a highly experienced and hands-on Cloud...  ...to build and maintain secure, scalable infrastructure that empowers engineers to innovate quickly...  ...languages ~ Bonus: Experience with AI/ML infrastructure and multi-cloud environments... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  • $220k - $405k

    Perplexity is seeking an experienced Software Engineer focusing on building the next-gen AI Foundation & Platform to help revolutionize the way people...  ...end-to-end AI data, evaluation and personalization infrastructure and flywheel which powers almost all agent products.... 
    Worldwide

    Perplexity

    San Francisco, CA
    1 day ago
  •  ...AI Infrastructure SpecialistAs vCluster's AI Infrastructure Specialist, you...  ...will be one of the first team members a neocloud or AI Factory engages with at a technical depth, and the playbooks you...  ...Feedback Loop: Collaborate with Engineering and Product to surface... 
    Remote work
    Flexible hours

    vCluster

    San Francisco, CA
    2 days ago
  • $180k - $300k

     ...time Location Type Hybrid Department Platform & Infrastructure Compensation $180K - $300K • Offers Equity Perplexity...  ...queries to enterprise-scale integrations. As a Staff Backend Engineer, you will shape the technical foundation of Perplexity’s external platform. You’... 
    Full time
    Worldwide

    Pantera Capital

    San Francisco, CA
    4 days ago
  • About Liquid AI Spun out of MIT CSAIL, we build general-purpose AI systems that run...  ...we sell, but the internal data and agent infrastructure that makes Liquid run at the speed of a...  ...models. You need real experience with prompt engineering, tool use, evals, and the practical... 

    AI Chopping Block, Inc.

    San Francisco, CA
    12 hours ago
  • $200k - $350k

     ...Scientific builds and commercializes AI agents for science....  ...team of top researchers and engineers across AI and biology to build...  ...operating the core platform infrastructure that powers autonomous scientific...  ...at the senior level is about technical ownership and leverage—... 
    Work at office

    Edison Scientific Inc.

    San Francisco, CA
    4 days ago
  • $160k - $235k

     ...Senior AI Engineer, AI Platform San Francisco, CA; USA (Remote) Affinity stitches together billions of data points from massive datasets...  ...flexibility with meaningful in-person collaboration. Team members within commuting distance are expected in-office 2–3 days per... 
    Work at office
    Remote work
    Worldwide
    Flexible hours
    2 days per week
    3 days per week

    Affinity

    San Francisco, CA
    4 days ago
  •  ...heterogeneous neocloud for AI workloads. As AI systems...  ...homogeneous, vertically integrated infrastructure. Gimlet addresses this by...  ...Gimlet Labs is seeking an Member of Staff focused on AI Research (Intern...  ...degree in computer science, engineering, or comparable area of study... 
    Internship

    Gimlet Labs

    San Francisco, CA
    2 days ago
  • $200k

     ...Join to apply for the Member of Technical Staff role at Listen LabsTL;DR: We...  ..., so we are expanding our engineering team. We're looking for someone...  ...: Listen Labs is an AI-powered research platform...  ...across the LLM pipeline, infrastructure, backend, and UX.You have... 
    Flexible hours

    Listen Labs

    San Francisco, CA
    2 days ago
  • $200k - $350k

     ...Member of ML Technical Staff Title of Role: Member of ML Technical Staff Location: San Francisco...  ...Stage of Funding: Venture-Backed - AI Office Type: Onsite Salary: $20...  ...to the continuous improvement of engineering practices. Analyze model performance... 
    Work at office
    Visa sponsorship

    Recruiting from Scratch

    San Francisco, CA
    4 days ago
  •  ...Staff AI Platform Engineer Laurel is on a mission to return time. As the leading AI Time platform...  ...ownership. We empower every team member to understand the business levers behind...  .... Experience designing AI infrastructure, not just models (REQUIRED) You think... 
    Work at office
    Remote work
    Visa sponsorship
    2 days per week

    Laurel Property Services

    San Francisco, CA
    4 days ago
  • Member of Technical Staff - Software Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs)...  ...and deploy software and biological AI systems to safeguard humanity. The...  ...managing container-based scalable infrastructure, building REST APIs, integrating open... 
    Full time
    Work at office

    Valthos

    San Francisco, CA
    2 days ago
  •  ...Perplexity is AI for people who expect more. This role brings...  ...great data scientist, analytics engineer, or data engineer - the kind...  ...notices. You'll build the infrastructure that turns a small data team...  ...the output of every data team member and every stakeholder who needs... 

    Perplexity

    San Francisco, CA
    1 day ago
  •  ...Job Description What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience. We are looking for...  ...Seniority 5+ years of experience in backend software engineering, with a focus on Python in well-established engineering teams... 
    Work experience placement

    RST Recruitment

    San Francisco, CA
    23 days ago
  • $130k - $240k

     ...SketchPro SketchPro is building the first AI junior architect. We integrate deeply...  ...of architecture. We’re a team of AI engineers and seasoned architects, bridging...  ...frontier technology. The Role Being a Member of Technical Staff at SketchPro means the problem in... 
    Work at office
    Shift work

    SketchPro AI

    San Francisco, CA
    16 days ago
  • $150k - $250k

     ...servicing with the industry’s most advanced AI credit-servicing agents. We are backed...  ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,...  ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel... 
    Full time
    Work experience placement
    Internship
    Worldwide

    Krew

    San Francisco, CA
    12 days ago
  • $150k - $350k

     ...Description Job Description Member of Technical Staff, Applied Research — Sieve...  ...Sieve Sieve is the only AI research company exclusively focused on video data infrastructure and video intelligence....  ...technical applied research engineering role sitting between research... 
    Full time
    H1b
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    16 days ago
  •  ...Member of Technical Staff, Model EfficiencyWho are we?Our mission is to scale intelligence to serve humanity...  ...and enterprises who are building AI systems to power magical experiences...  ...customers.Cohere is a team of researchers, engineers, designers, and more, who are... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    2 days ago
  • $185k - $250k

     ...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff - Fullstack to design, build, and scale end-to-end systems...  ...solutions. You’ll play a critical role in shaping Stuut’s engineering culture and product experience, ensuring our full stack... 
    Full time
    Flexible hours

    Stuut

    San Francisco, CA
    3 days ago
  • $220k - $405k

     ...Employment Type Full time Department AI Compensation $220K - $405K...  ...energetic researchers and engineers to join our Secure...  ...broader AI ecosystem. As a member of SII, you'll conduct original...  ...privacy threats across AI systems, infrastructure, and user-facing products.... 
    Full time
    Local area

    Pantera Capital

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Infrastructure Engineer). Be the first to apply!