Member of Technical Staff (AI Infrastructure Engineer)

Perplexity

We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-scale AI training and inference clusters Responsibilities Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training workloads Manage and optimize Slurm-based HPC environments for distributed training of large language models Develop robust APIs and orchestration systems for both training pipelines and inference services Implement resource scheduling and job management systems across heterogeneous compute environments Benchmark system performance, diagnose bottlenecks, and implement improvements across both training and inference infrastructure Build monitoring, alerting, and observability solutions tailored to ML workloads running on Kubernetes and Slurm Respond swiftly to system outages and collaborate across teams to maintain high uptime for critical training runs and inference services Optimize cluster utilization and implement autoscaling strategies for dynamic workload demands Qualifications Strong expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management Hands-on experience with Slurm workload management, including job scheduling, resource allocation, and cluster optimization Experience with deploying and managing distributed training systems at scale Deep understanding of container orchestration and distributed systems architecture High level familiarity with LLM architecture and training processes (Multi-Head Attention, Multi/Grouped-Query, distributed training strategies) Experience managing GPU clusters and optimizing compute resource utilization Required Skills Expert-level Kubernetes administration and YAML configuration management Proficiency with Slurm job scheduling, resource management, and cluster configuration Python and C++ programming with focus on systems and infrastructure automation Hands-on experience with ML frameworks such as PyTorch in distributed training contexts Strong understanding of networking, storage, and compute resource management for ML workloads Experience developing APIs and managing distributed systems for both batch and real-time workloads Solid debugging and monitoring skills with expertise in observability tools for containerized environments Preferred Skills Experience with Kubernetes operators and custom controllers for ML workloads Advanced Slurm administration including multi-cluster federation and advanced scheduling policies Familiarity with GPU cluster management and CUDA optimization Experience with other ML frameworks like TensorFlow or distributed training libraries Background in HPC environments, parallel computing, and high-performance networking Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices Experience with container registries, image optimization, and multi-stage builds for ML workloads Required Experience Demonstrated experience managing large-scale Kubernetes deployments in production environments Proven track record with Slurm cluster administration and HPC workload management Previous roles in SRE, DevOps, or Platform Engineering with focus on ML infrastructure Experience supporting both long-running training jobs and high-availability inference services Ideally, 3-5 years of relevant experience in ML systems deployment with specific focus on cluster orchestration and resource management #J-18808-Ljbffr Perplexity

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Infrastructure Engineer) in San Francisco, CA vacancy

Member of Technical Staff - Applied AI Engineer
Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied... ...build, deploy, and scale model training and evaluation infrastructure Visualize and communicate results within Valthos...
Suggested
Full time
Work at office
Valthos
San Francisco, CA
4 days ago
Member of Technical Staff: AI Research & Engineering
Member of Technical Staff: AI Research & Engineering in Media Integrity About Synhawk Synhawk builds omnimodal foundation models for communication integrity, aimed at infrastructure-side deployment in telco and banking sectors. Our platform analyzes the integrity of audio...
Suggested
Immediate start
Shift work
Synhawk
San Francisco, CA
6 days ago
Member of Technical Staff (Software Engineer, API Platform)
$220k - $405k
...innovates at the frontier of AI infrastructure, search, and orchestration... ...Perplexity is seeking strong engineers with a passion for delivering... ...these interfaces. As a member of our team, you’ll work on... ...experience alike. You’ll also define technical strategy for how we scale to...
Suggested
Perplexity
San Francisco, CA
6 days ago
Member of Technical Staff (AI Software Engineer, Multimodal)
$220k - $405k
...builders to join our Multimodal AI group, an industry-leading... ...we have yet to invent. As an engineer on the Multimodal AI team, you... ...evaluation systems, backend infrastructure, and supporting libraries and... ..., from problem definition to technical design, implementation, and launch...
Suggested
Perplexity
San Francisco, CA
5 days ago
Member of Technical Staff (AI Inference Engineer)
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency... ..., text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache...
Suggested
Perplexity
San Francisco, CA
6 days ago
Member of Technical Staff (AI Software Engineer, Agents)
Perplexity is seeking energetic engineers to join our highly driven Agents engineering team... ...consists of backend, full-stack, and AI/ML engineers who collaborate to build delightful... ...and leverage cutting-edge AI models, infrastructure, and browser technologies to advance the...
Flexible hours
Perplexity
San Francisco, CA
6 days ago
Member of Technical Staff (AI Engineering)
$150k - $250k
...servicing with the industry’s most advanced AI credit-servicing agents. We are backed... ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,... ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel...
Full time
Internship
Worldwide
Krew
San Francisco, CA
22 days ago
Member of Technical Staff (Software Engineer, Cloud Infrastructure)
About the Role The Cloud Infrastructure team owns the foundational cloud... ...Own the roadmap and technical strategy for agent-driven cloud... ...low-latency, high-throughput AI workloads. Architect and scale... ...Terraform) and strong software engineering skills in at least one of...
United States Digital Space LLC
San Francisco, CA
6 days ago
Member of Technical Staff (Software Engineer, Storage Platform)
...Role The Storage Platform team owns the infrastructure that powers how the company persists, retrieves... ...cost-efficiency for every product and AI workload. This foundational, high-... ...excellence around storage, the team enables engineers across the company to focus on product...
United States Digital Space LLC
San Francisco, CA
6 days ago
Member of Technical Staff (Software Engineer, Data Platform)
About Perplexity AI Perplexity is an AI-powered answer engine built to serve the world’s curiosity... .... In this senior/staff role, you will shape architecture... ...and drive the long-term technical direction of Perplexity’... ...technical bar for data infrastructure through thoughtful...
Perplexity
San Francisco, CA
6 days ago
Member of Technical Staff (Software Engineer, Backend Platform)
...enabling every product and AI team to build with... ...maintains critical infrastructure, including backend systems... ...‑in‑depth. Set the technical bar for backend platform... ...area, mentoring other engineers and making long‑term... ..., more for senior and staff). Strong system design...
Perplexity
San Francisco, CA
6 days ago
Founding AI Infrastructure Engineer - Own & Ship
Touchdown Labs, Inc. seeks a Founding Member of Technical Staff for AI Infrastructure in San Francisco/Bay Area or exceptional remote candidates. Responsibilities... .... The ideal candidate will have strong systems engineering experience and use AI tools effectively for...
Full time
Remote work
Touchdown Labs, Inc.
San Francisco, CA
5 days ago
Senior or Staff AI Infrastructure Engineer - San Francisco Only
$200k - $240k
...blockchain analytics and AI solutions to help... ...for all. The AI Engineering Team is chartered... ...high‑performance infrastructure, and operational... .... As a Senior or Staff AI Infrastructure... ...building and scaling the technical infrastructure for... ..., mentors team members, and enhances...
Remote work
Worldwide
Dormont Manufacturing Co
San Francisco, CA
1 day ago
Member of Technical Staff
...precedents to copy from. About the Role Members of Technical Staff (MTS) are the senior engineers who build the platform that... ...at its core. Multi‑tenant data infrastructure across very different portcos.... ...compounding growth. How We Use AI in Our Hiring Process To ensure...
BEACON SOFTWARE COMPANY
San Francisco, CA
5 days ago
Member of Technical Staff
$225k - $300k
...Member of Technical Staff Location: San Francisco, CA Onsite Policy: Full-time onsite Comp... ...consumer underwriting infrastructure from the ground up using AI-powered systems across document... ...This is not a narrowly scoped engineering role inside a large organization...
Full time
Trades Workforce Solutions
San Francisco, CA
5 days ago
Member of Technical Staff
...Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll... ...-world applications Systems Engineering: Design scalable pipelines... ...reliability of systems Data & Infrastructure: Work with large-scale datasets...
ERAGON
San Francisco, CA
2 days ago
Member of Technical Staff
$130k - $200k
...SketchPro SketchPro is building the first AI junior architect. We integrate deeply... ...of architecture. We’re a team of AI engineers and seasoned architects, bridging... ...frontier technology. The Role Being a Member of Technical Staff at SketchPro means the problem in front...
Work at office
Shift work
SketchPro.ai
San Francisco, CA
1 day ago
Member of Technical Staff
...We’re an AI platform out to redefine knowledge work. The... .... About the Role As a Member of Technical Staff, you will be part of the team... ...vigorously on the underlying infrastructure, core features, agent... ...the most leverage. Shape engineering culture and practices at an...
Work experience placement
H1b
Work at office
Visa sponsorship
Ersilia
San Francisco, CA
4 days ago
Member of Technical Staff
...Tomo is building this generation's most important consumer AI product. We have been working quietly on a SOTA personal agent... ...equally strong obligations to both 1) choose good and 2) to win. think that this role should be renamed "member of tomo staff" #J-18808-Ljbffr...
Immediate start
Tomo
San Francisco, CA
1 day ago
Member of Technical Staff
$200k
...Join to apply for the Member of Technical Staff role at Listen Labs .... ..., so we are expanding our engineering team. We're looking for someone... ...Background: Listen Labs is an AI‑powered research platform... ...across the LLM pipeline, infrastructure, backend, and UX. You...
Flexible hours
Listen Labs
San Francisco, CA
15 hours ago
Member of Technical Staff
...Catalog is building the commerce layer for AI - the missing infrastructure that lets agents not just search the web,... ...discover and buy online. Role As a Member of Technical Staff, you will ship core systems, set engineering culture, and move the mission from prototype...
Work at office
Getcatalog
San Francisco, CA
1 day ago
Member of Technical Staff
...Member of Technical Staff @ Lotus AI Lotus AI is a groundbreaking primary care app that integrates your... ...Our team includes ex-founders and engineers who have built and scaled consumer... ..., schema migrations, and data infrastructure simplification Familiarity with...
Lotus Health
San Francisco, CA
3 days ago
Member of Technical Staff
...Member Of Technical Staff Humans& is a human-centric frontier AI lab. We believe AI can be reimagined, centering around people and their relationships with each other. We are looking for researchers and engineers who have done exceptional work at the frontier of...
Humans&
San Francisco, CA
8 days ago
Member of Technical Staff
$140k - $200k
...Member of Technical Staff Harper is an AI-native commercial insurance company in San Francisco. We're not... ...by instinct (frontend, backend, infrastructure). ~ You've shipped AI to production... ...: at most companies a junior engineer waits in line behind layers of process...
Work at office
Relocation
Harper Group
San Francisco, CA
4 days ago
Member of Technical Staff
$70k - $110k
...Service Technician to join our dynamic engineering team. As a key member of our team, you will be responsible... ...individuals looking to apply their technical skills and knowledge in a challenging... ...this job, you agree to receive calls, AI-generated calls, text messages, or emails...
Temporary work
Local area
Jobot
San Francisco, CA
5 days ago
Member of Technical Staff
...Artificial Analysis is the leading independent AI benchmarking company. We support labs, engineers and enterprises to understand AI capabilities and... ...define what cutting edge means. We're hiring Members of Technical Staff to design the evaluations that set the standard...
Artificial Analysis, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff
...information about us visit Core Roles Data & Integration Build AI systems that automatically connect to any legacy data source - ERPs... ...: Embed with factory operators, iterate and validate fast. Meritocracy: Any problem can be solved by any team member. #J-18808-Ljbffr...
Complement
San Francisco, CA
5 days ago
Member of Technical Staff
...Quantum superintelligence is an AI that uses quantum computers... ...; it is who builds the infrastructure to make that convergence... ...most of the world's software engineers. AI is already generating quantum... .... Role Overview As a Member of Technical Staff you will shape Conductor's...
Conductor Quantum
San Francisco, CA
1 day ago
Member of Technical Staff
...building the best way to talk to AI and humans together — where AI... ...day, and everyone talks to users. Member of Technical Staff is the title we use for engineers who own hard problems end to end... ...backends at scale Realtime infrastructure (WebRTC, WebSockets, streaming)...
Shapes
San Francisco, CA
1 day ago
Member of Technical Staff
$250k
...an enterprise-grade AI platform that lets companies... ...The team is small, technical, and moving fast,... ...AI Tools. The Role Member of Technical Staff who can handle... ...stack: Python; modern engineering / ML frameworks; AWS... ...pipelines, APIs, and cloud infrastructure (AWS, GCP)...
Full time
David Joseph & Company
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Infrastructure Engineer). Be the first to apply!