Solutions Architect - AI Inference Specialist

FriendliAI

About the job FriendliAI is seeking a Solution Architect to assist enterprises in deploying, scaling, and operating generative and agentic AI workloads on FriendliAI infrastructure. You will work directly with customers to solve and implement production-grade applications using our products, such as Serverless Endpoints, Dedicated Endpoints, or Container. Friendli Container is our service that allows customers to download our inference engine as Docker images and deploy it in their chosen environment, such as private clouds or on-premises. Our Friendli Container can be adopted directly to AWS EKS clusters using our EKS add-on product. You will work directly on our customers’ projects, collaborating with their engineering teams to solve AI inference challenges like scaling, orchestration, and monitoring. This is a hands-on, customer-embedded role. If you have worked in DevOps, platform engineering, or SRE for AI applications, this is your ideal position. Key Responsibilities Design and implement large-scale deployment architectures for LLM and multimodal inference Deploy and manage containerized workloads across Kubernetes clusters Diagnose production issues, such as performance bottlenecks, and implement temporary fixes as needed Collaborate with customers’ DevOps teams to integrate FriendliAI’s infrastructure into their CI/CD workflows Develop scripts, Helm charts, and Terraform modules that simplify repeated deployments Contribute field insights to shape our platform reliability, observability, and scaling strategies Lead workshops, technical sessions, or webinars to help customers master infrastructure best practices. Qualifications 3+ years of experience in cloud infrastructure, DevOps, or reliability engineering Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent Proficiency with Kubernetes, Docker, Terraform, and Helm Strong foundation in distributed systems, networking, and performance tuning Experience with GPU-based computing and generative AI model serving workloads Strong technical background in backend systems or AI tooling Experience operating workloads on AWS, GCP, or OCI Excellent problem-solving and debugging skills in real-world environments Preferred Experience Experience deploying large models (LLMs, diffusion models) on GPUs or clusters Familiarity with inference frameworks (Triton, vLLM, TensorRT, DeepSpeed-Inference) Familiarity with observability stacks (Prometheus, Grafana, Loki, ELK, OTEL) Understanding of networking security and compliance frameworks (e.g., SOC 2) Experience supporting on-prem or hybrid-cloud deployments Benefits A front-row seat to the generative AI infrastructure revolution Competitive compensation and benefits package Daily lunch and dinner provided; unlimited snacks and beverages Health check-up and top-tier hardware support Flexible working hours and a highly collaborative environment About us FriendliAI is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure powers high-throughput, low-latency workloads for global organizations and integrates directly with Hugging Face, providing instant access to over 500,000 open-source models. We are on a mission to deliver the world’s best platform for AI inference. #J-18808-Ljbffr FriendliAI

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Solutions Architect - AI Inference Specialist in San Francisco, CA vacancy

Solution Architect (AI/LLM Inference)
ABOUT BASETEN Baseten powers mission‑critical inference for the world’s most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay... ...turn to to ship AI products. THE ROLE As a Solution Architect (AI/LLM Inference) at Baseten you will partner closely...
Suggested
Flexible hours
Baseten
San Francisco, CA
1 day ago
AI/LLM Inference Solutions Architect - Customer Facing
Baseten is seeking a Solution Architect (AI/LLM Inference) to work closely with Sales and customers in San Francisco. This role involves translating business needs into technical solutions, conducting demos, and managing POCs. Ideal candidates will possess a strong AI/...
Suggested
Baseten
San Francisco, CA
1 day ago
Senior AI Inference Data Plane Engineer - Remote
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...
Suggested
Remote job
DigitalOcean
San Francisco, CA
19 days ago
Senior AI Inference Performance Engineer (Remote)
A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical strategy for performance architecture and addressing complex performance issues ensuring industry-leading service...
Suggested
Remote job
DigitalOcean
San Francisco, CA
5 days ago
Remote AI Solutions Architect (Sales Engineer)
$140k - $230k
Arize AI, Inc is looking for an AI Sales Engineer, Digital Native to join our remote-first team. This role involves working closely with Account Executives to convey Arize's value proposition and aligning with prospective customers throughout the sales process. The ideal...
Suggested
Remote job
Arize AI, Inc
San Francisco, CA
4 days ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...
Perplexity
San Francisco, CA
4 days ago
AI/ML Solutions Architect
...our people, culture, and innovative solutions. With expertise in Managed Services,... ...Avahi is seeking an experienced AI/ML Solutions Architect to join our passionate team. This role... ...GenAI domains (e.g., MLOps, ML training, inference, data engineering, model evaluation,...
Remote work
Flexible hours
Avahi
San Francisco, CA
2 hours ago
Applied AI Inference Engineer
...BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion,... ...directly with customers to architect, build, and deploy high‑scale production... ...customer success, and pre‑sales solution engineering mixed in. EXAMPLE INITIATIVES...
Work experience placement
Flexible hours
Baseten
San Francisco, CA
10 days ago
Customer-Facing AI Data Engineer & Solutions Architect
Eon is looking for a Field Data Engineer (FDE) to build and deploy data solutions for major enterprises. You'll take ownership of technical relationships, transforming real business problems into customer-ready data solutions quickly. The ideal candidate will have significant...
Eon
San Francisco, CA
12 days ago
AI Infrastructure Engineer — Scalable Training & Inference
An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate...
SpreeAI
San Francisco, CA
5 days ago
AI Engineer — Model Performance & Inference Optimizer
...looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability. You will build fine-tuning infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks and ensuring...
Pantera Capital
San Francisco, CA
5 days ago
Embedded AI Deployment Engineer — Field Solutions Architect
A technology firm specializing in AI solutions is seeking a Forward Deployment Engineer to embed with enterprise clients. The role involves designing tailored AI solutions, leading implementations, and collaborating with stakeholders. The ideal candidate will have extensive...
Jeen.ai
San Francisco, CA
3 days ago
Account Solution Architect
$143k - $210k
...is The Essential Cloud for AI™. Built for pioneers by pioneers... .... We hire technical, AI Solution Architects who want to operate the full... ..., Weave, observability, and inference. You’ll help these customers... ...the commercial motion, with Specialist Field Engineers for deep domain...
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Somi AI
San Francisco, CA
5 days ago
Principal Solutions Architect
...customers are now running real AI workloads on top of us — LLM... ...they need someone who can architect that layer with them, not just... ...communication patterns. Securing inference traffic across multi-cloud... ...customer-facing technical role — Solutions Architect, Customer Engineer,...
Remote work
Work from home
Flexible hours
Strategic Employment Partners (SEP)
San Francisco, CA
4 days ago
Senior Solutions Architect (US)
...backed UK startup pioneering a breakthrough AI accelerator for data centers which uses... ...deep and commercially astute Senior Solutions Architect to own the technical heart of our... ...bring the world's first optical compute inference platform to market. You will be the person...
Lumai
San Francisco, CA
3 days ago
AI Deployment Solutions Architect (RNGD NPU)
FuriosaAI is looking for a Solutions Architect to bring the full potential of our powerful RNGD chips... ...as the primary technical authority in AI/LLM model deployments. From running... ...LLM landscape — tracking model releases, inference frameworks, and serving stack evolution...
FuriosaAI, Inc.
San Francisco, CA
1 day ago
Production AI Inference Engineer — Scale & Impact
A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...
Flexible hours
Baseten
San Francisco, CA
10 days ago
Member of Technical Staff (AI Inference Engineer)
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples...
Perplexity
San Francisco, CA
5 days ago
Specialist Solutions Architect - AI/ML
$180k - $247.5k
As a Specialist Solutions Architect (SSA) - AI/ML Engineering, you will be the trusted technical ML & AI expert to both Databricks customers and the Field... ..., including agents, end-to-end ML pipelines, training/inference optimization, integration with cloud‑native services,...
Local area
Remote work
Worldwide
Databricks
San Francisco, CA
5 days ago
Senior Solution Architect, Strategic Accounts
$176.6k - $239k
...Senior Solution Architect, Strategic Accounts Design and deliver production‑grade AWS architectures that embed generative AI capabilities across every workload, from modernizing applications with intelligent automation to transforming data pipelines, security operations...
Flexible hours
Amazon
San Francisco, CA
2 days ago
AI Infra Engineer: Scale ML Training & Inference
A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience...
Perplexity
San Francisco, CA
1 day ago
Data Foundations AI Solution Engineering Manager
$155.19k - $207.62k
salesforce.com, inc. is seeking a Manager of Solution Engineering in San Francisco to build and scale a team responsible for translating complex data environments into outcome-driven AI strategies. This role requires a leader focused on coaching talent and driving executive...
salesforce.com, inc.
San Francisco, CA
5 days ago
Senior Data Scientist, Causal Inference
$148k - $185k
...new ideas and products. As a Data Scientist expert in causal inference and marketing mix models (MMM), you will lead our efforts to measure... ...Deliver results across the entire lifecycle of data science solutions for Growth: from defining the problem with cross-functional...
Hourly pay
Full time
Work at office
Local area
3 days per week
Lyft
San Francisco, CA
2 days ago
PAS - Microsoft 365 AI Solution Architect- Consulting Manager- Multiple Locations
$125.9k - $231.1k
...teams and take your career wherever you want it to go. Join EY and help to build a better working world. Microsoft 365 AI Solution Architect (Manager) EY advises clients to understand, architect, select and implement bleeding edge solutions required to efficiently...
Summer holiday
Flexible hours
EY
San Francisco, CA
1 day ago
Delivery Excellence - Tech Enablement - Solution Architect Senior Manager
$124k - $280k
...Engineering Industry/Sector: Not Applicable Time Type: Full time Travel Requirements: Up to 20% The Opportunity As a Solution Architect Senior Manager, you will play a pivotal role in driving digital transformation and enhancing business performance within our...
Full time
H1b
PwC
San Francisco, CA
4 days ago
PAS - Microsoft 365 AI Solution Architect- Consulting Manager- Multiple Locations
...Nashville, Phoenix, Pittsburgh, McLean, Atlanta, Charlotte, Detroit, Columbus, Cleveland, Akron, Cincinnati, Miami Microsoft 365 AI Solution Architect (Manager) EY advises clients to understand, architect, select and implement bleeding edge solutions required to efficiently...
Ernst & Young Oman
San Francisco, CA
4 days ago
Senior AI Model Serving Engineer Low-Latency Inference
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...
Jobleads-US
San Francisco, CA
2 days ago
AI Infrastructure Engineer Intern — Training & Inference
A leading AI fashion-tech company is seeking a Software Engineer Intern to focus on building infrastructure for AI systems. This role involves designing scalable models, developing APIs, and optimizing for performance and reliability. An ideal candidate will have a strong...
Internship
Immediate start
SpreeAI
San Francisco, CA
5 days ago
Enterprise Solution Architect
$80 - $85 per hour
...Our client, a leader in the manufacturing, automotive, and aerospace industries, is seeking a dedicated and skilled Enterprise Solution Architect to join their dynamic team. As an Enterprise Solution Architect, you will be an integral part of the IT department supporting...
Weekly pay
Temporary work
Remote work
Flexible hours
Experis/Manpower Group
San Francisco, CA
2 days ago
AI-Driven EdTech Sales & Solution Architect
Fractal is seeking an EdTech Sales Consultant & Solutioning Expert to drive enterprise growth across AI-led learning solutions in San Francisco. This role involves consultative selling, Solution design, and pursuing large deals. The ideal candidate will have 15-20 years...
Flexible hours
Jobleads-US
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Solutions Architect - AI Inference Specialist. Be the first to apply!