Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Solutions Architect - AI Inference Specialist

FriendliAI

About the job FriendliAI is seeking a Solution Architect to assist enterprises in deploying, scaling, and operating generative and agentic AI workloads on FriendliAI infrastructure. You will work directly with customers to solve and implement production-grade applications using our products, such as Serverless Endpoints, Dedicated Endpoints, or Container. Friendli Container is our service that allows customers to download our inference engine as Docker images and deploy it in their chosen environment, such as private clouds or on-premises. Our Friendli Container can be adopted directly to AWS EKS clusters using our EKS add-on product. You will work directly on our customers’ projects, collaborating with their engineering teams to solve AI inference challenges like scaling, orchestration, and monitoring. This is a hands-on, customer-embedded role. If you have worked in DevOps, platform engineering, or SRE for AI applications, this is your ideal position. Key Responsibilities Design and implement large-scale deployment architectures for LLM and multimodal inference Deploy and manage containerized workloads across Kubernetes clusters Diagnose production issues, such as performance bottlenecks, and implement temporary fixes as needed Collaborate with customers’ DevOps teams to integrate FriendliAI’s infrastructure into their CI/CD workflows Develop scripts, Helm charts, and Terraform modules that simplify repeated deployments Contribute field insights to shape our platform reliability, observability, and scaling strategies Lead workshops, technical sessions, or webinars to help customers master infrastructure best practices. Qualifications 3+ years of experience in cloud infrastructure, DevOps, or reliability engineering Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent Proficiency with Kubernetes, Docker, Terraform, and Helm Strong foundation in distributed systems, networking, and performance tuning Experience with GPU-based computing and generative AI model serving workloads Strong technical background in backend systems or AI tooling Experience operating workloads on AWS, GCP, or OCI Excellent problem-solving and debugging skills in real-world environments Preferred Experience Experience deploying large models (LLMs, diffusion models) on GPUs or clusters Familiarity with inference frameworks (Triton, vLLM, TensorRT, DeepSpeed-Inference) Familiarity with observability stacks (Prometheus, Grafana, Loki, ELK, OTEL) Understanding of networking security and compliance frameworks (e.g., SOC 2) Experience supporting on-prem or hybrid-cloud deployments Benefits A front-row seat to the generative AI infrastructure revolution Competitive compensation and benefits package Daily lunch and dinner provided; unlimited snacks and beverages Health check-up and top-tier hardware support Flexible working hours and a highly collaborative environment About us FriendliAI is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure powers high-throughput, low-latency workloads for global organizations and integrates directly with Hugging Face, providing instant access to over 500,000 open-source models. We are on a mission to deliver the world’s best platform for AI inference. #J-18808-Ljbffr FriendliAI

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Solutions Architect - AI Inference Specialist in San Francisco, CA vacancy
  • ABOUT BASETEN Baseten powers mission‑critical inference for the world’s most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay...  ...turn to to ship AI products. THE ROLE As a Solution Architect (AI/LLM Inference) at Baseten you will partner closely... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  • Baseten is seeking a Solution Architect (AI/LLM Inference) to work closely with Sales and customers in San Francisco. This role involves translating business needs into technical solutions, conducting demos, and managing POCs. Ideal candidates will possess a strong AI/... 
    Suggested

    Baseten

    San Francisco, CA
    1 day ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Suggested
    Remote job

    DigitalOcean

    San Francisco, CA
    19 days ago
  • A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical strategy for performance architecture and addressing complex performance issues ensuring industry-leading service... 
    Suggested
    Remote job

    DigitalOcean

    San Francisco, CA
    5 days ago
  • $140k - $230k

    Arize AI, Inc is looking for an AI Sales Engineer, Digital Native to join our remote-first team. This role involves working closely with Account Executives to convey Arize's value proposition and aligning with prospective customers throughout the sales process. The ideal... 
    Suggested
    Remote job

    Arize AI, Inc

    San Francisco, CA
    4 days ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience... 

    Perplexity

    San Francisco, CA
    4 days ago
  •  ...our people, culture, and innovative solutions. With expertise in Managed Services,...  ...Avahi is seeking an experienced AI/ML Solutions Architect to join our passionate team. This role...  ...GenAI domains (e.g., MLOps, ML training, inference, data engineering, model evaluation,... 
    Remote work
    Flexible hours

    Avahi

    San Francisco, CA
    2 hours ago
  •  ...BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion,...  ...directly with customers to architect, build, and deploy high‑scale production...  ...customer success, and pre‑sales solution engineering mixed in. EXAMPLE INITIATIVES... 
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    10 days ago
  • Eon is looking for a Field Data Engineer (FDE) to build and deploy data solutions for major enterprises. You'll take ownership of technical relationships, transforming real business problems into customer-ready data solutions quickly. The ideal candidate will have significant... 

    Eon

    San Francisco, CA
    12 days ago
  • An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate... 

    SpreeAI

    San Francisco, CA
    5 days ago
  •  ...looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability. You will build fine-tuning infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks and ensuring... 

    Pantera Capital

    San Francisco, CA
    5 days ago
  • A technology firm specializing in AI solutions is seeking a Forward Deployment Engineer to embed with enterprise clients. The role involves designing tailored AI solutions, leading implementations, and collaborating with stakeholders. The ideal candidate will have extensive... 

    Jeen.ai

    San Francisco, CA
    3 days ago
  • $143k - $210k

     ...is The Essential Cloud for AI™. Built for pioneers by pioneers...  .... We hire technical, AI Solution Architects who want to operate the full...  ..., Weave, observability, and inference. You’ll help these customers...  ...the commercial motion, with Specialist Field Engineers for deep domain... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    Somi AI

    San Francisco, CA
    5 days ago
  •  ...customers are now running real AI workloads on top of us — LLM...  ...they need someone who can architect that layer with them, not just...  ...communication patterns. Securing inference traffic across multi-cloud...  ...customer-facing technical role — Solutions Architect, Customer Engineer,... 
    Remote work
    Work from home
    Flexible hours

    Strategic Employment Partners (SEP)

    San Francisco, CA
    4 days ago
  •  ...backed UK startup pioneering a breakthrough AI accelerator for data centers which uses...  ...deep and commercially astute Senior Solutions Architect to own the technical heart of our...  ...bring the world's first optical compute inference platform to market. You will be the person... 

    Lumai

    San Francisco, CA
    3 days ago
  • FuriosaAI is looking for a Solutions Architect to bring the full potential of our powerful RNGD chips...  ...as the primary technical authority in AI/LLM model deployments. From running...  ...LLM landscape — tracking model releases, inference frameworks, and serving stack evolution... 

    FuriosaAI, Inc.

    San Francisco, CA
    1 day ago
  • A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development... 
    Flexible hours

    Baseten

    San Francisco, CA
    10 days ago
  • $220k

    We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples... 

    Perplexity

    San Francisco, CA
    5 days ago
  • $180k - $247.5k

    As a Specialist Solutions Architect (SSA) - AI/ML Engineering, you will be the trusted technical ML & AI expert to both Databricks customers and the Field...  ..., including agents, end-to-end ML pipelines, training/inference optimization, integration with cloud‑native services,... 
    Local area
    Remote work
    Worldwide

    Databricks

    San Francisco, CA
    5 days ago
  • $176.6k - $239k

     ...Senior Solution Architect, Strategic Accounts Design and deliver production‑grade AWS architectures that embed generative AI capabilities across every workload, from modernizing applications with intelligent automation to transforming data pipelines, security operations... 
    Flexible hours

    Amazon

    San Francisco, CA
    2 days ago
  • A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience... 

    Perplexity

    San Francisco, CA
    1 day ago
  • $155.19k - $207.62k

    salesforce.com, inc. is seeking a Manager of Solution Engineering in San Francisco to build and scale a team responsible for translating complex data environments into outcome-driven AI strategies. This role requires a leader focused on coaching talent and driving executive... 

    salesforce.com, inc.

    San Francisco, CA
    5 days ago
  • $148k - $185k

     ...new ideas and products. As a Data Scientist expert in causal inference and marketing mix models (MMM), you will lead our efforts to measure...  ...Deliver results across the entire lifecycle of data science solutions for Growth: from defining the problem with cross-functional... 
    Hourly pay
    Full time
    Work at office
    Local area
    3 days per week

    Lyft

    San Francisco, CA
    2 days ago
  • $125.9k - $231.1k

     ...teams and take your career wherever you want it to go.  Join EY and help to build a better working world. Microsoft 365 AI Solution Architect (Manager) EY advises clients to understand, architect, select and implement bleeding edge solutions required to efficiently... 
    Summer holiday
    Flexible hours

    EY

    San Francisco, CA
    1 day ago
  • $124k - $280k

     ...Engineering Industry/Sector: Not Applicable Time Type: Full time Travel Requirements: Up to 20% The Opportunity As a Solution Architect Senior Manager, you will play a pivotal role in driving digital transformation and enhancing business performance within our... 
    Full time
    H1b

    PwC

    San Francisco, CA
    4 days ago
  •  ...Nashville, Phoenix, Pittsburgh, McLean, Atlanta, Charlotte, Detroit, Columbus, Cleveland, Akron, Cincinnati, Miami Microsoft 365 AI Solution Architect (Manager) EY advises clients to understand, architect, select and implement bleeding edge solutions required to efficiently... 

    Ernst & Young Oman

    San Francisco, CA
    4 days ago
  • A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates... 

    Jobleads-US

    San Francisco, CA
    2 days ago
  • A leading AI fashion-tech company is seeking a Software Engineer Intern to focus on building infrastructure for AI systems. This role involves designing scalable models, developing APIs, and optimizing for performance and reliability. An ideal candidate will have a strong... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    5 days ago
  • $80 - $85 per hour

     ...Our client, a leader in the manufacturing, automotive, and aerospace industries, is seeking a dedicated and skilled Enterprise Solution Architect to join their dynamic team. As an Enterprise Solution Architect, you will be an integral part of the IT department supporting... 
    Weekly pay
    Temporary work
    Remote work
    Flexible hours

    Experis/Manpower Group

    San Francisco, CA
    2 days ago
  • Fractal is seeking an EdTech Sales Consultant & Solutioning Expert to drive enterprise growth across AI-led learning solutions in San Francisco. This role involves consultative selling, Solution design, and pursuing large deals. The ideal candidate will have 15-20 years... 
    Flexible hours

    Jobleads-US

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Solutions Architect - AI Inference Specialist. Be the first to apply!