Solutions Architect - AI Inference Specialist
FriendliAI
About the job FriendliAI is seeking a Solution Architect to assist enterprises in deploying, scaling, and operating generative and agentic AI workloads on FriendliAI infrastructure. You will work directly with customers to solve and implement production-grade applications using our products, such as Serverless Endpoints, Dedicated Endpoints, or Container. Friendli Container is our service that allows customers to download our inference engine as Docker images and deploy it in their chosen environment, such as private clouds or on-premises. Our Friendli Container can be adopted directly to AWS EKS clusters using our EKS add-on product. You will work directly on our customers’ projects, collaborating with their engineering teams to solve AI inference challenges like scaling, orchestration, and monitoring. This is a hands-on, customer-embedded role. If you have worked in DevOps, platform engineering, or SRE for AI applications, this is your ideal position. Key Responsibilities Design and implement large-scale deployment architectures for LLM and multimodal inference Deploy and manage containerized workloads across Kubernetes clusters Diagnose production issues, such as performance bottlenecks, and implement temporary fixes as needed Collaborate with customers’ DevOps teams to integrate FriendliAI’s infrastructure into their CI/CD workflows Develop scripts, Helm charts, and Terraform modules that simplify repeated deployments Contribute field insights to shape our platform reliability, observability, and scaling strategies Lead workshops, technical sessions, or webinars to help customers master infrastructure best practices. Qualifications 3+ years of experience in cloud infrastructure, DevOps, or reliability engineering Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent Proficiency with Kubernetes, Docker, Terraform, and Helm Strong foundation in distributed systems, networking, and performance tuning Experience with GPU-based computing and generative AI model serving workloads Strong technical background in backend systems or AI tooling Experience operating workloads on AWS, GCP, or OCI Excellent problem-solving and debugging skills in real-world environments Preferred Experience Experience deploying large models (LLMs, diffusion models) on GPUs or clusters Familiarity with inference frameworks (Triton, vLLM, TensorRT, DeepSpeed-Inference) Familiarity with observability stacks (Prometheus, Grafana, Loki, ELK, OTEL) Understanding of networking security and compliance frameworks (e.g., SOC 2) Experience supporting on-prem or hybrid-cloud deployments Benefits A front-row seat to the generative AI infrastructure revolution Competitive compensation and benefits package Daily lunch and dinner provided; unlimited snacks and beverages Health check-up and top-tier hardware support Flexible working hours and a highly collaborative environment About us FriendliAI is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure powers high-throughput, low-latency workloads for global organizations and integrates directly with Hugging Face, providing instant access to over 500,000 open-source models. We are on a mission to deliver the world’s best platform for AI inference. #J-18808-Ljbffr FriendliAI
- ABOUT BASETEN Baseten powers mission‑critical inference for the world’s most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay... ...turn to to ship AI products. THE ROLE As a Solution Architect (AI/LLM Inference) at Baseten you will partner closely...SuggestedFlexible hours
- Baseten is seeking a Solution Architect (AI/LLM Inference) to work closely with Sales and customers in San Francisco. This role involves translating business needs into technical solutions, conducting demos, and managing POCs. Ideal candidates will possess a strong AI/...Suggested
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...SuggestedRemote job- A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical strategy for performance architecture and addressing complex performance issues ensuring industry-leading service...SuggestedRemote job
$140k - $230k
Arize AI, Inc is looking for an AI Sales Engineer, Digital Native to join our remote-first team. This role involves working closely with Account Executives to convey Arize's value proposition and aligning with prospective customers throughout the sales process. The ideal...SuggestedRemote job$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...- ...our people, culture, and innovative solutions. With expertise in Managed Services,... ...Avahi is seeking an experienced AI/ML Solutions Architect to join our passionate team. This role... ...GenAI domains (e.g., MLOps, ML training, inference, data engineering, model evaluation,...Remote workFlexible hours
- ...BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion,... ...directly with customers to architect, build, and deploy high‑scale production... ...customer success, and pre‑sales solution engineering mixed in. EXAMPLE INITIATIVES...Work experience placementFlexible hours
- Eon is looking for a Field Data Engineer (FDE) to build and deploy data solutions for major enterprises. You'll take ownership of technical relationships, transforming real business problems into customer-ready data solutions quickly. The ideal candidate will have significant...
- An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming skills and a solid foundation in data structures and algorithms. The ideal candidate...
- ...looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability. You will build fine-tuning infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks and ensuring...
- A technology firm specializing in AI solutions is seeking a Forward Deployment Engineer to embed with enterprise clients. The role involves designing tailored AI solutions, leading implementations, and collaborating with stakeholders. The ideal candidate will have extensive...
$143k - $210k
...is The Essential Cloud for AI™. Built for pioneers by pioneers... .... We hire technical, AI Solution Architects who want to operate the full... ..., Weave, observability, and inference. You’ll help these customers... ...the commercial motion, with Specialist Field Engineers for deep domain...Permanent employmentTemporary workCasual workWork at officeFlexible hours- ...customers are now running real AI workloads on top of us — LLM... ...they need someone who can architect that layer with them, not just... ...communication patterns. Securing inference traffic across multi-cloud... ...customer-facing technical role — Solutions Architect, Customer Engineer,...Remote workWork from homeFlexible hours
- ...backed UK startup pioneering a breakthrough AI accelerator for data centers which uses... ...deep and commercially astute Senior Solutions Architect to own the technical heart of our... ...bring the world's first optical compute inference platform to market. You will be the person...
- FuriosaAI is looking for a Solutions Architect to bring the full potential of our powerful RNGD chips... ...as the primary technical authority in AI/LLM model deployments. From running... ...LLM landscape — tracking model releases, inference frameworks, and serving stack evolution...
- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...Flexible hours
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples...$180k - $247.5k
As a Specialist Solutions Architect (SSA) - AI/ML Engineering, you will be the trusted technical ML & AI expert to both Databricks customers and the Field... ..., including agents, end-to-end ML pipelines, training/inference optimization, integration with cloud‑native services,...Local areaRemote workWorldwide$176.6k - $239k
...Senior Solution Architect, Strategic Accounts Design and deliver production‑grade AWS architectures that embed generative AI capabilities across every workload, from modernizing applications with intelligent automation to transforming data pipelines, security operations...Flexible hours- A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience...
$155.19k - $207.62k
salesforce.com, inc. is seeking a Manager of Solution Engineering in San Francisco to build and scale a team responsible for translating complex data environments into outcome-driven AI strategies. This role requires a leader focused on coaching talent and driving executive...$148k - $185k
...new ideas and products. As a Data Scientist expert in causal inference and marketing mix models (MMM), you will lead our efforts to measure... ...Deliver results across the entire lifecycle of data science solutions for Growth: from defining the problem with cross-functional...Hourly payFull timeWork at officeLocal area3 days per week$125.9k - $231.1k
...teams and take your career wherever you want it to go. Join EY and help to build a better working world. Microsoft 365 AI Solution Architect (Manager) EY advises clients to understand, architect, select and implement bleeding edge solutions required to efficiently...Summer holidayFlexible hours$124k - $280k
...Engineering Industry/Sector: Not Applicable Time Type: Full time Travel Requirements: Up to 20% The Opportunity As a Solution Architect Senior Manager, you will play a pivotal role in driving digital transformation and enhancing business performance within our...Full timeH1b- ...Nashville, Phoenix, Pittsburgh, McLean, Atlanta, Charlotte, Detroit, Columbus, Cleveland, Akron, Cincinnati, Miami Microsoft 365 AI Solution Architect (Manager) EY advises clients to understand, architect, select and implement bleeding edge solutions required to efficiently...
- A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...
- A leading AI fashion-tech company is seeking a Software Engineer Intern to focus on building infrastructure for AI systems. This role involves designing scalable models, developing APIs, and optimizing for performance and reliability. An ideal candidate will have a strong...InternshipImmediate start
$80 - $85 per hour
...Our client, a leader in the manufacturing, automotive, and aerospace industries, is seeking a dedicated and skilled Enterprise Solution Architect to join their dynamic team. As an Enterprise Solution Architect, you will be an integral part of the IT department supporting...Weekly payTemporary workRemote workFlexible hours- Fractal is seeking an EdTech Sales Consultant & Solutioning Expert to drive enterprise growth across AI-led learning solutions in San Francisco. This role involves consultative selling, Solution design, and pursuing large deals. The ideal candidate will have 15-20 years...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Solutions Architect - AI Inference Specialist. Be the first to apply!
- mobile solution architect San Francisco, CA
- enterprise solution architect San Francisco, CA
- solution designer San Francisco, CA
- senior solutions architect San Francisco, CA
- solution engineering manager San Francisco, CA
- contact center solution architect San Francisco, CA
- senior cloud solutions architect San Francisco, CA
- entry level aws solution architect San Francisco, CA
- anaplan senior solutions architect San Francisco, CA
- solutions architect San Francisco, CA


