Senior AI Model Serving Engineer Low-Latency Inference
Jobleads-US
A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates will have a strong foundation in algorithms and system design, along with a passion for mentoring others. The position offers a competitive salary and generous benefits. #J-18808-Ljbffr Jobleads-US
- A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal...Senior
$220k - $320k
ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models...Suggested3 days per week$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate has over 5 years of software engineering experience, strong familiarity with ML architectures, and experience...Senior$166k - $225k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides... ...models. It offers real-time, low-latency inference, governance, monitoring, and... ...and cost efficiency. As a Senior Engineer, you’ll play a critical role...SeniorLocal areaWorldwide- MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability... ...in production-grade serving infrastructure, be fluent in Python...Senior
$167.2k - $209k
...DigitalOcean is expanding its AI Infrastructure layer to... .... We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role... ...can deploy and scale their models with industry-leading performance... ...distributed inference serving frameworks such as llm‑d,...SeniorLocal areaRemote workWorldwideFlexible hours$220k - $320k
...Help us make inference blazingly fast. If you... ...specialized language models for companies that... ...frontier-quality AI at a fraction of... ...ten-person team of engineers who work in-person... ...with the goal of serving models faster and... ...inference performance: latency, throughput, cost...SeniorWork at office$200k
Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position...Work at office- Sciforium is an AI infrastructure... ...multimodal AI models and a proprietary... ...-efficiency serving platform. Backed... ...support from AMD engineers the team is... ...internal performance, latency, and efficiency... ...with low‑level performance... ...model training or inference systems. Contributions...Flexible hours
$225k
...generation to improve models and solve alignment more... ...ultra‑long context, and inference‑time compute to achieve... ...The Role As a Software Engineer on the Inference & RL... ...distributed systems that serve our models in... ...that determine inference latency, throughput, stability,...SeniorRelocationVisa sponsorship$142.7k - $270.95k
...is seeking a Senior researcher - Machine... ...& Efficiency Engineer to join our R&... ...in inference performance, latency, and cost efficiency... ...intersection of model architecture,... ...Intelligence (AI), ML systems,... ...Responsibilities Inference & Serving Optimization:... ...-throughput, low-latency...SeniorFull timeTemporary workLocal areaWorldwide- A leading AI platform company in San Francisco is looking for a Senior Infrastructure Engineer to design and operate production infrastructure for high-scale, low-latency systems. Your focus will be on critical services, improving reliability, and enhancing developer velocity...Senior
- ...Role Our generative AI-powered products are... ...of medicine—and the inference systems that power them... ...We’re looking for an Engineering Manager to lead and grow our Model Inference team. The... ...of how our models are served: from architecting low-latency, high-throughput infrastructure...Hourly payFull timeFlexible hours
- ...research, design, and engineering. Our organization... ...will lead the Model Routing & Inference team at Cursor,... ...that powers every AI interaction in... ...calls that balance latency, cost, reliability... ...high‑throughput, low‑latency... ...especially in inference serving, traffic routing,...
$250.8k - $286.2k
...responsible and reliable AI systems, changing... ...science and engineering teams to deliver our... ...reimagine how we serve our customers and... ...customers. Our AI models and platforms empower... ...language model inference, similarity search... ...scalability, cost, latency, throughput — of...SeniorFull timePart timeLocal area- ...BASETEN Baseten powers inference for the world's most dynamic AI companies, like... ...bring cutting-edge models into production. With... ...systems, model serving, and developer experience... ...record of owning low‑latency, reliable backend... ...open-source inference engines (vLLM, TensorRT-LLM...Flexible hours
- Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer... ...with strict latency and reliability requirements... ...and real-time inference serving across multiple regions... ...industries. Lead - AI/ML Stack Infrastructure... ...containerized AI models. The engineer will lead...SeniorTemporary work
- YO IT Consulting is seeking an experienced Senior Civil Engineer specializing in evaluating AI-generated content. This remote role involves ensuring technical accuracy, challenging AI models with real-world engineering scenarios, and shaping AI communication standards....SeniorRemote job
$166.9k - $225.9k
...operates as both a central engineering function and an embedded reliability... ...with peers—while also serving as the dedicated reliability... ...Experience with AIOps—using AI/ML‑based tooling for anomaly... ...‑backed services (e.g., LLM inference latency, non‑determinism, prompt...SeniorFlexible hours- Xcede is looking for a Member of Technical Staff focused on AI Safety to lead red-teaming efforts and ensure the robustness of next... ...should have deep expertise in LLM safety, strong software engineering skills, and relevant academic qualifications in AI or related fields...Senior
- ...technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional teams. Ideal...
$325k
About the Team Our Inference team brings OpenAI's most capable research... ...access our start-of-the-art AI models, allowing them to do things... ...Role We are looking for an engineer who wants to take the world'... ...for use in a high-volume, low-latency, and high-availability...- ...Company: Sequen AI is leading the... ...building frontier ranking models for search and... ...frontier AI models serve production traffic at... ...for a Core Systems Engineer with a deep mastery... ...and compute complex inference logic with ultra-low latency. You will replace high...Senior
- A technology startup in San Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience...
- A tech startup focused on AI workloads is seeking a Member of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and... ...Ideal candidates should have strong software engineering skills and experience with ML inference...Senior
$192k - $260k
...world’s best data and AI infrastructure platform... ...business. Foundation Model Serving is the API Product for... ...frontier AI model inference for open source models... ...necessary. We’re looking for engineers who have owned high‑... ...high‑throughput, low‑latency inference on GPU workloads...Local areaWorldwide$192k - $260k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides enterprises... .... It offers real-time, low-latency inference, governance, monitoring, and... ...cost efficiency. As a Staff Engineer, you’ll play a critical role...Local areaWorldwide$250k
...Ready to architect AI infrastructure... ...a serverless inference platform, beginning... ...expanding into low-latency, real-time inference and custom model hosting. This is... ...chance to join as a Senior Inference Platform Engineer at an early... ...latest models, serving frameworks, and...SeniorPermanent employment- Pantera Capital is looking for a Model Performance Engineer in San Francisco, California to optimize model inference speed, cost, and reliability.... ...infrastructure that accelerates the AI team’s processes. The role covers optimizing serving frameworks and ensuring...
$217k - $312.2k
...the world's best data and AI infrastructure platform so... ...their business. Databricks’ Model Serving product provides... ...models. It offers real‑time, low‑latency inference, governance, monitoring, and... ...and cost efficiency. As a Senior Engineering Manager, you will lead the...SeniorLocal areaWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Model Serving Engineer Low-Latency Inference. Be the first to apply!
- ai research engineer San Francisco, CA
- ai developer San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- machine learning ai engineer San Francisco, CA
- senior data management analyst San Francisco, CA
- senior app developer San Francisco, CA


