Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open-source technologies and contributing to community projects. Candidates should have a solid understanding of distributed systems and familiarity with deep learning frameworks, ideally with experience in PyTorch and Ray. Anyscale offers competitive compensation and extensive benefits, including healthcare coverage and stock options. #J-18808-Ljbffr Anyscale
- ...About Us Most AI is frozen in place... ...intelligence - the inference services that serve LLMs at scale and the data pipelines... ...Researchers and ML engineers will hand you... ...Design and operate distributed inference systems for... ...on experience with LLM inference engines (...SuggestedFlexible hours
- ...on a mission to democratize distributed computing and make it accessible... ...accelerate the progress of AI applications out into the... ...developer or data scientist can scale an ML application from their... ...the role As a Distributed LLM Inference Engineer, you will help with systems...SuggestedWork at office
- ...mission‑critical inference for the world's most dynamic AI companies, like... ...build the platform engineers turn to to ship... ...system for distributed, heterogeneous AI... ...believe that as LLM and multi‑modal workloads scale, the network is... ...operates at wire‑speed. In this role,...SuggestedFlexible hours
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About... ...the Role At Together.ai, we are building state-of... ..., develop, and optimize distributed inference engines that support... ...and language models at scale. This role will focus on...SuggestedFull time- ...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing... ...infrastructure for large-scale multimodal models, focusing on high-... ...product teams to push the boundaries of AI technology, ensuring reliable production...Suggested
- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...Flexible hours
- ...looking for an individual to design and implement high-performance scheduling systems for AI inference processes. This role requires strong foundational knowledge in distributed systems and an eagerness to work closely with agent-based technologies. The ideal candidate...
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for... ...performance - scalability, cost, latency, throughput - of large scale production AI systems. Contribute to the technical...Full timePart timeLocal area$146.5k
...team: The ML Data Engineering team powers metadata extraction... ...operate at massive scale, supporting diverse... ...data engineering, and distributed systems, collaborating... ...deploy scalable ML and LLM-powered solutions in production... ...-edge generative AI and metadata enrichment...Local areaWorldwideHome officeFlexible hours- ...to enable enterprises to implement AI workloads effectively. The role involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely... ...Terraform, and a solid understanding of distributed systems. Benefits include...Flexible hours
$165k
...partner with top AI labs,... ...compute at the speed of light. We’re... ...About the Role Inference is now the defining... ...customers run on it: LLM serving... ...intersection of distributed systems, model... ...supports them at scale. Profile and resolve... ...software engineering experience with...Local area$180k - $275k
...this role matters now AI has dramatically lowered... ...producing abusive content at scale no longer requires... .... This means designing distributed systems for real-time... ...’ll collaborate across engineering, product, and design to... ...activity Leverage AI/LLM-based detection to stay...Full timeWork at officeWork from home$146.5k - $228k
...the team: The ML Data Engineering team powers metadata extraction... ...operate at massive scale, supporting diverse... ...data engineering, and distributed systems, collaborating... ...deploy scalable ML and LLM-powered solutions in production... ...-edge generative AI and metadata enrichment...Temporary workLocal areaWorldwideHome officeFlexible hours$142.2k - $204.6k
...Role As a software engineer for GenAI inference, you will help design... ...language model (LLM) serving systems are... ...optimized for large-scale LLMs inference Collaborate... ...with federated, distributed inference infrastructure... ...is the data and AI company. More than 10...Local areaWorldwide- ...Senior Software Engineer, LLM Performance SF Bay Area... ...Parasail is redefining AI infrastructure by enabling... ...deployment across a distributed network of GPUs,... ...efficiently at enterprise scale while driving continuous... .... Contributions to inference engines such as vLLM is...
- ...AI/ML Engineer (RL & Physical Systems) FLUIX is... ...systems to power distribution, where milliseconds... ...and real megawatt-scale infrastructure.... ...Support integration of LLM-based tools and... ...knowledge distillation, inference orchestration, etc... ...at startup speed. Bonus Points...Weekend work
- ...* **Move at Drata Speed (Precision & Velocity... ...as both a central engineering function and an... ...stack to help Drata scale reliably for a... ...SLO tracking, and distributed tracing* Experience... ...with AIOps - using AI/ML-based tooling for... ...services (e.g., LLM inference latency, non-determinism...Work at officeImmediate startWorldwideMonday to FridayFlexible hours
$300k
...interpretable, and steerable AI systems. We want... ...researchers, engineers, policy experts,... ...the role Our Inference team is... ...We tackle complex, distributed systems challenges... ...performance, large-scale distributed systems... ...management systems LLM inference optimization...Work at officeWorldwideVisa sponsorshipFlexible hours- ...are hiring Software Engineers focused on AI Infrastructure to build... ...at production scale. This role exists because... ...orchestration, large-scale inference systems, performance... ...pipelines Distributed GPU infrastructure... ...and moves at lightning speed. You'll have the autonomy...InternshipImmediate start
- ...our platform delivers AI inference. Validating whether inference... ...for a dedicated QA engineer who can own the... ...strategies that account for LLM inference. Work... ...Strong experience testing distributed systems with multiple... ...in an early-stage or scaling environment....WorldwideFlexible hours
- ...—from edge devices to large-scale deployments. Our work spans... ...scalable training, efficient inference, and real-world deployment.... ...seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model... ..., implement, and optimize distributed training systems for large-scale...
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate... ...with ML architectures, and experience with distributed systems. This role involves collaboration with researchers...- ...Langfuse Open Source LLM Engineering Platform that helps teams build useful AI applications via... ...mainly booth scans and swag distribution. It is not a pure... ...ClickHouse for tracing at scale, S3 for file storage,... ...develops at breakneck speed and our customers are at...Part timeWork at officeRemote work
$150k - $237.5k
...low-cost and large-scale energy storage and... ...Senior Software Engineer, Energy Storage... ...we build and the speed at which we build... ...over time * Apply AI tools to accelerate... ...scalable, and secure distributed systems *... ...information, and inferences drawn from your PI...Full time- ...Software Engineer Opportunity at Abridge Abridge's services and engineering... ...by multiples. This is a distributed systems oriented role and is... ..., will be under tremendous scale, and presents many opportunities... ...and research as we pioneer new AI-first cloud-native-first...
$180k - $310k
...the role You'll build and scale the application and data... ...shipping velocity. As Software Engineer on the Platform team, you'll collaborate... ...and implement scalable APIs, distributed systems, and data... ...systems, event pipelines, or AI-powered applications (Nice to...Full timeWork at officeWork from home$230k - $385k
...factors. We strive to seamlessly blend high-level AI capabilities with the constraints of physical systems... ...' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal...Work at officeRelocation package- ...Principal AI Engineer (LLM Agents & Orchestration) Job Title: Principal AI Engineer (... ...GenAI companies in the world. We've scaled faster than most funded startups - with... ...Latency & Reliability: Optimize inference pipelines for speed (streaming, token optimization) and...
$216k - $270k
...Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative... ...knowledge retrieval, inference, evaluation, and more... ...Senior Full-Stack Engineer to help us build, scale... ...Python, working with distributed systems, data pipelines, and ML/LLM components....Full time$166k - $225k
...running the world's best data and AI infrastructure platform so... ...their business. Founded by engineers — and customer obsessed — we... ...for interfacing with data to scaling our services and infrastructure... ...building the next generation distributed data storage and processing...Local areaWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale AI at Speed. Be the first to apply!

