Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open-source technologies and contributing to community projects. Candidates should have a solid understanding of distributed systems and familiarity with deep learning frameworks, ideally with experience in PyTorch and Ray. Anyscale offers competitive compensation and extensive benefits, including healthcare coverage and stock options. #J-18808-Ljbffr Anyscale
- Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...Suggested
- ...About Us Most AI is frozen in place... ...intelligence - the inference services that serve LLMs at scale and the data pipelines... ...Researchers and ML engineers will hand you... ...Design and operate distributed inference systems for... ...on experience with LLM inference engines (...SuggestedFlexible hours
- ...on a mission to democratize distributed computing and make it accessible... ...accelerate the progress of AI applications out into the... ...developer or data scientist can scale an ML application from their... ...the role As a Distributed LLM Inference Engineer, you will help with systems...SuggestedWork at office
- ...mission‑critical inference for the world's most dynamic AI companies, like... ...build the platform engineers turn to to ship... ...system for distributed, heterogeneous AI... ...believe that as LLM and multi‑modal workloads scale, the network is... ...operates at wire‑speed. In this role,...SuggestedFlexible hours
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About... ...the Role At Together.ai, we are building state-of... ..., develop, and optimize distributed inference engines that support... ...and language models at scale. This role will focus on...SuggestedFull time- A dynamic AI company in San Francisco is looking for an Applied AI Inference Engineer to develop and deploy high-scale production AI applications. You will partner with customers to transform business goals into reliable services while engaging in software development...Flexible hours
$167.2k - $209k
...DigitalOcean is expanding its AI Infrastructure layer... ...seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In... ...and delivering high-scale, resilient data... ...the intersection of distributed systems and... ...frameworks such as llm‑d, NVIDIA Dynamo, or...Local areaRemote workWorldwideFlexible hours$146.5k
...team: The ML Data Engineering team powers metadata extraction... ...operate at massive scale, supporting diverse... ...data engineering, and distributed systems, collaborating... ...deploy scalable ML and LLM-powered solutions in production... ...-edge generative AI and metadata enrichment...For contractorsLocal areaWorldwideHome officeFlexible hours- ...AI Systems Engineer Transluce is a fast-moving research... ...systems that can scale to thousands of... ...: Inference stacks that are as... ...Behavior elicitation: Distributed RL training and roll... ...internal tools to speed up the team Help... ...Bonus: can set up LLM pipelines, e.g. multiple...Flexible hours
- ...* **Move at Drata Speed (Precision & Velocity... ...as both a central engineering function and an... ...stack to help Drata scale reliably for a... ...SLO tracking, and distributed tracing* Experience... ...with AIOps - using AI/ML-based tooling for... ...services (e.g., LLM inference latency, non-determinism...Work at officeImmediate startWorldwideMonday to FridayFlexible hours
$142.2k - $204.6k
...Role As a software engineer for GenAI inference, you will help design... ...language model (LLM) serving systems are... ...optimized for large-scale LLMs inference Collaborate... ...with federated, distributed inference infrastructure... ...is the data and AI company. More than 10...Local areaWorldwide- ...AI/ML Engineer (RL & Physical Systems) FLUIX is... ...systems to power distribution, where milliseconds... ...and real megawatt-scale infrastructure.... ...Support integration of LLM-based tools and... ...knowledge distillation, inference orchestration, etc... ...at startup speed. Bonus Points...Weekend work
- ...are hiring Software Engineers focused on AI Infrastructure to build... ...at production scale. This role exists because... ...orchestration, large-scale inference systems, performance... ...pipelines Distributed GPU infrastructure... ...and moves at lightning speed. You'll have the autonomy...InternshipImmediate start
- ...Staff+ Software Engineer, Inference Runtime Remote-Friendly... ...interpretable, and steerable AI systems. We want AI to... ...customers with the speed, reliability, and... ...efforts spanning serving, scaling, and accelerator teams... ..., large-scale distributed systems serving millions...Work at officeRemote workVisa sponsorshipFlexible hours
- ...our platform delivers AI inference. Validating whether inference... ...for a dedicated QA engineer who can own the... ...strategies that account for LLM inference. Work... ...Strong experience testing distributed systems with multiple... ...in an early-stage or scaling environment....WorldwideFlexible hours
- ...Langfuse Open Source LLM Engineering Platform that helps teams build useful AI applications via... ...mainly booth scans and swag distribution. It is not a pure... ...ClickHouse for tracing at scale, S3 for file storage,... ...develops at breakneck speed and our customers are at...Part timeWork at officeRemote work
$180k - $310k
...the role You'll build and scale the application and data... ...shipping velocity. As Software Engineer on the Platform team, you'll collaborate... ...and implement scalable APIs, distributed systems, and data... ...systems, event pipelines, or AI-powered applications (Nice to...Full timeWork at officeWork from home- Health Harbor, located in San Francisco, is seeking experienced engineers to build and scale their Voice AI LLM and orchestration system. The role demands strong problem-solving skills and the ability to work under high pressure, with a commitment of about 70 hours a week...Flexible hours
$230k - $385k
...factors. We strive to seamlessly blend high-level AI capabilities with the constraints of physical systems... ...' lives. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large-scale multimodal...Work at officeRelocation package$150k - $237.5k
...low-cost and large-scale energy storage and... ...Senior Software Engineer, Energy Storage... ...we build and the speed at which we build... ...over time * Apply AI tools to accelerate... ...scalable, and secure distributed systems *... ...information, and inferences drawn from your PI...Full time$139.2k - $174k
...DigitalOcean is expanding its AI Infrastructure... ...seeking a Senior Engineer 2 to play a key... ...running AI workloads— inference, training, fine‑tuning— at scale. In this high‑... ...between high‑scale distributed systems and specialized... ...orchestration for LLM inference and...Local areaRemote workWorldwideFlexible hours$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA... ...interpretable, and steerable AI systems. We want AI... ...Cloud Inference team scales and optimizes Claude... ..., large-scale distributed systems serving millions... ...Are curious about LLM serving; prior inference...Work at officeVisa sponsorshipFlexible hours- ...Principal AI Engineer (LLM Agents & Orchestration) Job Title: Principal AI Engineer (... ...GenAI companies in the world. We've scaled faster than most funded startups - with... ...Latency & Reliability: Optimize inference pipelines for speed (streaming, token optimization) and...
$216k - $270k
...Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative... ...knowledge retrieval, inference, evaluation, and more... ...Senior Full-Stack Engineer to help us build, scale... ...Python, working with distributed systems, data pipelines, and ML/LLM components....Full time$255k - $405k
...on integrating multimodal functionalities into our AI products, ensuring they are reliable, user‑friendly... ...broad societal benefit. About the Role As a Software Engineer, Distributed Data Systems, you will design and scale the infrastructure that powers large‑scale...Full timeWork at officeLocal areaRelocation packageFlexible hours- ...Technical Lead for Inference & ML Performance... ...next generation of AI products. We build... ...production, and do it at scale without compromise... ...fal's inference engine and ensure our... ..., ML compilers, distributed inference) to build... ...enhancing inference speed and scalability....
$192k - $260k
...Databricks Databricks is the data and AI company. More than 10,000... ...: MS or PhD in databases, distributed systems. Comfortable working... ...system that combines the scale and cost-efficiency of data lakes... ...of real-world data engineering architecture. Delta Pipelines...Worldwide$250k - $380k
...Full time Department Scaling Compensation $250K -... ...and running OpenAI’s LLM training and inference infrastructure that powers... ...are looking for an engineer to design and... ...performance bottlenecks in distributed dataset loading (e.g.... ...OpenAI OpenAI is an AI research and deployment...Full timeWork at officeLocal areaRelocation packageFlexible hours- ...Inference Engine Engineer We build and run the inference engine behind... ...dozens of model architectures at scale with tight latency and cost... ...You understand modern LLM architectures and are able to... ...built and operated production distributed systems under real load -...
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco... ...the Role Together AI is building the Inference... ...and speech models at scale. If you get a thrill... ...reduce model compute and speed up responses.... ...scale, fault-tolerant, distributed systems and API microservices...Full timeLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Distributed LLM Inference Engineer - Scale AI at Speed. Be the first to apply!

