LLM Inference Frameworks and Optimization Engineer
Gravity Engineering Services Pvt Ltd.
About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency. We are seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and language models at scale. This role will focus on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design, ensuring efficient large-scale deployment of LLMs and vision models. This role offers a unique opportunity to shape the future of LLM inference infrastructure, ensuring scalable, high-performance AI deployment across a diverse range of applications. If you're passionate about pushing the boundaries of AI inference, we’d love to hear from you! Responsibilities Inference Framework Development and Optimization Design and develop fault-tolerant, high-concurrency distributed inference engine for text, image, and multimodal generation models. Implement and optimize distributed inference strategies, including Mixture of Experts (MoE) parallelism, tensor parallelism, pipeline parallelism for high-performance serving. Apply CUDA graph optimizations, TensorRT/TRT-LLM graph optimizations, and PyTorch-based compilation (torch.compile), and speculative decoding to enhance efficiency and scalability. Software-Hardware Co-Design and AI Infrastructure Collaborate with hardware teams on performance bottleneck analysis, co-optimize inference performance for GPUs, TPUs, or custom accelerators. Work closely with AI researchers and infrastructure engineers to develop efficient model execution plans and optimize E2E model serving pipelines. Requirements Must‑Have: Experience: 3+ years of experience in deep learning inference frameworks, distributed systems, or high-performance computing. Technical Skills: Familiar with at least one LLM inference frameworks (e.g., TensorRT‑LLM, vLLM, SGLang, TGI(Text Generation Inference) ). Background knowledge and experience in at least one of the following: GPU programming (CUDA/Triton/TensorRT), compiler, model quantization, and GPU cluster scheduling . Deep understanding of KV cache systems like Mooncake, PagedAttention, or custom in‑house variants. Programming: Proficient in Python and C++/CUDA for high-performance deep learning inference. Optimization Techniques: Deep understanding of Transformer architectures and LLM/VLM/Diffusion model optimization . Knowledge of inference optimization , such as workload scheduling, CUDA graph, compiled, efficient kernels. Soft Skills: Strong analytical problem‑solving skills with a performance‑driven mindset. Excellent collaboration and communication skills across teams. Nice‑to‑Have: Experience in developing software systems for large‑scale data center networks with RDMA/RoCE . Familiar with distributed filesystem (e.g., 3FS, HDFS, Ceph ). Familiar with open‑source distributed scheduling/orchestration frameworks, such as Kubernetes (K8S) . Contributions to open‑source deep learning inference projects. #J-18808-Ljbffr Gravity Engineering Services Pvt Ltd.
$160k - $230k
...enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference frameworks, algorithms, and... ...Frameworks and Optimization Engineer to design, develop, and optimize... ...optimizations, TensorRT/TRT‑LLM graph optimizations, PyTorch...SuggestedFull time$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...Suggested- Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low...Suggested
- Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for... ...in the field. Familiarity with deep learning frameworks like PyTorch and knowledge of distributed systems...Suggested
- ...raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of performance for... ...Familiarity with deep learning and deep learning frameworks (e.g. PyTorch) Solid understanding of distributed...SuggestedWork at office
$190.9k - $232.8k
A leading data and AI company is seeking a Staff Software Engineer for GenAI inference to lead the architecture and optimization of the inference engine. The role requires expertise in CUDA, GPU programming, and distributed systems design. Ideal candidates will have a...- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance... ...of distributed systems and familiarity with deep learning frameworks, ideally with experience in PyTorch and Ray. Anyscale...
$150k - $300k
Prime Intellect is looking for a skilled ML Systems Engineer to build and optimize LLM serving infrastructure and inference systems. This hybrid role involves contributing to the scalability of their reinforcement learning training. Successful candidates will have over...Relocation package- ...platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role... ...implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT. The ideal candidate...
$167.2k - $209k
...We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this... ...standards. Performance Optimization: Implement and optimise distributed... ..., or Modular. Inference Frameworks: Familiarity with... ...serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray...Local areaRemote workWorldwideFlexible hours- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...Flexible hours
$200k
Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position...Work at office- Gravity Engineering Services Pvt Ltd. is seeking a talented individual in San Francisco to architect and implement robust, scalable inference systems for AI models. This in-person role focuses on optimizing model serving infrastructures for high throughput and low latency...
$300k
...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The... ..., a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base...Visa sponsorshipRelocation package$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience... ..., fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation of $220,0...Local area$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking... ...and more. Invent and introduce state-of-the-art LLM optimization techniques to improve the performance - scalability,...Full timePart timeLocal area$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform) Overview: At Capital One, we are creating responsible and reliable AI... ...PyTorch, and more. ~ Invent and introduce state-of-the-art LLM optimization techniques to improve the performance -...Full timePart timeLocal area$90 - $125 per hour
A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly rate...Remote jobHourly payFor contractors- Zensors is seeking a Machine Learning Engineer focused on ML Runtime & Optimization to enhance our visual sensing platform. The role involves optimizing machine learning pipelines and collaborating with AI research teams to implement high-performance algorithms. Ideal candidates...
- ...currently Tuesday. The Field Engineering team is a group of ML... ...‑on with customers to optimize, deploy, and scale ML workloads... ...‑on experience in ML inference, model optimization, benchmarking... ...Have Familiarity with LLM inference optimization frameworks (vLLM, sgLang, Modular,...Hourly paySummer workInternshipWork at officeLocal areaFlexible hours
- ...ROLE You build and operate the inference systems that serve our... ...serving infrastructure, runtime optimization, and the long tail of... ...real workloads. This is an engineering role, not a research role. You... ...contributions to inference / serving frameworks Experience with mixed cloud...
- ...of YC and unicorn founders and senior engineers with deep expertise in 3D, generative... ...looking for a Founding Engineer, ML Inference with deep expertise in high-performance... ...stack, designing novel inference frameworks, optimizing inference performance, and shaping the...RelocationVisa sponsorshipRelocation package
- Principal AI Engineer (LLM Agents & Orchestration) Role Title: Principal AI Engineer (LLM... ...stateful agentic workflows (using frameworks like LangGraph or custom Python/TypeScript... ...reliably. Latency & Reliability: Optimize inference pipelines for speed (streaming, token...
- ...Make: As a staff software engineer, you will lead two areas that... ...strategy for workflow and backend optimization. Lead and contribute to... ...-tune AI capabilities for AI/LLM-driven scenarios. ~... ...databases, and orchestration framework. ~ Proficiency in crafting...Work experience placementFlexible hours
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience... ...optimization techniques and debugging inference frameworks. The role offers a competitive salary of...$220k
We build and run the inference engine behind every Perplexity query and deploy... .... You understand modern LLM architectures and are able... ...touched any of ML compilers and framework internals: PyTorch internals... ...architectures and inference optimization techniques (e.g....$225k
...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will... ...high‑performance inference serving systems Optimize KV‑cache management, batching strategies, and...RelocationVisa sponsorship- ...analytics company in San Francisco is seeking an experienced engineer to support AI applications focused on safety and performance. The role involves architecting frameworks, building modular agents, and scaling LLM infrastructure. Ideal candidates have significant backend...
$264.8k - $331k
...clients. As an ML Sys Research Engineer, you'll work on building out... ...-of-the-art technologies to optimize our ML system. Your customer... ...optimize our training and inference framework. Post-train state of the... ...At least 1-3 years of LLM training in a production environment...Full time- ...practice of medicine—and the inference systems that power... ...We’re looking for an Engineering Manager to lead and... ...pushing the frontier of LLM serving techniques.... ...Research teams on model optimization, quantization, and deployment... ...systems and inference frameworks (e.g., PyTorch,...Hourly payFull timeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM Inference Frameworks and Optimization Engineer. Be the first to apply!


