Senior AI Inference Engineer llama.cpp specialist 100% Remote
Framework Ventures
About the job You'll work on the C++ layer that powers local AI, porting and enhancing inference engines like llama.cpp, ONNX and similar, to run efficiently on edge devices. Your focus is on the runtime: making models load faster, run leaner, and perform well across different hardware. You'll ensure that the inference layer is stable, optimized, and ready for integration with the rest of the stack. This role is for engineers who want to work close to the metal, enabling private and fast on-device AI without relying on cloud infrastructure. Responsibilities Work on deploying machine learning models to edge devices using the frameworks: llama.cpp, ggml, ONNX. Collaborate closely with researchers to assist in coding, training and transitioning models from research to production environments. Integrate AI features into existing products, enriching them with the latest advancements in machine learning. Qualifications Excellent programming skills in C++, experience in Javascript is a bonus Strong experience with Llama.cpp and ggml inference engines, which facilitates the deployment of models to specific GPU architectures Good understanding of deep learning concepts and model architectures Experience with transformers and LLMs Demonstrated ability to rapidly assimilate new technologies and techniques A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI & R&D #J-18808-Ljbffr
- ...to edge devices using the frameworks: llama.cpp, ggml, onnx. Collaborate closely with researchers... ...to production environments. Integrate AI features into existing products,... ...experience with Llama.cpp and ggml inference engines, facilitating the deployment of models...Remote workSenior
- ...Framework Ventures is seeking a C++ Engineer in Town of Norway, Wisconsin, specializing in AI development for edge devices. The role involves working on inference engines like llama.cpp and ONNX, ensuring optimized performance across hardware. You'll deploy machine learning...Suggested
- ...Framework Ventures is seeking a C++ Engineer to work on the AI layer that powers local AI on edge devices... ...models using frameworks like llama.cpp and ONNX, collaborating with researchers... ...strong C++ skills and experience with inference engines, as well as a relevant degree...SuggestedLocal area
- ...Job We are looking for an experienced AI Model Engineer with deep expertise in kernel development... .... The engineer will extend the inference framework to support inference and fine... ...model architectures (e.g., Qwen, Gemma, LLaMA, Falcon, etc.). Experience implementing...Remote workSenior
$199.7k - $254.6k
...Join Cisco's CX AI Incubation Team as a Senior AI/MLDevOpsEngineer... ...collaborate with product and engineering teams to deploy... ..., including on-prem inference packaging, runtime optimization... ...,TensorRT-LLM, llama.cpp). ~... ...attainment between 75% and 100%; and ~ Once...SeniorFull timeTemporary workLocal areaFlexible hours- ...A cloud technology company is looking for a Senior Engineer 2 to enhance their AI Inference Optimization team. In this role, you will drive architectural... ...position offers competitive compensation and is fully remote, promoting a collaborative and innovative work environment...Remote workSenior
- ...TypeScript-Entwickler zur Entwicklung moderner Webapplikationen und zur aktiven Mitgestaltung an KI-Features. Die Position bietet 100% Remote-Arbeit, flexible Arbeitszeiten und überdurchschnittliche Vergütung. Geübte Kommunikation in Deutsch ist Voraussetzung, da die...Remote workSeniorFlexible hours
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...Remote workSenior$167.2k - $209k
A pioneering cloud service provider in Seattle seeks a Senior Engineer 2 for its AI Inference Data Plane team. This role requires designing and delivering... ...in GoLang or Python. Competitive salary range from $167,200 to $209,000 with remote work options. #J-18808-LjbffrRemote workSenior$100k
...Senior Information Technology Purchasing Specialist Must have two years of information technology purchasing experience... ...experience Pay up to $100,000 Must be a United States citizen... ...citizen or Green Card holder Partial remote with at least Mondays and Fridays...Remote workSeniorPermanent employmentFull timeWork at officeRelocationMonday to Friday- ...Owning the inference backbone for QVAC's local AI stack, the full-time AI Inference Engineer will work remotely to enhance C++ systems for efficient model deployment on edge devices... ...to edge devices using frameworks like llama.cpp, ggml, and ONNX Collaborate with...Remote workFull timeLocal area
$152k - $241.5k
...recently, GPU deep learning ignited modern AI - the next era of computing - with the... ...looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers for... ...our DLC has been the backbone of NVIDIA's inference engine, spanning across data centers,...Remote workSenior- ...Responsibilities Build AI applications using Llama (Llama 3 / Llama Stack / Llama API / local LLM inference) . Fine-tune and evaluate... ...via quantization, prompt engineering, and latency reduction.... ...Experience with llama.cpp , vLLM , Ollama , or NVIDIA...Remote workLocal area
$152k - $241.5k
A leading technology company in Austin is seeking a Senior Compiler Engineer for their AI Inference Platforms team. The role involves analyzing deep learning networks and developing optimization algorithms, requiring expertise in compiler technologies. Ideal candidates...Remote jobSenior- ...worldwide. Responsibilities: Design and execute causal inference and incrementality experiments (GEO experiments, matched markets... ...Partner with cross-functional stakeholders across regions and seniority levels Improve testing playbooks and ensure consistent...Remote workSeniorWorldwide
- ...About the job We’re seeking experienced AI infrastructure Engineers to design and implement robust, scalable pipelines for massive data workloads... ...of data and model workflows from prototyping to inference. Qualifications Proficient in Python with strong programming...Remote workSenior
$242k - $290k
...As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient... ...CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution... ...latency and maximize memory bandwidth on AI accelerators. Write production-level,...Remote workSeniorTemporary workRelocation package- ...Senior Paid Ads Specialist / Manager (100% Remote) Greater Delhi Area We are looking for a Senior Paid Ads Specialist / Manager to join our team of high-performing paid marketing experts and strengthen our digital advertising efforts. If you thrive on running and...Remote workSeniorFlexible hours
- ...a focused team of 3–5 senior engineers while remaining deeply... ...performance, privacy-preserving AI models that run... .... This is a Hybrid remote position located in a... ...optimized for on-device inference (Mac, iOS, Android,... ...CoreML, ONNX Runtime, or llama.cpp) ~ Experience...Remote workSeniorRelocation package
$110k - $140k
...for enterprises and AI innovators around... ...company. Vultr Cares 100% company‑paid... ...year $500 stipend for remote office setup in first... ...AI Platform Engineer to own the strategy... ...experience deploying LLM inference infrastructure and... ...open‑source models — Llama, Mistral, Qwen,...Remote workSeniorWork at officeImmediate startFlexible hours$160k - $190k
...Senior AI Engineer Paper is reimagining how schools support students so that every learner can... ...and tool-using agents. Build scalable inference systems with strict latency and cost... ...stipend to set-up your workspace and $100 monthly stipends to support with on-going...Remote workSenior- ...Senior Software Engineer (AI Engineer) Portugal, Remote Who We Are At Fluxon, we believe that how you build matters... ...ingestion, preprocessing, model inference, and output structuring... ...-tune open-source models (e.g., Llama, Mistral) for specific domain tasks...Remote workSeniorFlexible hours
$175k - $225k
...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla... ...We're looking for an AI Inference Engineer who lives at the boundary... ...autonomous navigation. Exposure to remote logging, log ingestion, and... ...this role, but do not meet 100% of the qualifications...Remote workLocal area- ...Senior AI Engineer WongDoody creates human experiences at 22 studios across... ..., we are team players and specialists - both in frontend and... ...Flexible working hours and 100% overtime compensation ~... ...ChatGPT, Claude, Gemini, or Llama into end-to-end workflow solutions...Remote workSeniorWork at officeLocal areaFlexible hours
$165k - $220k
...Description:**DataRobot delivers AI that maximizes impact... ...in the future. As an AI Engineer on our Professional... ...you.**This is a fully remote position with no requirement... ...as Langgraph, CrewAI, Llama Index* Generative AI:... ...to jobs when they meet 100% of the qualifications...Remote workSeniorFull timeWork at officeLocal areaWorldwideFlexible hours$140.4k
...Job Title: Senior AI Performance Engineer (CUDA / GPU / NVIDIA Stack) Duration:... ...Min 12+ Months Location: 100% Remote This is a hands-on engineering... ...tuning) Improve inference performance using... ...models such as YOLO, GPT, LLaMA, Transformers Strong...Remote workSeniorFull time- ...Our new initiative brings AI directly into this process... ...works. We're looking for a senior machine learning engineer to take the lead on this... ...now. Why join us? 100% remote based in the US Help shape... ...APIs and probabilistic inference reliably Work alongside...Remote workSeniorLocal area
$50 - $60 per hour
...Application Management Specialist NTT DATA strives... ...in New York/Dallas/Remote, New York (US-NY),... ...Build agentic AI systems: Design and... ...following MCP protocol. Engineer robust guardrails... ...., OpenAI, Gemini, Llama, Qwen, Claude). ~... ...the Fortune Global 100 and are committed...Remote workSeniorHourly pay- ...Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence... ...us and help build the platform engineers turn to to ship AI products. THE... ..., including meaningful equity. ~100% coverage of medical, dental, and...Remote workWork experience placementFlexible hours
- ...Senior ML/AI Engineer We're Sweed, a product-driven company... ...Engineer to join our team remotely and help us build the... ...-end engineers, QA specialists, analysts, and... ...Design scalable APIs and inference services for AI-driven... ...with a US company) ~100% remote — we're a remote...Remote workSeniorContract workTrial periodFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Inference Engineer llama.cpp specialist 100% Remote. Be the first to apply!
- machine learning ai engineer United States
- senior ai engineer United States
- ai engineer remote United States
- ai ml engineer United States
- ai engineer United States
- ai developer United States
- ai research engineer United States
- ai prompt engineer United States
- vetting specialist United States
- protection specialist United States

