Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Inference Engineer llama.cpp specialist 100% Remote

Framework Ventures

About the job You'll work on the C++ layer that powers local AI, porting and enhancing inference engines like llama.cpp, ONNX and similar, to run efficiently on edge devices. Your focus is on the runtime: making models load faster, run leaner, and perform well across different hardware. You'll ensure that the inference layer is stable, optimized, and ready for integration with the rest of the stack. This role is for engineers who want to work close to the metal, enabling private and fast on-device AI without relying on cloud infrastructure. Responsibilities Work on deploying machine learning models to edge devices using the frameworks: llama.cpp, ggml, ONNX. Collaborate closely with researchers to assist in coding, training and transitioning models from research to production environments. Integrate AI features into existing products, enriching them with the latest advancements in machine learning. Qualifications Excellent programming skills in C++, experience in Javascript is a bonus Strong experience with Llama.cpp and ggml inference engines, which facilitates the deployment of models to specific GPU architectures Good understanding of deep learning concepts and model architectures Experience with transformers and LLMs Demonstrated ability to rapidly assimilate new technologies and techniques A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI & R&D #J-18808-Ljbffr

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer llama.cpp specialist 100% Remote in United States vacancy
  •  ...to edge devices using the frameworks: llama.cpp, ggml, onnx. Collaborate closely with researchers...  ...to production environments. Integrate AI features into existing products,...  ...experience with Llama.cpp and ggml inference engines, facilitating the deployment of models... 
    Remote work
    Senior

    Framework Ventures

    United States
    1 day ago
  •  ...Framework Ventures is seeking a C++ Engineer in Town of Norway, Wisconsin, specializing in AI development for edge devices. The role involves working on inference engines like llama.cpp and ONNX, ensuring optimized performance across hardware. You'll deploy machine learning... 
    Suggested

    Framework Ventures

    Waterford, WI
    10 hours ago
  •  ...Framework Ventures is seeking a C++ Engineer to work on the AI layer that powers local AI on edge devices...  ...models using frameworks like llama.cpp and ONNX, collaborating with researchers...  ...strong C++ skills and experience with inference engines, as well as a relevant degree... 
    Suggested
    Local area

    Framework Ventures

    Italian Republic
    2 days ago
  •  ...Job We are looking for an experienced AI Model Engineer with deep expertise in kernel development...  .... The engineer will extend the inference framework to support inference and fine...  ...model architectures (e.g., Qwen, Gemma, LLaMA, Falcon, etc.). Experience implementing... 
    Remote work
    Senior

    Framework Ventures

    New York, NY
    2 days ago
  • $199.7k - $254.6k

     ...Join Cisco's CX AI Incubation Team as a Senior AI/MLDevOpsEngineer...  ...collaborate with product and engineering teams to deploy...  ..., including on-prem inference packaging, runtime optimization...  ...,TensorRT-LLM, llama.cpp). ~...  ...attainment between 75% and 100%; and ~ Once... 
    Senior
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    14 hours ago
  •  ...A cloud technology company is looking for a Senior Engineer 2 to enhance their AI Inference Optimization team. In this role, you will drive architectural...  ...position offers competitive compensation and is fully remote, promoting a collaborative and innovative work environment... 
    Remote work
    Senior

    DigitalOcean

    Seattle, WA
    2 days ago
  •  ...TypeScript-Entwickler zur Entwicklung moderner Webapplikationen und zur aktiven Mitgestaltung an KI-Features. Die Position bietet 100% Remote-Arbeit, flexible Arbeitszeiten und überdurchschnittliche Vergütung. Geübte Kommunikation in Deutsch ist Voraussetzung, da die... 
    Remote work
    Senior
    Flexible hours

    dreifach.ai

    United States
    2 days ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Remote work
    Senior

    DigitalOcean

    San Francisco, CA
    4 days ago
  • $167.2k - $209k

    A pioneering cloud service provider in Seattle seeks a Senior Engineer 2 for its AI Inference Data Plane team. This role requires designing and delivering...  ...in GoLang or Python. Competitive salary range from $167,200 to $209,000 with remote work options. #J-18808-Ljbffr
    Remote work
    Senior

    DigitalOcean

    Seattle, WA
    4 days ago
  • $100k

     ...Senior Information Technology Purchasing Specialist Must have two years of information technology purchasing experience...  ...experience Pay up to $100,000 Must be a United States citizen...  ...citizen or Green Card holder Partial remote with at least Mondays and Fridays... 
    Remote work
    Senior
    Permanent employment
    Full time
    Work at office
    Relocation
    Monday to Friday

    MRINetwork

    United States
    1 day ago
  •  ...Owning the inference backbone for QVAC's local AI stack, the full-time AI Inference Engineer will work remotely to enhance C++ systems for efficient model deployment on edge devices...  ...to edge devices using frameworks like llama.cpp, ggml, and ONNX Collaborate with... 
    Remote work
    Full time
    Local area

    Virtual Vocations Inc

    United States
    18 hours ago
  • $152k - $241.5k

     ...recently, GPU deep learning ignited modern AI - the next era of computing - with the...  ...looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring software engineers for...  ...our DLC has been the backbone of NVIDIA's inference engine, spanning across data centers,... 
    Remote work
    Senior

    NVIDIA

    United States
    4 days ago
  •  ...Responsibilities Build AI applications using Llama (Llama 3 / Llama Stack / Llama API / local LLM inference) . Fine-tune and evaluate...  ...via quantization, prompt engineering, and latency reduction....  ...Experience with llama.cpp , vLLM , Ollama , or NVIDIA... 
    Remote work
    Local area

    Tranzeal

    United States
    3 days ago
  • $152k - $241.5k

    A leading technology company in Austin is seeking a Senior Compiler Engineer for their AI Inference Platforms team. The role involves analyzing deep learning networks and developing optimization algorithms, requiring expertise in compiler technologies. Ideal candidates... 
    Remote job
    Senior

    NVIDIA Corporation

    Austin, TX
    4 days ago
  •  ...worldwide. Responsibilities: Design and execute causal inference and incrementality experiments (GEO experiments, matched markets...  ...Partner with cross-functional stakeholders across regions and seniority levels Improve testing playbooks and ensure consistent... 
    Remote work
    Senior
    Worldwide

    Varite

    United States
    1 day ago
  •  ...About the job We’re seeking experienced AI infrastructure Engineers to design and implement robust, scalable pipelines for massive data workloads...  ...of data and model workflows from prototyping to inference. Qualifications Proficient in Python with strong programming... 
    Remote work
    Senior

    Framework Ventures

    United States
    1 day ago
  • $242k - $290k

     ...As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient...  ...CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution...  ...latency and maximize memory bandwidth on AI accelerators. Write production-level,... 
    Remote work
    Senior
    Temporary work
    Relocation package

    Zoox

    San Diego, CA
    1 day ago
  •  ...Senior Paid Ads Specialist / Manager (100% Remote) Greater Delhi Area We are looking for a Senior Paid Ads Specialist / Manager to join our team of high-performing paid marketing experts and strengthen our digital advertising efforts. If you thrive on running and... 
    Remote work
    Senior
    Flexible hours

    WIN Home Inspection

    United States
    17 hours ago
  •  ...a focused team of 3–5 senior engineers while remaining deeply...  ...performance, privacy-preserving AI models that run...  .... This is a Hybrid remote position located in a...  ...optimized for on-device inference (Mac, iOS, Android,...  ...CoreML, ONNX Runtime, or llama.cpp) ~ Experience... 
    Remote work
    Senior
    Relocation package

    McAfee

    San Jose, CA
    14 hours ago
  • $110k - $140k

     ...for enterprises and AI innovators around...  ...company. Vultr Cares 100% company‑paid...  ...year $500 stipend for remote office setup in first...  ...AI Platform Engineer to own the strategy...  ...experience deploying LLM inference infrastructure and...  ...open‑source models — Llama, Mistral, Qwen,... 
    Remote work
    Senior
    Work at office
    Immediate start
    Flexible hours

    Vultr

    Richmond, VA
    2 days ago
  • $160k - $190k

     ...Senior AI Engineer Paper is reimagining how schools support students so that every learner can...  ...and tool-using agents. Build scalable inference systems with strict latency and cost...  ...stipend to set-up your workspace and $100 monthly stipends to support with on-going... 
    Remote work
    Senior

    Softbank Investment Advisers

    United States
    17 hours ago
  •  ...Senior Software Engineer (AI Engineer) Portugal, Remote Who We Are At Fluxon, we believe that how you build matters...  ...ingestion, preprocessing, model inference, and output structuring...  ...-tune open-source models (e.g., Llama, Mistral) for specific domain tasks... 
    Remote work
    Senior
    Flexible hours

    Fluxon

    United States
    14 hours ago
  • $175k - $225k

     ...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla...  ...We're looking for an AI Inference Engineer who lives at the boundary...  ...autonomous navigation. Exposure to remote logging, log ingestion, and...  ...this role, but do not meet 100% of the qualifications... 
    Remote work
    Local area

    Sauron

    San Francisco, CA
    5 days ago
  •  ...Senior AI Engineer WongDoody creates human experiences at 22 studios across...  ..., we are team players and specialists - both in frontend and...  ...Flexible working hours and 100% overtime compensation ~...  ...ChatGPT, Claude, Gemini, or Llama into end-to-end workflow solutions... 
    Remote work
    Senior
    Work at office
    Local area
    Flexible hours

    WONGDOODY

    United States
    4 days ago
  • $165k - $220k

     ...Description:**DataRobot delivers AI that maximizes impact...  ...in the future. As an AI Engineer on our Professional...  ...you.**This is a fully remote position with no requirement...  ...as Langgraph, CrewAI, Llama Index* Generative AI:...  ...to jobs when they meet 100% of the qualifications... 
    Remote work
    Senior
    Full time
    Work at office
    Local area
    Worldwide
    Flexible hours

    DataRobot

    Oklahoma City, OK
    4 days ago
  • $140.4k

     ...Job Title: Senior AI Performance Engineer (CUDA / GPU / NVIDIA Stack) Duration:...  ...Min 12+ Months Location: 100% Remote This is a hands-on engineering...  ...tuning) Improve inference performance using...  ...models such as YOLO, GPT, LLaMA, Transformers Strong... 
    Remote work
    Senior
    Full time

    Brillfy Technology Inc

    United States
    1 day ago
  •  ...Our new initiative brings AI directly into this process...  ...works. We're looking for a senior machine learning engineer to take the lead on this...  ...now. Why join us? 100% remote based in the US Help shape...  ...APIs and probabilistic inference reliably Work alongside... 
    Remote work
    Senior
    Local area

    Jobot

    McLean, VA
    4 days ago
  • $50 - $60 per hour

     ...Application Management Specialist NTT DATA strives...  ...in New York/Dallas/Remote, New York (US-NY),...  ...Build agentic AI systems: Design and...  ...following MCP protocol. Engineer robust guardrails...  ...., OpenAI, Gemini, Llama, Qwen, Claude). ~...  ...the Fortune Global 100 and are committed... 
    Remote work
    Senior
    Hourly pay

    NTT DATA

    United States
    4 hours ago
  •  ...Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence...  ...us and help build the platform engineers turn to to ship AI products. THE...  ..., including meaningful equity. ~100% coverage of medical, dental, and... 
    Remote work
    Work experience placement
    Flexible hours

    Baseten

    United States
    1 day ago
  •  ...Senior ML/AI Engineer We're Sweed, a product-driven company...  ...Engineer to join our team remotely and help us build the...  ...-end engineers, QA specialists, analysts, and...  ...Design scalable APIs and inference services for AI-driven...  ...with a US company) ~100% remote — we're a remote... 
    Remote work
    Senior
    Contract work
    Trial period
    Flexible hours

    Sweed

    United States
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer llama.cpp specialist 100% Remote. Be the first to apply!