llama.cpp
The C++ inference engine under most of the local stack
Georgi Gerganov's llama.cpp is the lingua franca of local inference — pure C/C++ with CUDA / Metal / OpenCL / Vulkan backends, GGUF model format, and a wire-compatible server mode. Most desktop wrappers (Ollama, LM Studio, KoboldCPP, etc.) ship it under the hood.
Report an issue with llama.cpp
Posts to your status feed
Pick the closest match below, edit the body, and post. Your report carries the #llama-cpp tag automatically so it surfaces here + in the trending-tags rail.
