Inference runtime

Ollama

One-line local model serving with smart quant manifests

macOS · Linux · WindowsMIT

Ollama bundles popular open-weight models behind a single CLI — `ollama run llama3` and the model is downloaded, quantized for your hardware, and exposed on a local API. Built atop llama.cpp with a manifest format that picks the right GGUF variant per machine.

Report an issue with Ollama

Posts to your status feed

Pick the closest match below, edit the body, and post. Your report carries the #ollama tag automatically so it surfaces here + in the trending-tags rail. Reporting also follows Ollama so you’ll get status updates.

Down Very Slow Hallucinating Refusing Prompts Rate-Limited Other

Ollama

Report an issue with Ollama

Release history

Recent coverage

Tag aliases

Tags1

Community tags

Edit history

Community status