DocsLocal Models

Local Models

Run AI completely offline on your machine. Zero latency, full privacy, no API costs.

Ollama

Recommended

The easiest way to run local models. GRID auto-detects Ollama when running.

  1. 1Install from ollama.ai
  2. 2Run ollama serve in terminal
  3. 3Pull a model: ollama pull qwen2.5-coder
  4. 4GRID will auto-detect and list available models

Recommended Models

ModelSizeBest For
qwen2.5-coder:7b4.7 GBFast coding, autocomplete
codellama:13b7.4 GBBalanced quality/speed
deepseek-coder-v28.9 GBComplex reasoning

Other Options

LM Studio

GUI-based model manager. Set endpoint to localhost:1234.

vLLM

Production-grade serving. Use OpenAI-compatible endpoint.