Local AI coding

Local LLM inference on macOS Apple Silicon (M-series) — no API keys, no rate limits, no cloud.

Overview

Two inference backends, one coding agent layer:

Tool	Format	Server	When to use
Ollama	GGUF	`localhost:11434/v1` (OpenAI-compatible)	Primary backend — always-on LaunchAgent, large models, multi-client
mlx-lm	MLX	`localhost:8080/v1` (OpenAI-compatible)	On-demand — faster on Apple Silicon Metal, but tool calling broken (PR #1027)
OpenCode	—	TUI client	Primary coding agent (full agentic loop with tools)
aider	—	CLI client	Quick edits, one-shot diffs, git integration

Ollama and mlx-lm use different model formats (GGUF vs MLX) and store files separately. They cannot share downloads.

Quick start

Pull a model and run OpenCode:

# Pull a model (GGUF, stored at ~/.ollama/models)
ollama pull qwen3-coder:30b

# Start OpenCode (auto-connects to Ollama at localhost:11434)
opencode

# Or use aider
aider

Ollama

Installed via Homebrew (brew "ollama" in packages/Brewfile). On macOS, managed as a LaunchAgent — starts at login, accessible at http://127.0.0.1:11434.

# Check running models
ollama list

# Pull a model
ollama pull llama3.3:70b

# Run a quick test
ollama run qwen3-coder:30b "hello"

# API endpoint (OpenAI-compatible)
curl http://localhost:11434/v1/models

Context windows

Ollama’s default context window (4096 tokens) is too small for agentic tool-use loops — the system prompt + tool schemas + conversation history fill the window immediately.

install/opencode.sh creates context-boosted model aliases automatically:

Alias	Base model	Context	Memory (weights + KV)
`qwen3-coder:30b-ctx256k`	`qwen3-coder:30b`	256K	~78 GB
`llama3.3:70b-ctx128k`	`llama3.3:70b`	128K	~83 GB
`gpt-oss:20b-ctx128k`	`gpt-oss:20b`	128K	~39 GB
`qwen2.5-coder:7b-ctx128k`	`qwen2.5-coder:7b`	128K	~20 GB

These fit comfortably on an M3 Max 128 GB (unified memory — no CPU/GPU split, Metal accesses all of it).

gpt-oss:120b is excluded — confirmed Ollama hang bug with large num_ctx for that model.

To recreate aliases after pulling new models:

bash ~/dotfiles/install/opencode.sh

Model storage

Ollama stores models at ~/.ollama/models (managed by the Ollama app — not redirected by dotfiles). On a shared NFS home, point it at scratch if needed:

OLLAMA_MODELS=/scratch/$USER/ollama/models ollama pull qwen3-coder:30b

Installed via pip (mlx-lm in packages/pip.txt, tagged # macos-only). Apple Silicon only — runs on Metal, skips CPU. Skipped automatically on Linux by install/python.sh. Not started automatically — launch on demand.

# Start the server on localhost:8080
mlx_lm.server --model mlx-community/Qwen3-30B-A3B-4bit --port 8080

# Models are stored at $HF_HOME (~/.local/$PLAT/.cache/huggingface)

Note: Tool calling in mlx_lm.server is currently broken upstream (draft fix in PR #1027). Until that merges, use Ollama for agentic workflows. mlx-lm is useful for fast one-shot generation.

HF_HOME is set by .zprofile to $_LOCAL_PLAT/.cache/huggingface — model weights go to scratch if scratch is configured, never polluting NFS home quotas.

OpenCode

Installed via Homebrew (brew "opencode" in packages/Brewfile). TUI coding agent that runs a full agentic loop with file read/write/edit tools. Config at ~/.config/opencode/opencode.json (deployed by chezmoi).

opencode          # launch in current directory
opencode --help   # options

The default model is qwen3-coder:30b-ctx256k (Ollama). Switch models inside the TUI with /model.

OpenCode does not auto-detect OLLAMA_HOST — the provider is configured explicitly in opencode.json with baseURL: "http://127.0.0.1:11434/v1".

To add a new model to the OpenCode model list, edit home/dot_config/opencode/opencode.json and run chezmoi apply. If the model needs a context-boosted alias, add it to install/opencode.sh and re-run it.

aider

Installed via pip (aider-chat in packages/pip.txt, tagged # python=3.12 because scipy has no wheels for Python 3.14+). Config at ~/.aider.conf.yml (deployed by chezmoi as a template):

macOS: defaults to ollama/qwen3-coder:30b-ctx256k (local inference)
Linux: empty config — falls through to ANTHROPIC_API_KEY or an explicit --model flag

aider                                    # use default model from ~/.aider.conf.yml
aider --model ollama/llama3.3:70b-ctx128k  # override model
aider --model anthropic/claude-opus-4   # use Anthropic API (needs ANTHROPIC_API_KEY)

aider has git integration built in — it commits changes automatically with descriptive messages.

run_onchange hooks

chezmoi re-runs the relevant install scripts automatically when tracked files change:

Trigger file	Script re-run
`packages/pip.txt`	`install/local-llm.sh` (verifies mlx-lm/aider binaries)
`home/dot_config/opencode/opencode.json`	`install/opencode.sh` (recreates context aliases)

This means chezmoi update after pulling dotfile changes will re-verify the local LLM setup and recreate any missing model aliases.

Keyboard shortcuts

cade's setup