5.7 KiB
CodeCompanion + Ollama Setup Guide
This guide explains how to use Ollama with CodeCompanion across your network via Tailscale.
Overview
Your CodeCompanion configuration now supports both Claude (via Anthropic API) and Ollama models. You can:
- Use Ollama locally on your main machine
- Access Ollama from other machines on your network via Tailscale
- Switch between Claude and Ollama models seamlessly
Prerequisites
On Your Ollama Server Machine
-
Install Ollama (if not already done)
curl -fsSL https://ollama.ai/install.sh | sh -
Start Ollama with network binding
By default, Ollama only listens on
localhost:11434. To access it from other machines, you need to expose it to your network:# Option 1: Run Ollama with network binding (temporary) OLLAMA_HOST=0.0.0.0:11434 ollama serve # Option 2: Set it permanently in systemd (recommended) sudo systemctl edit ollamaAdd this to the systemd service file:
[Service] Environment="OLLAMA_HOST=0.0.0.0:11434"Then restart:
sudo systemctl restart ollama -
Pull a model (if not already done)
ollama pull mistral # Or try other models: # ollama pull neural-chat # ollama pull dolphin-mixtral # ollama pull llama2 -
Find your Tailscale IP
tailscale ip -4 # Output example: 100.123.45.67
Configuration
On Your Main Machine (with Ollama)
Default behavior: The config will use http://localhost:11434 automatically.
To override, set the environment variable:
export OLLAMA_ENDPOINT="http://localhost:11434"
On Other Machines (without Ollama)
Set the OLLAMA_ENDPOINT environment variable to point to your Ollama server's Tailscale IP:
export OLLAMA_ENDPOINT="http://100.123.45.67:11434"
Make it persistent by adding to your shell config (~/.zshrc, ~/.bashrc, etc.):
export OLLAMA_ENDPOINT="http://100.123.45.67:11434"
Usage
Keymaps
<leader>cll- Toggle chat with Ollama (normal and visual modes)<leader>cc- Toggle chat with Claude Haiku (default)<leader>cs- Toggle chat with Claude Sonnet<leader>co- Toggle chat with Claude Opus<leader>ca- Show CodeCompanion actions<leader>cm- Show current model
Switching Models
You can also use the :CodeCompanionSwitchModel command:
:CodeCompanionSwitchModel haiku
:CodeCompanionSwitchModel sonnet
:CodeCompanionSwitchModel opus
To add Ollama to this command, you would need to extend the configuration.
Troubleshooting
"Connection refused" error
Problem: You're getting connection errors when trying to use Ollama.
Solutions:
- Verify Ollama is running:
curl http://localhost:11434/api/tags - Check if it's bound to the network:
sudo netstat -tlnp | grep 11434 - Verify Tailscale connectivity:
ping 100.x.x.x(use the Tailscale IP) - Check firewall:
sudo ufw status(if using UFW)
"Model not found" error
Problem: The model you specified doesn't exist on the Ollama server.
Solution:
- List available models:
curl http://localhost:11434/api/tags - Pull the model:
ollama pull mistral - Update the default model in
lua/shelbybark/plugins/codecompanion.luaif needed
Slow responses
Problem: Responses are very slow.
Causes & Solutions:
- Network latency: Tailscale adds minimal overhead, but check your network
- Model size: Larger models (7B+) are slower. Try smaller models like
mistralorneural-chat - Server resources: Check CPU/RAM on the Ollama server with
toporhtop
Tailscale not connecting
Problem: Can't reach the Ollama server via Tailscale IP.
Solutions:
- Verify Tailscale is running:
tailscale status - Check both machines are on the same Tailscale network
- Verify the Tailscale IP is correct:
tailscale ip -4 - Check firewall rules on the Ollama server
Recommended Models for CodeCompanion
| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
| mistral | 7B | Fast | Good | General coding |
| neural-chat | 7B | Fast | Good | Chat/conversation |
| dolphin-mixtral | 8x7B | Slower | Excellent | Complex tasks |
| llama2 | 7B/13B | Medium | Good | General purpose |
| orca-mini | 3B | Very Fast | Fair | Quick answers |
Advanced Configuration
Custom Model Selection
To change the default Ollama model, edit lua/shelbybark/plugins/codecompanion.lua:
schema = {
model = {
default = "neural-chat", -- Change this to your preferred model
},
},
Multiple Ollama Servers
If you have multiple Ollama servers, you can create multiple adapters:
ollama_main = function()
return require("codecompanion.adapters").extend("ollama", {
env = { url = "http://100.123.45.67:11434" },
schema = { model = { default = "mistral" } },
})
end,
ollama_backup = function()
return require("codecompanion.adapters").extend("ollama", {
env = { url = "http://100.123.45.68:11434" },
schema = { model = { default = "neural-chat" } },
})
end,
Then add keymaps for each.
Performance Tips
- Use smaller models for faster responses (mistral, neural-chat)
- Run Ollama on a machine with good specs (8GB+ RAM, modern CPU)
- Keep Tailscale updated for best network performance
- Monitor network latency with
pingto your Ollama server - Consider running Ollama on GPU if available for faster inference