Files
neovim_config/docs/OLLAMA_SETUP.md
2026-02-13 08:49:20 -06:00

5.7 KiB

CodeCompanion + Ollama Setup Guide

This guide explains how to use Ollama with CodeCompanion across your network via Tailscale.

Overview

Your CodeCompanion configuration now supports both Claude (via Anthropic API) and Ollama models. You can:

  • Use Ollama locally on your main machine
  • Access Ollama from other machines on your network via Tailscale
  • Switch between Claude and Ollama models seamlessly

Prerequisites

On Your Ollama Server Machine

  1. Install Ollama (if not already done)

    curl -fsSL https://ollama.ai/install.sh | sh
    
  2. Start Ollama with network binding

    By default, Ollama only listens on localhost:11434. To access it from other machines, you need to expose it to your network:

    # Option 1: Run Ollama with network binding (temporary)
    OLLAMA_HOST=0.0.0.0:11434 ollama serve
    
    # Option 2: Set it permanently in systemd (recommended)
    sudo systemctl edit ollama
    

    Add this to the systemd service file:

    [Service]
    Environment="OLLAMA_HOST=0.0.0.0:11434"
    

    Then restart:

    sudo systemctl restart ollama
    
  3. Pull a model (if not already done)

    ollama pull mistral
    # Or try other models:
    # ollama pull neural-chat
    # ollama pull dolphin-mixtral
    # ollama pull llama2
    
  4. Find your Tailscale IP

    tailscale ip -4
    # Output example: 100.123.45.67
    

Configuration

On Your Main Machine (with Ollama)

Default behavior: The config will use http://localhost:11434 automatically.

To override, set the environment variable:

export OLLAMA_ENDPOINT="http://localhost:11434"

On Other Machines (without Ollama)

Set the OLLAMA_ENDPOINT environment variable to point to your Ollama server's Tailscale IP:

export OLLAMA_ENDPOINT="http://100.123.45.67:11434"

Make it persistent by adding to your shell config (~/.zshrc, ~/.bashrc, etc.):

export OLLAMA_ENDPOINT="http://100.123.45.67:11434"

Usage

Keymaps

  • <leader>cll - Toggle chat with Ollama (normal and visual modes)
  • <leader>cc - Toggle chat with Claude Haiku (default)
  • <leader>cs - Toggle chat with Claude Sonnet
  • <leader>co - Toggle chat with Claude Opus
  • <leader>ca - Show CodeCompanion actions
  • <leader>cm - Show current model

Switching Models

You can also use the :CodeCompanionSwitchModel command:

:CodeCompanionSwitchModel haiku
:CodeCompanionSwitchModel sonnet
:CodeCompanionSwitchModel opus

To add Ollama to this command, you would need to extend the configuration.

Troubleshooting

"Connection refused" error

Problem: You're getting connection errors when trying to use Ollama.

Solutions:

  1. Verify Ollama is running: curl http://localhost:11434/api/tags
  2. Check if it's bound to the network: sudo netstat -tlnp | grep 11434
  3. Verify Tailscale connectivity: ping 100.x.x.x (use the Tailscale IP)
  4. Check firewall: sudo ufw status (if using UFW)

"Model not found" error

Problem: The model you specified doesn't exist on the Ollama server.

Solution:

  1. List available models: curl http://localhost:11434/api/tags
  2. Pull the model: ollama pull mistral
  3. Update the default model in lua/shelbybark/plugins/codecompanion.lua if needed

Slow responses

Problem: Responses are very slow.

Causes & Solutions:

  1. Network latency: Tailscale adds minimal overhead, but check your network
  2. Model size: Larger models (7B+) are slower. Try smaller models like mistral or neural-chat
  3. Server resources: Check CPU/RAM on the Ollama server with top or htop

Tailscale not connecting

Problem: Can't reach the Ollama server via Tailscale IP.

Solutions:

  1. Verify Tailscale is running: tailscale status
  2. Check both machines are on the same Tailscale network
  3. Verify the Tailscale IP is correct: tailscale ip -4
  4. Check firewall rules on the Ollama server
Model Size Speed Quality Best For
mistral 7B Fast Good General coding
neural-chat 7B Fast Good Chat/conversation
dolphin-mixtral 8x7B Slower Excellent Complex tasks
llama2 7B/13B Medium Good General purpose
orca-mini 3B Very Fast Fair Quick answers

Advanced Configuration

Custom Model Selection

To change the default Ollama model, edit lua/shelbybark/plugins/codecompanion.lua:

schema = {
    model = {
        default = "neural-chat",  -- Change this to your preferred model
    },
},

Multiple Ollama Servers

If you have multiple Ollama servers, you can create multiple adapters:

ollama_main = function()
    return require("codecompanion.adapters").extend("ollama", {
        env = { url = "http://100.123.45.67:11434" },
        schema = { model = { default = "mistral" } },
    })
end,
ollama_backup = function()
    return require("codecompanion.adapters").extend("ollama", {
        env = { url = "http://100.123.45.68:11434" },
        schema = { model = { default = "neural-chat" } },
    })
end,

Then add keymaps for each.

Performance Tips

  1. Use smaller models for faster responses (mistral, neural-chat)
  2. Run Ollama on a machine with good specs (8GB+ RAM, modern CPU)
  3. Keep Tailscale updated for best network performance
  4. Monitor network latency with ping to your Ollama server
  5. Consider running Ollama on GPU if available for faster inference

References