Skip to main content
Ori supports Ollama for fully local, offline AI. No API key, no internet connection, no cost — everything runs on your hardware.

Setup

1

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows — download from ollama.com
2

Pull a model

ollama pull llama3.1
Other good options: mistral, codellama, phi3, gemma2
3

Use in Ori

Ori auto-detects Ollama at localhost:11434. No configuration needed — just switch to an Ollama model in the model selector.
ModelSizeBest for
llama3.1:8b4.7 GBGeneral use, good balance
llama3.1:70b40 GBBest quality (needs 48GB+ RAM)
codellama:13b7.4 GBCode generation and analysis
mistral:7b4.1 GBFast, good for simple tasks
phi3:medium7.9 GBStrong reasoning for its size

Custom Ollama URL

If Ollama runs on a different machine or port, configure it in Settings or in ~/.ori/config.json:
{
  "ollamaUrl": "http://192.168.1.100:11434"
}

Hardware requirements

SetupRAMGPU VRAMExperience
Minimum (7B model)8 GB6 GBUsable, slower responses
Recommended (13B model)16 GB8 GBGood quality and speed
Ideal (70B model)64 GB24 GB+Near-cloud quality
Ollama models run on your CPU/GPU. Response speed depends on your hardware. For the fastest experience, use a cloud provider (Anthropic, OpenAI, Google) — for maximum privacy, use Ollama.

Limitations

Local models are powerful but have some limitations compared to cloud models:
  • Smaller context windows — typically 4K-8K tokens vs 100K+ for Claude
  • Lower reasoning quality — for complex multi-step tasks, cloud models perform better
  • No vision — most local models can’t analyze screenshots (computer use works but visual understanding is limited)
  • Slower — response time depends on your hardware
Best of both worlds: Use Ollama for private, everyday tasks and switch to Anthropic/OpenAI for complex reasoning. Ori lets you switch models per-conversation.