OpenOri supports Ollama for fully local, offline AI. No API key, no internet connection, no cost — everything runs on your hardware.Documentation Index
Fetch the complete documentation index at: https://docs.openori.dev/llms.txt
Use this file to discover all available pages before exploring further.
Setup
Recommended models
| Model | Size | Best for |
|---|---|---|
llama3.1:8b | 4.7 GB | General use, good balance |
llama3.1:70b | 40 GB | Best quality (needs 48GB+ RAM) |
codellama:13b | 7.4 GB | Code generation and analysis |
mistral:7b | 4.1 GB | Fast, good for simple tasks |
phi3:medium | 7.9 GB | Strong reasoning for its size |
Custom Ollama URL
If Ollama runs on a different machine or port, configure it in Settings or in~/.ori/config.json:
Hardware requirements
| Setup | RAM | GPU VRAM | Experience |
|---|---|---|---|
| Minimum (7B model) | 8 GB | 6 GB | Usable, slower responses |
| Recommended (13B model) | 16 GB | 8 GB | Good quality and speed |
| Ideal (70B model) | 64 GB | 24 GB+ | Near-cloud quality |
Ollama models run on your CPU/GPU. Response speed depends on your hardware. For the fastest experience, use a cloud provider (Anthropic, OpenAI, Google) — for maximum privacy, use Ollama.
Limitations
Local models are powerful but have some limitations compared to cloud models:- Smaller context windows — typically 4K-8K tokens vs 100K+ for Claude
- Lower reasoning quality — for complex multi-step tasks, cloud models perform better
- No vision — most local models can’t analyze screenshots (computer use works but visual understanding is limited)
- Slower — response time depends on your hardware