Start with your VRAM, not the biggest model name
The fastest way to have a good first experience with local AI is to pick a model that actually fits your hardware.
- 4GB class GPUs: start with
phi4-mini - 6-8GB GPUs: start with
qwen3:8bormistral - 10-16GB GPUs: try
phi4,gemma3:12b, ordeepseek-r1 - 20GB+ GPUs: step up to
qwen3:30borgemma3:27b
If you pick a model that is too large, generation becomes slow or may fail to load entirely.
Install Ollama
Download Ollama from ollama.com and complete the installer for Windows, macOS, or Linux.
Good first commands
Open a terminal and run one of these:
ollama run phi4-mini
ollama run qwen3:8b
ollama run mistral
Ollama will download the selected model and open a local chat session.
When to use the matcher
Use LLM Matcher when you want a faster answer to questions like:
- Which models fit my exact GPU?
- What should I run on 8GB versus 12GB?
- When should I switch to a cloud GPU?
Start on the homepage, choose your GPU, and the site will rank practical options for your VRAM.