LLM Model Database
45 models · updated 2026
Smallest DeepSeek R1 distill — strong reasoning in an ultra-tiny package.
Google's smallest Gemma 3 model, ideal for resource-constrained devices.
Meta's smallest Llama 3.2, great for edge devices and CPU-only setups.
Ultra-lightweight Qwen3 for fast on-device chat on any hardware.
Compact Qwen3 model for quick chat and light coding tasks with minimal VRAM.
Google Gemma 2 2B — ultra-fast lightweight model for quick responses.
Llama 3.2 3B delivers solid instruction-following on budget hardware.
Capable Gemma 3 4B model with vision support for versatile local use.
Microsoft Phi 3.5 Mini — punches above its weight in reasoning and STEM.
Small Phi model that runs well on 4GB-class GPUs and laptops.
Balanced Qwen3 model for everyday chat and coding on entry-level GPUs.
Meta Code Llama 7B — specialized code generation and completion model.
DeepSeek R1 7B distill delivers chain-of-thought reasoning on consumer GPUs.
Meta Llama 3.1 8B — the go-to all-rounder for 8GB+ GPUs.
Fast and lightweight Mistral model for laptops and lower-VRAM desktops.
Qwen 2.5 7B — multilingual powerhouse for coding and chat on 8GB GPUs.
Best-in-class coding model for 8GB GPUs — excels at HumanEval benchmarks.
Reasoning-focused distilled DeepSeek model with strong coding and math performance.
Google Gemma 2 9B — highly capable conversation and analysis model.
Fast general-purpose Qwen3 model for everyday chat and coding on 6-8GB GPUs.
Mistral NeMo 12B with 128K context — great for long-document analysis.
DeepSeek R1 14B distill — strong reasoning performance on 12GB GPUs.
Google's multimodal Gemma 3 mid-size model, practical on 10GB+ GPUs.
Qwen 2.5 14B delivers GPT-3.5-level quality on 12GB consumer GPUs.
Near-GPT-4 code quality locally — top pick for developers on 12GB+ GPUs.
DeepSeek Coder V2 MoE — advanced code intelligence rivaling GPT-4 Code Interpreter.
Microsoft's 14B Phi model with strong coding quality for mid-range GPUs.
Qwen3 14B — best multilingual model for 12GB VRAM setups.
Mistral Small 3.1 — high-quality instruction model for 16GB+ setups.
Google's multimodal Gemma 3 large model with 128K context and image input.
Google's Gemma 4 26B multimodal MoE model. The Ollama q4 tag is 18GB, the bf16 variant is 52GB, and it supports text + image input with a 256K context window.
DeepSeek R1 32B distill — near-frontier reasoning on a single RTX 3090/4090.
Sparse Qwen3 MoE model with 256K context. Only 3B parameters are active per token, so it runs much faster than a dense 30B on 24GB-class GPUs.
Meta Code Llama 34B — top-tier code completion for 24GB GPUs.
Qwen 2.5 32B — strongest multilingual model fitting on a single 24GB GPU.
State-of-the-art local code model — competitive with GPT-4 on code benchmarks.
Qwen3 32B — top-tier multilingual reasoning for 24GB workstation GPUs.
Cohere Command R — optimized for RAG, long-context retrieval and research tasks.
DeepSeek R1 70B — strongest local reasoning model, requires server-grade GPU.
DeepSeek's 2026 preview frontier model. In Ollama it is cloud-only as of May 2026, with a 284B MoE architecture, 13B active parameters per token, and a 1M-token context window.
Llama 3.1 70B delivers frontier-quality reasoning and coding at full local control.
Meta Llama 3.3 70B — near-GPT-4 quality, best on dual 3090s or cloud GPU.
Qwen 2.5 72B — multilingual excellence at the 70B scale for cloud deployments.
Meta's multimodal long-context model for GPU servers and high-end multi-GPU setups.
Meta's larger multimodal Llama 4 model, best suited to cloud or datacenter GPUs.