LLM Model Database

45 models · updated 2026

DeepSeek R1 1.5B 1GB+

Smallest DeepSeek R1 distill — strong reasoning in an ultra-tiny package.

reasoningcoding
Gemma 3 1B 1GB+

Google's smallest Gemma 3 model, ideal for resource-constrained devices.

chat
Llama 3.2 1B 1GB+

Meta's smallest Llama 3.2, great for edge devices and CPU-only setups.

chat
Qwen3 0.6B 1GB+

Ultra-lightweight Qwen3 for fast on-device chat on any hardware.

chatmultilingual
Qwen3 1.7B 1GB+

Compact Qwen3 model for quick chat and light coding tasks with minimal VRAM.

chatcodingmultilingual
Gemma 2 2B 2GB+

Google Gemma 2 2B — ultra-fast lightweight model for quick responses.

chat
Llama 3.2 3B 2GB+

Llama 3.2 3B delivers solid instruction-following on budget hardware.

chatcoding
Gemma 3 4B 3GB+

Capable Gemma 3 4B model with vision support for versatile local use.

chatcoding
Phi 3.5 Mini 3GB+

Microsoft Phi 3.5 Mini — punches above its weight in reasoning and STEM.

reasoningcodingchat
Phi-4 Mini 3GB+

Small Phi model that runs well on 4GB-class GPUs and laptops.

chatcoding
Qwen3 4B 3GB+

Balanced Qwen3 model for everyday chat and coding on entry-level GPUs.

chatcodingmultilingual
Code Llama 7B 5GB+

Meta Code Llama 7B — specialized code generation and completion model.

coding
DeepSeek R1 7B 5GB+

DeepSeek R1 7B distill delivers chain-of-thought reasoning on consumer GPUs.

reasoningcodingresearch
Llama 3.1 8B 5GB+

Meta Llama 3.1 8B — the go-to all-rounder for 8GB+ GPUs.

chatcodingreasoning
Mistral 7B 5GB+

Fast and lightweight Mistral model for laptops and lower-VRAM desktops.

chatcreativecoding
Qwen 2.5 7B 5GB+

Qwen 2.5 7B — multilingual powerhouse for coding and chat on 8GB GPUs.

chatcodingmultilingual
Qwen 2.5 Coder 7B 5GB+

Best-in-class coding model for 8GB GPUs — excels at HumanEval benchmarks.

coding
DeepSeek R1 6GB+

Reasoning-focused distilled DeepSeek model with strong coding and math performance.

codingresearchreasoning
Gemma 2 9B 6GB+

Google Gemma 2 9B — highly capable conversation and analysis model.

chatresearch
Qwen3 8B 6GB+

Fast general-purpose Qwen3 model for everyday chat and coding on 6-8GB GPUs.

chatcodingmultilingual
Mistral NeMo 12B 8GB+

Mistral NeMo 12B with 128K context — great for long-document analysis.

chatcoding
DeepSeek R1 14B 9GB+

DeepSeek R1 14B distill — strong reasoning performance on 12GB GPUs.

reasoningcodingresearch
Gemma 3 12B 9GB+

Google's multimodal Gemma 3 mid-size model, practical on 10GB+ GPUs.

chatcreativeresearch
Qwen 2.5 14B 9GB+

Qwen 2.5 14B delivers GPT-3.5-level quality on 12GB consumer GPUs.

chatcodingmultilingual
Qwen 2.5 Coder 14B 9GB+

Near-GPT-4 code quality locally — top pick for developers on 12GB+ GPUs.

codingreasoning
DeepSeek Coder V2 10GB+

DeepSeek Coder V2 MoE — advanced code intelligence rivaling GPT-4 Code Interpreter.

codingreasoning
Phi-4 10GB+

Microsoft's 14B Phi model with strong coding quality for mid-range GPUs.

codingresearch
Qwen3 14B 10GB+

Qwen3 14B — best multilingual model for 12GB VRAM setups.

chatcodingmultilingualreasoning
Mistral Small 3.1 14GB+

Mistral Small 3.1 — high-quality instruction model for 16GB+ setups.

chatcodingreasoning
Gemma 3 27B 18GB+

Google's multimodal Gemma 3 large model with 128K context and image input.

chatresearchcreative
Gemma 4 26B 18GB+

Google's Gemma 4 26B multimodal MoE model. The Ollama q4 tag is 18GB, the bf16 variant is 52GB, and it supports text + image input with a 256K context window.

chatcodingmultimodalresearch
DeepSeek R1 32B 19GB+

DeepSeek R1 32B distill — near-frontier reasoning on a single RTX 3090/4090.

reasoningcodingresearch
Qwen3 30B 19GB+

Sparse Qwen3 MoE model with 256K context. Only 3B parameters are active per token, so it runs much faster than a dense 30B on 24GB-class GPUs.

chatcodingmultilingualreasoning
Code Llama 34B 20GB+

Meta Code Llama 34B — top-tier code completion for 24GB GPUs.

coding
Qwen 2.5 32B 20GB+

Qwen 2.5 32B — strongest multilingual model fitting on a single 24GB GPU.

chatcodingmultilingual
Qwen 2.5 Coder 32B 20GB+

State-of-the-art local code model — competitive with GPT-4 on code benchmarks.

codingreasoning
Qwen3 32B 20GB+

Qwen3 32B — top-tier multilingual reasoning for 24GB workstation GPUs.

chatcodingmultilingualreasoning
Command R 35B 22GB+

Cohere Command R — optimized for RAG, long-context retrieval and research tasks.

researchchatreasoning
DeepSeek R1 70B 40GB+

DeepSeek R1 70B — strongest local reasoning model, requires server-grade GPU.

reasoningresearchcoding
DeepSeek V4 Flash 40GB+

DeepSeek's 2026 preview frontier model. In Ollama it is cloud-only as of May 2026, with a 284B MoE architecture, 13B active parameters per token, and a 1M-token context window.

codingresearchreasoningagentic
Llama 3.1 70B 40GB+

Llama 3.1 70B delivers frontier-quality reasoning and coding at full local control.

chatcodingresearchreasoning
Llama 3.3 70B 40GB+

Meta Llama 3.3 70B — near-GPT-4 quality, best on dual 3090s or cloud GPU.

chatcodingresearchreasoning
Qwen 2.5 72B 43GB+

Qwen 2.5 72B — multilingual excellence at the 70B scale for cloud deployments.

chatcodingmultilingualresearch
Llama 4 Scout 72GB+

Meta's multimodal long-context model for GPU servers and high-end multi-GPU setups.

chatresearch
Llama 4 Maverick 256GB+

Meta's larger multimodal Llama 4 model, best suited to cloud or datacenter GPUs.

chatresearchcreative