LLM Model Database

45 models · updated 2026

DeepSeek R1 1.5B 1GB+

Smallest DeepSeek R1 distill — strong reasoning in an ultra-tiny package.

reasoningcoding

Gemma 3 1B 1GB+

Google's smallest Gemma 3 model, ideal for resource-constrained devices.

Llama 3.2 1B 1GB+

Meta's smallest Llama 3.2, great for edge devices and CPU-only setups.

Qwen3 0.6B 1GB+

Ultra-lightweight Qwen3 for fast on-device chat on any hardware.

chatmultilingual

Qwen3 1.7B 1GB+

Compact Qwen3 model for quick chat and light coding tasks with minimal VRAM.

chatcodingmultilingual

Gemma 2 2B 2GB+

Google Gemma 2 2B — ultra-fast lightweight model for quick responses.

Llama 3.2 3B 2GB+

Llama 3.2 3B delivers solid instruction-following on budget hardware.

Gemma 3 4B 3GB+

Capable Gemma 3 4B model with vision support for versatile local use.

Phi 3.5 Mini 3GB+

Microsoft Phi 3.5 Mini — punches above its weight in reasoning and STEM.

reasoningcodingchat

Phi-4 Mini 3GB+

Small Phi model that runs well on 4GB-class GPUs and laptops.

Qwen3 4B 3GB+

Balanced Qwen3 model for everyday chat and coding on entry-level GPUs.

chatcodingmultilingual

Code Llama 7B 5GB+

Meta Code Llama 7B — specialized code generation and completion model.

DeepSeek R1 7B 5GB+

DeepSeek R1 7B distill delivers chain-of-thought reasoning on consumer GPUs.

reasoningcodingresearch

Llama 3.1 8B 5GB+

Meta Llama 3.1 8B — the go-to all-rounder for 8GB+ GPUs.

chatcodingreasoning

Mistral 7B 5GB+

Fast and lightweight Mistral model for laptops and lower-VRAM desktops.

chatcreativecoding

Qwen 2.5 7B 5GB+

Qwen 2.5 7B — multilingual powerhouse for coding and chat on 8GB GPUs.

chatcodingmultilingual

Qwen 2.5 Coder 7B 5GB+

Best-in-class coding model for 8GB GPUs — excels at HumanEval benchmarks.

DeepSeek R1 6GB+

Reasoning-focused distilled DeepSeek model with strong coding and math performance.

codingresearchreasoning

Gemma 2 9B 6GB+

Google Gemma 2 9B — highly capable conversation and analysis model.

Qwen3 8B 6GB+

Fast general-purpose Qwen3 model for everyday chat and coding on 6-8GB GPUs.

chatcodingmultilingual

Mistral NeMo 12B 8GB+

Mistral NeMo 12B with 128K context — great for long-document analysis.

DeepSeek R1 14B 9GB+

DeepSeek R1 14B distill — strong reasoning performance on 12GB GPUs.

reasoningcodingresearch

Gemma 3 12B 9GB+

Google's multimodal Gemma 3 mid-size model, practical on 10GB+ GPUs.

chatcreativeresearch

Qwen 2.5 14B 9GB+

Qwen 2.5 14B delivers GPT-3.5-level quality on 12GB consumer GPUs.

chatcodingmultilingual

Qwen 2.5 Coder 14B 9GB+

Near-GPT-4 code quality locally — top pick for developers on 12GB+ GPUs.

codingreasoning

DeepSeek Coder V2 10GB+

DeepSeek Coder V2 MoE — advanced code intelligence rivaling GPT-4 Code Interpreter.

codingreasoning

Phi-4 10GB+

Microsoft's 14B Phi model with strong coding quality for mid-range GPUs.

Qwen3 14B 10GB+

Qwen3 14B — best multilingual model for 12GB VRAM setups.

chatcodingmultilingualreasoning

Mistral Small 3.1 14GB+

Mistral Small 3.1 — high-quality instruction model for 16GB+ setups.

chatcodingreasoning

Gemma 3 27B 18GB+

Google's multimodal Gemma 3 large model with 128K context and image input.

chatresearchcreative

Gemma 4 26B 18GB+

Google's Gemma 4 26B multimodal MoE model. The Ollama q4 tag is 18GB, the bf16 variant is 52GB, and it supports text + image input with a 256K context window.

chatcodingmultimodalresearch

DeepSeek R1 32B 19GB+

DeepSeek R1 32B distill — near-frontier reasoning on a single RTX 3090/4090.

reasoningcodingresearch

Qwen3 30B 19GB+

Sparse Qwen3 MoE model with 256K context. Only 3B parameters are active per token, so it runs much faster than a dense 30B on 24GB-class GPUs.

chatcodingmultilingualreasoning

Code Llama 34B 20GB+

Meta Code Llama 34B — top-tier code completion for 24GB GPUs.

Qwen 2.5 32B 20GB+

Qwen 2.5 32B — strongest multilingual model fitting on a single 24GB GPU.

chatcodingmultilingual

Qwen 2.5 Coder 32B 20GB+

State-of-the-art local code model — competitive with GPT-4 on code benchmarks.

codingreasoning

Qwen3 32B 20GB+

Qwen3 32B — top-tier multilingual reasoning for 24GB workstation GPUs.

chatcodingmultilingualreasoning

Command R 35B 22GB+

Cohere Command R — optimized for RAG, long-context retrieval and research tasks.

researchchatreasoning

DeepSeek R1 70B 40GB+

DeepSeek R1 70B — strongest local reasoning model, requires server-grade GPU.

reasoningresearchcoding

DeepSeek V4 Flash 40GB+

DeepSeek's 2026 preview frontier model. In Ollama it is cloud-only as of May 2026, with a 284B MoE architecture, 13B active parameters per token, and a 1M-token context window.

codingresearchreasoningagentic

Llama 3.1 70B 40GB+

Llama 3.1 70B delivers frontier-quality reasoning and coding at full local control.

chatcodingresearchreasoning

Llama 3.3 70B 40GB+

Meta Llama 3.3 70B — near-GPT-4 quality, best on dual 3090s or cloud GPU.

chatcodingresearchreasoning

Qwen 2.5 72B 43GB+

Qwen 2.5 72B — multilingual excellence at the 70B scale for cloud deployments.

chatcodingmultilingualresearch

Llama 4 Scout 72GB+

Meta's multimodal long-context model for GPU servers and high-end multi-GPU setups.

Llama 4 Maverick 256GB+

Meta's larger multimodal Llama 4 model, best suited to cloud or datacenter GPUs.

chatresearchcreative