What are the best local LLM tools to run AI on my own hardware in 2026?

Start with a runtime — Ollama is the fastest path for almost everyone, llama.cpp if you want the lowest-level control, vLLM if you need to serve multiple users. Layer a desktop app (LM Studio, Jan, GPT4All) or web UI (Open WebUI, LibreChat) on top for chat. Add a coding harness (Continue.dev, Cline, Aider) if you want autocomplete and chat in your IDE. Add a RAG system (AnythingLLM, PrivateGPT, Open WebUI built-in RAG) if you want to chat with your own documents. Beyond that the catalogue covers agent frameworks, voice and multimodal, mobile clients, and Obsidian/Logseq integrations. The full directory below lists 87 projects with licences and primary URLs. Pick a runtime first — Ollama (easiest), llama.cpp (foundational), or vLLM (multi-user serving) cover 95% of use cases.. Add a chat surface — LM Studio (best GUI), Jan (privacy-first), or Open WebUI (browser-based, popular self-host).. For coding: Continue.dev for autocomplete and chat, Cline for autonomous agent edits, Aider for git-native terminal work.. For RAG: AnythingLLM (easiest), PrivateGPT (offline-leaning), Open WebUI (built-in RAG inside the chat UI).. For mobile: MLC Chat or PocketPal AI on Android, Private LLM or Enchanted on iOS — all run actually-useful 2-4B models on flagship phones.

Local LLM Software Directory 2026: 70+ Tools to Run AI Locally

The local-LLM ecosystem in 2026 is large enough that picking the wrong tool first costs hours, not minutes. This directory catalogues 87 actively-maintained projects across nine layers — runtimes, desktop apps, web UIs, coding assistants, RAG systems, agent frameworks, voice and multimodal, mobile clients, and specialized productivity plugins — with the description, licence, and primary URL for each. Use it as the "what exists" map before you commit to a stack; every category ends with a link to the deeper PromptQuorum comparison guide for that layer.

Key Takeaways

Nine layers, 87 projects, one map. Runtimes, desktop apps, web UIs, coding assistants, RAG systems, agent frameworks, voice/multimodal, mobile clients, and specialized productivity plugins — almost every popular project in 2026 fits in exactly one layer.
Pick a runtime first. Ollama is the right default for ~95% of readers; llama.cpp is the foundational engine underneath most other tools; vLLM is the production-serving pick for multi-user setups.
Most layers above the runtime are optional. A desktop app OR a web UI is enough for chat. Add a coding harness only when you want IDE integration; add a RAG system only when you want to chat with your own documents; add an agent framework only when one-shot calls stop being enough.
Licence matters for commercial use. MIT and Apache 2.0 dominate the ecosystem. AGPL appears on a handful of UIs (text-generation-webui, KoboldCpp, Jan, SillyTavern) — fine for personal use, more deliberate for commercial deployments. The "License" column below names every one explicitly.
Multi-tool stacks are normal. Ollama + Open WebUI + AnythingLLM + Continue.dev is a single-machine setup that covers chat, RAG, and coding without compromise. The "Common Real-World Stacks" table below names the recipes that actually work in 2026.

1. Local LLM Runtimes & Inference Engines

A runtime is the engine that loads model weights into memory and turns prompts into tokens. It is the first decision in a local-LLM stack and the one that constrains everything above it — every desktop app, web UI, and coding harness ultimately calls a runtime. Ollama dominates user-facing share in 2026 because it ships an OpenAI-compatible API and a one-command install; llama.cpp is the C++ engine underneath most of the others; vLLM is the right pick when you need to serve concurrent users on a real GPU.

Tool	Link	Description	License
Ollama	ollama.com	Easiest overall — one-command install, OpenAI-compatible API, huge model library	MIT
llama.cpp	github.com/ggml-org/llama.cpp	Foundational C++ engine behind most other tools, runs anywhere including Apple Silicon	MIT
vLLM	github.com/vllm-project/vllm	High-throughput serving for multi-user GPU deployments	Apache 2.0
LocalAI	localai.io	Drop-in OpenAI API replacement supporting multiple backends	MIT
TensorRT-LLM	github.com/NVIDIA/TensorRT-LLM	NVIDIA-optimized inference for enterprise GPU rigs	Apache 2.0
MLC LLM	mlc.ai/mlc-llm	Mobile and edge device deployment runtime	Apache 2.0
SGLang	github.com/sgl-project/sglang	Structured inference serving for agent pipelines	Apache 2.0
ExLlamaV2	github.com/turboderp-org/exllamav2	Fast quantized inference optimized for RTX GPUs	MIT
KoboldCpp	github.com/LostRuins/koboldcpp	Lightweight llama.cpp wrapper with built-in UI	AGPL 3.0
Llamafile	github.com/Mozilla-Ocho/llamafile	Single-file portable LLM execution by Mozilla	Apache 2.0
MLX-LM	github.com/ml-explore/mlx-examples	Apple Silicon-native runtime by Apple research	MIT

Deeper guide: llama.cpp vs Ollama vs vLLM

2. Desktop GUI Apps

Desktop apps wrap a runtime in a chat interface and a model browser. They are where most non-technical users start because there is no terminal step — download, click, chat. LM Studio, Jan, and GPT4All hold the bulk of the user base in 2026; AnythingLLM doubles as a desktop app and a RAG layer; Open Interpreter is the outlier that lets a local model drive your computer.

Tool	Link	Description	License
LM Studio	lmstudio.ai	Most polished GUI, built-in HuggingFace model browser, server mode	Free (closed)
Jan	jan.ai	Privacy-focused offline ChatGPT clone, fully open-source	AGPL 3.0
GPT4All	nomic.ai/gpt4all	Beginner-friendly with strong CPU-only support	MIT
AnythingLLM	anythingllm.com	RAG and document chat with built-in vector store	MIT
Msty	msty.app	Clean consumer UX, multi-provider support	Free (closed)
Cherry Studio	cherry-ai.com	Multi-provider desktop AI with extensive customization	Apache 2.0
Faraday	faraday.dev	Character chat and roleplay desktop client	Free (closed)
Enchanted	enchantedlabs.ai	Native macOS/iOS minimal Ollama client	MIT
h2oGPT	github.com/h2oai/h2ogpt	Enterprise-feature-heavy desktop and server	Apache 2.0
Open Interpreter	github.com/OpenInterpreter/open-interpreter	Lets local LLM control your computer and execute code	AGPL 3.0

Deeper guide: LM Studio vs Jan vs GPT4All

3. Web UIs & Browser Frontends

Web UIs are self-hosted ChatGPT clones — same conversational surface, but you point them at a runtime running on your own machine or LAN. They are the natural choice when you want multi-device access (laptop, phone, tablet hitting one server) or team usage. Open WebUI dominates the self-hosted segment in 2026, with LibreChat as the team-features alternative and SillyTavern as the dedicated roleplay UI.

Tool	Link	Description	License
Open WebUI	openwebui.com	Most popular self-hosted ChatGPT-like UI with built-in RAG	BSD 3-Clause
LibreChat	librechat.ai	Multi-model ChatGPT alternative with team features	MIT
text-generation-webui	github.com/oobabooga/text-generation-webui	Power-user UI with extensive plugin ecosystem	AGPL 3.0
SillyTavern	github.com/SillyTavern/SillyTavern	Roleplay and character chat with lorebooks	AGPL 3.0
LobeChat	lobehub.com	Modern polished UI with plugin marketplace	MIT
Big-AGI	github.com/enricoros/big-AGI	Advanced multi-provider frontend with personas	MIT
NextChat	github.com/ChatGPTNextWeb/NextChat	Lightweight web chat, simple deployment	MIT
Page Assist	github.com/n4ze3m/page-assist	Browser sidebar AI for Chrome and Firefox	MIT
Chatbox	chatboxai.app	Cross-platform desktop and web client	GPLv3

Deeper guide: SillyTavern vs Agnai vs RisuAI

4. Coding Assistants & IDE Integrations

Coding assistants connect a local LLM to your editor or terminal via OpenAI-compatible APIs. The choice is mostly about workflow primitive: autocomplete-in-editor (Continue.dev), autonomous agent edits (Cline, OpenHands), or git-native diff edits at the terminal (Aider). All three patterns work against any runtime that speaks the OpenAI Chat Completions protocol — Ollama is the most common backend in 2026.

Tool	Link	Description	License
Continue.dev	continue.dev	VS Code and JetBrains autocomplete and chat with local models	Apache 2.0
Aider	aider.chat	Terminal pair programmer with multi-file edit support	Apache 2.0
Cline	cline.bot	Autonomous coding agent for VS Code	Apache 2.0
Tabby	tabby.tabbyml.com	Self-hosted GitHub Copilot alternative	Apache 2.0
CodeGPT	codegpt.co	IDE integrations across multiple editors	MIT
OpenHands	github.com/All-Hands-AI/OpenHands	AI software engineer agent (formerly OpenDevin)	MIT
Cursor (local mode)	cursor.com	AI-first code editor with local model support	Free (closed)
Twinny	github.com/twinnydotdev/twinny	Free Copilot alternative for VS Code	MIT

Deeper guide: Continue.dev vs Cline vs Aider

5. RAG & Document Chat Systems

RAG (Retrieval-Augmented Generation) systems combine a local LLM with an embedding model and a vector store so the model can answer from your own documents.** The split is between turn-key apps (AnythingLLM, PrivateGPT, Quivr, Khoj) that "just work" and framework libraries (LlamaIndex, Haystack, txtai) that you build on. RAGFlow has gained share in 2026 specifically for documents that need citation-grade retrieval.

Tool	Link	Description	License
AnythingLLM	anythingllm.com	Easiest all-in-one personal RAG with workspaces	MIT
PrivateGPT	github.com/zylon-ai/private-gpt	Fully offline enterprise-leaning RAG	Apache 2.0
Quivr	github.com/QuivrHQ/quivr	Self-hosted personal knowledge assistant	Apache 2.0
Khoj	khoj.dev	Personal AI second brain, syncs with Obsidian and Notion	AGPL 3.0
Dify	dify.ai	AI workflow builder with RAG and agent support	Modified Apache 2.0
Flowise	flowiseai.com	Visual LangChain workflow builder	Apache 2.0
Langflow	langflow.org	Visual AI orchestration with RAG components	MIT
LlamaIndex	llamaindex.ai	RAG framework / Python library — foundation for custom builds	MIT
Haystack	haystack.deepset.ai	Search and RAG framework by deepset	Apache 2.0
RAGFlow	ragflow.io	Deep document understanding for RAG with citation extraction	Apache 2.0
txtai	github.com/neuml/txtai	Embedded vector + LLM database in one library	Apache 2.0

Deeper guide: AnythingLLM vs PrivateGPT vs Open WebUI

6. Agent Frameworks & Orchestration

Agent frameworks turn one-shot LLM calls into multi-step workflows — plan, act, observe, repeat. LangChain remains the general-purpose default; CrewAI and AutoGen specialise in role-based multi-agent setups; LangGraph is the right pick when state management matters across long-running flows. All eight frameworks below run cleanly against a local Ollama backend.

Tool	Link	Description	License
LangChain	langchain.com	General-purpose LLM application framework	MIT
LlamaIndex	llamaindex.ai	RAG-focused agent and data framework	MIT
CrewAI	crewai.com	Multi-agent role-based workflows	MIT
AutoGen	github.com/microsoft/autogen	Microsoft multi-agent orchestration framework	CC-BY-4.0 / MIT
Semantic Kernel	learn.microsoft.com/semantic-kernel	Microsoft enterprise orchestration SDK in C#/Python/Java	MIT
LangGraph	langchain-ai.github.io/langgraph	Stateful graph-based agent workflows	MIT
Letta (formerly MemGPT)	letta.com	Long-term memory agents	Apache 2.0
Pydantic AI	ai.pydantic.dev	Type-safe agent framework built on Pydantic	MIT

Deeper guide: Local AI Agents With MCP

7. Voice, Speech & Multimodal

Voice and multimodal stacks extend a local LLM beyond text — speech in (STT), speech out (TTS), and vision. Whisper.cpp and faster-whisper own the local STT layer; Piper and Coqui share the TTS layer with XTTS v2 dominating voice cloning; LLaVA and Ollama vision models cover the vision side. A fully-offline voice assistant is buildable from this layer plus a small chat model.

Tool	Link	Description	License
Whisper.cpp	github.com/ggerganov/whisper.cpp	Local speech recognition, runs on CPU or GPU	MIT
faster-whisper	github.com/SYSTRAN/faster-whisper	Fast Whisper transcription via CTranslate2	MIT
Piper TTS	github.com/rhasspy/piper	Lightweight local text-to-speech	MIT
Coqui TTS	coqui.ai	Open-source voice synthesis with multiple model options	MPL 2.0
XTTS v2	docs.coqui.ai/en/latest/models/xtts.html	Voice cloning with multilingual support	CPML
Bark	github.com/suno-ai/bark	Generative voice with non-speech sounds	MIT
StyleTTS 2	github.com/yl4579/StyleTTS2	High-quality natural-sounding TTS	MIT
LLaVA	llava-vl.github.io	Local vision + language model	Apache 2.0
Ollama vision models	ollama.com	Local vision via Ollama (Llama 3.2 Vision, Llava, etc.)	Various

Deeper guide: Build a Local Voice Assistant on Your Phone

8. Mobile & Edge Clients

Mobile clients run a quantised model directly on the phone using Apple Neural Engine, Qualcomm NPU, or pure CPU inference. The MLC LLM project is the foundational layer; consumer apps (PocketPal AI, Private LLM, LLM Farm, Layla) wrap it with a chat UI. Flagship phones in 2026 run 2-4B models at usable speeds (8-15 tokens/sec); 7B is on the edge of feasibility for top-tier hardware.

Tool	Link	Description	License
MLC Chat	mlc.ai/mlc-llm	Cross-platform mobile LLM runtime	Apache 2.0
PocketPal AI	github.com/a-ghorbani/pocketpal-ai	Free iOS and Android local LLM client	MIT
Private LLM	privatellm.app	Polished iOS and macOS local LLM app	Paid (closed)
LLM Farm	github.com/guinmoon/LLMFarm	iOS local LLM with model browser	MIT
Layla	layla-network.ai	Android-first local LLM app	Free (closed)
Maid	github.com/Mobile-Artificial-Intelligence/maid	Open-source Flutter mobile LLM app	MIT
Enchanted	enchantedlabs.ai	Native iOS/macOS Ollama client	MIT
Chapper	prevolut.uk	Native Ollama and LM Studio mobile client	Free
RikkaHub	github.com/rikkahub/rikkahub	Open-source Android local AI	MIT
AnythingLLM Mobile	anythingllm.com	Remote access to your local AnythingLLM workspace	MIT

Deeper guide: Best Local LLM Apps for iPhone in 2026

9. Specialized & Productivity Tools

Specialized tools embed local LLMs into apps you already use — note-taking platforms (Obsidian, Logseq, Joplin), autonomous task agents (AutoGPT, BabyAGI, MetaGPT), and roleplay frontends (Agnai, RisuAI). These are not generic chat surfaces; they are workflow-specific integrations that assume you already have a host app and a runtime.

Tool	Link	Description	License
Smart Connections	github.com/brianpetro/obsidian-smart-connections	Obsidian semantic search and chat plugin	GPL 3.0
Copilot for Obsidian	github.com/logancyang/obsidian-copilot	Obsidian local LLM chat plugin	AGPL 3.0
Text Generator	github.com/nhaouari/obsidian-textgenerator-plugin	Obsidian content generation plugin	MIT
logseq-copilot	github.com/logancyang/logseq-copilot	Logseq plugin for local and cloud LLM chat, same author as Obsidian Copilot	AGPL 3.0
BMO Chatbot	github.com/longy2k/obsidian-bmo-chatbot	Obsidian chatbot with local LLM	MIT
Joplin AI	joplinapp.org	Joplin notes with local AI integrations	MIT
AutoGPT (local)	github.com/Significant-Gravitas/AutoGPT	Autonomous task agent with Ollama support	MIT
BabyAGI	github.com/yoheinakajima/babyagi	Lightweight autonomous agent	MIT
MetaGPT	github.com/geekan/MetaGPT	Multi-agent software company simulation	MIT
Agnai	agnai.chat	Roleplay frontend with character cards	MIT
RisuAI	github.com/kwaroran/RisuAI	Mobile-friendly roleplay frontend	GPL 3.0

Deeper guide: Local LLM With Obsidian in 2026

Common Real-World Stacks

For readers who do not want to read nine categories, pick the closest stack and copy it. Each row pairs a real goal with a tested combination and the hardware floor it actually runs on.

Goal	Stack	Hardware floor
Just chat casually	LM Studio standalone	16 GB RAM, no GPU
Best balance for power users	Ollama + Open WebUI	16 GB RAM, optional GPU
Document chat	Ollama + AnythingLLM	16 GB RAM, optional GPU
Coding	Ollama + Continue.dev	16 GB RAM + GPU recommended
Roleplay / creative	KoboldCpp + SillyTavern	16 GB RAM, GPU recommended
Privacy-first business	Ollama + Open WebUI + PrivateGPT	32 GB RAM + 12 GB VRAM
Mobile / on-the-go	MLC Chat or PocketPal AI	iPhone 13+ / Pixel 7+
Apple Silicon	Ollama (MLX backend) or LM Studio	M2/M3/M4/M5 with 16+ GB unified
Multi-user team	vLLM + Open WebUI	32+ GB RAM + multi-GPU

How This Directory Stays Current

This directory is reviewed every six months (next refresh: November 2026). Inclusion criteria: project is actively maintained (commits in the last 90 days), has a verifiable open-source licence or a clear commercial-use statement, and either holds meaningful user share in 2026 or fills a layer that would otherwise be empty. Projects that go inactive for more than two release cycles are removed; new entrants that pass the criteria are added at the next review. To suggest a project for inclusion, open an issue or PR against the PromptQuorum repository — include the project URL, licence, and a one-sentence description in the format above.

Sources

ggml-org/llama.cpp GitHub — primary source for runtime architecture and supported models.
Ollama Library — official model catalogue and runtime documentation.
LM Studio Documentation — feature reference for the dominant desktop GUI.
Open WebUI Documentation — feature reference for the dominant self-hosted web UI.
Hugging Face Hub — primary location for downloading model weights consumed by every runtime listed above.
awesome-local-llm GitHub list — community-maintained inventory used as a sanity check for project inclusion.

FAQ

What is the difference between a local LLM runtime and a desktop app?

A runtime (Ollama, llama.cpp, vLLM) is the engine that loads model weights and serves an API — typically OpenAI-compatible. A desktop app (LM Studio, Jan, GPT4All) is a chat UI that calls a runtime under the hood. Some apps bundle their own runtime (LM Studio embeds llama.cpp), others require you to install a runtime separately (Open WebUI calls Ollama). The runtime decides what is possible; the app decides what is convenient.

Can I use multiple tools from this list at the same time?

Yes — most stacks combine 2-4 tools. A common setup: Ollama as the runtime, Open WebUI for chat, AnythingLLM for document chat, and Continue.dev for coding — all four run against the same Ollama instance on a single machine. The "Common Real-World Stacks" table above lists the recipes that work without conflict.

Which tools work fully offline with no telemetry?

Ollama, llama.cpp, vLLM, Jan, GPT4All, Open WebUI, AnythingLLM, PrivateGPT, Continue.dev, Aider, KoboldCpp, Llamafile, MLX-LM, and most of the AGPL/MIT-licensed apps in this directory work fully offline once the model is downloaded. LM Studio and several closed-source tools have optional analytics that can be disabled in settings — verify by running a packet capture once after install. Browser-based UIs (Open WebUI, LibreChat) are local-only when configured to use a local backend.

Are any of these commercial-licensed (not free for commercial use)?

A handful: LM Studio, Msty, Faraday, Layla, and Cursor are closed-source — generally free to use but not redistributable, and commercial terms vary. Private LLM is paid. AGPL-licensed tools (Jan, KoboldCpp, text-generation-webui, SillyTavern, Khoj, Open Interpreter, Copilot for Obsidian) are free for any use including commercial, but the AGPL terms require source disclosure if you modify and host them publicly. Apache 2.0 and MIT projects (the majority) are usable in any context including commercial without attribution constraints beyond the licence text.

Which tools support Apple Silicon (M-series chips) natively?

Ollama, llama.cpp, MLX-LM, LM Studio, Jan, Enchanted, GPT4All, MLC Chat, AnythingLLM, and most Electron/Tauri apps run natively on Apple Silicon and use the Metal backend. MLX-LM is Apple-specific and the fastest for large models on M-series. vLLM, TensorRT-LLM, and ExLlamaV2 are NVIDIA-focused and either do not run or run poorly on Apple Silicon — for Apple users, Ollama with the Metal backend is the default.

Do all these tools support GGUF model format?

GGUF is the native format for llama.cpp and any tool that wraps it (Ollama, LM Studio, Jan, GPT4All, KoboldCpp, Llamafile). vLLM and TensorRT-LLM use their own optimised formats (typically AWQ or FP16) for higher throughput. ExLlamaV2 uses EXL2 quantisation. MLX-LM uses MLX-converted weights. Most listed tools accept GGUF; a few (vLLM, TensorRT-LLM, ExLlamaV2, MLX-LM) require a one-time conversion step from the original Hugging Face weights.

Which tools are best for users with no coding experience?

GPT4All has the simplest install (one click, runs on 8 GB RAM). LM Studio is the most feature-rich without requiring a terminal. Jan is the most privacy-conscious of the no-code options. For document chat without command-line work, AnythingLLM is the easiest. All four are listed in the Desktop GUI Apps category above.

Can I run these tools on a server and access them remotely?

Most server-capable tools (Ollama, vLLM, LocalAI, Open WebUI, LibreChat, PrivateGPT, AnythingLLM) expose an HTTP API and bind to a network interface configurable in settings. Standard pattern: run Ollama on a home server or VPS, run a UI on your laptop or phone pointing at the server's IP. Treat the API like any web service — bind to localhost behind a reverse proxy, or to a private network with proper authentication. Open WebUI ships with multi-user support out of the box.

Which tools support multi-user / team setups?

Open WebUI, LibreChat, h2oGPT, AnythingLLM (with admin features enabled), and Dify are designed for multi-user use, with role-based access and per-user conversation history. vLLM is the right serving layer underneath when concurrent inference matters — it batches requests across users for throughput unattainable on Ollama at concurrency above ~3.

How often does this directory get updated?

Every six months — the next scheduled refresh is November 2026. Mid-cycle changes (a project goes inactive, a new tool gains meaningful share, a licence changes) get patched into the existing entry. Entirely new categories or layers wait for a refresh to keep the structure stable. The "Sources" section above lists the community indexes used to spot-check what the ecosystem is actually doing between refreshes.

The Complete Local LLM Software Directory: 70+ Tools to Run AI on Your Own Hardware (2026)