PromptQuorumPromptQuorum
Home/Power Local LLM/The Complete Local LLM Software Directory: 70+ Tools to Run AI on Your Own Hardware (2026)
Overview & Reference

The Complete Local LLM Software Directory: 70+ Tools to Run AI on Your Own Hardware (2026)

Β·20 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

The local-LLM ecosystem in 2026 splits cleanly into nine layers. Runtimes (Ollama, llama.cpp, vLLM) move tokens through the model; desktop apps (LM Studio, Jan, GPT4All) wrap a runtime in a chat UI; web UIs (Open WebUI, LibreChat) do the same in a browser; coding assistants (Continue.dev, Cline, Aider) plug a local model into your editor; RAG systems (AnythingLLM, PrivateGPT) point it at your documents; agent frameworks (LangChain, CrewAI, LangGraph) chain calls into multi-step workflows; voice and multimodal stacks (Whisper.cpp, Piper, LLaVA) extend it beyond text; mobile clients (MLC Chat, PocketPal AI) put it on a phone; and specialized productivity plugins (Obsidian, Logseq, AutoGPT) embed it in tools you already use. Pick a runtime first (Ollama for almost everyone), then add one or two layers above. The directory below lists every project worth knowing in each layer along with its licence, so you can plan a stack that is open-source end-to-end if that matters to you.**

The local-LLM ecosystem in 2026 is large enough that picking the wrong tool first costs hours, not minutes. This directory catalogues 87 actively-maintained projects across nine layers β€” runtimes, desktop apps, web UIs, coding assistants, RAG systems, agent frameworks, voice and multimodal, mobile clients, and specialized productivity plugins β€” with the description, licence, and primary URL for each. Use it as the "what exists" map before you commit to a stack; every category ends with a link to the deeper PromptQuorum comparison guide for that layer.

Key Takeaways

  • Nine layers, 87 projects, one map. Runtimes, desktop apps, web UIs, coding assistants, RAG systems, agent frameworks, voice/multimodal, mobile clients, and specialized productivity plugins β€” almost every popular project in 2026 fits in exactly one layer.
  • Pick a runtime first. Ollama is the right default for ~95% of readers; llama.cpp is the foundational engine underneath most other tools; vLLM is the production-serving pick for multi-user setups.
  • Most layers above the runtime are optional. A desktop app OR a web UI is enough for chat. Add a coding harness only when you want IDE integration; add a RAG system only when you want to chat with your own documents; add an agent framework only when one-shot calls stop being enough.
  • Licence matters for commercial use. MIT and Apache 2.0 dominate the ecosystem. AGPL appears on a handful of UIs (text-generation-webui, KoboldCpp, Jan, SillyTavern) β€” fine for personal use, more deliberate for commercial deployments. The "License" column below names every one explicitly.
  • Multi-tool stacks are normal. Ollama + Open WebUI + AnythingLLM + Continue.dev is a single-machine setup that covers chat, RAG, and coding without compromise. The "Common Real-World Stacks" table below names the recipes that actually work in 2026.

1. Local LLM Runtimes & Inference Engines

A runtime is the engine that loads model weights into memory and turns prompts into tokens. It is the first decision in a local-LLM stack and the one that constrains everything above it β€” every desktop app, web UI, and coding harness ultimately calls a runtime. Ollama dominates user-facing share in 2026 because it ships an OpenAI-compatible API and a one-command install; llama.cpp is the C++ engine underneath most of the others; vLLM is the right pick when you need to serve concurrent users on a real GPU.

ToolLinkDescriptionLicense
Ollamaollama.comEasiest overall β€” one-command install, OpenAI-compatible API, huge model libraryMIT
llama.cppgithub.com/ggml-org/llama.cppFoundational C++ engine behind most other tools, runs anywhere including Apple SiliconMIT
vLLMgithub.com/vllm-project/vllmHigh-throughput serving for multi-user GPU deploymentsApache 2.0
LocalAIlocalai.ioDrop-in OpenAI API replacement supporting multiple backendsMIT
TensorRT-LLMgithub.com/NVIDIA/TensorRT-LLMNVIDIA-optimized inference for enterprise GPU rigsApache 2.0
MLC LLMmlc.ai/mlc-llmMobile and edge device deployment runtimeApache 2.0
SGLanggithub.com/sgl-project/sglangStructured inference serving for agent pipelinesApache 2.0
ExLlamaV2github.com/turboderp-org/exllamav2Fast quantized inference optimized for RTX GPUsMIT
KoboldCppgithub.com/LostRuins/koboldcppLightweight llama.cpp wrapper with built-in UIAGPL 3.0
Llamafilegithub.com/Mozilla-Ocho/llamafileSingle-file portable LLM execution by MozillaApache 2.0
MLX-LMgithub.com/ml-explore/mlx-examplesApple Silicon-native runtime by Apple researchMIT

Deeper guide: llama.cpp vs Ollama vs vLLM

2. Desktop GUI Apps

Desktop apps wrap a runtime in a chat interface and a model browser. They are where most non-technical users start because there is no terminal step β€” download, click, chat. LM Studio, Jan, and GPT4All hold the bulk of the user base in 2026; AnythingLLM doubles as a desktop app and a RAG layer; Open Interpreter is the outlier that lets a local model drive your computer.

ToolLinkDescriptionLicense
LM Studiolmstudio.aiMost polished GUI, built-in HuggingFace model browser, server modeFree (closed)
Janjan.aiPrivacy-focused offline ChatGPT clone, fully open-sourceAGPL 3.0
GPT4Allnomic.ai/gpt4allBeginner-friendly with strong CPU-only supportMIT
AnythingLLManythingllm.comRAG and document chat with built-in vector storeMIT
Mstymsty.appClean consumer UX, multi-provider supportFree (closed)
Cherry Studiocherry-ai.comMulti-provider desktop AI with extensive customizationApache 2.0
Faradayfaraday.devCharacter chat and roleplay desktop clientFree (closed)
Enchantedenchantedlabs.aiNative macOS/iOS minimal Ollama clientMIT
h2oGPTgithub.com/h2oai/h2ogptEnterprise-feature-heavy desktop and serverApache 2.0
Open Interpretergithub.com/OpenInterpreter/open-interpreterLets local LLM control your computer and execute codeAGPL 3.0

Deeper guide: LM Studio vs Jan vs GPT4All

3. Web UIs & Browser Frontends

Web UIs are self-hosted ChatGPT clones β€” same conversational surface, but you point them at a runtime running on your own machine or LAN. They are the natural choice when you want multi-device access (laptop, phone, tablet hitting one server) or team usage. Open WebUI dominates the self-hosted segment in 2026, with LibreChat as the team-features alternative and SillyTavern as the dedicated roleplay UI.

ToolLinkDescriptionLicense
Open WebUIopenwebui.comMost popular self-hosted ChatGPT-like UI with built-in RAGBSD 3-Clause
LibreChatlibrechat.aiMulti-model ChatGPT alternative with team featuresMIT
text-generation-webuigithub.com/oobabooga/text-generation-webuiPower-user UI with extensive plugin ecosystemAGPL 3.0
SillyTaverngithub.com/SillyTavern/SillyTavernRoleplay and character chat with lorebooksAGPL 3.0
LobeChatlobehub.comModern polished UI with plugin marketplaceMIT
Big-AGIgithub.com/enricoros/big-AGIAdvanced multi-provider frontend with personasMIT
NextChatgithub.com/ChatGPTNextWeb/NextChatLightweight web chat, simple deploymentMIT
Page Assistgithub.com/n4ze3m/page-assistBrowser sidebar AI for Chrome and FirefoxMIT
Chatboxchatboxai.appCross-platform desktop and web clientGPLv3

Deeper guide: SillyTavern vs Agnai vs RisuAI

4. Coding Assistants & IDE Integrations

Coding assistants connect a local LLM to your editor or terminal via OpenAI-compatible APIs. The choice is mostly about workflow primitive: autocomplete-in-editor (Continue.dev), autonomous agent edits (Cline, OpenHands), or git-native diff edits at the terminal (Aider). All three patterns work against any runtime that speaks the OpenAI Chat Completions protocol β€” Ollama is the most common backend in 2026.

ToolLinkDescriptionLicense
Continue.devcontinue.devVS Code and JetBrains autocomplete and chat with local modelsApache 2.0
Aideraider.chatTerminal pair programmer with multi-file edit supportApache 2.0
Clinecline.botAutonomous coding agent for VS CodeApache 2.0
Tabbytabby.tabbyml.comSelf-hosted GitHub Copilot alternativeApache 2.0
CodeGPTcodegpt.coIDE integrations across multiple editorsMIT
OpenHandsgithub.com/All-Hands-AI/OpenHandsAI software engineer agent (formerly OpenDevin)MIT
Cursor (local mode)cursor.comAI-first code editor with local model supportFree (closed)
Twinnygithub.com/twinnydotdev/twinnyFree Copilot alternative for VS CodeMIT

Deeper guide: Continue.dev vs Cline vs Aider

5. RAG & Document Chat Systems

RAG (Retrieval-Augmented Generation) systems combine a local LLM with an embedding model and a vector store so the model can answer from your own documents.** The split is between turn-key apps (AnythingLLM, PrivateGPT, Quivr, Khoj) that "just work" and framework libraries (LlamaIndex, Haystack, txtai) that you build on. RAGFlow has gained share in 2026 specifically for documents that need citation-grade retrieval.

ToolLinkDescriptionLicense
AnythingLLManythingllm.comEasiest all-in-one personal RAG with workspacesMIT
PrivateGPTgithub.com/zylon-ai/private-gptFully offline enterprise-leaning RAGApache 2.0
Quivrgithub.com/QuivrHQ/quivrSelf-hosted personal knowledge assistantApache 2.0
Khojkhoj.devPersonal AI second brain, syncs with Obsidian and NotionAGPL 3.0
Difydify.aiAI workflow builder with RAG and agent supportModified Apache 2.0
Flowiseflowiseai.comVisual LangChain workflow builderApache 2.0
Langflowlangflow.orgVisual AI orchestration with RAG componentsMIT
LlamaIndexllamaindex.aiRAG framework / Python library β€” foundation for custom buildsMIT
Haystackhaystack.deepset.aiSearch and RAG framework by deepsetApache 2.0
RAGFlowragflow.ioDeep document understanding for RAG with citation extractionApache 2.0
txtaigithub.com/neuml/txtaiEmbedded vector + LLM database in one libraryApache 2.0

Deeper guide: AnythingLLM vs PrivateGPT vs Open WebUI

6. Agent Frameworks & Orchestration

Agent frameworks turn one-shot LLM calls into multi-step workflows β€” plan, act, observe, repeat. LangChain remains the general-purpose default; CrewAI and AutoGen specialise in role-based multi-agent setups; LangGraph is the right pick when state management matters across long-running flows. All eight frameworks below run cleanly against a local Ollama backend.

ToolLinkDescriptionLicense
LangChainlangchain.comGeneral-purpose LLM application frameworkMIT
LlamaIndexllamaindex.aiRAG-focused agent and data frameworkMIT
CrewAIcrewai.comMulti-agent role-based workflowsMIT
AutoGengithub.com/microsoft/autogenMicrosoft multi-agent orchestration frameworkCC-BY-4.0 / MIT
Semantic Kernellearn.microsoft.com/semantic-kernelMicrosoft enterprise orchestration SDK in C#/Python/JavaMIT
LangGraphlangchain-ai.github.io/langgraphStateful graph-based agent workflowsMIT
Letta (formerly MemGPT)letta.comLong-term memory agentsApache 2.0
Pydantic AIai.pydantic.devType-safe agent framework built on PydanticMIT

Deeper guide: Local AI Agents With MCP

7. Voice, Speech & Multimodal

Voice and multimodal stacks extend a local LLM beyond text β€” speech in (STT), speech out (TTS), and vision. Whisper.cpp and faster-whisper own the local STT layer; Piper and Coqui share the TTS layer with XTTS v2 dominating voice cloning; LLaVA and Ollama vision models cover the vision side. A fully-offline voice assistant is buildable from this layer plus a small chat model.

ToolLinkDescriptionLicense
Whisper.cppgithub.com/ggerganov/whisper.cppLocal speech recognition, runs on CPU or GPUMIT
faster-whispergithub.com/SYSTRAN/faster-whisperFast Whisper transcription via CTranslate2MIT
Piper TTSgithub.com/rhasspy/piperLightweight local text-to-speechMIT
Coqui TTScoqui.aiOpen-source voice synthesis with multiple model optionsMPL 2.0
XTTS v2docs.coqui.ai/en/latest/models/xtts.htmlVoice cloning with multilingual supportCPML
Barkgithub.com/suno-ai/barkGenerative voice with non-speech soundsMIT
StyleTTS 2github.com/yl4579/StyleTTS2High-quality natural-sounding TTSMIT
LLaVAllava-vl.github.ioLocal vision + language modelApache 2.0
Ollama vision modelsollama.comLocal vision via Ollama (Llama 3.2 Vision, Llava, etc.)Various

Deeper guide: Build a Local Voice Assistant on Your Phone

8. Mobile & Edge Clients

Mobile clients run a quantised model directly on the phone using Apple Neural Engine, Qualcomm NPU, or pure CPU inference. The MLC LLM project is the foundational layer; consumer apps (PocketPal AI, Private LLM, LLM Farm, Layla) wrap it with a chat UI. Flagship phones in 2026 run 2-4B models at usable speeds (8-15 tokens/sec); 7B is on the edge of feasibility for top-tier hardware.

ToolLinkDescriptionLicense
MLC Chatmlc.ai/mlc-llmCross-platform mobile LLM runtimeApache 2.0
PocketPal AIgithub.com/a-ghorbani/pocketpal-aiFree iOS and Android local LLM clientMIT
Private LLMprivatellm.appPolished iOS and macOS local LLM appPaid (closed)
LLM Farmgithub.com/guinmoon/LLMFarmiOS local LLM with model browserMIT
Laylalayla-network.aiAndroid-first local LLM appFree (closed)
Maidgithub.com/Mobile-Artificial-Intelligence/maidOpen-source Flutter mobile LLM appMIT
Enchantedenchantedlabs.aiNative iOS/macOS Ollama clientMIT
Chapperprevolut.ukNative Ollama and LM Studio mobile clientFree
RikkaHubgithub.com/rikkahub/rikkahubOpen-source Android local AIMIT
AnythingLLM Mobileanythingllm.comRemote access to your local AnythingLLM workspaceMIT

Deeper guide: Best Local LLM Apps for iPhone in 2026

9. Specialized & Productivity Tools

Specialized tools embed local LLMs into apps you already use β€” note-taking platforms (Obsidian, Logseq, Joplin), autonomous task agents (AutoGPT, BabyAGI, MetaGPT), and roleplay frontends (Agnai, RisuAI). These are not generic chat surfaces; they are workflow-specific integrations that assume you already have a host app and a runtime.

ToolLinkDescriptionLicense
Smart Connectionsgithub.com/brianpetro/obsidian-smart-connectionsObsidian semantic search and chat pluginGPL 3.0
Copilot for Obsidiangithub.com/logancyang/obsidian-copilotObsidian local LLM chat pluginAGPL 3.0
Text Generatorgithub.com/nhaouari/obsidian-textgenerator-pluginObsidian content generation pluginMIT
logseq-copilotgithub.com/logancyang/logseq-copilotLogseq plugin for local and cloud LLM chat, same author as Obsidian CopilotAGPL 3.0
BMO Chatbotgithub.com/longy2k/obsidian-bmo-chatbotObsidian chatbot with local LLMMIT
Joplin AIjoplinapp.orgJoplin notes with local AI integrationsMIT
AutoGPT (local)github.com/Significant-Gravitas/AutoGPTAutonomous task agent with Ollama supportMIT
BabyAGIgithub.com/yoheinakajima/babyagiLightweight autonomous agentMIT
MetaGPTgithub.com/geekan/MetaGPTMulti-agent software company simulationMIT
Agnaiagnai.chatRoleplay frontend with character cardsMIT
RisuAIgithub.com/kwaroran/RisuAIMobile-friendly roleplay frontendGPL 3.0

Deeper guide: Local LLM With Obsidian in 2026

Common Real-World Stacks

For readers who do not want to read nine categories, pick the closest stack and copy it. Each row pairs a real goal with a tested combination and the hardware floor it actually runs on.

GoalStackHardware floor
Just chat casuallyLM Studio standalone16 GB RAM, no GPU
Best balance for power usersOllama + Open WebUI16 GB RAM, optional GPU
Document chatOllama + AnythingLLM16 GB RAM, optional GPU
CodingOllama + Continue.dev16 GB RAM + GPU recommended
Roleplay / creativeKoboldCpp + SillyTavern16 GB RAM, GPU recommended
Privacy-first businessOllama + Open WebUI + PrivateGPT32 GB RAM + 12 GB VRAM
Mobile / on-the-goMLC Chat or PocketPal AIiPhone 13+ / Pixel 7+
Apple SiliconOllama (MLX backend) or LM StudioM2/M3/M4/M5 with 16+ GB unified
Multi-user teamvLLM + Open WebUI32+ GB RAM + multi-GPU

How This Directory Stays Current

This directory is reviewed every six months (next refresh: November 2026). Inclusion criteria: project is actively maintained (commits in the last 90 days), has a verifiable open-source licence or a clear commercial-use statement, and either holds meaningful user share in 2026 or fills a layer that would otherwise be empty. Projects that go inactive for more than two release cycles are removed; new entrants that pass the criteria are added at the next review. To suggest a project for inclusion, open an issue or PR against the PromptQuorum repository β€” include the project URL, licence, and a one-sentence description in the format above.

Sources

FAQ

What is the difference between a local LLM runtime and a desktop app?

A runtime (Ollama, llama.cpp, vLLM) is the engine that loads model weights and serves an API β€” typically OpenAI-compatible. A desktop app (LM Studio, Jan, GPT4All) is a chat UI that calls a runtime under the hood. Some apps bundle their own runtime (LM Studio embeds llama.cpp), others require you to install a runtime separately (Open WebUI calls Ollama). The runtime decides what is possible; the app decides what is convenient.

Can I use multiple tools from this list at the same time?

Yes β€” most stacks combine 2-4 tools. A common setup: Ollama as the runtime, Open WebUI for chat, AnythingLLM for document chat, and Continue.dev for coding β€” all four run against the same Ollama instance on a single machine. The "Common Real-World Stacks" table above lists the recipes that work without conflict.

Which tools work fully offline with no telemetry?

Ollama, llama.cpp, vLLM, Jan, GPT4All, Open WebUI, AnythingLLM, PrivateGPT, Continue.dev, Aider, KoboldCpp, Llamafile, MLX-LM, and most of the AGPL/MIT-licensed apps in this directory work fully offline once the model is downloaded. LM Studio and several closed-source tools have optional analytics that can be disabled in settings β€” verify by running a packet capture once after install. Browser-based UIs (Open WebUI, LibreChat) are local-only when configured to use a local backend.

Are any of these commercial-licensed (not free for commercial use)?

A handful: LM Studio, Msty, Faraday, Layla, and Cursor are closed-source β€” generally free to use but not redistributable, and commercial terms vary. Private LLM is paid. AGPL-licensed tools (Jan, KoboldCpp, text-generation-webui, SillyTavern, Khoj, Open Interpreter, Copilot for Obsidian) are free for any use including commercial, but the AGPL terms require source disclosure if you modify and host them publicly. Apache 2.0 and MIT projects (the majority) are usable in any context including commercial without attribution constraints beyond the licence text.

Which tools support Apple Silicon (M-series chips) natively?

Ollama, llama.cpp, MLX-LM, LM Studio, Jan, Enchanted, GPT4All, MLC Chat, AnythingLLM, and most Electron/Tauri apps run natively on Apple Silicon and use the Metal backend. MLX-LM is Apple-specific and the fastest for large models on M-series. vLLM, TensorRT-LLM, and ExLlamaV2 are NVIDIA-focused and either do not run or run poorly on Apple Silicon β€” for Apple users, Ollama with the Metal backend is the default.

Do all these tools support GGUF model format?

GGUF is the native format for llama.cpp and any tool that wraps it (Ollama, LM Studio, Jan, GPT4All, KoboldCpp, Llamafile). vLLM and TensorRT-LLM use their own optimised formats (typically AWQ or FP16) for higher throughput. ExLlamaV2 uses EXL2 quantisation. MLX-LM uses MLX-converted weights. Most listed tools accept GGUF; a few (vLLM, TensorRT-LLM, ExLlamaV2, MLX-LM) require a one-time conversion step from the original Hugging Face weights.

Which tools are best for users with no coding experience?

GPT4All has the simplest install (one click, runs on 8 GB RAM). LM Studio is the most feature-rich without requiring a terminal. Jan is the most privacy-conscious of the no-code options. For document chat without command-line work, AnythingLLM is the easiest. All four are listed in the Desktop GUI Apps category above.

Can I run these tools on a server and access them remotely?

Most server-capable tools (Ollama, vLLM, LocalAI, Open WebUI, LibreChat, PrivateGPT, AnythingLLM) expose an HTTP API and bind to a network interface configurable in settings. Standard pattern: run Ollama on a home server or VPS, run a UI on your laptop or phone pointing at the server's IP. Treat the API like any web service β€” bind to localhost behind a reverse proxy, or to a private network with proper authentication. Open WebUI ships with multi-user support out of the box.

Which tools support multi-user / team setups?

Open WebUI, LibreChat, h2oGPT, AnythingLLM (with admin features enabled), and Dify are designed for multi-user use, with role-based access and per-user conversation history. vLLM is the right serving layer underneath when concurrent inference matters β€” it batches requests across users for throughput unattainable on Ollama at concurrency above ~3.

How often does this directory get updated?

Every six months β€” the next scheduled refresh is November 2026. Mid-cycle changes (a project goes inactive, a new tool gains meaningful share, a licence changes) get patched into the existing entry. Entirely new categories or layers wait for a refresh to keep the structure stable. The "Sources" section above lists the community indexes used to spot-check what the ecosystem is actually doing between refreshes.

← Back to Power Local LLM