Skip to main content
PromptQuorumPromptQuorum
Home/Power Local LLM/The Complete Local LLM Software Directory: 70+ Tools to Run AI on Your Own Hardware (2026)
Overview & Reference

The Complete Local LLM Software Directory: 70+ Tools to Run AI on Your Own Hardware (2026)

·20 min read·By Hans Kuepper · Founder of PromptQuorum, multi-model AI dispatch tool · PromptQuorum

The local-LLM ecosystem in 2026 splits cleanly into nine layers. Runtimes (Ollama, llama.cpp, vLLM) move tokens through the model; desktop apps (LM Studio, Jan, GPT4All) wrap a runtime in a chat UI; web UIs (Open WebUI, LibreChat) do the same in a browser; coding assistants (Continue.dev, Cline, Aider) plug a local model into your editor; RAG systems (AnythingLLM, PrivateGPT) point it at your documents; agent frameworks (LangChain, CrewAI, LangGraph) chain calls into multi-step workflows; voice and multimodal stacks (Whisper.cpp, Piper, LLaVA) extend it beyond text; mobile clients (MLC Chat, PocketPal AI) put it on a phone; and specialized productivity plugins (Obsidian, Logseq, AutoGPT) embed it in tools you already use. Pick a runtime first (Ollama for almost everyone), then add one or two layers above. The directory below lists every project worth knowing in each layer along with its licence, so you can plan a stack that is open-source end-to-end if that matters to you.**

The local-LLM ecosystem in 2026 is large enough that picking the wrong tool first costs hours, not minutes. This directory catalogues 87 actively-maintained projects across nine layers — runtimes, desktop apps, web UIs, coding assistants, RAG systems, agent frameworks, voice and multimodal, mobile clients, and specialized productivity plugins — with the description, licence, and primary URL for each. Use it as the "what exists" map before you commit to a stack; every category ends with a link to the deeper PromptQuorum comparison guide for that layer.

This page contains links to third-party products for reference. PromptQuorum is not enrolled in any affiliate program — these are plain links that earn no commission. Clicking links and your next steps are entirely your own responsibility. These links do not represent any endorsement or verification by PromptQuorum.

Slide Deck: The Complete Local LLM Software Directory: 70+ Tools to Run AI on Your Own Hardware (2026)

The slide deck below covers: a 9-layer local LLM stack overview (runtimes through specialized plugins); 6-tool comparison tables for runtimes (Ollama/llama.cpp/vLLM/LocalAI/ExLlamaV2/MLX-LM), desktop apps, web UIs, coding assistants, RAG systems, and agent frameworks; a 9-row real-world stacks table (goal, stack, hardware floor); 5-step stack selection guide; and FAQ. Download the PDF as a local LLM software directory reference card.

Browse the slides below or download as PDF for offline reference. Download Reference Card (PDF)

Key Takeaways

  • Nine layers, 87 projects, one map. Runtimes, desktop apps, web UIs, coding assistants, RAG systems, agent frameworks, voice/multimodal, mobile clients, and specialized productivity plugins — almost every popular project in 2026 fits in exactly one layer.
  • Pick a runtime first. Ollama is the right default for ~95% of readers; llama.cpp is the foundational engine underneath most other tools; vLLM is the production-serving pick for multi-user setups.
  • Most layers above the runtime are optional. A desktop app OR a web UI is enough for chat. Add a coding harness only when you want IDE integration; add a RAG system only when you want to chat with your own documents; add an agent framework only when one-shot calls stop being enough.
  • Licence matters for commercial use. MIT and Apache 2.0 dominate the ecosystem. AGPL appears on a handful of UIs (text-generation-webui, KoboldCpp, Jan, SillyTavern) — fine for personal use, more deliberate for commercial deployments. The "License" column below names every one explicitly.
  • Multi-tool stacks are normal. Ollama + Open WebUI + AnythingLLM + Continue.dev is a single-machine setup that covers chat, RAG, and coding without compromise. The "Common Real-World Stacks" table below names the recipes that actually work in 2026.
The 9 layers of a local LLM stack: 87 actively-maintained projects spanning runtimes (Ollama, llama.cpp, vLLM), desktop apps (LM Studio, Jan, GPT4All), web UIs, coding assistants, RAG systems, agent frameworks, voice & multimodal, mobile clients, and specialized productivity tools.
The 9 layers of a local LLM stack: 87 actively-maintained projects spanning runtimes (Ollama, llama.cpp, vLLM), desktop apps (LM Studio, Jan, GPT4All), web UIs, coding assistants, RAG systems, agent frameworks, voice & multimodal, mobile clients, and specialized productivity tools.

1. Local LLM Runtimes & Inference Engines

A runtime is the engine that loads model weights into memory and turns prompts into tokens. It is the first decision in a local-LLM stack and the one that constrains everything above it — every desktop app, web UI, and coding harness ultimately calls a runtime. Ollama dominates user-facing share in 2026 because it ships an OpenAI-compatible API and a one-command install; llama.cpp is the C++ engine underneath most of the others; vLLM is the right pick when you need to serve concurrent users on a real GPU.

ToolLinkDescriptionLicense
Ollamaollama.comEasiest overall — one-command install, OpenAI-compatible API, huge model libraryMIT
llama.cppgithub.com/ggml-org/llama.cppFoundational C++ engine behind most other tools, runs anywhere including Apple SiliconMIT
vLLMgithub.com/vllm-project/vllmHigh-throughput serving for multi-user GPU deploymentsApache 2.0
LocalAIlocalai.ioDrop-in OpenAI API replacement supporting multiple backendsMIT
TensorRT-LLMgithub.com/NVIDIA/TensorRT-LLMNVIDIA-optimized inference for enterprise GPU rigsApache 2.0
MLC LLMmlc.ai/mlc-llmMobile and edge device deployment runtimeApache 2.0
SGLanggithub.com/sgl-project/sglangStructured inference serving for agent pipelinesApache 2.0
ExLlamaV2github.com/turboderp-org/exllamav2Fast quantized inference optimized for RTX GPUsMIT
KoboldCppgithub.com/LostRuins/koboldcppLightweight llama.cpp wrapper with built-in UIAGPL 3.0
Llamafilegithub.com/Mozilla-Ocho/llamafileSingle-file portable LLM execution by MozillaApache 2.0
MLX-LMgithub.com/ml-explore/mlx-examplesApple Silicon-native runtime by Apple researchMIT

Deeper guide: llama.cpp vs Ollama vs vLLM

Ollama vs llama.cpp vs vLLM: Ollama is MIT-licensed with one-command install and OpenAI-compatible API; llama.cpp is the foundational MIT-licensed C++ engine; vLLM is the Apache 2.0 multi-user serving choice for GPU deployments.
Ollama vs llama.cpp vs vLLM: Ollama is MIT-licensed with one-command install and OpenAI-compatible API; llama.cpp is the foundational MIT-licensed C++ engine; vLLM is the Apache 2.0 multi-user serving choice for GPU deployments.
Check RunPod pricing and sign upproduct link · disclosedCheck Vast.ai pricing and sign upproduct link · disclosedCheck Lambda Labs pricing and sign upproduct link · disclosed

2. Desktop GUI Apps

Desktop apps wrap a runtime in a chat interface and a model browser. They are where most non-technical users start because there is no terminal step — download, click, chat. LM Studio, Jan, and GPT4All hold the bulk of the user base in 2026; AnythingLLM doubles as a desktop app and a RAG layer; Open Interpreter is the outlier that lets a local model drive your computer.

ToolLinkDescriptionLicense
LM Studiolmstudio.aiMost polished GUI, built-in HuggingFace model browser, server modeFree (closed)
Janjan.aiPrivacy-focused offline ChatGPT clone, fully open-sourceAGPL 3.0
GPT4Allnomic.ai/gpt4allBeginner-friendly with strong CPU-only supportMIT
AnythingLLManythingllm.comRAG and document chat with built-in vector storeMIT
Mstymsty.appClean consumer UX, multi-provider supportFree (closed)
Cherry Studiocherry-ai.comMulti-provider desktop AI with extensive customizationAGPL 3.0
Backyard AIbackyard.aiCharacter chat and roleplay desktop clientFree (closed)
Enchantedgithub.com/AugustDev/enchantedNative macOS/iOS minimal Ollama clientApache 2.0
h2oGPTgithub.com/h2oai/h2ogptEnterprise-feature-heavy desktop and serverApache 2.0
Open Interpretergithub.com/OpenInterpreter/open-interpreterLets local LLM control your computer and execute codeAGPL 3.0

Deeper guide: LM Studio vs Jan vs GPT4All

Check Msty pricingproduct link · disclosedCheck AnythingLLM Cloud pricingproduct link · disclosed

3. Web UIs & Browser Frontends

Web UIs are self-hosted ChatGPT clones — same conversational surface, but you point them at a runtime running on your own machine or LAN. They are the natural choice when you want multi-device access (laptop, phone, tablet hitting one server) or team usage. Open WebUI dominates the self-hosted segment in 2026, with LibreChat as the team-features alternative and SillyTavern as the dedicated roleplay UI.

ToolLinkDescriptionLicense
Open WebUIopenwebui.comMost popular self-hosted ChatGPT-like UI with built-in RAGBSD 3-Clause
LibreChatlibrechat.aiMulti-model ChatGPT alternative with team featuresMIT
text-generation-webuigithub.com/oobabooga/text-generation-webuiPower-user UI with extensive plugin ecosystemAGPL 3.0
SillyTaverngithub.com/SillyTavern/SillyTavernRoleplay and character chat with lorebooksAGPL 3.0
LobeChatlobehub.comModern polished UI with plugin marketplaceMIT
Big-AGIgithub.com/enricoros/big-AGIAdvanced multi-provider frontend with personasMIT
NextChatgithub.com/ChatGPTNextWeb/NextChatLightweight web chat, simple deploymentMIT
Page Assistgithub.com/n4ze3m/page-assistBrowser sidebar AI for Chrome and FirefoxMIT
Chatboxchatboxai.appCross-platform desktop and web clientGPLv3

Deeper guide: SillyTavern vs Agnai vs RisuAI

4. Coding Assistants & IDE Integrations

Coding assistants connect a local LLM to your editor or terminal via OpenAI-compatible APIs. The choice is mostly about workflow primitive: autocomplete-in-editor (Continue.dev), autonomous agent edits (Cline, OpenHands), or git-native diff edits at the terminal (Aider). All three patterns work against any runtime that speaks the OpenAI Chat Completions protocol — Ollama is the most common backend in 2026.

ToolLinkDescriptionLicense
Continue.devcontinue.devVS Code and JetBrains autocomplete and chat with local modelsApache 2.0
Aideraider.chatTerminal pair programmer with multi-file edit supportApache 2.0
Clinecline.botAutonomous coding agent for VS CodeApache 2.0
Tabbytabby.tabbyml.comSelf-hosted GitHub Copilot alternativeApache 2.0
CodeGPTcodegpt.coIDE integrations across multiple editorsMIT
OpenHandsgithub.com/All-Hands-AI/OpenHandsAI software engineer agent (formerly OpenDevin)MIT
Cursor (local mode)cursor.comAI-first code editor with local model supportFree (closed)
Twinnygithub.com/twinnydotdev/twinnyFree Copilot alternative for VS CodeMIT

Deeper guide: Continue.dev vs Cline vs Aider

3 local LLM coding patterns: Continue.dev for inline autocomplete in VS Code and JetBrains, Cline for autonomous agent file edits, and Aider for git-native terminal diffs — all connect to Ollama via the OpenAI-compatible API.
3 local LLM coding patterns: Continue.dev for inline autocomplete in VS Code and JetBrains, Cline for autonomous agent file edits, and Aider for git-native terminal diffs — all connect to Ollama via the OpenAI-compatible API.
Check Cursor pricingproduct link · disclosed

5. RAG & Document Chat Systems

RAG (Retrieval-Augmented Generation) systems combine a local LLM with an embedding model and a vector store so the model can answer from your own documents.** The split is between turn-key apps (AnythingLLM, PrivateGPT, Quivr, Khoj) that "just work" and framework libraries (LlamaIndex, Haystack, txtai) that you build on. RAGFlow has gained share in 2026 specifically for documents that need citation-grade retrieval.

ToolLinkDescriptionLicense
AnythingLLManythingllm.comEasiest all-in-one personal RAG with workspacesMIT
PrivateGPTgithub.com/zylon-ai/private-gptFully offline enterprise-leaning RAGApache 2.0
Quivrgithub.com/QuivrHQ/quivrSelf-hosted personal knowledge assistantApache 2.0
Khojkhoj.devPersonal AI second brain, syncs with Obsidian and NotionAGPL 3.0
Difydify.aiAI workflow builder with RAG and agent supportModified Apache 2.0
Flowiseflowiseai.comVisual LangChain workflow builderApache 2.0
Langflowlangflow.orgVisual AI orchestration with RAG componentsMIT
LlamaIndexllamaindex.aiRAG framework / Python library — foundation for custom buildsMIT
Haystackhaystack.deepset.aiSearch and RAG framework by deepsetApache 2.0
RAGFlowragflow.ioDeep document understanding for RAG with citation extractionApache 2.0
txtaigithub.com/neuml/txtaiEmbedded vector + LLM database in one libraryApache 2.0

Deeper guide: AnythingLLM vs PrivateGPT vs Open WebUI

Local RAG split: turn-key apps (AnythingLLM, PrivateGPT, Quivr, RAGFlow, Khoj) for no-code document chat vs framework libraries (LlamaIndex, Haystack, Dify, Flowise, txtai) for building custom pipelines.
Local RAG split: turn-key apps (AnythingLLM, PrivateGPT, Quivr, RAGFlow, Khoj) for no-code document chat vs framework libraries (LlamaIndex, Haystack, Dify, Flowise, txtai) for building custom pipelines.

6. Agent Frameworks & Orchestration

Agent frameworks turn one-shot LLM calls into multi-step workflows — plan, act, observe, repeat. LangChain remains the general-purpose default; CrewAI and AutoGen specialise in role-based multi-agent setups; LangGraph is the right pick when state management matters across long-running flows. All eight frameworks below run cleanly against a local Ollama backend.

ToolLinkDescriptionLicense
LangChainlangchain.comGeneral-purpose LLM application frameworkMIT
LlamaIndexllamaindex.aiRAG-focused agent and data frameworkMIT
CrewAIcrewai.comMulti-agent role-based workflowsMIT
AutoGengithub.com/microsoft/autogenMicrosoft multi-agent orchestration frameworkCC-BY-4.0 / MIT
Semantic Kernellearn.microsoft.com/semantic-kernelMicrosoft enterprise orchestration SDK in C#/Python/JavaMIT
LangGraphlangchain-ai.github.io/langgraphStateful graph-based agent workflowsMIT
Letta (formerly MemGPT)letta.comLong-term memory agentsApache 2.0
Pydantic AIai.pydantic.devType-safe agent framework built on PydanticMIT

Deeper guide: Local AI Agents With MCP

7. Voice, Speech & Multimodal

Voice and multimodal stacks extend a local LLM beyond text — speech in (STT), speech out (TTS), and vision. Whisper.cpp and faster-whisper own the local STT layer; Piper and Coqui share the TTS layer with XTTS v2 dominating voice cloning; LLaVA and Ollama vision models cover the vision side. A fully-offline voice assistant is buildable from this layer plus a small chat model.

ToolLinkDescriptionLicense
Whisper.cppgithub.com/ggerganov/whisper.cppLocal speech recognition, runs on CPU or GPUMIT
faster-whispergithub.com/SYSTRAN/faster-whisperFast Whisper transcription via CTranslate2MIT
Piper TTSgithub.com/rhasspy/piperLightweight local text-to-speechMIT
Coqui TTSgithub.com/idiap/coqui-ai-TTSOpen-source voice synthesis with multiple model optionsMPL 2.0
XTTS v2huggingface.co/coqui/XTTS-v2Voice cloning with multilingual supportCPML
Barkgithub.com/suno-ai/barkGenerative voice with non-speech soundsMIT
StyleTTS 2github.com/yl4579/StyleTTS2High-quality natural-sounding TTSMIT
LLaVAllava-vl.github.ioLocal vision + language modelApache 2.0
Ollama vision modelsollama.comLocal vision via Ollama (Llama 3.2 Vision, Llava, etc.)Various

Deeper guide: Build a Local Voice Assistant on Your Phone

8. Mobile & Edge Clients

Mobile clients run a quantised model directly on the phone using Apple Neural Engine, Qualcomm NPU, or pure CPU inference. The MLC LLM project is the foundational layer; consumer apps (PocketPal AI, Private LLM, LLM Farm, Layla) wrap it with a chat UI. Flagship phones in 2026 run 2-4B models at usable speeds (8-15 tokens/sec); 7B is on the edge of feasibility for top-tier hardware.

ToolLinkDescriptionLicense
MLC Chatmlc.ai/mlc-llmCross-platform mobile LLM runtimeApache 2.0
PocketPal AIgithub.com/a-ghorbani/pocketpal-aiFree iOS and Android local LLM clientMIT
Private LLMprivatellm.appPolished iOS and macOS local LLM appPaid (closed)
LLM Farmgithub.com/guinmoon/LLMFarmiOS local LLM with model browserMIT
Laylalayla-network.aiAndroid-first local LLM appFree (closed)
Maidgithub.com/Mobile-Artificial-Intelligence/maidOpen-source Flutter mobile LLM appMIT
Enchantedgithub.com/AugustDev/enchantedNative iOS/macOS Ollama clientApache 2.0
Chapperprevolut.ukNative Ollama and LM Studio mobile clientFree
RikkaHubgithub.com/rikkahub/rikkahubOpen-source Android local AIMIT
AnythingLLM Mobileanythingllm.comRemote access to your local AnythingLLM workspaceMIT

Deeper guide: Best Local LLM Apps for iPhone in 2026

9. Specialized & Productivity Tools

Specialized tools embed local LLMs into apps you already use — note-taking platforms (Obsidian, Logseq, Joplin), autonomous task agents (AutoGPT, BabyAGI, MetaGPT), and roleplay frontends (Agnai, RisuAI). These are not generic chat surfaces; they are workflow-specific integrations that assume you already have a host app and a runtime.

ToolLinkDescriptionLicense
Smart Connectionsgithub.com/brianpetro/obsidian-smart-connectionsObsidian semantic search and chat pluginGPL 3.0
Copilot for Obsidiangithub.com/logancyang/obsidian-copilotObsidian local LLM chat pluginAGPL 3.0
Text Generatorgithub.com/nhaouari/obsidian-textgenerator-pluginObsidian content generation pluginMIT
logseq-copilotgithub.com/logancyang/logseq-copilotLogseq plugin for local and cloud LLM chat, same author as Obsidian CopilotAGPL 3.0
BMO Chatbotgithub.com/longy2k/obsidian-bmo-chatbotObsidian chatbot with local LLMMIT
Joplin AIjoplinapp.orgJoplin notes with local AI integrationsMIT
AutoGPT (local)github.com/Significant-Gravitas/AutoGPTAutonomous task agent with Ollama supportMIT
BabyAGIgithub.com/yoheinakajima/babyagiLightweight autonomous agentMIT
MetaGPTgithub.com/geekan/MetaGPTMulti-agent software company simulationMIT
Agnaiagnai.chatRoleplay frontend with character cardsMIT
RisuAIgithub.com/kwaroran/RisuAIMobile-friendly roleplay frontendGPL 3.0

Deeper guide: Local LLM With Obsidian in 2026

Common Real-World Stacks

For readers who do not want to read nine categories, pick the closest stack and copy it. Each row pairs a real goal with a tested combination and the hardware floor it actually runs on.

GoalStackHardware floor
Just chat casuallyLM Studio standalone16 GB RAM, no GPU
Best balance for power usersOllama + Open WebUI16 GB RAM, optional GPU
Document chatOllama + AnythingLLM16 GB RAM, optional GPU
CodingOllama + Continue.dev16 GB RAM + GPU recommended
Roleplay / creativeKoboldCpp + SillyTavern16 GB RAM, GPU recommended
Privacy-first businessOllama + Open WebUI + PrivateGPT32 GB RAM + 12 GB VRAM
Mobile / on-the-goMLC Chat or PocketPal AIiPhone 13+ / Pixel 7+
Apple SiliconOllama (MLX backend) or LM StudioM2/M3/M4/M5 with 16+ GB unified
Multi-user teamvLLM + Open WebUI32+ GB RAM + multi-GPU
9 common real-world local LLM stacks by goal: from LM Studio standalone (16 GB RAM, no GPU) to vLLM + Open WebUI for multi-user teams (32 GB RAM + multi-GPU), with Ollama + Open WebUI as the best-balance default at 16 GB RAM.
9 common real-world local LLM stacks by goal: from LM Studio standalone (16 GB RAM, no GPU) to vLLM + Open WebUI for multi-user teams (32 GB RAM + multi-GPU), with Ollama + Open WebUI as the best-balance default at 16 GB RAM.

How This Directory Stays Current

This directory is reviewed every six months — last reviewed June 2026, next refresh November 2026. The June 2026 pass reverified every link and corrected several project names and licences: Faraday is now Backyard AI, the maintained Coqui TTS fork moved to Idiap, and Cherry Studio is AGPL 3.0. Inclusion criteria: project is actively maintained (commits in the last 90 days), has a verifiable open-source licence or a clear commercial-use statement, and either holds meaningful user share in 2026 or fills a layer that would otherwise be empty. Projects that go inactive for more than two release cycles are removed; new entrants that pass the criteria are added at the next review. To suggest a project for inclusion, open an issue or PR against the PromptQuorum repository — include the project URL, licence, and a one-sentence description in the format above.

Sources

Frequently Asked Questions

What is the difference between a local LLM runtime and a desktop app?

A runtime (Ollama, llama.cpp, vLLM) is the engine that loads model weights and serves an API — typically OpenAI-compatible. A desktop app (LM Studio, Jan, GPT4All) is a chat UI that calls a runtime under the hood. Some apps bundle their own runtime (LM Studio embeds llama.cpp), others require you to install a runtime separately (Open WebUI calls Ollama). The runtime decides what is possible; the app decides what is convenient.

Can I use multiple tools from this list at the same time?

Yes — most stacks combine 2-4 tools. A common setup: Ollama as the runtime, Open WebUI for chat, AnythingLLM for document chat, and Continue.dev for coding — all four run against the same Ollama instance on a single machine. The "Common Real-World Stacks" table above lists the recipes that work without conflict.

Which tools work fully offline with no telemetry?

Ollama, llama.cpp, vLLM, Jan, GPT4All, Open WebUI, AnythingLLM, PrivateGPT, Continue.dev, Aider, KoboldCpp, Llamafile, MLX-LM, and most of the AGPL/MIT-licensed apps in this directory work fully offline once the model is downloaded. LM Studio and several closed-source tools have optional analytics that can be disabled in settings — verify by running a packet capture once after install. Browser-based UIs (Open WebUI, LibreChat) are local-only when configured to use a local backend.

Are any of these commercial-licensed (not free for commercial use)?

A handful: LM Studio, Msty, Backyard AI, Layla, and Cursor are closed-source — generally free to use but not redistributable, and commercial terms vary. Private LLM is paid. AGPL-licensed tools (Jan, KoboldCpp, text-generation-webui, SillyTavern, Khoj, Open Interpreter, Copilot for Obsidian) are free for any use including commercial, but the AGPL terms require source disclosure if you modify and host them publicly. Apache 2.0 and MIT projects (the majority) are usable in any context including commercial without attribution constraints beyond the licence text.

Which tools support Apple Silicon (M-series chips) natively?

Ollama, llama.cpp, MLX-LM, LM Studio, Jan, Enchanted, GPT4All, MLC Chat, AnythingLLM, and most Electron/Tauri apps run natively on Apple Silicon and use the Metal backend. MLX-LM is Apple-specific and the fastest for large models on M-series. vLLM, TensorRT-LLM, and ExLlamaV2 are NVIDIA-focused and either do not run or run poorly on Apple Silicon — for Apple users, Ollama with the Metal backend is the default.

Do all these tools support GGUF model format?

GGUF is the native format for llama.cpp and any tool that wraps it (Ollama, LM Studio, Jan, GPT4All, KoboldCpp, Llamafile). vLLM and TensorRT-LLM use their own optimised formats (typically AWQ or FP16) for higher throughput. ExLlamaV2 uses EXL2 quantisation. MLX-LM uses MLX-converted weights. Most listed tools accept GGUF; a few (vLLM, TensorRT-LLM, ExLlamaV2, MLX-LM) require a one-time conversion step from the original Hugging Face weights.

Which tools are best for users with no coding experience?

GPT4All has the simplest install (one click, runs on 8 GB RAM). LM Studio is the most feature-rich without requiring a terminal. Jan is the most privacy-conscious of the no-code options. For document chat without command-line work, AnythingLLM is the easiest. All four are listed in the Desktop GUI Apps category above.

Can I run these tools on a server and access them remotely?

Most server-capable tools (Ollama, vLLM, LocalAI, Open WebUI, LibreChat, PrivateGPT, AnythingLLM) expose an HTTP API and bind to a network interface configurable in settings. Standard pattern: run Ollama on a home server or VPS, run a UI on your laptop or phone pointing at the server's IP. Treat the API like any web service — bind to localhost behind a reverse proxy, or to a private network with proper authentication. Open WebUI ships with multi-user support out of the box.

Which tools support multi-user / team setups?

Open WebUI, LibreChat, h2oGPT, AnythingLLM (with admin features enabled), and Dify are designed for multi-user use, with role-based access and per-user conversation history. vLLM is the right serving layer underneath when concurrent inference matters — it batches requests across users for throughput unattainable on Ollama at concurrency above ~3.

How often does this directory get updated?

Every six months — last reviewed June 2026, the next scheduled refresh is November 2026. Mid-cycle changes (a project goes inactive, a new tool gains meaningful share, a licence changes) get patched into the existing entry. Entirely new categories or layers wait for a refresh to keep the structure stable. The "Sources" section above lists the community indexes used to spot-check what the ecosystem is actually doing between refreshes.

← Back to Power Local LLM