PromptQuorumPromptQuorum
Home/Local LLMs/Install LM Studio: GUI Setup for macOS, Windows & Linux
Getting Started

Install LM Studio: GUI Setup for macOS, Windows & Linux

Β·7 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

LM Studio is a desktop application that lets you browse, download, and run local LLMs through a graphical interface -- no terminal commands required. It runs on macOS, Windows, and Linux, and includes a built-in chat UI and an OpenAI-compatible local server.

LM Studio is a desktop application that lets you browse, download, and run local LLMs through a graphical interface -- no terminal commands required. It runs on macOS, Windows, and Linux, and includes a built-in chat UI and an OpenAI-compatible local server. As of April 2026, LM Studio supports any GGUF-quantized model from Hugging Face.

Key Takeaways

  • Download LM Studio from lmstudio.ai -- available for macOS (Apple Silicon + Intel), Windows, and Linux (AppImage).
  • Minimum: 8 GB RAM. Recommended: 16 GB RAM for 7B models. Apple Silicon Macs use GPU acceleration by default.
  • The built-in model browser searches Hugging Face directly -- download GGUF models without leaving the app.
  • LM Studio includes a built-in chat UI and a local OpenAI-compatible server on port 1234.
  • Best for: beginners who prefer a GUI, users who want to compare multiple models side-by-side, and anyone who wants a complete package without terminal commands.

What Is LM Studio?

LM Studio is a desktop application for running local LLMs. It provides a graphical model browser, a built-in chat interface, and a local API server -- all in one app. Under the hood, it uses llama.cpp for inference, the same engine that powers Ollama.

The key difference from Ollama is that LM Studio is entirely GUI-driven. You browse and download models through the app interface, start chats with a click, and manage model settings with sliders rather than configuration files.

LM Studio is free for personal use. It is developed by LM Studio, Inc. and was released in 2023. As of 2026, it supports NVIDIA CUDA, AMD ROCm, and Apple Metal acceleration.

What Are the System Requirements for LM Studio?

SpecMinimumRecommended
Operating SystemmacOS 13.6, Windows 10, Ubuntu 22.04macOS 14+, Windows 11, Ubuntu 24.04
RAM8 GB16 GB or more
Storage500 MB for app + model space50 GB+ free for multiple models
GPU (optional)NVIDIA GTX 10-series or newerNVIDIA RTX 40/50-series, AMD RX 7000+, or Apple M-series

How Do You Download and Install LM Studio?

  1. 1
    Go to lmstudio.ai and click the download button for your OS.
  2. 2
    macOS: Open the .dmg file and drag LM Studio to Applications. On first launch, approve the security prompt in System Preferences β†’ Privacy & Security.
  3. 3
    Windows: Run the LM-Studio-Setup.exe installer. LM Studio installs to %LOCALAPPDATA%\LM-Studio.
  4. 4
    Linux: Download the .AppImage file. Make it executable with `chmod +x LM-Studio-*.AppImage` and run it. No system installation required.
  5. 5
    On first launch, LM Studio shows a welcome screen and prompts you to download a model.

How Do You Find and Download a Model in LM Studio?

Use the Search tab (magnifying glass icon in the left sidebar) to find models:

  1. 1
    Click the Search tab in the left sidebar.
  2. 2
    Type a model name -- for example "llama 3.1" or "phi-3 mini".
  3. 3
    LM Studio shows matching GGUF models from Hugging Face with file sizes and quantization options.
  4. 4
    Select a quantization level. For 8 GB RAM: choose Q4_K_M (~4.5 GB for a 7B model). For 16 GB RAM: Q5_K_M or Q6_K offer better quality.
  5. 5
    Click the download arrow. Progress shows in the Downloads tab.

How Do You Start Chatting with a Model in LM Studio?

  1. 1
    Click the Chat tab (speech bubble icon) in the left sidebar.
  2. 2
    At the top of the chat window, click the model selector dropdown and choose your downloaded model.
  3. 3
    LM Studio loads the model into memory -- this takes 5-30 seconds depending on model size and hardware.
  4. 4
    Type your message in the input field at the bottom and press Enter or click Send.
  5. 5
    The model's response streams token by token. Generation speed appears in the status bar at the bottom of the window.

How Do You Adjust Model Settings in LM Studio?

The right panel in the Chat tab exposes key inference parameters:

  • Temperature (default 0.8): controls response randomness. Lower values (0.1-0.4) produce more focused, predictable output. Higher values (0.8-1.2) produce more varied, creative output.
  • Context Length (default 4096 tokens): the maximum conversation history the model can process. Longer context uses more RAM. Most 7B models support 4096-8192 tokens.
  • GPU Layers (macOS/Linux/Windows with GPU): how many model layers to offload to the GPU. Set to max for fastest inference if your GPU has enough VRAM.
  • System Prompt: a persistent instruction prepended to every conversation. Use this to set the model's role or behavior.

How Do You Enable the LM Studio Local Server?

LM Studio includes a local server that mimics the OpenAI API. Any application that works with OpenAI can use your local model through this server:

  1. 1
    Click the Local Server tab (the "<->" icon) in the left sidebar.
  2. 2
    Select a model in the model dropdown at the top.
  3. 3
    Click "Start Server". The server starts on http://localhost:1234.
  4. 4
    Your application should set `base_url = "http://localhost:1234/v1"` and any string as the API key (the server accepts any value).

Connect to LM Studio via Python

python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "What is a local LLM?"}]
)
print(response.choices[0].message.content)

LM Studio vs Ollama: Which Should You Use?

FactorLM StudioOllama
InterfaceGraphical desktop appTerminal + API
Model sourceHugging Face (any GGUF model)Ollama library (curated, ~200 models)
API portlocalhost:1234localhost:11434
Model managementGUI browser with file size infoCLI commands (ollama pull, list, rm)
AutomationLimited (GUI-focused)Strong (scripting, Docker, CI)
Best forBeginners, GUI users, model explorationDevelopers, automation, server deployments

Troubleshooting Common LM Studio Issues

LM Studio says "Not enough memory to load model"

The model requires more RAM than is available. Close other applications to free memory, or select a smaller quantization (Q3_K_S instead of Q4_K_M). As a rule: multiply the model file size by 1.2 to estimate the RAM needed. A 4.5 GB file needs ~5.4 GB free RAM.

The model is generating very slowly (under 5 tokens/sec)

The model is running entirely on CPU. Check GPU Layers in the right panel -- if it shows 0, your GPU is not being used. On macOS, LM Studio enables Metal (GPU) automatically for Apple Silicon. On Windows/Linux with NVIDIA, ensure your driver is up to date and increase GPU Layers to the maximum value shown.

I cannot find a specific model in the LM Studio search

LM Studio searches Hugging Face for GGUF files. If a model is not appearing, try searching by the Hugging Face repository name directly (e.g., "bartowski/Llama-3.1-8B-Instruct-GGUF"). Some newer models may not be indexed yet.

The local server returns "model not found" errors

A model must be loaded in the Local Server tab before the server can respond. Open the Local Server tab, select a model from the dropdown, and click Start Server. The model name in API requests can be any string -- LM Studio uses whichever model is currently loaded.

Next Steps After Installing LM Studio

With LM Studio running, try Run Your First Local LLM to understand what response quality and speed to expect. For model recommendations matched to your hardware, see Best Beginner Local LLM Models. If you want to troubleshoot setup issues, see Troubleshooting Local LLM Setup.

Sources

  • LM Studio Official Website -- Downloads and documentation
  • Hugging Face Model Hub -- Full range of GGUF-quantized models
  • LM Studio GitHub -- Source code and community discussions

Common Mistakes When Installing LM Studio

  • Not allocating enough system RAM for the model you selected in LM Studio settings.
  • Using a pre-quantized model that is still too large for your GPU VRAM.
  • Expecting instant responses from large models on CPU-only systems -- response time will be 10-30 seconds.

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each provider's official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Compare your local LLM against 25+ cloud models simultaneously with PromptQuorum.

Join the PromptQuorum Waitlist β†’

← Back to Local LLMs

Install LM Studio: GUI Setup for macOS, Windows & Linux