PromptQuorumPromptQuorum

How to Convert Ollama Models to MLX Format

Quick Answer

You cannot directly convert Ollama models to MLX. Instead, download the original GGUF or SafeTensors weights from Hugging Face, then convert with mlx-lm convert. For most popular models (Llama 3, Qwen, Mistral), pre-converted MLX versions already exist on Hugging Face under the mlx-community organization.

  • β–ΈYou cannot convert Ollama models directly β€” the model format is different
  • β–ΈPre-converted MLX models exist at huggingface.co/mlx-community for most popular models
  • β–ΈTo convert yourself: download from Hugging Face, then run mlx_lm.convert

Updated: 2026-05

Tool ComparisonsIntermediate

Key Takeaways

  • βœ“Ollama stores models in its own internal format at ~/.ollama/models β€” you cannot import these directly into MLX
  • βœ“The mlx-community organization on Hugging Face has pre-converted MLX versions of Llama 3, Qwen, Mistral, Phi, Gemma, and many others β€” check there first before converting
  • βœ“If a pre-converted version does not exist, download the original SafeTensors weights from Hugging Face and run mlx_lm.convert β€” quantization is applied during conversion

Step 1: Check for a Pre-Converted MLX Model

Before converting anything, visit huggingface.co/mlx-community. The community maintains hundreds of models already converted and quantized for MLX. Search by model name β€” if it exists there, installing it takes one command and no conversion.

If a pre-converted version exists, run the model directly with mlx-lm:

pip install mlx-lm
mlx_lm.generate --model mlx-community/Meta-Llama-3-8B-4bit --prompt "Hello"

Step 2: Convert a Model Yourself (if not pre-converted)

If the model you want is not in mlx-community, download the original SafeTensors weights from the model author's Hugging Face repo (not from mlx-community), then run the converter. The -q flag applies 4-bit quantization during conversion:

Conversion takes 2–10 minutes depending on model size. The output is a directory of .safetensors shards plus an mlx-compatible tokenizer config.

pip install mlx-lm
mlx_lm.convert --hf-path original-org/model-name --mlx-path ./converted-model -q

Quick Answers About MLX Model Conversion

Can I export a model from Ollama and import it into MLX?β–Ύ
No. Ollama stores models in its own internal format in ~/.ollama/models. This format is not directly readable by mlx-lm. You need the original SafeTensors or GGUF weights from Hugging Face to use as the conversion source.
Does mlx-lm support GGUF files as conversion input?β–Ύ
As of early 2026, mlx-lm.convert primarily targets SafeTensors (the standard Hugging Face format). If you only have a GGUF file, use a GGUF-to-SafeTensors conversion tool first, or look for the original SafeTensors weights on the model's Hugging Face page.
Which models have pre-converted MLX versions?β–Ύ
The mlx-community organization covers most major models: Llama 3, Qwen 2.5, Mistral, Phi-3/4, Gemma 2, and many fine-tunes. Both 4-bit and 8-bit quantized versions are usually available. Visit huggingface.co/mlx-community and search by model family name.
What quantization should I use when converting to MLX?β–Ύ
For most 7B–14B models on 16 GB unified memory, use 4-bit quantization (the default with the -q flag). For a 7B model, this produces a ~4 GB model that runs well on M1/M2/M3/M4 chips. Use 8-bit only if you have 32 GB or more and need higher output quality.