You cannot directly convert Ollama models to MLX. Instead, download the original GGUF or SafeTensors weights from Hugging Face, then convert with mlx-lm convert. For most popular models (Llama 3, Qwen, Mistral), pre-converted MLX versions already exist on Hugging Face under the mlx-community organization.
βΈYou cannot convert Ollama models directly β the model format is different
βΈPre-converted MLX models exist at huggingface.co/mlx-community for most popular models
βΈTo convert yourself: download from Hugging Face, then run mlx_lm.convert
Updated: 2026-05
Tool ComparisonsIntermediate
Key Takeaways
βOllama stores models in its own internal format at ~/.ollama/models β you cannot import these directly into MLX
βThe mlx-community organization on Hugging Face has pre-converted MLX versions of Llama 3, Qwen, Mistral, Phi, Gemma, and many others β check there first before converting
βIf a pre-converted version does not exist, download the original SafeTensors weights from Hugging Face and run mlx_lm.convert β quantization is applied during conversion
Step 1: Check for a Pre-Converted MLX Model
Before converting anything, visit huggingface.co/mlx-community. The community maintains hundreds of models already converted and quantized for MLX. Search by model name β if it exists there, installing it takes one command and no conversion.
If a pre-converted version exists, run the model directly with mlx-lm:
Step 2: Convert a Model Yourself (if not pre-converted)
If the model you want is not in mlx-community, download the original SafeTensors weights from the model author's Hugging Face repo (not from mlx-community), then run the converter. The -q flag applies 4-bit quantization during conversion:
Conversion takes 2β10 minutes depending on model size. The output is a directory of .safetensors shards plus an mlx-compatible tokenizer config.
Can I export a model from Ollama and import it into MLX?βΎ
No. Ollama stores models in its own internal format in ~/.ollama/models. This format is not directly readable by mlx-lm. You need the original SafeTensors or GGUF weights from Hugging Face to use as the conversion source.
Does mlx-lm support GGUF files as conversion input?βΎ
As of early 2026, mlx-lm.convert primarily targets SafeTensors (the standard Hugging Face format). If you only have a GGUF file, use a GGUF-to-SafeTensors conversion tool first, or look for the original SafeTensors weights on the model's Hugging Face page.
Which models have pre-converted MLX versions?βΎ
The mlx-community organization covers most major models: Llama 3, Qwen 2.5, Mistral, Phi-3/4, Gemma 2, and many fine-tunes. Both 4-bit and 8-bit quantized versions are usually available. Visit huggingface.co/mlx-community and search by model family name.
What quantization should I use when converting to MLX?βΎ
For most 7Bβ14B models on 16 GB unified memory, use 4-bit quantization (the default with the -q flag). For a 7B model, this produces a ~4 GB model that runs well on M1/M2/M3/M4 chips. Use 8-bit only if you have 32 GB or more and need higher output quality.