Key Takeaways
- ALLaM 7B is the best publicly self-hostable Arabic model β built by NCAI/SDAIA (now under HUMAIN), released under Apache 2.0, with GGUF weights that run directly in Ollama and llama.cpp.
- The benchmark gap is real: ALLaM-7B scores 72β74% on AraLingBench, while Qwen variants score 40β62% β a 12β32 percentage-point gap on Arabic linguistic tasks.
- AceGPT (KAUST + CUHKSZ + SRIBD) is a 7B/13B Apache 2.0 alternative, but its last GitHub update was December 2023 β treat it as unmaintained.
- Cultural fidelity β grammatical fluency. Globally trained models can be grammatically correct yet culturally wrong; fine-tuning a multilingual model on Arabic often *improves* MSA quality while *decreasing* dialect accuracy β a documented paradox.
- VRAM quick reference (Q4_K_M): 7B β 6β8 GB, 13B β 10β14 GB, 34B β 20β24 GB, 70B β 40β48 GB.
- ALLaM 34B is proprietary β it powers HUMAIN Chat but has no public weights, so only the 7B is self-hostable today.
- National momentum: Saudi Arabia declared 2026 the Year of Artificial Intelligence, accelerating Arabic model development.
π In One Sentence
ALLaM 7B (Apache 2.0, Ollama-ready) is the leading publicly self-hostable Arabic model, scoring 72β74% on AraLingBench versus 40β62% for Qwen variants.
π¬ In Plain Terms
If you need an Arabic AI you can run on your own computer, ALLaM 7B from Saudi Arabia is the best free option right now. Big global models like Qwen understand Arabic grammar but often miss the culture and dialect.
Why Arabic Cultural Fidelity Matters for Local AI
A model can produce grammatically correct Arabic and still be culturally wrong β and for customer-facing or government work in Saudi Arabia, cultural correctness is what matters.
The benchmark evidence is consistent. On AraLingBench, which tests Arabic morphological and syntactic reasoning, Qwen-family models score 40β62% while Arabic-specialized models like ALLaM-7B score 72β74%. That 12β32 percentage-point gap concentrates in exactly the areas β morphology, syntax, register β where Arabic differs most from the European languages global models are optimized for.
Fine-tuning is not a free fix. Research on the Arabic LLM landscape (arXiv 2506.01340, 2026) documents a paradox: fine-tuning a multilingual model on Arabic data often improves Modern Standard Arabic (MSA) quality while *decreasing* dialect accuracy. You cannot simply bolt Arabic competence onto a global model and expect dialectal fidelity.
Dialect handling is where global models break down most visibly. For smaller open-weight models, strict ISO-code dialect accuracy can fall as low as 0.016β0.078 β meaning the model produces fluent Arabic in the *wrong* dialect. The AraDiCE benchmark (COLING 2025) finds Arabic-specific models outperform multilingual ones on dialect, though significant challenges in dialect identification and generation persist across all models.
Cultural and religious context is a documented weak point. The same survey notes that Western-centric or multilingual training data "introduces cultural biases that can misalign models with the values and expectations of Arabic-speaking communities" β affecting how a model frames Islamic topics, formal address, and social conventions.
Grammatical gender agreement is a known, persistent challenge: Arabic applies gender agreement to verbs, adjectives, and pronouns in ways that differ structurally from European languages, and globally trained models routinely get this subtly wrong.
The business implication for Saudi deployments: if your use case is customer-facing Arabic content, formal correspondence, or anything touching cultural or religious context, an Arabic-specialized model is worth the trade-off β and the MSA-versus-Gulf-dialect distinction should be an explicit part of your model selection.
Saudi and Arabic Local Models: ALLaM, AceGPT, and Multilingual Alternatives
ALLaM 7B is the recommended starting point for self-hosted Arabic AI; the table below summarizes the realistic options.
ALLaM was built by the National Center for AI (NCAI) at SDAIA in partnership with IBM, and is now commercialized through HUMAIN β a Public Investment Fund-owned AI company launched in May 2025. The family spans 7B, 13B, 34B, and 70B variants, but only the 7B Instruct is publicly available (Apache 2.0, with nine GGUF quantizations on Hugging Face). The 34B that powers HUMAIN Chat is proprietary with no public weights.
AceGPT is a joint project of KAUST, the Chinese University of Hong Kong Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD) β not a KAUST-only model. It offers 7B and 13B variants (base and chat) built on LLaMA-2, under Apache 2.0. At its 2023 launch it outperformed Jais on Arabic tasks, but its last GitHub update was December 2023, so treat it as unmaintained.
Qwen2.5 is the strongest multilingual alternative for broad language coverage, but as the benchmarks show, it trails Arabic-specialized models on cultural and dialectal tasks despite its larger ecosystem.
Jais (13B/70B) is included for completeness, but note it is UAE-origin (Core42/G42, Abu Dhabi), not Saudi. It remains competitive on Arabic dialect tasks and is Apache 2.0.
| Model | Parameters | VRAM (Q4_K_M) | License | Ollama | Arabic Score |
|---|---|---|---|---|---|
| ALLaM 7B | 7B | 6β8 GB | Apache 2.0 | Yes (GGUF) | 72β74% (AraLingBench) |
| ALLaM 34B | 34B | ~20 GB | Proprietary | No (no public weights) | Not publicly benchmarked |
| AceGPT 7B | 7B | 6β8 GB | Apache 2.0 | Community port | Strong at launch (2023) |
| AceGPT 13B | 13B | 10β14 GB | Apache 2.0 | Community port | Strong at launch (2023) |
| Qwen2.5 7B | 7B | 6β8 GB | Apache 2.0 | Yes | 40β62% (AraLingBench) |
| Qwen2.5 72B | 72B | 40β48 GB | Apache 2.0 | Yes | Higher, but cultural gaps remain |
| Jais 13B (UAE) | 13B | 10β14 GB | Apache 2.0 | Limited | Competitive on dialect |
Running ALLaM 7B Locally with Ollama
ALLaM 7B ships as GGUF quantizations on Hugging Face, so you can run it in Ollama with a one-line Modelfile. Follow these steps.
- Alternative β llama.cpp directly: llama-cli -m ALLaM-7B-Instruct-Q4_K_M.gguf --chat-template chatml -p "Ψ£ΩΩ Ω Ψ§ΩΨ¬Ω ΩΨ© Ψ§ΩΨͺΨ§ΩΩΨ©:" for maximum control over context length and sampling.
- AceGPT via community port: ollama run salmatrafi/acegpt pulls the community-maintained AceGPT port if you want to compare.
- Minimum hardware: an 8 GB VRAM GPU (RTX 3070/4060 or better) or Apple Silicon with 16 GB unified memory. Size larger models with the VRAM Calculator.
- 1Download the GGUF from Hugging Face
Why it matters: Visit humain-ai/ALLaM-7B-Instruct-preview on Hugging Face, browse the quantizations, and download ALLaM-7B-Instruct-Q4_K_M.gguf (recommended, ~4.5 GB) β the best quality-to-size balance for an 8 GB GPU. - 2Install Ollama
Why it matters: Download Ollama from ollama.com for your OS. You need roughly 8 GB of VRAM on an NVIDIA GPU, or 16 GB of unified memory on Apple Silicon, to run a 7B model comfortably. - 3Create a Modelfile
Why it matters: Create a plain text file named Modelfile containing a single line: FROM ./ALLaM-7B-Instruct-Q4_K_M.gguf β this tells Ollama where to find the weights. - 4Register the model with Ollama
Why it matters: Run: ollama create allam-7b -f Modelfile. Ollama imports the GGUF and makes it available as a named model you can call repeatedly. - 5Run inference in Arabic
Why it matters: Run: ollama run allam-7b "Ψ§Ψ΄Ψ±Ψ Ω ΩΩΩΩ Ψ§ΩΨ°ΩΨ§Ψ‘ Ψ§ΩΨ§Ψ΅Ψ·ΩΨ§ΨΉΩ Ψ§ΩΩ ΨΩΩ" (Explain the concept of local AI). The model responds in Modern Standard Arabic. - 6Verify and steer the Arabic output
Why it matters: If the model replies in English, add a system prompt such as "Ψ£Ψ¬Ψ¨ Ψ―Ψ§Ψ¦Ω Ψ§Ω Ψ¨Ψ§ΩΩΨΊΨ© Ψ§ΩΨΉΨ±Ψ¨ΩΨ© Ψ§ΩΩΨ΅ΨΩ" (Always respond in Modern Standard Arabic) to lock the register and language.
How to Self-Evaluate Arabic Model Quality
Benchmarks are a starting point, but you should test any Arabic model against your own domain before deploying. Use these checks.
- MSA vs. dialect consistency: send the same prompt in Modern Standard Arabic and in Gulf dialect, and check whether the model holds register and meaning across both.
- Cultural context test: ask about Saudi cultural practices, Islamic finance principles, or formal addressing conventions β and check whether the framing is appropriate, not just grammatically valid.
- Gender agreement test: ask the model to describe a female doctor and a male engineer, and verify correct Arabic grammatical gender agreement on verbs, adjectives, and pronouns.
- Formality calibration: request a formal letter and then a casual message β a good model adjusts register; a weak one uses the same tone for both.
- Benchmark proxies: use AraLingBench (morphological and syntactic reasoning) and AraDiCE (cultural awareness and dialect) as published reference points when comparing models.
- Red flags: Latin-script responses to Arabic prompts, the wrong dialect register, or culturally inappropriate framing of religious topics all signal a poor fit.
- Practical rule: for any customer-facing Arabic use case, test with at least 20 domain-specific prompts before you deploy β benchmark scores do not capture your specific content.
Frequently Asked Questions: Arabic Local LLMs
What is ALLaM and who created it?
ALLaM is a family of Arabic language models built by the National Center for AI (NCAI) at SDAIA in partnership with IBM, and now commercialized through HUMAIN, a Public Investment Fund-owned AI company. The 7B Instruct version is publicly available under Apache 2.0; larger 13B, 34B, and 70B variants exist, but only the 7B has open weights.
Can I run ALLaM locally?
Yes β the ALLaM 7B Instruct model has GGUF quantizations on Hugging Face that run directly in Ollama and llama.cpp on a GPU with about 8 GB of VRAM or Apple Silicon with 16 GB unified memory. The 34B that powers HUMAIN Chat is proprietary and cannot be self-hosted.
What is AceGPT and is it still maintained?
AceGPT is an Arabic model jointly developed by KAUST, CUHKSZ, and SRIBD, offering 7B and 13B variants under Apache 2.0. It outperformed Jais at its 2023 launch, but its last GitHub update was December 2023, so it appears unmaintained β usable, but not actively improved.
How does ALLaM compare to Qwen on Arabic?
On AraLingBench, ALLaM-7B scores 72β74% versus 40β62% for Qwen variants β a 12β32 percentage-point gap on Arabic linguistic tasks. Qwen has a larger ecosystem and broader multilingual coverage, but ALLaM is stronger on Arabic-specific morphology, syntax, and cultural tasks.
Why do multilingual models struggle with Arabic?
They are typically grammatically fluent but culturally and dialectally weak. Strict dialect accuracy can fall to 0.016β0.078 for smaller models, and fine-tuning a multilingual model on Arabic often improves MSA quality while decreasing dialect accuracy β a documented paradox. Western-centric training data also introduces cultural biases in how models handle Islamic and social context.
What VRAM do I need for a 7B Arabic model?
About 6β8 GB of VRAM at Q4_K_M quantization, with 8 GB or more recommended for comfortable performance. A 13B model needs 10β14 GB, a 34B around 20β24 GB, and a 70B around 40β48 GB.
Is Jais a Saudi model?
No β Jais is UAE-origin, developed by Core42/G42 in Abu Dhabi, not by a Saudi institution. It is included here because it is a capable, Apache 2.0 Arabic model competitive on dialect tasks, but it is not part of the Saudi (ALLaM/AceGPT) lineage.
Should I use ALLaM 34B or 7B?
For local deployment, use the 7B β the 34B is proprietary and not self-hostable. Start with ALLaM 7B on your own hardware, and if you need the 34B's capability, access it through the HUMAIN Chat product rather than expecting downloadable weights.
How do I test if a model handles Saudi Arabic correctly?
Run MSA-versus-dialect consistency prompts, ask about Saudi cultural practices and Islamic finance, and test grammatical gender agreement (e.g., describing a female doctor and a male engineer). Watch for Latin-script replies, wrong dialect register, or culturally inappropriate framing, and validate with at least 20 domain-specific prompts before deploying.
What is HUMAIN?
HUMAIN is a Saudi AI company fully owned by the Public Investment Fund, launched in May 2025. It commercializes ALLaM and operates HUMAIN Chat. It is a separate entity from SDAIA but inherited the ALLaM models from SDAIA's National Center for AI; Aramco later took a minority stake.
Sources
- Hugging Face β humain-ai/ALLaM-7B-Instruct-preview (model card, GGUF quantizations) β huggingface.co
- AraLingBench β Arabic linguistic benchmark (arXiv 2511.14295) β arxiv.org
- Landscape of Arabic LLMs β survey (arXiv 2506.01340) β arxiv.org
- AraDiCE β Arabic dialect and cultural evaluation, COLING 2025 (arXiv 2409.11404) β arxiv.org
- HUMAIN Chat launch on ALLaM 34B β Middle East AI News β middleeastainews.com
- Saudi Cabinet β 2026 declared the Year of Artificial Intelligence β spa.gov.sa