Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/ALLaM, AceGPT & the Best Saudi Arabic Local LLMs (2026)
Best Models

ALLaM, AceGPT & the Best Saudi Arabic Local LLMs (2026)

Β·11 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

For Arabic-language local AI in Saudi Arabia, ALLaM 7B (HUMAIN/NCAI, Apache 2.0) is the leading publicly available model β€” scoring 72–74% on AraLingBench versus 40–62% for Qwen variants β€” and it runs locally via Ollama using its GGUF weights. AceGPT 7B/13B (KAUST + CUHKSZ) is an alternative but appears unmaintained since December 2023.

Saudi Arabia's AI-first ambitions β€” including HUMAIN's ALLaM models and 2026's official Year of Artificial Intelligence β€” are producing a new generation of Arabic-native local LLMs. But choosing a model for Arabic workloads is not just a parameter-count question: multilingual models from global vendors score significantly lower than Arabic-specialized models on cultural and dialectal tasks, despite being grammatically fluent. This guide covers ALLaM (HUMAIN/NCAI), AceGPT (KAUST + CUHKSZ), and the top multilingual alternatives β€” with verified benchmark data, VRAM requirements, and a step-by-step guide to running ALLaM locally with Ollama.

Key Takeaways

  • ALLaM 7B is the best publicly self-hostable Arabic model β€” built by NCAI/SDAIA (now under HUMAIN), released under Apache 2.0, with GGUF weights that run directly in Ollama and llama.cpp.
  • The benchmark gap is real: ALLaM-7B scores 72–74% on AraLingBench, while Qwen variants score 40–62% β€” a 12–32 percentage-point gap on Arabic linguistic tasks.
  • AceGPT (KAUST + CUHKSZ + SRIBD) is a 7B/13B Apache 2.0 alternative, but its last GitHub update was December 2023 β€” treat it as unmaintained.
  • Cultural fidelity β‰  grammatical fluency. Globally trained models can be grammatically correct yet culturally wrong; fine-tuning a multilingual model on Arabic often *improves* MSA quality while *decreasing* dialect accuracy β€” a documented paradox.
  • VRAM quick reference (Q4_K_M): 7B β‰ˆ 6–8 GB, 13B β‰ˆ 10–14 GB, 34B β‰ˆ 20–24 GB, 70B β‰ˆ 40–48 GB.
  • ALLaM 34B is proprietary β€” it powers HUMAIN Chat but has no public weights, so only the 7B is self-hostable today.
  • National momentum: Saudi Arabia declared 2026 the Year of Artificial Intelligence, accelerating Arabic model development.

πŸ“ In One Sentence

ALLaM 7B (Apache 2.0, Ollama-ready) is the leading publicly self-hostable Arabic model, scoring 72–74% on AraLingBench versus 40–62% for Qwen variants.

πŸ’¬ In Plain Terms

If you need an Arabic AI you can run on your own computer, ALLaM 7B from Saudi Arabia is the best free option right now. Big global models like Qwen understand Arabic grammar but often miss the culture and dialect.

Why Arabic Cultural Fidelity Matters for Local AI

A model can produce grammatically correct Arabic and still be culturally wrong β€” and for customer-facing or government work in Saudi Arabia, cultural correctness is what matters.

The benchmark evidence is consistent. On AraLingBench, which tests Arabic morphological and syntactic reasoning, Qwen-family models score 40–62% while Arabic-specialized models like ALLaM-7B score 72–74%. That 12–32 percentage-point gap concentrates in exactly the areas β€” morphology, syntax, register β€” where Arabic differs most from the European languages global models are optimized for.

Fine-tuning is not a free fix. Research on the Arabic LLM landscape (arXiv 2506.01340, 2026) documents a paradox: fine-tuning a multilingual model on Arabic data often improves Modern Standard Arabic (MSA) quality while *decreasing* dialect accuracy. You cannot simply bolt Arabic competence onto a global model and expect dialectal fidelity.

Dialect handling is where global models break down most visibly. For smaller open-weight models, strict ISO-code dialect accuracy can fall as low as 0.016–0.078 β€” meaning the model produces fluent Arabic in the *wrong* dialect. The AraDiCE benchmark (COLING 2025) finds Arabic-specific models outperform multilingual ones on dialect, though significant challenges in dialect identification and generation persist across all models.

Cultural and religious context is a documented weak point. The same survey notes that Western-centric or multilingual training data "introduces cultural biases that can misalign models with the values and expectations of Arabic-speaking communities" β€” affecting how a model frames Islamic topics, formal address, and social conventions.

Grammatical gender agreement is a known, persistent challenge: Arabic applies gender agreement to verbs, adjectives, and pronouns in ways that differ structurally from European languages, and globally trained models routinely get this subtly wrong.

The business implication for Saudi deployments: if your use case is customer-facing Arabic content, formal correspondence, or anything touching cultural or religious context, an Arabic-specialized model is worth the trade-off β€” and the MSA-versus-Gulf-dialect distinction should be an explicit part of your model selection.

Saudi and Arabic Local Models: ALLaM, AceGPT, and Multilingual Alternatives

ALLaM 7B is the recommended starting point for self-hosted Arabic AI; the table below summarizes the realistic options.

ALLaM was built by the National Center for AI (NCAI) at SDAIA in partnership with IBM, and is now commercialized through HUMAIN β€” a Public Investment Fund-owned AI company launched in May 2025. The family spans 7B, 13B, 34B, and 70B variants, but only the 7B Instruct is publicly available (Apache 2.0, with nine GGUF quantizations on Hugging Face). The 34B that powers HUMAIN Chat is proprietary with no public weights.

AceGPT is a joint project of KAUST, the Chinese University of Hong Kong Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD) β€” not a KAUST-only model. It offers 7B and 13B variants (base and chat) built on LLaMA-2, under Apache 2.0. At its 2023 launch it outperformed Jais on Arabic tasks, but its last GitHub update was December 2023, so treat it as unmaintained.

Qwen2.5 is the strongest multilingual alternative for broad language coverage, but as the benchmarks show, it trails Arabic-specialized models on cultural and dialectal tasks despite its larger ecosystem.

Jais (13B/70B) is included for completeness, but note it is UAE-origin (Core42/G42, Abu Dhabi), not Saudi. It remains competitive on Arabic dialect tasks and is Apache 2.0.

ModelParametersVRAM (Q4_K_M)LicenseOllamaArabic Score
ALLaM 7B7B6–8 GBApache 2.0Yes (GGUF)72–74% (AraLingBench)
ALLaM 34B34B~20 GBProprietaryNo (no public weights)Not publicly benchmarked
AceGPT 7B7B6–8 GBApache 2.0Community portStrong at launch (2023)
AceGPT 13B13B10–14 GBApache 2.0Community portStrong at launch (2023)
Qwen2.5 7B7B6–8 GBApache 2.0Yes40–62% (AraLingBench)
Qwen2.5 72B72B40–48 GBApache 2.0YesHigher, but cultural gaps remain
Jais 13B (UAE)13B10–14 GBApache 2.0LimitedCompetitive on dialect

Running ALLaM 7B Locally with Ollama

ALLaM 7B ships as GGUF quantizations on Hugging Face, so you can run it in Ollama with a one-line Modelfile. Follow these steps.

  • Alternative β€” llama.cpp directly: llama-cli -m ALLaM-7B-Instruct-Q4_K_M.gguf --chat-template chatml -p "Ψ£ΩƒΩ…Ω„ Ψ§Ω„Ψ¬Ω…Ω„Ψ© Ψ§Ω„ΨͺΨ§Ω„ΩŠΨ©:" for maximum control over context length and sampling.
  • AceGPT via community port: ollama run salmatrafi/acegpt pulls the community-maintained AceGPT port if you want to compare.
  • Minimum hardware: an 8 GB VRAM GPU (RTX 3070/4060 or better) or Apple Silicon with 16 GB unified memory. Size larger models with the VRAM Calculator.
  1. 1
    Download the GGUF from Hugging Face
    Why it matters: Visit humain-ai/ALLaM-7B-Instruct-preview on Hugging Face, browse the quantizations, and download ALLaM-7B-Instruct-Q4_K_M.gguf (recommended, ~4.5 GB) β€” the best quality-to-size balance for an 8 GB GPU.
  2. 2
    Install Ollama
    Why it matters: Download Ollama from ollama.com for your OS. You need roughly 8 GB of VRAM on an NVIDIA GPU, or 16 GB of unified memory on Apple Silicon, to run a 7B model comfortably.
  3. 3
    Create a Modelfile
    Why it matters: Create a plain text file named Modelfile containing a single line: FROM ./ALLaM-7B-Instruct-Q4_K_M.gguf β€” this tells Ollama where to find the weights.
  4. 4
    Register the model with Ollama
    Why it matters: Run: ollama create allam-7b -f Modelfile. Ollama imports the GGUF and makes it available as a named model you can call repeatedly.
  5. 5
    Run inference in Arabic
    Why it matters: Run: ollama run allam-7b "Ψ§Ψ΄Ψ±Ψ­ Ω…ΩΩ‡ΩˆΩ… Ψ§Ω„Ψ°ΩƒΨ§Ψ‘ Ψ§Ω„Ψ§Ψ΅Ψ·Ω†Ψ§ΨΉΩŠ Ψ§Ω„Ω…Ψ­Ω„ΩŠ" (Explain the concept of local AI). The model responds in Modern Standard Arabic.
  6. 6
    Verify and steer the Arabic output
    Why it matters: If the model replies in English, add a system prompt such as "Ψ£Ψ¬Ψ¨ Ψ―Ψ§Ψ¦Ω…Ψ§Ω‹ Ψ¨Ψ§Ω„Ω„ΨΊΨ© Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ© الفءحى" (Always respond in Modern Standard Arabic) to lock the register and language.

How to Self-Evaluate Arabic Model Quality

Benchmarks are a starting point, but you should test any Arabic model against your own domain before deploying. Use these checks.

  • MSA vs. dialect consistency: send the same prompt in Modern Standard Arabic and in Gulf dialect, and check whether the model holds register and meaning across both.
  • Cultural context test: ask about Saudi cultural practices, Islamic finance principles, or formal addressing conventions β€” and check whether the framing is appropriate, not just grammatically valid.
  • Gender agreement test: ask the model to describe a female doctor and a male engineer, and verify correct Arabic grammatical gender agreement on verbs, adjectives, and pronouns.
  • Formality calibration: request a formal letter and then a casual message β€” a good model adjusts register; a weak one uses the same tone for both.
  • Benchmark proxies: use AraLingBench (morphological and syntactic reasoning) and AraDiCE (cultural awareness and dialect) as published reference points when comparing models.
  • Red flags: Latin-script responses to Arabic prompts, the wrong dialect register, or culturally inappropriate framing of religious topics all signal a poor fit.
  • Practical rule: for any customer-facing Arabic use case, test with at least 20 domain-specific prompts before you deploy β€” benchmark scores do not capture your specific content.

Frequently Asked Questions: Arabic Local LLMs

What is ALLaM and who created it?

ALLaM is a family of Arabic language models built by the National Center for AI (NCAI) at SDAIA in partnership with IBM, and now commercialized through HUMAIN, a Public Investment Fund-owned AI company. The 7B Instruct version is publicly available under Apache 2.0; larger 13B, 34B, and 70B variants exist, but only the 7B has open weights.

Can I run ALLaM locally?

Yes β€” the ALLaM 7B Instruct model has GGUF quantizations on Hugging Face that run directly in Ollama and llama.cpp on a GPU with about 8 GB of VRAM or Apple Silicon with 16 GB unified memory. The 34B that powers HUMAIN Chat is proprietary and cannot be self-hosted.

What is AceGPT and is it still maintained?

AceGPT is an Arabic model jointly developed by KAUST, CUHKSZ, and SRIBD, offering 7B and 13B variants under Apache 2.0. It outperformed Jais at its 2023 launch, but its last GitHub update was December 2023, so it appears unmaintained β€” usable, but not actively improved.

How does ALLaM compare to Qwen on Arabic?

On AraLingBench, ALLaM-7B scores 72–74% versus 40–62% for Qwen variants β€” a 12–32 percentage-point gap on Arabic linguistic tasks. Qwen has a larger ecosystem and broader multilingual coverage, but ALLaM is stronger on Arabic-specific morphology, syntax, and cultural tasks.

Why do multilingual models struggle with Arabic?

They are typically grammatically fluent but culturally and dialectally weak. Strict dialect accuracy can fall to 0.016–0.078 for smaller models, and fine-tuning a multilingual model on Arabic often improves MSA quality while decreasing dialect accuracy β€” a documented paradox. Western-centric training data also introduces cultural biases in how models handle Islamic and social context.

What VRAM do I need for a 7B Arabic model?

About 6–8 GB of VRAM at Q4_K_M quantization, with 8 GB or more recommended for comfortable performance. A 13B model needs 10–14 GB, a 34B around 20–24 GB, and a 70B around 40–48 GB.

Is Jais a Saudi model?

No β€” Jais is UAE-origin, developed by Core42/G42 in Abu Dhabi, not by a Saudi institution. It is included here because it is a capable, Apache 2.0 Arabic model competitive on dialect tasks, but it is not part of the Saudi (ALLaM/AceGPT) lineage.

Should I use ALLaM 34B or 7B?

For local deployment, use the 7B β€” the 34B is proprietary and not self-hostable. Start with ALLaM 7B on your own hardware, and if you need the 34B's capability, access it through the HUMAIN Chat product rather than expecting downloadable weights.

How do I test if a model handles Saudi Arabic correctly?

Run MSA-versus-dialect consistency prompts, ask about Saudi cultural practices and Islamic finance, and test grammatical gender agreement (e.g., describing a female doctor and a male engineer). Watch for Latin-script replies, wrong dialect register, or culturally inappropriate framing, and validate with at least 20 domain-specific prompts before deploying.

What is HUMAIN?

HUMAIN is a Saudi AI company fully owned by the Public Investment Fund, launched in May 2025. It commercializes ALLaM and operates HUMAIN Chat. It is a separate entity from SDAIA but inherited the ALLaM models from SDAIA's National Center for AI; Aramco later took a minority stake.

Sources

  • Hugging Face β€” humain-ai/ALLaM-7B-Instruct-preview (model card, GGUF quantizations) β€” huggingface.co
  • AraLingBench β€” Arabic linguistic benchmark (arXiv 2511.14295) β€” arxiv.org
  • Landscape of Arabic LLMs β€” survey (arXiv 2506.01340) β€” arxiv.org
  • AraDiCE β€” Arabic dialect and cultural evaluation, COLING 2025 (arXiv 2409.11404) β€” arxiv.org
  • HUMAIN Chat launch on ALLaM 34B β€” Middle East AI News β€” middleeastainews.com
  • Saudi Cabinet β€” 2026 declared the Year of Artificial Intelligence β€” spa.gov.sa

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each provider’s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both β€” you pick the backend.

Join the PromptQuorum Waitlist β†’

← Back to Local LLMs

ALLaM, AceGPT & Best Arabic Local LLMs 2026 | PromptQuorum