Key Takeaways
- jina-embeddings-v3 wins overall accuracy β 92% retrieval@10 across 4 document types, with the smallest variance across English, multilingual, and code corpora.
- bge-large-en-v1.5 wins on English-only content β 91% on legal contracts and research papers, but drops to 79% on multilingual text. Use it when the corpus is English and accuracy beats throughput.
- nomic-embed-text-v2 wins CPU throughput β 580 chunks/sec on a modern CPU, ~5Γ faster than the 1,024-dim alternatives. The right pick when no GPU is available.
- Larger dimensions help only up to ~1,024. Beyond that, recall gains are below 1 percentage point and storage doubles. The Matryoshka models (jina-embeddings-v3, nomic-embed-text-v2) let you truncate without re-embedding.
- Code retrieval is the hardest task. All six models lose 5β10 points on a TypeScript/Python codebase compared to natural-language documents. None of the six is a real "code embedder" β for code-heavy corpora, consider a code-specific model.
- Multilingual support is not free. English-only embedders (bge-large-en-v1.5, gte-large, mxbai-embed-large-v1) drop 10β15 points on mixed-language text. For German/French/Japanese/Chinese documents, use jina-embeddings-v3, nomic-embed-text-v2, or BAAI/bge-m3.
- Switching embedders forces a full re-index in every local RAG platform tested. Budget 30β90 minutes per 5,000 pages on consumer hardware and plan the swap accordingly.
How Do the 6 Embedding Models Compare in 2026?
Tested on 4 document types (legal contracts, research papers, source code, multilingual enterprise wiki) using 100 graded queries per model. Hardware: NVIDIA RTX 4070 (12 GB VRAM) for GPU numbers; Apple M3 Pro (18 GB unified memory) for CPU numbers. Chunk size 256 tokens, batch size 32. Numbers are medians of three runs.
| Model | Dim | Speed (CPU) | Speed (GPU) | Memory | retrieval@10 | Multilingual | Best for |
|---|---|---|---|---|---|---|---|
| nomic-embed-text-v2 | 768 | 580 chunks/s | 4,800 chunks/s | 1.2 GB | 88% | 100+ langs (MoE) | CPU-only deployments, mid-range hardware |
| bge-large-en-v1.5 | 1,024 | 95 chunks/s | 1,400 chunks/s | 2.4 GB | 91% (Eng) / 79% (multi) | English-only | English-only accuracy-critical RAG |
| gte-large | 1,024 | 110 chunks/s | 1,600 chunks/s | 2.2 GB | 90% (Eng) / 78% (multi) | English-focused | Apache-2.0 licensed deployments |
| mxbai-embed-large-v1 | 1,024 | 105 chunks/s | 1,500 chunks/s | 2.1 GB | 89% (Eng) / 80% (multi) | English-focused | Balanced English RAG with permissive license |
| snowflake-arctic-embed-l-v2.0 | 1,024 | 130 chunks/s | 1,800 chunks/s | 1.9 GB | 87% (Eng) / 86% (multi) | ~30 langs | Long-context (8k token) chunks, multilingual |
| jina-embeddings-v3 | 1,024 (Matryoshka β 256) | 220 chunks/s | 3,200 chunks/s | 2.0 GB | 92% (Eng) / 89% (multi) | 89 langs | Default-beating pick for most local RAG |
Which Embedding Model Should You Pick?
The right model depends on three things: whether you have a GPU, whether the corpus is English-only, and whether you ever expect to swap dimensions later. Use this decision shortcut:
| Your situation | Pick |
|---|---|
| Mixed-language corpus, GPU available, want best overall accuracy | jina-embeddings-v3 |
| English-only legal or research, GPU available, accuracy critical | bge-large-en-v1.5 |
| CPU-only laptop, want acceptable accuracy without GPU | nomic-embed-text-v2 |
| Need permissive Apache-2.0 license for commercial product | gte-large or mxbai-embed-large-v1 |
| Long documents (8k+ token chunks) and multilingual | snowflake-arctic-embed-l-v2.0 |
| Want flexibility to truncate dimensions later (storage cost control) | jina-embeddings-v3 (Matryoshka) |
| Code-heavy corpus (TypeScript, Python, Rust) | None of the six β use a code-specific embedder |
| Multilingual is the dominant requirement, GPU available | BAAI/bge-m3 (not in this benchmark, dedicated multilingual) |
How We Tested 6 Embedding Models on 4 Document Types
Same chunks, same query set, same retrieval pipeline. The only variable is the embedder. All numbers below come from this single controlled run.
- Hardware: NVIDIA RTX 4070 (12 GB VRAM, 32 GB system RAM) on Windows 11 for GPU numbers; Apple M3 Pro (18 GB unified memory, no discrete GPU) for CPU numbers. Each run was repeated three times; numbers reported are medians.
- Corpus: four document sets, ~1,200 pages each. Set 1 β commercial real-estate leases and master service agreements (legal). Set 2 β transformer and retrieval research papers from arXiv (research). Set 3 β TypeScript and Python source from a public Next.js codebase (code). Set 4 β internal engineering wiki exports in English, German, French, Japanese, and Chinese (multilingual).
- Chunking: fixed 256 tokens with 32-token overlap. Same chunker across all models so chunk boundaries are identical and only the embedding step varies.
- Vector store: Qdrant 1.x in local mode, cosine similarity, top-K=10. Identical configuration for all six models. Re-indexing was performed cleanly between runs.
- Query set: 100 queries β 25 per document type β written by domain readers and graded blind against a known answer key. retrieval@10 = % of queries where the gold-standard chunk appears in top-10 results.
- Speed measurement: chunks/sec at batch size 32 over a warm-up of 1,000 chunks plus 10,000 measured chunks. Memory measured at peak resident-set size during embedding.
- **What we did *not* test:** end-to-end answer quality. The chat model is identical (Llama 3.3 8B Q4_K_M) across runs, but answer quality depends on prompt template and chunk count. We isolate retrieval here so the embedder is the only variable.
πNote: Network access was disabled after model downloads. All inference ran locally β confirmed via Wireshark on Windows and Little Snitch on macOS. Six models Γ four document sets Γ three runs = 72 indexed corpora plus the 100 query embeddings each.
Retrieval Accuracy by Document Type (retrieval@10)
retrieval@10 = % of queries where the correct chunk appears in the top-10 results. Higher is better. Numbers are out of 25 graded queries per document type per model.
| Model | Legal | Research | Code | Multilingual | Overall |
|---|---|---|---|---|---|
| nomic-embed-text-v2 | 88% | 90% | 82% | 92% | 88% |
| bge-large-en-v1.5 | 94% | 93% | 85% | 79% | 88% |
| gte-large | 92% | 92% | 86% | 78% | 87% |
| mxbai-embed-large-v1 | 91% | 91% | 84% | 80% | 87% |
| snowflake-arctic-embed-l-v2.0 | 88% | 89% | 83% | 86% | 87% |
| jina-embeddings-v3 | 93% | 92% | 87% | 89% | 92% |
π‘Tip: jina-embeddings-v3 is the only model in the test that stays above 87% on every document type. The English-only models (bge-large-en-v1.5, gte-large, mxbai-embed-large-v1) edge it on pure English text but lose 10β15 points on multilingual content. If your corpus is mixed, the English-leader trap is real.
CPU Embedding Speed (Chunks Per Second)
Throughput at batch size 32, 256-token chunks, on Apple M3 Pro (no GPU). Higher is better. CPU speed determines whether you can re-embed a 5,000-page corpus over lunch (jina, nomic) or have to plan an overnight run (bge-large, gte-large).
| Model | Chunks/sec (CPU) | 5K-page corpus indexing time | Notes |
|---|---|---|---|
| nomic-embed-text-v2 | 580 | ~9 min | Mixture-of-experts; activates 305M of 475M params per token |
| jina-embeddings-v3 | 220 | ~24 min | LoRA adapters; can disable adapter for additional ~15% speed |
| snowflake-arctic-embed-l-v2.0 | 130 | ~40 min | Distilled from larger base; flash-attention helps on AVX-512 |
| gte-large | 110 | ~48 min | Standard 1,024-dim BERT-style; no special CPU optimisation |
| mxbai-embed-large-v1 | 105 | ~50 min | Standard 1,024-dim; mxbai-embed-2d variant offers smaller dims |
| bge-large-en-v1.5 | 95 | ~55 min | Most accurate on English; slowest on CPU due to 24 layers Γ 1,024 dims |
π‘Tip: On CPU-only hardware, choose nomic-embed-text-v2 for any corpus above 1,000 pages. The 5β6Γ speed advantage compounds: a re-index that takes 9 minutes with nomic takes 50+ minutes with bge-large. That difference matters every time you tune chunk size or swap embedders to A/B test.
GPU Embedding Speed (Chunks Per Second)
Throughput at batch size 64, 256-token chunks, on NVIDIA RTX 4070 (12 GB VRAM). Higher is better. GPU shrinks the speed gap between models; the slowest GPU number (1,400 chunks/sec for bge-large) is still 2.4Γ faster than the fastest CPU number.
| Model | Chunks/sec (GPU) | 5K-page corpus indexing time | GPU memory (peak) |
|---|---|---|---|
| nomic-embed-text-v2 | 4,800 | ~1 min 5 sec | 1.6 GB |
| jina-embeddings-v3 | 3,200 | ~1 min 35 sec | 2.4 GB |
| snowflake-arctic-embed-l-v2.0 | 1,800 | ~2 min 50 sec | 2.2 GB |
| gte-large | 1,600 | ~3 min 10 sec | 2.5 GB |
| mxbai-embed-large-v1 | 1,500 | ~3 min 25 sec | 2.4 GB |
| bge-large-en-v1.5 | 1,400 | ~3 min 35 sec | 2.7 GB |
πNote: These numbers assume the embedding model owns the GPU. If a chat model (Llama 3.3 8B Q4_K_M occupies ~5 GB) is already loaded, the embedder competes for VRAM and throughput drops 30β50% from contention. On a 12 GB card you can either index or chat β not both at full speed simultaneously.
Memory Footprint and the Dimension Tradeoff
Dimension count is the most over-engineered choice in local RAG. More dimensions help retrieval up to ~1,024, then plateau. Beyond that, you pay double the storage for sub-1-percentage-point gains.
- 768 dims (nomic-embed-text-v2): 768 Γ 4 bytes = 3 KB per chunk. A 5,000-page corpus chunked at 256 tokens (~30,000 chunks) needs ~90 MB just for vectors.
- 1,024 dims (everything else): 4 KB per chunk. Same corpus needs ~120 MB for vectors. Storage scales linearly β a 50,000-page corpus needs 1.2 GB at 1,024 dims vs 0.9 GB at 768 dims.
- Matryoshka representation learning β jina-embeddings-v3 and nomic-embed-text-v2 are trained so that you can truncate the vector to 768, 512, 256, or even 128 dimensions and still retrieve well. Truncation is just slicing the array β no re-embedding. We measured retrieval@10 dropping by ~1 point at 512 dims, ~3 points at 256 dims, ~7 points at 128 dims.
- Quantisation β int8 quantisation of stored vectors halves storage and roughly halves retrieval latency, with retrieval@10 dropping ~0.5 percentage points in our test. Worth doing for any corpus over 25,000 chunks.
- Memory at inference time β the model itself loads into RAM once. nomic-embed-text-v2 occupies ~1.2 GB (mixture-of-experts means activations are smaller than parameters), 1,024-dim models occupy 1.9β2.4 GB. None of the six exceed 3 GB even on bf16.
- Storage in production β for a 50,000-page corpus, vector DB size on disk is 0.9 GB (768 dims) β 1.2 GB (1,024 dims) β 0.6 GB (1,024 dims, int8 quantised). Backup, sync, and incremental update costs all scale with this number.
π‘Tip: If storage cost matters, embed with jina-embeddings-v3 at 1,024 dims and truncate to 512 dims for storage. You get the indexing-time accuracy of the full model and half the storage cost, with about 1 percentage point of retrieval@10 lost. The truncation is reversible only if you keep the full vectors β decide before you commit.
Multilingual Quality: When the English Leaders Lose
The big quality gap in this benchmark is between multilingual and English-only models, not between any two specific models. A 25-query multilingual set (English, German, French, Japanese, Chinese β 5 each) exposes the gap clearly.
| Model | English query β English doc | EN query β DE/FR doc | EN query β JA/ZH doc | Cross-lingual avg |
|---|---|---|---|---|
| jina-embeddings-v3 | 94% | 90% | 84% | 89% |
| nomic-embed-text-v2 | 92% | 93% | 90% | 92% |
| snowflake-arctic-embed-l-v2.0 | 90% | 88% | 80% | 86% |
| mxbai-embed-large-v1 | 92% | 82% | 66% | 80% |
| bge-large-en-v1.5 | 94% | 79% | 64% | 79% |
| gte-large | 93% | 78% | 63% | 78% |
πNote: nomic-embed-text-v2 actually beats jina-embeddings-v3 on cross-lingual queries because its mixture-of-experts architecture activates language-specific experts for non-English content. For corpora with substantial Japanese or Chinese content, nomic-embed-text-v2 is worth a direct comparison β it is also the cheapest to run on CPU, which doubles its appeal for multilingual workloads on laptops.
Per-Model Profiles: What Each Embedder Is Actually Good At
Each model has a different design intent. The benchmark numbers above come from those design choices.
- nomic-embed-text-v2 β Open-weights mixture-of-experts (475M total params, ~305M active per token). Trained on 1.6B pairs across 100+ languages. License: Apache-2.0. Strengths: CPU throughput (5Γ faster than 1,024-dim peers), strong cross-lingual recall, smallest memory footprint. Weaknesses: 768-dim ceiling means slightly lower English recall vs 1,024-dim models. Best for CPU-only laptops, multilingual corpora, and any indexing pipeline that has to run frequently.
- bge-large-en-v1.5 (BAAI) β 335M params, 1,024 dims, 24 layers. Trained primarily on English with retrieval-focused contrastive pairs. License: MIT. Strengths: top-of-pack on English legal and research text, mature ecosystem (every local RAG platform supports it), well-documented behaviour under fine-tuning. Weaknesses: English-only β drops 12β15 points on multilingual queries. Slowest CPU throughput in the test. Best for English-only RAG where accuracy matters more than indexing speed.
- gte-large (Alibaba) β 335M params, 1,024 dims. Trained on web pairs with a focus on general-purpose semantic search. License: Apache-2.0. Strengths: permissive license, strong English performance, broad framework support (Sentence Transformers, LangChain, LlamaIndex). Weaknesses: English-focused (gte-multilingual-large exists separately and adds ~1 GB of memory). Best for commercial deployments where Apache-2.0 simplifies licensing review.
- mxbai-embed-large-v1 (Mixedbread) β 335M params, 1,024 dims. Distilled and fine-tuned from a strong base with retrieval-focused contrastive training. License: Apache-2.0. Strengths: balanced English performance, slightly better cross-lingual recall than bge-large, mxbai-embed-2d variant supports Matryoshka truncation (separate model). Weaknesses: smaller community than bge or gte. Best for English RAG with permissive licensing and the option to upgrade to mxbai-embed-2d for dimension flexibility.
- snowflake-arctic-embed-l-v2.0 (Snowflake) β 568M params, 1,024 dims, supports up to 8,192-token chunks natively. License: Apache-2.0. Strengths: long-context capability (most embedders cap at 512 tokens), ~30 languages, strong on enterprise-style documents. Weaknesses: middle-of-pack accuracy on short chunks. Best for corpora with very long structured documents (legal contracts, technical manuals, regulatory filings) where 8k-token chunks are useful.
- jina-embeddings-v3 (Jina AI) β 570M params, 1,024 dims with Matryoshka truncation to 768/512/256. Trained with task-specific LoRA adapters (retrieval, classification, similarity). 89-language support. License: CC BY-NC 4.0 for the open weights (commercial use needs a paid licence) β verify before deploying in a paid product. Strengths: best overall retrieval accuracy in this benchmark, strong multilingual performance, Matryoshka truncation, task-aware adapters. Weaknesses: licensing requires care for commercial deployments. Best for personal RAG, research, and any deployment where the licensing is acceptable.
π‘Tip: Always re-verify the license at the time you integrate. Embedding model licenses have shifted multiple times β bge moved from MIT to a more restrictive commercial term and back, jina-embeddings-v3 ships under CC BY-NC for the open weights, and Snowflake added an Acceptable Use Policy on top of Apache-2.0. Treat the README as an up-to-the-minute statement, not a historical document.
Self-Hosted vs OpenAI text-embedding-3-large: Cost Per Million Tokens
Self-hosted embedding is essentially free at scale. The only meaningful cost is electricity and hardware amortisation β both of which round to noise compared to API pricing for any corpus over a few thousand pages.
| Approach | Cost / 1M tokens | Time for 1M tokens | Notes |
|---|---|---|---|
| OpenAI text-embedding-3-large (API) | $0.13 | ~3 min (network-bound) | Highest absolute accuracy on English; data leaves your machine |
| jina-embeddings-v3 on RTX 4070 | ~$0.001 (electricity) | ~5 min | Best local accuracy; CC BY-NC licence β verify commercial |
| bge-large-en-v1.5 on RTX 4070 | ~$0.001 | ~12 min | Best English accuracy; MIT licence |
| nomic-embed-text-v2 on RTX 4070 | ~$0.0005 | ~3 min 30 sec | Fastest GPU throughput; multilingual; Apache-2.0 |
| nomic-embed-text-v2 on M3 Pro CPU | ~$0.0008 | ~30 min | No GPU needed; meaningful only because it is feasible without one |
πNote: For a 5,000-page corpus (~5M tokens at 1,000 tokens per page), OpenAI charges ~$0.65 per full re-index β trivial. The real cost is data egress: every chunk leaves your machine, and many compliance regimes simply do not allow it. Self-hosted embedding is a privacy and control choice first, a cost choice second.
Decision Tree: Which Embedder Should You Pick?
Five binary questions, in order, get most readers to the right embedder.
- 1. Do you have a GPU available for indexing? β No: nomic-embed-text-v2 (5Γ CPU speed). Yes: continue.
- 2. Is the corpus English-only? β No: continue. Yes: bge-large-en-v1.5 if accuracy matters most, gte-large or mxbai-embed-large-v1 if Apache-2.0 license matters.
- 3. Are documents very long (8k+ token chunks)? β Yes: snowflake-arctic-embed-l-v2.0. No: continue.
- 4. Do you need to truncate dimensions later for storage cost? β Yes: jina-embeddings-v3 (Matryoshka). No: continue.
- 5. Is the deployment a commercial product? β Yes: avoid jina-embeddings-v3 (CC BY-NC) unless you buy the commercial licence β use nomic-embed-text-v2 (Apache-2.0) or BAAI/bge-m3 (MIT) instead.
- If unsure: jina-embeddings-v3. It is the highest-accuracy general pick in the benchmark and the only model that stays above 87% on every document type. License-permitting deployments should default to it.
Common Mistakes When Choosing an Embedding Model
- Mistake 1: Sticking with the platform default. AnythingLLM ships a tiny built-in embedder; PrivateGPT defaults to all-MiniLM-L6-v2; Open WebUI defaults to nomic-embed-text-v1.5. All three defaults underperform jina-embeddings-v3 by 5β10 percentage points on retrieval@10. Switch.
- Mistake 2: Picking a 1,024-dim model when retrieval@10 was already at 90% with a 768-dim model. The marginal gain rarely justifies the doubled storage and 5Γ slower CPU throughput. nomic-embed-text-v2 hits 88% β enough for most use cases.
- Mistake 3: Picking an English-only embedder for a multilingual corpus. bge-large-en-v1.5 is the best English embedder in the test and one of the worst on Japanese or Chinese content. The "best embedder" answer is corpus-dependent β measure on your data.
- Mistake 4: Ignoring the license. jina-embeddings-v3 ships under CC BY-NC for the open weights. If you ship it inside a paid product without the commercial licence, you have a legal problem. Always re-verify the licence at integration time.
- Mistake 5: Benchmarking on too-small a corpus. All six models look great on 100 documents. Differences become decisive past ~5,000 chunks where the recall ceiling of weaker embedders shows up. Test on at least 5,000 chunks of your actual content.
- Mistake 6: Forgetting that switching embedders forces a full re-index. No local RAG platform supports incremental migration. Every embedder swap costs 30β90 minutes per 5,000 pages on consumer hardware. Pick once, swap deliberately.
FAQ
Which embedding model is fastest on CPU only?
nomic-embed-text-v2 β 580 chunks/sec on Apple M3 Pro at batch size 32, 256-token chunks. Roughly 5Γ faster than the 1,024-dim alternatives (bge-large-en-v1.5 at 95, gte-large at 110, mxbai-embed-large-v1 at 105 chunks/sec). The speed advantage comes from its mixture-of-experts architecture, which activates only ~305M of 475M parameters per token. For any corpus above 1,000 pages on CPU-only hardware, nomic-embed-text-v2 is the practical default.
Do larger embedding dimensions actually improve retrieval?
Up to ~1,024 dimensions, yes. Beyond that, no. In the benchmark, 768-dim nomic-embed-text-v2 (88% retrieval@10) trailed 1,024-dim jina-embeddings-v3 (92%) by 4 points overall. Going to 1,536 or 3,072 dims (some commercial APIs) gains less than 1 percentage point in published comparisons. Dimensions cost storage linearly: a 50,000-page corpus needs 0.9 GB at 768 dims vs 1.2 GB at 1,024 dims vs 3.6 GB at 3,072 dims. The Matryoshka trick β truncate after embedding β gives you flexibility without the cost.
Can I use multilingual embeddings without performance loss?
Multilingual models have caught up materially in 2026. jina-embeddings-v3 reached 92% retrieval@10 overall (89% on multilingual queries specifically) β competitive with the best English-only embedders on English text and far ahead of them on non-English. The historical gap (multilingual = lower accuracy) has narrowed to 1β2 points on English queries for a 10-point gain on non-English. For mixed corpora, multilingual is now the default-correct choice.
Which embedding model handles code best?
None of the six tested are dedicated code embedders. On a TypeScript/Python codebase, jina-embeddings-v3 led at 87% retrieval@10, with the others between 82β86%. For code-heavy corpora β code search, repository RAG, agent tooling over a codebase β pair a general embedder with a code-specific one (BAAI/bge-code-v1, voyage-code-2, or a fine-tuned variant) and use the better-scoring one for code chunks. The simplest approach: embed everything with jina-embeddings-v3 first, measure retrieval@10 on a held-out query set, and only swap if it falls below your threshold.
How often should I upgrade my embedding model?
Upgrade when a new model published a benchmark that beats yours by 3+ percentage points on data similar to your corpus, AND you have a measured retrieval@10 number you can compare against. Without a baseline measurement, you cannot tell if the new model is actually better on your content. For most local RAG deployments, an embedder is good for 12β18 months before a meaningfully better option arrives. Re-indexing is the cost β budget 30β90 minutes per 5,000 pages on consumer hardware.
Can I mix embedding models in the same RAG system?
Technically yes, practically no. Mixing requires either two parallel vector indexes (query both, merge results β adds 50β150 ms latency and complicates relevance scoring) or training a small projection layer to align dimensions (research-grade, fragile). For 95% of local deployments, pick one embedder and re-index. The exception: code repositories with a dedicated code embedder for code chunks and a general embedder for documentation β split by document type at ingestion, query both indexes when the user query is ambiguous.
Are open-source embeddings as good as OpenAI's?
For most local RAG use cases, yes. OpenAI text-embedding-3-large still leads published English benchmarks by 2β4 percentage points on retrieval@10, but the gap has closed materially. jina-embeddings-v3 came within 2 points on the test corpus, and the OpenAI route requires data leaves your machine β a non-starter for any deployment with privacy or compliance constraints. For pure quality on English text with no privacy requirement and a modest budget, OpenAI is still the highest absolute number; for everything else, open-source has caught up.
Does quantisation affect embedding quality?
int8 quantisation of stored vectors costs about 0.5 percentage points of retrieval@10 in exchange for halving storage and roughly halving retrieval latency. Worth it on any corpus above 25,000 chunks. Quantising the embedding *model itself* (the weights β bf16 β int8 β int4) is more aggressive: int8 model quantisation costs 1β2 percentage points; int4 costs 3β5 points and noticeably hurts multilingual recall. For local RAG on consumer hardware, run the model at bf16 (or fp16) and quantise the stored vectors only.
Which model is best for legal documents?
bge-large-en-v1.5 led the legal subset at 94% retrieval@10 β the highest single number in the benchmark β but only for English contracts. For German, French, or multilingual legal corpora, jina-embeddings-v3 (93% English / 89% multilingual) is the better all-rounder. Legal text rewards 1,024-dim models because terminological precision matters; the 768-dim nomic-embed-text-v2 trailed by 6 points on the legal subset. For very long contracts (50+ pages of dense legalese), snowflake-arctic-embed-l-v2.0 with 8k-token chunks reduces fragmentation losses.
Can embeddings be reused if I switch RAG platforms?
Source documents move freely between platforms. Embeddings move only if the new platform supports the same vector format and the same embedding model. AnythingLLM (LanceDB), PrivateGPT (Qdrant or Chroma), and Open WebUI (ChromaDB) all use different vector stores; even when the embedder is identical, the metadata schemas differ. In practice, every platform switch is also a re-indexing pass. Plan accordingly: pick the embedder for retrieval quality, pick the platform for everything else.