Entry-level RAG starts at $15,000–$30,000. Mid-scale builds run $35,000–$80,000. Enterprise deployments start at $100,000. What drives cost: source complexity, compliance, and retrieval quality — not just data volume. Here is the full breakdown.
Three tiers
3–6 weeks
$15,000–$30,000
Best for: Internal knowledge base, product FAQ bot, single-department support tool
Not suitable for multi-source heterogeneous data or regulated data environments.
6–12 weeks
$35,000–$80,000
Best for: Enterprise knowledge base, multi-team deployment, customer support RAG with live data
On-prem or fully air-gapped deployment costs more — see enterprise tier.
12–24 weeks
$100,000+
Best for: Legal research platforms, regulated industry deployments, defence/government, large-scale customer intelligence
Quoted individually after architecture sessions.
What drives cost
Data volume is the most obvious cost driver — but rarely the most important one. Source complexity, compliance requirements, and retrieval quality together account for 60–70% of typical RAG project cost.
Embedding cost scales linearly with tokens. OpenAI text-embedding-3-small: ~$0.02/1M tokens. At 1M documents averaging 2,000 tokens each = ~$40 in initial embedding. Re-embedding on updates adds recurring cost.
Volume alone rarely drives cost — it's volume × update frequency that matters.
Structured markdown: minimal parsing cost. PDFs with tables, charts, multi-column layouts: 3–5× more engineering. Scanned PDFs (OCR): add $5,000–$15,000. Legacy systems without APIs (screen scraping, ODBC): add $10,000–$25,000.
One hard data source can consume 30% of the project budget.
Standard cloud: no premium. SOC 2 aligned: adds audit documentation ($5K–$10K). HIPAA: BAA required, PHI isolation, encrypted storage, audit logging — adds $15K–$25K. On-premises / air-gapped: adds $30K–$60K infrastructure.
Compliance is decided at architecture time. Retrofitting costs 3× more than building correctly initially.
Basic semantic search: included. Hybrid search (BM25 + vector): +$5,000–$10,000. Reranking layer (Cohere Rerank or custom): +$3,000–$8,000. Query decomposition for multi-hop reasoning: +$8,000–$15,000.
Each retrieval quality layer compounds: a system with hybrid search + reranking + query decomposition needs significantly more engineering.
Managed cloud (Pinecone, Weaviate Cloud): $70–$3,000+/month depending on index size. Self-managed Qdrant on cloud VPS: $80–$300/month. On-premises: significant server cost + DevOps — typically $20,000–$50,000 in infrastructure setup.
Self-hosted Qdrant is the default recommendation above 5M vectors — it's 80–90% cheaper than Pinecone at scale.
Embeddings need updating as data changes. New documents must be chunked, embedded, and indexed. Retrieval quality degrades as the knowledge base grows without tuning. Monthly maintenance: 8–40 hours of engineering time.
The most common mistake: clients budget for the build but not the maintenance. A well-tuned RAG costs 10–20% of build cost per year to maintain.
Vector database cost
Vector database is typically 5–15% of total RAG operating cost, but the architecture choice affects compliance, performance, and long-term vendor lock-in. Here is the comparison.
Pinecone
Managed cloud
$70/mo starter, scales to $0.096/1M vectors/month
Virtually unlimited (expensive)
✗
Fast PoC, small-medium index, zero infra ops
Weaviate Cloud
Managed cloud
$25/mo starter, usage-based at scale
100M+ vectors
✓ (open-source)
Hybrid BM25+vector, GraphQL API, multi-modal
Qdrant
Self-hosted
Self-hosted: $30–150/mo VPS; Cloud: from $25/mo
Scales with hardware
✓ (main distribution)
Large-scale (10M+ vectors), on-prem requirement, cost
pgvector
Postgres extension
Free (on your existing Postgres)
~1M vectors before degradation
✓
Small RAG (<500K docs) on existing Postgres infra
Chroma
Local/dev
Free (open-source)
~100K vectors before degradation
✓
Development, PoC prototyping — not production
The rebuild trap
Five architectural mistakes we see in cheap RAG builds that require complete rebuilds within 12–18 months of production deployment.
Mistake
Starting with character-level chunking
Consequence
Retrieval misses semantic boundaries. Answers lack context. Full re-chunk and re-embed required — same cost as original embedding.
Fix
Use semantic chunking (sentence boundary detection) from day one. Overlap 10–15% between chunks.
Mistake
No reranking layer
Consequence
Vector similarity finds close documents but not necessarily relevant ones. Precision at 90%+ retrieval rate requires reranking — retrofit cost: $8,000–$15,000.
Fix
Budget for Cohere Rerank or a custom cross-encoder from the PoC stage.
Mistake
Single embedding model forever
Consequence
As your data volume grows, your embedding model's context window becomes a ceiling. Migrating to a new embedding model requires re-embedding the entire index.
Fix
Design the embedding pipeline to be swappable. Abstract the model behind an interface.
Mistake
No metadata filtering
Consequence
At 100K+ documents, users get irrelevant results from outdated or wrong-department data. Metadata filtering is a retrieval necessity at scale — retrofitting requires re-indexing.
Fix
Design the document schema with metadata fields (date, department, document type, access level) before ingestion.
Mistake
Cloud-only architecture for regulated data
Consequence
A HIPAA or GDPR review flags PHI in a third-party vector database. Migration to on-prem requires rebuilding the entire ingestion and serving layer.
Fix
Define data residency requirements before architecture selection, not after the first compliance review.
30-minute call. We assess your data sources, compliance requirements, and retrieval quality targets — then give you a firm price range before any work starts.