Agent orchestration vs retrieval depth. Honest take on which wins where — and why our production systems use both.
We ship production RAG and agent systems on both. This is what we actually use.
TL;DR
Graph-based state machines with cycles, checkpointing, and human-in-the-loop. The only framework with true stateful agent graphs. LangSmith adds production tracing on top.
The deepest retrieval pipeline available — hybrid search, reranking, query routing, and LlamaParse for document parsing that handles tables and images no other parser touches.
In every production RAG+agent system we ship, we use LlamaIndex as the retrieval layer and LangGraph as the orchestration shell. They compose cleanly — no conflict.
Different Problems
LangChain and LlamaIndex were built to solve different problems. Comparing them directly is a bit like comparing a workflow engine to a database — they are both necessary layers in a production system.
An orchestration framework. LangGraph models your AI system as a directed graph where nodes are functions (or LLM calls) and edges are transitions. It manages state, handles loops, persists memory between runs, and lets you insert human approval steps. Think of it as the control plane.
// LangGraph = state machine
graph = StateGraph(AgentState)
graph.add_node("planner", plan_step)
graph.add_node("retriever", retrieve)
graph.add_node("responder", respond)
graph.add_edge("planner", "retriever")
# cycles, checkpoints, human nodes
A data framework for retrieval. LlamaIndex handles the entire pipeline from raw document ingestion through chunking, embedding, indexing, query routing, and reranking. It answers the question: given a user query, what is the most accurate set of document chunks to retrieve? Think of it as the data plane.
// LlamaIndex = retrieval pipeline
docs = LlamaParse().load_data(files)
index = VectorStoreIndex.from_documents(docs)
engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[reranker]
) # hybrid + reranked
Backed by a16z and Benchmark, LangChain is now a unicorn. For production teams evaluating framework risk, this matters: the API is unlikely to deprecate without migration paths, commercial support is available, and the project has the runway to maintain a growing ecosystem. LlamaIndex is also well-funded, but LangChain's scale gives it a larger integration surface and more organizational contributors — which translates directly to API stability.
$1.25B
LangChain valuation
Series B · a16z & Benchmark
Feature Comparison
✓ Full support · ~ Partial / workaround required · ✗ Not supported
| Feature | LangChain / LangGraph | LlamaIndex / LlamaParse |
|---|---|---|
Graph-based agent state machines LangGraph is purpose-built for stateful agent graphs with cycles, branches, and checkpointing. LlamaIndex has no equivalent. | ✓ | ✗ |
Multi-step tool calling LangChain has a rich pre-built tool ecosystem. LlamaIndex tools are more focused on data retrieval use cases. | ✓ | ~ |
Human-in-the-loop / approval nodes LangGraph interrupt nodes support approval workflows and human escalation. LlamaIndex has no equivalent primitive. | ✓ | ✗ |
Agent memory & cross-session persistence LangGraph checkpointing supports durable cross-session memory. LlamaIndex has basic chat buffer memory only. | ✓ | ~ |
Document parsing (PDFs, tables, images) LlamaParse handles tables, images, and complex layouts with layout-aware extraction. LangChain loaders are functional but less precise. | ~ | ✓ |
Query routing strategies LlamaIndex RouterQueryEngine, SummaryIndex, KnowledgeGraphIndex give structured routing. LangChain requires manual routing logic. | ~ | ✓ |
Hybrid search (dense + BM25) LlamaIndex has native hybrid search with Reciprocal Rank Fusion. LangChain requires separate BM25 setup and manual score fusion. | ~ | ✓ |
Cross-encoder reranking LlamaIndex has first-class reranking nodes (Cohere, Voyage, local models). LangChain requires extra integration work. | ~ | ✓ |
Sub-question decomposition LlamaIndex SubQuestionQueryEngine decomposes complex queries automatically. LangChain achieves this via agents only. | ~ | ✓ |
Observability & production tracing LangSmith is production-grade — traces, evals, datasets, playground. LlamaIndex relies on Arize Phoenix, which is less seamless. | ✓ | ~ |
LLM provider support Both support OpenAI, Anthropic, Gemini, Cohere, Ollama, and most major model providers. | ✓ | ✓ |
Vector store integrations Both integrate with Pinecone, Weaviate, Chroma, Qdrant, pgvector, and 20+ others. | ✓ | ✓ |
Our Internal Stack
We don't choose between LangGraph and LlamaIndex. We use LlamaIndex as a retrieval tool inside a LangGraph agent. The query engine becomes a node — the agent decides when to call it, what to ask, and what to do with the results.
# LangGraph agent shell
LangGraph StateGraph
├── planner_node
│ └── decompose user query
├── retrieval_node ← LlamaIndex
│ LlamaParse ingestion
│ Hybrid search + BM25
│ Cross-encoder reranking
│ Citation extraction
├── action_node
│ └── tool calls if needed
└── response_node
└── answer with citations
# LangSmith traces the full graph
LlamaIndex as a tool node
The LlamaIndex query engine becomes a Python function that the LangGraph agent can call as a tool. The agent decides when to retrieve, what to ask, and how many times to retry if the answer is insufficient.
No framework conflict
They operate at different layers — LangGraph handles state and control flow, LlamaIndex handles retrieval logic. There is no overlap and no incompatibility.
LangSmith sees everything
When the LlamaIndex retrieval call happens inside a LangGraph node, LangSmith traces it as part of the graph run. You see the full pipeline in one observability view.
The Decision Tree
Answer yes to the first question that applies.
Is the core requirement parsing complex documents — PDFs with tables, images, or scanned layouts?
Yes
LlamaIndex required
LlamaParse is the best document parser available — accuracy others cannot match
No
Continue ↓
Do you need multi-step agent reasoning: tools, loops, decisions, and state persistence?
Yes
LangGraph required
Graph-based state machines with checkpointing — no other framework comes close
No
LlamaIndex alone
Pure RAG with no agentic reasoning — LlamaIndex is sufficient on its own
Does your agent need to query a knowledge base as one of its tools?
Yes
Use both
LangGraph as orchestrator, LlamaIndex query engine as the retrieval tool node
No
Continue ↓
Do you need production tracing, evaluation datasets, and a debugging playground?
Yes
Add LangSmith
Best-in-class observability for any LLM application — integrates with both frameworks
No
Continue ↓
Building a new production AI system from scratch?
Yes
Use both together
Our default: LlamaIndex retrieval layer + LangGraph orchestration — see below
No
Continue ↓
Our honest recommendation
Default to using both together.
For any production RAG or agent system, LlamaIndex retrieval inside a LangGraph orchestrator is the highest-accuracy, most maintainable stack we've found.
Our Stack
Hestur AI uses LangGraph + LlamaIndex as our production default for RAG systems and agent workflows. Here is what each framework handles in a typical engagement.
Book a 30-minute call. We'll walk through your data sources, accuracy requirements, and agent complexity — and scope a production system using the right stack for your use case.