GraphRAG

GraphRAG, popularized by Microsoft Research, extracts entities and relationships from a corpus and builds a knowledge graph. Retrieval becomes graph traversal plus vector search over entity descriptions and community summaries. It's the best answer for corpus-wide queries that vector search handles poorly.

The problem it solves

Vanilla RAG works well for "find the chunk that answers this." It works poorly for:

These require aggregating information across many documents, not finding one chunk. Pure vector retrieval either misses most of the relevant material or returns too much noise.

The GraphRAG pipeline

Indexing phase

  1. Chunk documents as usual
  2. Extract entities from each chunk using an LLM (people, organizations, concepts, events)
  3. Extract relationships between entities from each chunk
  4. Build a graph: nodes are entities, edges are relationships
  5. Community detection: cluster the graph into hierarchical communities (using Leiden or similar)
  6. Generate community summaries: LLM-summarize each community into a description of its core content
  7. Embed entities, relationships, community summaries for retrieval

Query phase

Two query modes in Microsoft's GraphRAG:

Local search

For specific entity-focused questions. Embed the query, find related entities in the graph, gather their connections and source chunks, synthesize. Best for "tell me about X" queries.

Global search

For corpus-wide questions. The LLM processes each community summary, generates partial answers, then aggregates them into a final response. Best for "themes," "patterns," "summarize across."

Strengths

Weaknesses

Cost structure

GraphRAG indexing can cost 10-100x more than standard RAG indexing. For 1M chunks:

At enterprise scale, this is significant. Use smaller/cheaper models for extraction if corpus is large.

Implementations

Hybrid: GraphRAG + vector RAG

Serious systems combine both:

Or: use GraphRAG's community summaries as additional retrieval candidates alongside document chunks. At retrieval time, the system can return either granular chunks or high-level summaries depending on query type.

When GraphRAG is worth it

When it isn't

GraphRAG is impressive and genuinely useful for the right use case. It's also often over-adopted by teams who would do better with well-tuned vanilla RAG.

Next: Adaptive RAG.