Master GraphRAG with our step-by-step tutorial, detailed feature walkthrough, and expert tips.
Install GraphRAG
: `pip install graphrag` and ensure you have Python
10+ with sufficient disk space for graph artifacts
Configure LLM access
: Set up OpenAI API keys or configure Azure OpenAI endpoints for the entity extraction and summarization pipeline
Prepare documents
: Organize your document corpus into a single directory — GraphRAG works best with 100+ documents containing interconnected information
Run indexing
: Execute `python
m graphrag.index
root ./your
data` to build the knowledge graph (expect significant LLM token usage for large corpora)
Test queries
: Try both local search for specific questions and global search for holistic understanding to compare GraphRAG's capabilities with traditional RAG
💡 Quick Start: Follow these 14 steps in order to get up and running with GraphRAG quickly.
Explore the key features that make GraphRAG powerful for knowledge & documents workflows.
Uses LLMs to extract entities, relationships, and claims from documents, building a structured knowledge graph that captures semantic connections traditional chunking misses.
Analyzing a corpus of research papers to understand how different concepts and findings relate across publications.
Synthesizes answers from community summaries across the entire dataset, enabling holistic questions that vanilla RAG cannot handle.
Asking 'What are the key regulatory trends?' across thousands of policy documents.
Combines graph neighborhood traversal with vector similarity for precise, context-rich answers to specific questions.
Finding detailed information about a specific entity and all its relationships within the knowledge base.
Applies the Leiden algorithm to identify clusters of related entities, generating hierarchical summaries at multiple abstraction levels.
Automatically organizing a large knowledge base into thematic groups for exploration.
Entity and relationship extraction prompts can be tuned for specific domains, improving accuracy for specialized corpora.
Configuring extraction for medical literature to focus on drug interactions, symptoms, and treatment protocols.
Produces inspectable Parquet files containing entities, relationships, communities, and summaries for debugging and analysis.
Auditing the knowledge graph to verify extraction quality before deploying to production.
Traditional RAG retrieves the top-k most similar text chunks for a query, which works well for narrow, fact-lookup questions but fails on global or multi-hop questions where the answer is spread across many documents. GraphRAG builds a knowledge graph of entities, relationships, and claims, then uses hierarchical community summaries to enable global reasoning ('summarize the main themes') and local graph traversal for entity-centric questions, in addition to standard chunk retrieval.
Local Search answers questions about specific entities by traversing their graph neighborhood and pulling in related text. Global Search answers corpus-wide, summarization-style questions by map-reducing over pre-computed community summaries. DRIFT Search is a newer hybrid mode that combines local entity context with global community context to better handle questions that span both granularities.
Yes — the GraphRAG codebase at github.com/microsoft/graphrag is open source under the MIT license. However, the indexing pipeline makes many LLM API calls (entity extraction, claim extraction, community summarization), so you pay the underlying LLM provider (OpenAI, Azure OpenAI, etc.) for compute. Indexing a large corpus can be significantly more expensive upfront than building a plain vector index.
GraphRAG supports OpenAI and Azure OpenAI for both chat completion and embeddings out of the box, configured via settings.yaml. Other providers can be wired in through the modular LLM interface. Outputs are stored as Parquet files; vector embeddings can be stored in LanceDB (default), Azure AI Search, or Cosmos DB. The graph itself can be exported to GraphML or Neo4j for visualization.
Use GraphRAG when your use case requires global reasoning, multi-hop questions, or strong provenance across a fixed or slow-changing corpus — for example, intelligence analysis, regulatory document review, or research synthesis. Use LlamaIndex or LangChain when you need a general-purpose orchestration framework, fast incremental indexing, or simpler entity-lookup retrieval. Many teams use GraphRAG as one retriever component inside a larger LlamaIndex/LangChain pipeline.
Now that you know how to use GraphRAG, it's time to put this knowledge into practice.
Sign up and follow the tutorial steps
Check pros, cons, and user feedback
See how it stacks against alternatives
Follow our tutorial and master this powerful knowledge & documents tool in minutes.
Tutorial updated March 2026