Open-source RAG engine with deep document understanding, chunk visualization, citation tracking, hybrid search, and agent workflow capabilities for enterprise knowledge bases.
An open-source system for building AI that answers questions from your documents — with deep understanding of complex document formats.
RAGFlow is an Apache-2.0 open-source Retrieval-Augmented Generation engine from InfiniFlow, with self-hosting available at no software license cost and hosted cloud pricing spanning Free, Starter, Pro, and Enterprise tiers, designed to act as a context layer for LLM applications and AI agents. Its public positioning is broader than a simple vector database wrapper: the project combines document ingestion, deep document understanding, chunking, hybrid retrieval, reranking, citations, configurable LLM and embedding models, and agent workflow tooling in one platform. The GitHub README describes RAGFlow as a RAG engine that fuses RAG with agent capabilities, while the product site frames it as a way to build a superior context layer for AI agents and enterprise use cases.
The strongest part of RAGFlow is its focus on messy enterprise data. The project emphasizes deep document understanding for unstructured data with complicated formats and supports a broad range of input types, including Word documents, slide decks, spreadsheets, text files, images, scanned copies, structured data, and web pages. It also includes built-in ingestion and ETL-style processing intended to cleanse and structure multi-format data into semantic representations before retrieval. For teams building knowledge-base assistants over PDFs, scanned documents, internal files, and mixed business records, that ingestion layer is a major part of the value proposition.
RAGFlow also puts unusual emphasis on explainability and grounding. It offers template-based chunking with multiple options, chunk visualization for human inspection, and traceable citations that let users quickly view references behind generated answers. This matters for professional and regulated workflows where the answer alone is not enough and users need to inspect where a claim came from. Its retrieval stack combines vector search, BM25/full-text search, custom scoring, multiple recall, and fused reranking, which gives teams more retrieval control than a bare vector-only setup.
Beyond retrieval, RAGFlow has been evolving toward agent orchestration. The website describes unified AI agent orchestration that integrates RAG, tools, MCPs, web search, chat, datasets, models, and visual workflows. The listed industry examples include equity investment research, legal precedent analysis, and manufacturing maintenance support, each using agent-style steps such as search, retrieval, HTTP calls, conditional logic, report generation, clarification, and instruction output. Recent updates listed in the repository also show support for agent memory, agentic workflow and MCP, a Python/JavaScript code executor component, and multiple chat channels.
RAGFlow is suitable for engineering teams that want an open-source, self-hostable RAG platform with a UI and production-oriented components, not just a developer library. It can be deployed via Docker Compose, configured with different LLM and embedding providers, and integrated through APIs. However, it is not a zero-maintenance tool. Self-hosting has meaningful infrastructure requirements, including at least 4 CPU cores, 16 GB RAM, 50 GB disk, Docker, Docker Compose, and Python 3.13 according to the README. The deployment stack also involves services such as Elasticsearch by default, with Infinity as an alternative document engine, plus MySQL, MinIO, and Redis in the development setup. Teams should expect real DevOps ownership for production use.
The commercial cloud offering lowers the operational burden and includes Free, Starter, Pro, and Enterprise tiers. The published cloud limits make the lower tiers best for evaluation and small team usage: Free includes 5 apps, 1 team member, 0.1 GB dataset storage, and 500 monthly credits; Starter increases this to 50 apps, 5 team members, 5 GB storage, and 5,000 credits; Pro includes unlimited apps, 20 team members, 50 GB storage, and 20,000 credits. Enterprise adds BYOC deployment, on-premises deployment, dedicated support, and custom SLA. Overall, RAGFlow is best viewed as a serious RAG and agent platform for teams that value document parsing quality, retrieval transparency, and deployment control, while accepting the complexity that comes with a full-stack open-source system.
Was this helpful?
Parses PDFs, Word docs, and more with structure-aware chunking that preserves tables, headers, figures, and hierarchical relationships.
Use Case:
Processing financial reports where table data and section context must be preserved for accurate retrieval.
Web UI showing exactly how each document was chunked, with the ability to manually adjust boundaries and verify parsing quality.
Use Case:
Quality-checking document parsing before deploying a knowledge base to production users.
Every generated answer includes links to specific source chunks, enabling users to verify claims against original documents.
Use Case:
Building a compliance knowledge assistant where every answer must be traceable to source policy documents.
Maintains conversation context across multiple exchanges, enabling follow-up questions and clarification without losing thread.
Use Case:
Creating a customer-facing knowledge assistant that handles complex multi-step inquiries.
Specialized parsing for complex tables that maintains row/column relationships during indexing and retrieval.
Use Case:
Querying data from annual reports, spec sheets, or compliance matrices embedded in PDF documents.
Built-in tenant isolation enabling multiple teams or clients to have separate knowledge bases within one deployment.
Use Case:
Deploying a shared RAG platform across departments with isolated data access controls.
Free software license; infrastructure and model costs not included
$0/month
$29/month shown with a higher $59/month reference price on the site
$129/month shown with a higher $259/month reference price on the site
Contact sales
Ready to get started with RAGFlow?
View Pricing Options →We believe in transparent reviews. Here's what RAGFlow doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Knowledge & Documents
Microsoft's graph-based retrieval augmented generation for complex document understanding and multi-hop reasoning.
AI agent framework
LlamaIndex is an open-source Python and TypeScript framework for building RAG, document workflows, and AI agents — with LlamaCloud for managed parsing, extraction, and indexing.
LLM app platform
Dify is an open-source LLM app development platform that combines a visual workflow builder, RAG pipelines, agent tools, and an LLMOps backbone.
Document Processing & OCR
Unstructured data platform for GenAI that connects to any source, processes 64+ file types, and outputs clean AI-ready inputs.
No reviews yet. Be the first to share your experience!
Get started with RAGFlow and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →