IBM-backed open-source document parsing toolkit that converts PDFs, DOCX, PPTX, images, audio, and more into structured formats for RAG pipelines and AI agent workflows.
An open-source tool from IBM that converts documents into AI-ready formats — handles PDFs, presentations, and more.
Docling is an open-source document processing toolkit originally developed by IBM Research that converts documents from virtually any format into clean, structured representations ready for AI consumption. With MIT licensing, local execution, and integrations with every major AI framework, it's become one of the most practical tools for teams building RAG systems and document-understanding agents.
Format Coverage That Actually MattersDocling handles the formats teams actually encounter: PDF (including scanned), DOCX, PPTX, XLSX, HTML, LaTeX, images (PNG, JPEG, TIFF), and even audio files (WAV, MP3) via automatic speech recognition. Recent releases added WebVTT caption parsing, XBRL financial reports, and USPTO patent documents. This breadth means you don't need separate parsers for each document type — Docling normalizes everything into its unified DoclingDocument format.
Advanced PDF UnderstandingPDF parsing is where Docling truly separates from simpler tools like PyPDF or pdfplumber. The Heron layout model (released December 2025) provides faster parsing while accurately detecting page layout, reading order, table structures, code blocks, mathematical formulas, and image classification. It handles multi-column layouts, headers/footers, and complex nested tables that break most other parsers. For OCR on scanned documents, Docling integrates multiple OCR engines and even supports IBM's Granite-Docling-258M vision-language model — a 258M parameter VLM purpose-built for document-to-text conversion that preserves complex layouts in a single inference pass.
Structured Output FormatsEvery parsed document converts to the DoclingDocument unified representation, which you can then export as Markdown, HTML, JSON (lossless), WebVTT, or DocTags. The JSON export preserves the full document structure — headings, paragraphs, tables, lists, figures — with coordinates and reading order metadata. This is critical for RAG systems where chunk boundaries and document structure affect retrieval quality. See our guide on building effective RAG systems for why document structure matters.
AI Framework IntegrationsDocling provides plug-and-play integrations with LangChain, LlamaIndex, CrewAI, and Haystack. These aren't thin wrappers — they're maintained connectors that feed parsed documents directly into each framework's document loaders and chunking pipelines. The MCP server integration (added in 2025) lets any MCP-compatible AI agent use Docling as a document parsing tool, making it accessible from Claude, Cursor, and other MCP clients.
Local Execution and PrivacyUnlike cloud-based document AI services from Google or Azure, Docling runs entirely locally. Install with pip install docling and process sensitive documents without sending data to any external server. This is essential for healthcare, legal, and financial teams with strict data governance requirements. The CLI makes batch processing straightforward for pipeline automation.
Recent releases added metadata extraction (title, authors, references, language detection), chart understanding (bar charts, pie charts, line plots), and molecular structure recognition for chemistry documents. These features make Docling useful beyond standard text extraction — it can serve as the perception layer for specialized AI agents working with scientific or financial documents.
Community and Development PaceWith 16,000+ GitHub stars and backing from IBM Research as an LF AI & Data Foundation project, Docling has strong institutional support while remaining fully open source. The release cadence is aggressive — multiple releases per month with meaningful feature additions, not just bug fixes.
Was this helpful?
Docling from IBM Research provides accurate, modular document conversion with particular strength in scientific and technical documents. The layout analysis and table extraction capabilities are excellent for academic papers, reports, and structured documents. Being open-source and self-hostable is a significant advantage for data-sensitive organizations. The processing speed is slower than simpler parsers, and the focus on structured documents means it's less suited for highly visual or creative document formats.
Uses trained models to identify document regions: titles, text blocks, tables, figures, headers, footers, and page numbers. Handles multi-column layouts, sidebars, and mixed content regions without rule-based heuristics.
Use Case:
Processing two-column academic papers where rule-based tools fail to correctly identify the reading order across columns.
Dedicated deep learning model for recognizing table structures including rows, columns, merged cells, spanning headers, and multi-line cells. Produces structured table data with row/column indices.
Use Case:
Extracting a complex financial table with merged headers and spanning cells from an annual report PDF.
Outputs a rich document object that preserves the full hierarchy: document → sections → subsections → paragraphs/tables/figures. Each element has type classification, bounding boxes, and parent-child relationships.
Use Case:
Building a document viewer that renders the extracted structure with proper heading hierarchy and inline table placement.
Export DoclingDocument to markdown, JSON, HTML, or custom formats while preserving structural information. Markdown export includes proper headers, table formatting, and figure placeholders.
Use Case:
Converting a batch of DOCX and PDF files to clean markdown for ingestion into a static site knowledge base.
Built-in OCR pipeline using EasyOCR or Tesseract for scanned documents and images. Configurable per document with language selection and preprocessing options.
Use Case:
Processing a mixed collection of digital and scanned PDFs where some documents have text layers and others require OCR extraction.
Layout analysis and table recognition models support GPU acceleration via PyTorch. Batch processing with GPU can achieve 5-10x speedup over CPU-only processing.
Use Case:
Processing 10,000 PDFs overnight using a GPU-equipped server to build a comprehensive document knowledge base.
Free
forever
Ready to get started with Docling?
View Pricing Options →Preprocessing documents for RAG pipelines where accurate chunking and structure preservation directly impact retrieval quality
Processing sensitive legal, medical, or financial documents locally without sending data to cloud services
Building document-understanding AI agents that need to parse mixed format documents (PDFs, spreadsheets, presentations) into a unified structure
Docling works with these platforms and services:
We believe in transparent reviews. Here's what Docling doesn't handle well:
Docling is open-source and runs locally; LlamaParse is a cloud service. LlamaParse uses LLMs for extraction and often produces better results for very complex documents. Docling is faster, free, and keeps data local. For most standard documents, Docling's quality is excellent; LlamaParse edges ahead for the most complex layouts.
Yes, through integrated OCR using EasyOCR or Tesseract. Quality depends on scan resolution — 300+ DPI scans produce good results. Docling auto-detects whether a PDF has a text layer or needs OCR processing.
No, it runs on CPU. However, GPU acceleration provides significant speedups (5-10x) for the deep learning models. For batch processing of large document collections, GPU is strongly recommended.
Docling produces higher-quality structured output with better layout analysis and table extraction for PDFs. Unstructured handles more file formats, has a broader connector ecosystem, and provides chunking/embedding features. Docling is a better converter; Unstructured is a more complete document ETL platform.
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
People who use this tool also find these helpful
Open source text extraction framework that pulls content and metadata from over 1,000 file formats. Free, battle-tested, and maintained by the Apache Software Foundation since 2007.
Microsoft's enterprise OCR and document processing service combining traditional OCR with deep learning for layout analysis, table extraction, key-value recognition, and custom model training.
Docugami is an AI-powered document intelligence platform that understands the structure and meaning of complex business documents like contracts, invoices, HR files, and insurance forms. Unlike simple OCR or chat-over-PDF tools, Docugami builds a deep semantic understanding of your document sets, extracting structured data, identifying clauses and terms, and enabling cross-document analysis at scale. Founded by former Microsoft engineering leaders, it targets enterprises that process high volumes of complex documents and need reliable, structured data extraction.
Cloud document processing for classification and entity extraction. This document ai provides comprehensive solutions for businesses looking to optimize their operations.
Advanced parsing service for PDFs and complex documents.
High-quality PDF to markdown conversion for LLM pipelines.
See how Docling compares to CrewAI and other alternatives
View Full Comparison →AI Agent Builders
CrewAI is an open-source Python framework for orchestrating autonomous AI agents that collaborate as a team to accomplish complex tasks. You define agents with specific roles, goals, and tools, then organize them into crews with defined workflows. Agents can delegate work to each other, share context, and execute multi-step processes like market research, content creation, or data analysis. CrewAI supports sequential and parallel task execution, integrates with popular LLMs, and provides memory systems for agent learning. It's one of the most popular multi-agent frameworks with a large community and extensive documentation.
Agent Frameworks
Open-source multi-agent framework from Microsoft Research with asynchronous architecture, AutoGen Studio GUI, and OpenTelemetry observability. Now part of the unified Microsoft Agent Framework alongside Semantic Kernel.
AI Agent Builders
Graph-based stateful orchestration runtime for agent loops.
AI Agent Builders
SDK for building AI agents with planners, memory, and connectors. - Enhanced AI-powered platform providing advanced capabilities for modern development and business workflows. Features comprehensive tooling, integrations, and scalable architecture designed for professional teams and enterprise environments.
No reviews yet. Be the first to share your experience!
Get started with Docling and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →