Document AI🔴Developer

Docling

Name: Docling
Brand: Docling
Availability: InStock

IBM-backed open-source document parsing toolkit that converts PDFs, DOCX, PPTX, images, audio, and more into structured formats for RAG pipelines and AI agent workflows.

Starting atFree

Visit Docling →

💡

In Plain English

An open-source tool from IBM that converts documents into AI-ready formats — handles PDFs, presentations, and more.

Overview

Docling is an open-source document processing toolkit originally developed by IBM Research that converts documents from virtually any format into clean, structured representations ready for AI consumption. With Apache 2.0 licensing, local execution, and integrations with every major AI framework, it's become one of the most practical tools for teams building RAG systems and document-understanding agents.

Format Coverage That Actually Matters

Docling handles the formats teams actually encounter: PDF (including scanned), DOCX, PPTX, XLSX, HTML, LaTeX, images (PNG, JPEG, TIFF), and even audio files (WAV, MP3) via automatic speech recognition. Recent releases added WebVTT caption parsing, XBRL financial reports, and USPTO patent documents. This breadth means you don't need separate parsers for each document type — Docling normalizes everything into its unified DoclingDocument format.

Advanced PDF Understanding

PDF parsing is where Docling truly separates from simpler tools like PyPDF or pdfplumber. The Heron layout model (released December 2025) provides faster parsing while accurately detecting page layout, reading order, table structures, code blocks, mathematical formulas, and image classification. It handles multi-column layouts, headers/footers, and complex nested tables that break most other parsers. For OCR on scanned documents, Docling integrates multiple OCR engines and even supports IBM's Granite-Docling-258M vision-language model — a 258M parameter VLM purpose-built for document-to-text conversion that preserves complex layouts in a single inference pass.

Structured Output Formats

Every parsed document converts to the DoclingDocument unified representation, which you can then export as Markdown, HTML, JSON (lossless), WebVTT, or DocTags. The JSON export preserves the full document structure — headings, paragraphs, tables, lists, figures — with coordinates and reading order metadata. This is critical for RAG systems where chunk boundaries and document structure affect retrieval quality. See our guide on building effective RAG systems for why document structure matters.

AI Framework Integrations

Docling provides plug-and-play integrations with LangChain, LlamaIndex, CrewAI, and Haystack. These aren't thin wrappers — they're maintained connectors that feed parsed documents directly into each framework's document loaders and chunking pipelines. The MCP server integration (added in 2025) lets any MCP-compatible AI agent use Docling as a document parsing tool, making it accessible from Claude, Cursor, and other MCP clients.

Local Execution and Privacy

Unlike cloud-based document AI services from Google or Azure, Docling runs entirely locally. Install with pip install docling and process sensitive documents without sending data to any external server. This is essential for healthcare, legal, and financial teams with strict data governance requirements. The CLI makes batch processing straightforward for pipeline automation.

Metadata and Advanced Analysis

Recent releases added rich metadata extraction capabilities including document language detection, page-level bounding boxes for every element, confidence scores on OCR results, and hierarchical section labeling. The TableFormer model achieves over 90% F1 on complex table structure recognition benchmarks (PubTabNet, FinTabNet), making it among the best open-source options for extracting structured data from tables embedded in PDFs. Docling's chunking utilities — HybridChunker and HierarchicalChunker — leverage this metadata to split documents at semantically meaningful boundaries rather than arbitrary token counts, which measurably improves retrieval precision in RAG systems.

Performance and Scale

On GPU hardware (e.g., a single NVIDIA A100), Docling processes approximately 10–15 pages per second for standard layout analysis, and 3–5 pages per second when the full VLM pipeline is engaged. CPU-only throughput is roughly 5–10× slower depending on document complexity. The project's GitHub repository has accumulated over 18,000 stars since its public release, reflecting strong community adoption. The SmolDocling model variant (released early 2026) reduced the VLM footprint to under 256M parameters while maintaining competitive accuracy, making GPU requirements more accessible for smaller teams.

🦞

Using with OpenClaw

▼

Create OpenClaw skills that leverage Docling for document analysis and processing. Integrate via API calls or direct SDK usage.

Use Case Example:

Process documents uploaded to OpenClaw using Docling's specialized capabilities, then store results in memory for later reference.

Learn about OpenClaw →

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Document processing tool requiring some technical understanding of formats and parsing.

Learn about Vibe Coding →

Was this helpful?

Editorial Review

Docling from IBM Research provides accurate, modular document conversion with particular strength in scientific and technical documents. The layout analysis and table extraction capabilities are excellent for academic papers, reports, and structured documents. Being open-source and self-hostable is a significant advantage for data-sensitive organizations. The processing speed is slower than simpler parsers, and the focus on structured documents means it's less suited for highly visual or creative document formats.

Key Features

Unified DocumentConverter API that ingests PDF, DOCX, PPTX, XLSX, HTML, Markdown, AsciiDoc, images, and audio and emits a normalized DoclingDocument object+

Advanced PDF understanding: page layout analysis, reading order reconstruction, table structure recognition via TableFormer, formula and code-block detection+

OCR support via EasyOCR, Tesseract, and RapidOCR for scanned documents, with configurable language models and bbox-level confidence+

Vision-language model pipelines using IBM's Granite-Docling and SmolDocling for image-first document understanding+

Layout-aware chunkers (HybridChunker, HierarchicalChunker) that respect section and table boundaries when preparing text for embeddings+

First-party integrations with LangChain, LlamaIndex, Haystack, txtai, and Crew AI, plus an MCP server for agent and IDE assistants+

Multiple export formats — Markdown, HTML, JSON, and the typed DoclingDocument schema — with deterministic, structure-preserving output+

Local/offline execution with Apache 2.0 licensing, openly published model weights on Hugging Face, and a CLI for batch conversion+

Pricing Plans

Open Source (self-hosted)

Free

✓Full Docling Python library under Apache 2.0
✓All document parsers (PDF, DOCX, PPTX, XLSX, HTML, images, audio)
✓TableFormer, Granite-Docling, and SmolDocling model weights from Hugging Face
✓OCR via EasyOCR, Tesseract, RapidOCR
✓LangChain, LlamaIndex, Haystack, Crew AI integrations
✓MCP server for agent/IDE use
✓CLI and Python SDK
✓Community support via GitHub issues and Discord

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with Docling?

View Pricing Options →

Getting Started with Docling

1Install Docling with `pip install docling` (add `docling[ocr]` or `docling[vlm]` for OCR/VLM support).
2Parse your first document using the DocumentConverter API: `converter = DocumentConverter(); result = converter.convert('myfile.pdf')`.
3Export the parsed result to Markdown, JSON, or HTML using `result.document.export_to_markdown()` or similar export methods.
4Integrate with your RAG stack by installing the appropriate connector (e.g., `docling-langchain`, `docling-llamaindex`, or `docling-haystack`).
5For batch processing or automation, use the Docling CLI: `docling convert --from pdf --to md ./documents/`.

Ready to start? Try Docling →

Best Use Cases

🎯

Building enterprise RAG pipelines where source documents are messy PDFs, contracts, or technical manuals and structure must be preserved

⚡

Preparing high-quality training and evaluation datasets from PDF/PPTX corpora for LLM fine-tuning or distillation

🔧

On-premises document understanding in regulated sectors (healthcare, legal, finance, government) where cloud APIs are not permitted

🚀

Powering agentic workflows via MCP, letting Claude- or Cursor-style assistants ingest user documents on demand

💡

Extracting structured tables, figures, and formulas from scientific papers or financial filings for downstream analytics

🔄

Replacing brittle in-house PDF-to-text scripts in existing LangChain/LlamaIndex/Haystack stacks with a single, layout-aware loader

Integration Ecosystem

15 integrations

Docling works with these platforms and services:

🧠 LLM Providers

ibm-granite

📊 Vector Databases

MilvusWeaviateQdrantChromaPinecone

☁️ Cloud Platforms

AWS

⚡ Code Execution

Docker

🔗 Other

GitHublangchainllamaindexhaystackcrewaitxtaimcp

View full Integration Matrix →

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Docling doesn't handle well:

⚠Docling is a Python library without a managed cloud offering, so teams must operate the infrastructure themselves. Throughput is bound by available CPU/GPU — running VLM and OCR pipelines at scale typically requires GPUs to be practical. Accuracy on edge cases such as deeply nested tables, hand-filled forms, mathematical-heavy papers, and low-quality scans remains imperfect and may require model fine-tuning or post-processing. Audio support is newer and less mature than the document-parsing path. APIs evolve release-to-release, so version pinning is recommended for production. There is no built-in UI for non-developers; downstream teams generally consume Docling output through their own tooling.

Pros & Cons

✓ Pros

✓Apache-2.0 licensed and runs fully local/offline, which is important for regulated industries handling sensitive documents
✓Preserves document structure (tables, headings, reading order, figures, formulas) rather than emitting flat text, dramatically improving RAG quality
✓Broad format coverage in one toolkit: PDF, DOCX, PPTX, XLSX, HTML, images, and audio, plus OCR fallbacks via EasyOCR/Tesseract/RapidOCR
✓First-class integrations with LangChain, LlamaIndex, Haystack, Crew AI, and an MCP server for agentic workflows
✓Backed by IBM Research with active maintenance under the LF AI & Data Foundation, and ships purpose-built models (TableFormer, Granite-Docling, SmolDocling)
✓Layout-aware chunking utilities (HybridChunker, HierarchicalChunker) make it easier to feed embeddings without breaking semantic units

✗ Cons

✗Python-only library — teams on JVM, Go, or Node stacks have to wrap it in a service or use the MCP/CLI interface
✗Running the full pipeline with VLMs and OCR is computationally heavy; throughput on CPU-only machines can be slow for large PDF batches
✗Quality on highly complex layouts (multi-column scientific papers with nested tables, scanned forms) still requires tuning and is not error-free
✗Documentation and APIs evolve quickly across releases, so pinning versions is necessary to avoid breakage in production pipelines
✗No managed/hosted offering from the project itself — teams are responsible for GPU provisioning, scaling, and monitoring

Frequently Asked Questions

Is Docling free to use commercially?+

Yes. Docling is released under the Apache 2.0 license and the associated models (Docling layout, TableFormer, Granite-Docling, SmolDocling) are openly available on Hugging Face, so it can be embedded in commercial products and run on-premises without per-document fees.

What document formats does Docling support?+

Docling parses PDF, DOCX, PPTX, XLSX, HTML, Markdown, AsciiDoc, CSV, and images (PNG, JPEG, TIFF), and recent versions add audio transcription. Outputs include Markdown, HTML, JSON, and the structured DoclingDocument schema.

How does Docling compare to using a hosted API like Unstructured or AWS Textract?+

Docling runs locally with no data ever leaving your environment, which hosted APIs cannot offer. It also preserves richer structural information (tables via TableFormer, reading order, formulas) than most generic OCR APIs. The trade-off is that you operate the infrastructure yourself rather than paying per page.

Can Docling be used inside an AI agent or IDE assistant?+

Yes. Docling ships a Model Context Protocol (MCP) server so MCP-compatible agents and IDE assistants (Claude Desktop, Cursor, etc.) can call it as a tool to convert and chunk documents on demand, in addition to direct integrations with LangChain, LlamaIndex, Haystack, and Crew AI.

Does Docling handle scanned PDFs and images?+

Yes. It integrates with OCR engines including EasyOCR, Tesseract, and RapidOCR, and can run vision-language pipelines (SmolDocling, Granite-Docling) that read directly from page images to produce structured output.

🔒 Security & Compliance

❌

SOC2

✅

GDPR

Yes

❌

HIPAA

❌

SSO

✅

Self-Hosted

Yes

✅

On-Prem

Yes

❌

RBAC

❌

Audit Log

❌

API Key Auth

✅

Open Source

Yes

—

Encryption at Rest

Unknown

—

Encryption in Transit

Unknown

Data Retention: configurable

Data Residency: USER-CONTROLLED

📋 Privacy Policy →🛡️ Security Page →

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on Docling and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

Through late 2025 and into 2026 the project expanded well beyond its original PDF focus. Notable additions include audio file ingestion with transcription, a Model Context Protocol (MCP) server so MCP-compatible agents and IDEs can call Docling as a tool, and tighter integration with IBM's Granite-Docling and the compact SmolDocling vision-language models for image-first document understanding. The project also moved under the LF AI & Data Foundation umbrella as docling-project, broadening governance beyond IBM, and continued to add ecosystem integrations (Crew AI, Haystack, txtai) alongside maturing the layout-aware HybridChunker for RAG.

Alternatives to Docling

Unstructured

Document AI

Document ETL engine that converts messy PDFs, Word files, and images into AI-ready structured data with intelligent chunking.

LlamaParse

Document AI

LlamaParse: Extract and analyze structured data from complex PDFs and documents using LLM-powered parsing.

View All Alternatives & Detailed Comparison →

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try Docling Today

Get started with Docling and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about Docling

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

📚 Related Articles

Best AI Tools for Document Processing & Data Extraction (2026)

A practical guide to AI-powered document processing tools. Compare Unstructured, LlamaParse, Amazon Textract, and more for extracting structured data from PDFs, invoices, contracts, and reports.

2026-03-1714 min read