Master Docling with our step-by-step tutorial, detailed feature walkthrough, and expert tips.
Install Docling with `pip install docling` (add `docling[ocr]` or `docling[vlm]` for OCR/VLM support). Parse your first document using the DocumentConverter API: `converter = DocumentConverter(); result = converter.convert('myfile.pdf')`. Export the parsed result to Markdown, JSON, or HTML using `result.document.export_to_markdown()` or similar export methods. Integrate with your RAG stack by installing the appropriate connector (e.g., `docling
langchain`, `docling
llamaindex`, or `docling
haystack`). For batch processing or automation, use the Docling CLI: `docling convert
to md ./documents/`.
💡 Quick Start: Follow these 5 steps in order to get up and running with Docling quickly.
Explore the key features that make Docling powerful for mcp / agent infrastructure workflows.
Yes. Docling is released under the Apache 2.0 license and the associated models (Docling layout, TableFormer, Granite-Docling, SmolDocling) are openly available on Hugging Face, so it can be embedded in commercial products and run on-premises without per-document fees.
Docling parses PDF, DOCX, PPTX, XLSX, HTML, Markdown, AsciiDoc, CSV, and images (PNG, JPEG, TIFF), and recent versions add audio transcription. Outputs include Markdown, HTML, JSON, and the structured DoclingDocument schema.
Docling runs locally with no data ever leaving your environment, which hosted APIs cannot offer. It also preserves richer structural information (tables via TableFormer, reading order, formulas) than most generic OCR APIs. The trade-off is that you operate the infrastructure yourself rather than paying per page.
Yes. Docling ships a Model Context Protocol (MCP) server so MCP-compatible agents and IDE assistants (Claude Desktop, Cursor, etc.) can call it as a tool to convert and chunk documents on demand, in addition to direct integrations with LangChain, LlamaIndex, Haystack, and Crew AI.
Yes. It integrates with OCR engines including EasyOCR, Tesseract, and RapidOCR, and can run vision-language pipelines (SmolDocling, Granite-Docling) that read directly from page images to produce structured output.
Now that you know how to use Docling, it's time to put this knowledge into practice.
Sign up and follow the tutorial steps
Check pros, cons, and user feedback
See how it stacks against alternatives
Follow our tutorial and master this powerful mcp / agent infrastructure tool in minutes.
Tutorial updated March 2026