Structured output library for reliable LLM schema extraction.
Makes AI return structured, validated data instead of messy text — perfect when you need reliable, typed responses from AI.
Instructor is a Python library that patches LLM client libraries to return structured, validated outputs instead of raw text. Built on Pydantic, it lets you define a response model as a Pydantic class and get back a validated Python object — with automatic retries when the LLM output doesn't match the schema. It's not an agent framework; it's a precision tool for one specific problem: getting reliable structured data from LLMs.
The library works by patching the OpenAI, Anthropic, Google, Cohere, Mistral, and other client libraries with a responsemodel parameter. When you call client.chat.completions.create(responsemodel=MyModel, ...), Instructor handles the function-calling schema generation, response parsing, validation, and retry logic. If the LLM returns invalid data, Instructor feeds the validation errors back to the model and retries.
Instructor supports multiple extraction modes: TOOLS (native function calling), JSON (forces JSON output), MD_JSON (extracts JSON from markdown blocks), and PARALLEL (extracts multiple objects). TOOLS mode is most reliable with capable models, while JSON mode works better with models that have weak function calling.
Beyond basic extraction, Instructor supports streaming partial objects (incremental Pydantic model updates as the LLM generates), iterable responses (extract lists of objects), union types for classification, and validators with custom logic. The library also includes a citation validator for grounding extracted data.
Created by Jason Liu, Instructor has become the de facto standard for structured extraction in Python LLM applications, with ports to TypeScript, Ruby, Go, and Elixir.
The honest take: Instructor does one thing exceptionally well. If your challenge is getting structured, validated data from LLMs — entity extraction, classification, data transformation — Instructor is the right choice over heavier frameworks. It's the tool you reach for when you need a Pydantic model back from an LLM call, reliably.
Was this helpful?
Instructor is the gold standard for structured LLM output extraction, using Pydantic models for validation and retry logic. Essential for any production agent that needs reliable, typed responses from LLMs.
Define desired output as a Pydantic model with typed fields, descriptions, and validators. Instructor converts this to function-calling schema, parses the LLM response, and returns a validated object.
Use Case:
Extracting structured user profile data (name, email, company, role) from unstructured customer emails with type validation.
When Pydantic validation fails, Instructor feeds specific errors back to the LLM and retries. The model receives context about what went wrong and can self-correct.
Use Case:
Extracting financial data where the model occasionally formats numbers incorrectly — retries with feedback improve accuracy from ~85% to ~97%.
Get incremental Pydantic model updates as the LLM generates tokens. Fields populate as they become available, enabling progressive rendering.
Use Case:
Building a real-time entity extraction UI that shows extracted fields appearing one by one as the model processes a document.
Patches client libraries for OpenAI, Anthropic, Google, Cohere, Mistral, LiteLLM, and Ollama with the same response_model interface. Switch providers by changing the client, not the extraction logic.
Use Case:
Running the same extraction pipeline across GPT-4, Claude, and Gemini to benchmark which produces the most accurate structured outputs.
TOOLS (native function calling), JSON (JSON mode), MD_JSON (markdown-wrapped JSON), and PARALLEL (multiple objects). Each mode optimizes for different model capabilities.
Use Case:
Using TOOLS mode with GPT-4 for reliability, falling back to JSON mode for models without function calling, with identical Pydantic models.
Extract lists of objects using Iterable[MyModel] (streaming each complete object as generated) and classify inputs using Union types where the model selects the appropriate Pydantic model.
Use Case:
Processing customer support tickets to extract multiple structured issue reports from a single transcript, streamed one at a time.
Free
forever
Ready to get started with Instructor?
View Pricing Options →Extracting structured data (entities, facts, attributes) from unstructured text with validated Pydantic output
Building classification systems where LLM outputs must conform to specific categories or type hierarchies
Creating data transformation pipelines that convert free-text inputs into typed, database-ready records
Adding structured output support to existing LLM application code with minimal refactoring
Instructor works with these platforms and services:
We believe in transparent reviews. Here's what Instructor doesn't handle well:
Instructor adds Pydantic validation (catches type errors, format issues, constraint violations), automatic retry with error feedback, and a consistent API across providers. Raw function calling gives you JSON; Instructor gives you validated Python objects.
Yes. Use create_partial() for streaming partial Pydantic objects. Fields populate incrementally. There's also create_iterable() for streaming a list of complete objects. Streaming works with all extraction modes and providers.
Start with max_retries=2-3. Each retry is a full LLM call. For critical extraction, 3 retries achieves 99%+ parse rates. Monitor your retry rate — if consistently high, simplify the Pydantic model or add field descriptions.
Yes. Instructor has an Ollama integration for any model Ollama serves. Larger models (70B+) handle complex schemas reliably; 7B models work for simple extraction. Use JSON mode instead of TOOLS for models with limited function calling.
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
In 2026, Instructor added support for more LLM providers including Google Gemini and Anthropic's tool-use mode, introduced streaming support for partial Pydantic model extraction, and improved retry logic with customizable validation hooks for complex structured output scenarios.
People who use this tool also find these helpful
A user-friendly AI agent building platform that simplifies the creation of intelligent automation workflows with drag-and-drop interfaces and pre-built components.
An innovative AI agent creation platform that enables users to build emotionally intelligent and creative AI agents with advanced personality customization and artistic capabilities.
The standard framework for building LLM applications with comprehensive tool integration, memory management, and agent orchestration capabilities.
CrewAI is an open-source Python framework for orchestrating autonomous AI agents that collaborate as a team to accomplish complex tasks. You define agents with specific roles, goals, and tools, then organize them into crews with defined workflows. Agents can delegate work to each other, share context, and execute multi-step processes like market research, content creation, or data analysis. CrewAI supports sequential and parallel task execution, integrates with popular LLMs, and provides memory systems for agent learning. It's one of the most popular multi-agent frameworks with a large community and extensive documentation.
Open-source standard that gives AI agents a common API to communicate, regardless of what framework built them. Free to implement. Backed by the AI Engineer Foundation but facing competition from Google's A2A and Anthropic's MCP.
Open-source CLI that scaffolds AI agent projects across frameworks like CrewAI, LangGraph, and LlamaStack with one command. Think create-react-app, but for agents.
See how Instructor compares to CrewAI and other alternatives
View Full Comparison →AI Agent Builders
CrewAI is an open-source Python framework for orchestrating autonomous AI agents that collaborate as a team to accomplish complex tasks. You define agents with specific roles, goals, and tools, then organize them into crews with defined workflows. Agents can delegate work to each other, share context, and execute multi-step processes like market research, content creation, or data analysis. CrewAI supports sequential and parallel task execution, integrates with popular LLMs, and provides memory systems for agent learning. It's one of the most popular multi-agent frameworks with a large community and extensive documentation.
Agent Frameworks
Open-source multi-agent framework from Microsoft Research with asynchronous architecture, AutoGen Studio GUI, and OpenTelemetry observability. Now part of the unified Microsoft Agent Framework alongside Semantic Kernel.
AI Agent Builders
Graph-based stateful orchestration runtime for agent loops.
AI Agent Builders
SDK for building AI agents with planners, memory, and connectors. - Enhanced AI-powered platform providing advanced capabilities for modern development and business workflows. Features comprehensive tooling, integrations, and scalable architecture designed for professional teams and enterprise environments.
No reviews yet. Be the first to share your experience!
Get started with Instructor and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →