Coding Agents🔴Developer

Instructor

Name: Instructor
Brand: Instructor
Availability: InStock

Extract structured, validated data from any LLM using Pydantic models with automatic retries and multi-provider support. Most popular Python library with 3M+ monthly downloads and 11K+ GitHub stars.

Starting atFree

Visit Instructor →

💡

In Plain English

Makes AI return structured, validated data instead of messy text — perfect when you need reliable, typed responses from any AI model.

Overview

Instructor is the most popular Python library for extracting structured, validated data from Large Language Models, transforming unreliable text outputs into type-safe Python objects through Pydantic model definitions. With over 3 million monthly downloads and 11,000+ GitHub stars, it has become the de facto standard for reliable LLM output processing in production applications.\n\nBuilt on Pydantic's validation framework, Instructor patches LLM client libraries to add a responsemodel parameter that defines the expected output structure. When you call client.create(responsemodel=MyModel, ...), Instructor automatically handles function-calling schema generation, response parsing, validation, and intelligent retry logic when the LLM output doesn't match the specified schema.\n\nThe library's core innovation lies in its automatic retry mechanism with validation feedback. When Pydantic validation fails, Instructor feeds specific error messages back to the LLM and retries the request. This feedback loop enables models to self-correct, achieving 99%+ success rates even with complex schemas that would otherwise fail frequently.\n\nInstructor supports 15+ LLM providers through its unified fromprovider() interface, including OpenAI, Anthropic, Google Gemini, Mistral, Cohere, DeepSeek, Ollama, and local models. This provider-agnostic approach prevents vendor lock-in and enables easy A/B testing across different models for specific extraction tasks without code changes.\n\nAdvanced features include streaming partial objects where Pydantic fields populate incrementally as the LLM generates tokens, iterable responses for extracting lists of objects, union types for classification tasks, and custom validators with arbitrary logic. Multiple extraction modes (TOOLS, JSON, MDJSON, PARALLEL) optimize for different model capabilities and use cases.\n\nThe library's focused scope as an extraction tool rather than a full agent framework is intentional. Instructor excels at the specific problem of getting reliable structured data from single LLM calls without the complexity of agent loops, tool calling, or conversation management. For complete agent workflows, the Instructor team recommends complementary tools like PydanticAI.\n\nCompared to alternatives, Instructor sits between raw function calling (which requires manual JSON parsing and error handling) and heavy agent frameworks. It provides more reliability than raw OpenAI function calls through validation and retries, but remains simpler than LangChain or other comprehensive frameworks by focusing solely on structured extraction.\n\nInstructor has expanded beyond Python with official ports to TypeScript, Go, Ruby, Elixir, and Rust, maintaining consistent APIs across languages. This multi-language support enables teams to use the same extraction patterns across different technology stacks while preserving the benefits of type safety and validation.\n\nCompanies using Instructor include teams at OpenAI, Google, Microsoft, AWS, and numerous Y Combinator startups. The library's production-ready status is evidenced by its extensive test suite, comprehensive documentation, and active community of 100+ contributors maintaining integrations and examples.

🦞

Using with OpenClaw

▼

Use Instructor within OpenClaw subagent scripts to extract structured data from LLM calls. Install via pip and use response_model parameter in your LLM client calls.

Use Case Example:

Extract structured entities, classifications, or typed records from text within OpenClaw agent workflows, ensuring validated outputs for downstream processing.

Learn about OpenClaw →

🎨

Vibe Coding Friendly?

▼

Difficulty:beginner

Simple Python API with minimal boilerplate — define a Pydantic model and add response_model to your LLM call. Requires basic Python knowledge but no ML expertise.

Learn about Vibe Coding →

Was this helpful?

Editorial Review

Instructor is the gold standard for structured LLM output extraction, with 3M+ monthly downloads and support for 15+ providers. Using Pydantic models for validation and automatic retry logic, it turns unreliable LLM text into guaranteed typed Python objects. Essential for any production system that needs reliable, structured responses from LLMs.

Key Features

Pydantic Model Integration+

Define output structure as Pydantic models with typed fields, descriptions, and validators. Instructor converts these to function-calling schemas and returns validated Python objects automatically.

Automatic Retry with Validation Feedback+

When Pydantic validation fails, Instructor provides specific error messages to the LLM and retries. Models receive context about validation failures and can self-correct, achieving 99%+ success rates.

Multi-Provider Support (15+)+

Unified from_provider() interface works with OpenAI, Anthropic, Google, Cohere, Mistral, DeepSeek, Ollama, and 10+ more providers. Switch providers without code changes for easy A/B testing and cost optimization.

Streaming Partial Objects+

Get incremental Pydantic model updates as the LLM generates tokens. Fields populate progressively, enabling real-time UIs that show structured data appearing as extraction progresses.

Multiple Extraction Modes+

TOOLS mode uses native function calling for maximum reliability, JSON mode forces JSON output for weaker models, MD_JSON extracts from markdown blocks, and PARALLEL extracts multiple objects simultaneously.

Union Type Classification+

Use Union types to let the LLM select the appropriate Pydantic model for classification tasks. Supports discriminated unions and automatic routing based on input content analysis.

Pricing Plans

Open Source

Free

✓Full library with all extraction modes
✓All 15+ provider integrations
✓Streaming, retries, and validation
✓MIT license
✓Community support via Discord

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with Instructor?

View Pricing Options →

Getting Started with Instructor

1Install Instructor via pip: 'pip install instructor' and import the library into your Python project
2Patch your existing LLM client using 'instructor.from_provider(provider_name)' or specific provider functions like 'instructor.from_openai()'
3Define a Pydantic model describing your desired output schema with typed fields and optional validation rules
4Call your client with the response_model parameter: 'client.create(response_model=YourModel, messages=[...])' to get validated Python objects
5Handle the returned Pydantic object directly in your application code with full type safety and IDE support

Ready to start? Try Instructor →

Best Use Cases

🎯

Structured entity extraction from unstructured text: Extracting structured data (entities, facts, attributes) from unstructured text like emails, documents, or web pages with validated Pydantic output and automatic retries on parse failures.

⚡

LLM-powered classification systems: Building classification systems where LLM outputs must conform to specific enum categories or discriminated union type hierarchies, with validation ensuring only valid classes are returned.

🔧

Data transformation pipelines: Creating ETL pipelines that convert free-text inputs (customer feedback, support tickets, forms) into typed, database-ready records with guaranteed schema compliance.

🚀

Adding structured output to existing LLM code: Retrofitting structured output support onto existing OpenAI/Anthropic API calls with minimal code changes — just add response_model parameter to existing client calls.

Integration Ecosystem

11 integrations

Instructor works with these platforms and services:

🧠 LLM Providers

OpenAIAnthropicGoogleCohereMistraldeepseekOllama

📈 Monitoring

LangSmithLangfuse

🔗 Other

pydanticlitellm

View full Integration Matrix →

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Instructor doesn't handle well:

⚠Designed exclusively for request-response extraction patterns, not suitable for multi-turn conversations or agent loops requiring stateful interactions
⚠Complex nested models with 15+ fields or deep nesting can exceed context limits when combined with long input documents
⚠Validation feedback assumes LLMs can self-correct from error messages, which isn't always effective with smaller or less capable models
⚠No built-in rate limiting or concurrency management for high-volume extraction requiring external throttling mechanisms
⚠Retry costs accumulate quickly with frequently failing validations, requiring careful schema design and model selection optimization

Pros & Cons

✓ Pros

✓Drop-in enhancement for existing LLM code - add response_model parameter for instant structured outputs with zero refactoring
✓Automatic retry with validation feedback achieves 99%+ parsing success rates even with complex schemas
✓Provider-agnostic design supports 15+ LLM services with identical APIs for easy switching and cost optimization
✓Streaming capabilities enable real-time UIs with progressive data population as models generate responses
✓Production-proven with 3M+ monthly downloads, 11K+ GitHub stars, and usage by teams at OpenAI, Google, Microsoft
✓Multi-language support (Python, TypeScript, Go, Ruby, Elixir, Rust) provides consistent extraction patterns across tech stacks
✓Focused scope as extraction tool prevents framework bloat while excelling at its core domain
✓Comprehensive documentation, examples, and active community support via Discord

✗ Cons

✗Limited to structured extraction - not a general-purpose agent framework; requires additional tools for conversation management and tool calling
✗Retry mechanism increases LLM costs when validation fails frequently; complex schemas may double or triple extraction expenses
✗Smaller models (under 13B parameters) struggle with complex nested schemas despite validation feedback
✗No built-in caching or deduplication - repeated extractions hit the LLM every time without external caching layers
✗Depends on Pydantic v2 - projects still using Pydantic v1 require migration before adoption

Frequently Asked Questions

How does Instructor differ from OpenAI's native function calling?+

Instructor adds Pydantic validation to catch type errors and constraint violations, automatic retry with error feedback when parsing fails, and a consistent API across 15+ providers. Raw function calling gives you JSON to parse yourself; Instructor provides validated Python objects with intelligent retry logic.

Can I use Instructor with streaming responses?+

Yes. Use create_partial() for streaming partial Pydantic objects where fields populate incrementally, and create_iterable() for streaming complete objects one at a time from lists. Streaming works with all extraction modes and supported providers.

How does Instructor relate to PydanticAI?+

Instructor focuses on fast, schema-first extraction from single LLM calls. PydanticAI (from the Pydantic team) provides a full agent runtime with tools, observability, and production dashboards. They're complementary - use Instructor for extraction, PydanticAI for agent workflows.

Does Instructor work with local models through Ollama?+

Yes. Instructor has native Ollama integration for any model Ollama serves. Larger models (70B+) handle complex schemas reliably, while 7B models work well for simple 3-5 field extraction. Use JSON mode instead of TOOLS for models with limited function calling.

What's the difference between Instructor and Outlines?+

Instructor uses post-generation validation with retries and works with any API provider. Outlines uses constrained generation for guaranteed schema compliance but requires self-hosting. Instructor is easier for cloud APIs, Outlines better for local deployment with zero retries.

🔒 Security & Compliance

—

SOC2

Unknown

—

GDPR

Unknown

—

HIPAA

Unknown

—

SSO

Unknown

✅

Self-Hosted

Yes

✅

On-Prem

Yes

—

RBAC

Unknown

—

Audit Log

Unknown

—

API Key Auth

Unknown

✅

Open Source

Yes

—

Encryption at Rest

Unknown

—

Encryption in Transit

Unknown

Data Retention: configurable

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on Instructor and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

In 2025-2026, Instructor introduced from_provider() for automatic multi-provider detection, expanded to 15+ LLM providers including DeepSeek, surpassed 3 million monthly downloads, and clarified its positioning as complementary to PydanticAI — Instructor for extraction, PydanticAI for agent workflows. Ports now available in TypeScript, Go, Ruby, Elixir, and Rust.

Alternatives to Instructor

Outlines

AI Agent Builders

Grammar-constrained generation for deterministic model outputs.

Guidance

AI Agent Builders

A programming language for controlling large language models with constrained generation and structured output guarantees

View All Alternatives & Detailed Comparison →

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try Instructor Today

Get started with Instructor and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about Instructor

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

📚 Related Articles

AI Coding Agents Compared: Claude Code vs Cursor vs Copilot vs Codex (2026)

Compare the top AI coding agents in 2026 — Claude Code, Cursor, Copilot, Codex, Windsurf, Aider, and more. Real pricing, honest strengths, and a decision framework for every skill level.

2026-03-1612 min read

Overview

Editorial Review

Key Features

Pydantic Model Integration+

Define output structure as Pydantic models with typed fields, descriptions, and validators. Instructor converts these to function-calling schemas and returns validated Python objects automatically.

Automatic Retry with Validation Feedback+

Multi-Provider Support (15+)+

Streaming Partial Objects+

Get incremental Pydantic model updates as the LLM generates tokens. Fields populate progressively, enabling real-time UIs that show structured data appearing as extraction progresses.

Multiple Extraction Modes+

Union Type Classification+

Use Union types to let the LLM select the appropriate Pydantic model for classification tasks. Supports discriminated unions and automatic routing based on input content analysis.

Getting Started with Instructor

1Install Instructor via pip: 'pip install instructor' and import the library into your Python project

2Patch your existing LLM client using 'instructor.from_provider(provider_name)' or specific provider functions like 'instructor.from_openai()'

3Define a Pydantic model describing your desired output schema with typed fields and optional validation rules

4Call your client with the response_model parameter: 'client.create(response_model=YourModel, messages=[...])' to get validated Python objects

5Handle the returned Pydantic object directly in your application code with full type safety and IDE support

Best Use Cases

🎯

Structured entity extraction from unstructured text: Extracting structured data (entities, facts, attributes) from unstructured text like emails, documents, or web pages with validated Pydantic output and automatic retries on parse failures.

⚡

LLM-powered classification systems: Building classification systems where LLM outputs must conform to specific enum categories or discriminated union type hierarchies, with validation ensuring only valid classes are returned.

🔧

Data transformation pipelines: Creating ETL pipelines that convert free-text inputs (customer feedback, support tickets, forms) into typed, database-ready records with guaranteed schema compliance.

🚀

Adding structured output to existing LLM code: Retrofitting structured output support onto existing OpenAI/Anthropic API calls with minimal code changes — just add response_model parameter to existing client calls.

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Instructor doesn't handle well:

⚠Designed exclusively for request-response extraction patterns, not suitable for multi-turn conversations or agent loops requiring stateful interactions

⚠Complex nested models with 15+ fields or deep nesting can exceed context limits when combined with long input documents

⚠Validation feedback assumes LLMs can self-correct from error messages, which isn't always effective with smaller or less capable models

⚠No built-in rate limiting or concurrency management for high-volume extraction requiring external throttling mechanisms

⚠Retry costs accumulate quickly with frequently failing validations, requiring careful schema design and model selection optimization

Pros & Cons

✓ Pros

✓Drop-in enhancement for existing LLM code - add response_model parameter for instant structured outputs with zero refactoring
✓Automatic retry with validation feedback achieves 99%+ parsing success rates even with complex schemas
✓Provider-agnostic design supports 15+ LLM services with identical APIs for easy switching and cost optimization
✓Streaming capabilities enable real-time UIs with progressive data population as models generate responses
✓Production-proven with 3M+ monthly downloads, 11K+ GitHub stars, and usage by teams at OpenAI, Google, Microsoft
✓Multi-language support (Python, TypeScript, Go, Ruby, Elixir, Rust) provides consistent extraction patterns across tech stacks
✓Focused scope as extraction tool prevents framework bloat while excelling at its core domain
✓Comprehensive documentation, examples, and active community support via Discord

✗ Cons

✗Limited to structured extraction - not a general-purpose agent framework; requires additional tools for conversation management and tool calling
✗Retry mechanism increases LLM costs when validation fails frequently; complex schemas may double or triple extraction expenses
✗Smaller models (under 13B parameters) struggle with complex nested schemas despite validation feedback
✗No built-in caching or deduplication - repeated extractions hit the LLM every time without external caching layers
✗Depends on Pydantic v2 - projects still using Pydantic v1 require migration before adoption