AI Frameworks🔴Developer

Instructor

Name: Instructor
Brand: Instructor

Most popular Python library for getting structured, validated outputs from LLMs by combining pydantic schemas with provider-native function calling.

Starting atFree

Visit Instructor →

💡

In Plain English

Most popular Python library for getting structured, validated outputs from LLMs by combining pydantic schemas with provider-native function calling.

Overview

Instructor is the open-source pydantic-based library for structured LLM outputs with automatic retries, streaming, and support for OpenAI, Anthropic, Gemini, and 10+ other providers.

🦞

Using with OpenClaw

▼

Use Instructor within OpenClaw subagent scripts to extract structured data from LLM calls. Install via pip and use response_model parameter in your LLM client calls.

Use Case Example:

Extract structured entities, classifications, or typed records from text within OpenClaw agent workflows, ensuring validated outputs for downstream processing.

Learn about OpenClaw →

🎨

Vibe Coding Friendly?

▼

Difficulty:beginner

Simple Python API with minimal boilerplate — define a Pydantic model and add response_model to your LLM call. Requires basic Python knowledge but no ML expertise.

Learn about Vibe Coding →

Was this helpful?

Editorial Review

Instructor is the gold standard for structured LLM output extraction, with 3M+ monthly downloads and support for 15+ providers. Using Pydantic models for validation and automatic retry logic, it turns unreliable LLM text into guaranteed typed Python objects. Essential for any production system that needs reliable, structured responses from LLMs.

Key Features

Pydantic Model Integration+

Define output structure as Pydantic models with typed fields, descriptions, and validators. Instructor converts these to function-calling schemas and returns validated Python objects automatically.

Automatic Retry with Validation Feedback+

When Pydantic validation fails, Instructor provides specific error messages to the LLM and retries. Models receive context about validation failures and can self-correct, achieving 99%+ success rates.

Multi-Provider Support (15+)+

Unified from_provider() interface works with OpenAI, Anthropic, Google, Cohere, Mistral, DeepSeek, Ollama, and 10+ more providers. Switch providers without code changes for easy A/B testing and cost optimization.

Streaming Partial Objects+

Get incremental Pydantic model updates as the LLM generates tokens. Fields populate progressively, enabling real-time UIs that show structured data appearing as extraction progresses.

Multiple Extraction Modes+

TOOLS mode uses native function calling for maximum reliability, JSON mode forces JSON output for weaker models, MD_JSON extracts from markdown blocks, and PARALLEL extracts multiple objects simultaneously.

Union Type Classification+

Use Union types to let the LLM select the appropriate Pydantic model for classification tasks. Supports discriminated unions and automatic routing based on input content analysis.

Pricing Plans

Open Source

Free (MIT)

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with Instructor?

View Pricing Options →

Getting Started with Instructor

1Install Instructor via pip: 'pip install instructor' and import the library into your Python project
2Patch your existing LLM client using 'instructor.from_provider(provider_name)' or specific provider functions like 'instructor.from_openai()'
3Define a Pydantic model describing your desired output schema with typed fields and optional validation rules
4Call your client with the response_model parameter: 'client.create(response_model=YourModel, messages=[...])' to get validated Python objects
5Handle the returned Pydantic object directly in your application code with full type safety and IDE support

Ready to start? Try Instructor →

Best Use Cases

🎯

Any Python LLM app that needs reliable typed JSON output

⚡

Multi-provider applications needing one structured-output API

🔧

Extraction, classification, and report-generation pipelines

🚀

Teams using small open models that need retry-based reliability

Integration Ecosystem

11 integrations

Instructor works with these platforms and services:

🧠 LLM Providers

OpenAIAnthropicGoogleCohereMistraldeepseekOllama

📈 Monitoring

LangSmithLangfuse

🔗 Other

pydanticlitellm

View full Integration Matrix →

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Instructor doesn't handle well:

⚠Instructor is deliberately a focused library, not a full agent framework — it does not handle memory, retrieval, orchestration, evaluation harnesses, or deployment, so it must be combined with other tools for complete applications. Quality of structured outputs is bounded by the underlying model: smaller or weaker open-source models may loop through retries before producing a valid object, which raises both latency and token cost. Some provider modes (notably JSON mode on models without native tool calling) offer weaker guarantees than OpenAI-style structured outputs, and choosing the right mode per provider sometimes requires experimentation. The non-Python ports (TypeScript, Go, Ruby, Elixir, PHP) trail the Python library in feature parity and documentation depth, and any schema feature that depends on Pydantic-specific behavior may not translate cleanly to those ecosystems.

Pros & Cons

✓ Pros

✓Trivially small surface area — a Python developer can adopt it in 10 minutes
✓Pydantic validation gives you real Python types, not stringly-typed dicts
✓Provider-agnostic — switch OpenAI ↔ Anthropic without touching prompt code
✓Retry-on-validation-error pattern materially improves small-model reliability
✓Massive adoption (1M+ monthly downloads) means lots of examples and Stack Overflow answers

✗ Cons

✗Pure library — no UI, no eval, no observability included
✗Streaming partials require careful handling on the consumer side
✗Each retry costs another LLM call; can get expensive on hard schemas
✗No built-in prompt versioning or A/B testing primitives
✗Doesn't help with prompt engineering itself — only with output validation

Frequently Asked Questions

What is Instructor and what problem does it solve?+

Instructor is an open-source library for extracting structured, validated data from large language models. It lets you define the shape of the output you want using a Pydantic model (in Python, with equivalents in TypeScript, Go, and Ruby), then handles prompting, parsing, validation, and automatic retries so you receive a typed object instead of a raw string of JSON-ish text.

Which LLM providers does Instructor support?+

Instructor patches the official client SDKs of most major providers, including OpenAI, Anthropic Claude, Google Gemini and Vertex AI, Mistral, Cohere, Groq, Together, Fireworks, Anyscale, Databricks, Ollama, llama.cpp, and vLLM. The same Pydantic schema and call pattern works across providers, so swapping models is typically a one-line change.

Do I need to know Pydantic to use Instructor?+

A basic understanding of Pydantic is strongly recommended, because Instructor uses Pydantic models to define output schemas and to power validation. The good news is that the same skills transfer directly to FastAPI, LangChain, and many other Python tools, and Instructor's documentation includes worked examples for common patterns like nested models, enums, and custom validators.

How does Instructor handle validation failures?+

When a model returns output that does not match your schema, Instructor catches the Pydantic ValidationError and automatically issues a follow-up request containing the original schema and the specific error messages, asking the model to correct itself. You control the maximum number of retries, and you can hook into the loop for logging or custom recovery logic.

Can I use Instructor with open source or local models?+

Yes. Instructor integrates with Ollama, llama.cpp, vLLM, Together, Fireworks, Anyscale, Groq, and other open-source-friendly runtimes. Quality of structured output depends on the underlying model's instruction-following ability, but Instructor's retry-with-validation loop helps compensate for weaker models that occasionally produce malformed JSON.

🔒 Security & Compliance

—

SOC2

Unknown

—

GDPR

Unknown

—

HIPAA

Unknown

—

SSO

Unknown

✅

Self-Hosted

Yes

✅

On-Prem

Yes

—

RBAC

Unknown

—

Audit Log

Unknown

—

API Key Auth

Unknown

✅

Open Source

Yes

—

Encryption at Rest

Unknown

—

Encryption in Transit

Unknown

Data Retention: configurable

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on Instructor and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

Instructor has continued expanding beyond Python in 2025 and into 2026, with official ports for TypeScript, Go, Elixir, PHP, and Ruby reaching broader provider coverage. The Python library has standardized on a from_provider entry point that unifies client instantiation across OpenAI, Anthropic, Gemini, Mistral, Cohere, Groq, Together, Fireworks, Ollama, and vLLM, and has added first-class support for newer provider features such as OpenAI Structured Outputs, Anthropic tool use refinements, and Gemini's structured generation modes. The documentation has been reorganized around a learning track, integrations catalog, cookbook, and concepts guide, and the hooks API has matured into the recommended path for observability and tracing integrations.

Alternatives to Instructor

PydanticAI

Developer Framework

PydanticAI is an AI-powered developer framework tool for building custom ai agents and structured output and tool calling.

Outlines

AI Agent Builders

Grammar-constrained generation for deterministic model outputs.

View All Alternatives & Detailed Comparison →

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try Instructor Today

Get started with Instructor and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about Instructor

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Editorial Review

Key Features

Pydantic Model Integration+

Define output structure as Pydantic models with typed fields, descriptions, and validators. Instructor converts these to function-calling schemas and returns validated Python objects automatically.

Automatic Retry with Validation Feedback+

Multi-Provider Support (15+)+

Streaming Partial Objects+

Get incremental Pydantic model updates as the LLM generates tokens. Fields populate progressively, enabling real-time UIs that show structured data appearing as extraction progresses.

Multiple Extraction Modes+

Union Type Classification+

Use Union types to let the LLM select the appropriate Pydantic model for classification tasks. Supports discriminated unions and automatic routing based on input content analysis.

Getting Started with Instructor

1Install Instructor via pip: 'pip install instructor' and import the library into your Python project

2Patch your existing LLM client using 'instructor.from_provider(provider_name)' or specific provider functions like 'instructor.from_openai()'

3Define a Pydantic model describing your desired output schema with typed fields and optional validation rules

4Call your client with the response_model parameter: 'client.create(response_model=YourModel, messages=[...])' to get validated Python objects

5Handle the returned Pydantic object directly in your application code with full type safety and IDE support

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Instructor doesn't handle well:

⚠Instructor is deliberately a focused library, not a full agent framework — it does not handle memory, retrieval, orchestration, evaluation harnesses, or deployment, so it must be combined with other tools for complete applications. Quality of structured outputs is bounded by the underlying model: smaller or weaker open-source models may loop through retries before producing a valid object, which raises both latency and token cost. Some provider modes (notably JSON mode on models without native tool calling) offer weaker guarantees than OpenAI-style structured outputs, and choosing the right mode per provider sometimes requires experimentation. The non-Python ports (TypeScript, Go, Ruby, Elixir, PHP) trail the Python library in feature parity and documentation depth, and any schema feature that depends on Pydantic-specific behavior may not translate cleanly to those ecosystems.

Pros & Cons

✓ Pros

✓Trivially small surface area — a Python developer can adopt it in 10 minutes
✓Pydantic validation gives you real Python types, not stringly-typed dicts
✓Provider-agnostic — switch OpenAI ↔ Anthropic without touching prompt code
✓Retry-on-validation-error pattern materially improves small-model reliability
✓Massive adoption (1M+ monthly downloads) means lots of examples and Stack Overflow answers

✗ Cons

✗Pure library — no UI, no eval, no observability included
✗Streaming partials require careful handling on the consumer side
✗Each retry costs another LLM call; can get expensive on hard schemas
✗No built-in prompt versioning or A/B testing primitives
✗Doesn't help with prompt engineering itself — only with output validation