Stanford NLP's framework for programming language models with declarative Python modules instead of prompts, featuring automatic optimizers that compile programs into effective prompt strategies and fine-tuned weights.
Automatically fine-tunes your AI's instructions so it gives better answers — like having a compiler that optimizes your AI's performance instead of hand-writing prompts.
DSPy (Declarative Self-improving Python) is a framework from Stanford NLP that fundamentally reimagines how developers build applications with large language models by replacing fragile hand-written prompts with composable, optimizable Python modules.
Instead of manually crafting prompt strings and iterating through trial-and-error, DSPy lets you define what your program should do using typed Signatures (like context, question -> reasoning, answer) and compose behavior from built-in modules such as ChainOfThought, ReAct, and ProgramOfThought. The framework then uses automatic optimizers — including MIPROv2, GEPA, BootstrapFewShot, and COPRO — to compile your program into highly effective prompts or fine-tuned weights, given just a metric function and a small set of labeled examples.
This approach delivers several transformative benefits. First, it makes LLM programs model-portable: switching from GPT-4 to Claude to Llama requires re-optimization rather than prompt rewriting, because the optimizer discovers provider-specific strategies automatically. Second, it enables systematic quality improvement — teams routinely achieve 10-30% accuracy gains over hand-prompted baselines by letting optimizers search the space of possible instructions and demonstrations. Third, it dramatically reduces inference costs by optimizing smaller models (Llama 3, Mistral, Phi) to match or exceed the accuracy of larger models at a fraction of the per-token price.
DSPy is published at ICLR 2024 and has accumulated over 25,000 GitHub stars. It is backed by Stanford HAI and maintained by an active research team that continues to release new optimizers (GEPA for reflective prompt evolution, SIMBA for scalable optimization) and capabilities including MCP tool support, streaming, async execution, and structured output generation. The framework integrates with all major LLM providers via LiteLLM, supports vector databases like Pinecone, Weaviate, Qdrant, and Chroma for RAG pipelines, and works with observability tools including LangSmith, Langfuse, and MLflow.
DSPy is best suited for teams building production AI systems who want measurable, reproducible quality improvements rather than subjective prompt tweaking. It excels in RAG pipelines, multi-hop reasoning, classification, information extraction, and agent workflows where output quality can be quantified with a metric.
Was this helpful?
DSPy is a paradigm-shifting framework that replaces manual prompt engineering with programmatic optimization. Revolutionary for teams building complex LLM pipelines who need measurable, reproducible quality improvements backed by metrics and evaluation methodology. The automatic optimization approach delivers genuine productivity gains and model portability, though it requires a steeper initial investment in learning the framework's abstractions and creating labeled evaluation data.
Define the input/output behavior of an LM call as a Python signature (e.g., `context, question -> reasoning, answer`) instead of a prompt string. Signatures specify field names, types, and descriptions, enabling DSPy's optimizers to automatically generate appropriate instructions, demonstrations, and formatting for any target model.
DSPy ships with a full library of optimizers that compile programs into better prompts or fine-tuned weights given a metric and training set. MIPROv2 jointly optimizes instructions and demonstrations using Bayesian surrogate models. GEPA uses reflective prompt evolution for complex reasoning. BootstrapFewShot generates demonstrations from the training set. SIMBA scales optimization to multi-module programs efficiently.
Built-in modules including ChainOfThought, ReAct, ProgramOfThought, CodeAct, BestOfN, Refine, MultiChainComparison, and Parallel let you compose multi-step LM programs the same way you compose PyTorch layers — each module encapsulates a prompting strategy and can be optimized independently or jointly within a larger program.
Through dspy.LM and LiteLLM under the hood, DSPy supports OpenAI, Anthropic, Google Gemini, Databricks, Together.ai, Ollama, vLLM, HuggingFace Transformers, and any OpenAI-compatible endpoint. Switching providers requires changing one line of configuration, and re-optimization adapts prompts to the new model's strengths automatically.
dspy.Evaluate runs programs over a dataset with parallel execution and metric aggregation, and built-in metrics include SemanticF1, answer_exact_match, answer_passage_match, and CompleteAndGrounded. Runtime assertions (dspy.Assert and dspy.Suggest) enforce constraints on LM outputs with automatic retry and backtracking on violation.
$0
Ready to get started with DSPy?
View Pricing Options →DSPy works with these platforms and services:
We believe in transparent reviews. Here's what DSPy doesn't handle well:
Production-ready optimizers with automatic prompt engineering and evaluation metrics.
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Recent additions include dspy.GEPA (Reflective Prompt Evolution) with tutorials for AIME math, structured information extraction, privacy-conscious delegation, and code backdoor classification. MCP tool support enables agent workflows with external tool servers. SIMBA optimizer provides scalable multi-module optimization. Streaming and async execution are now stable, and the framework has added improved TypedPredictor support for structured outputs with Pydantic models.
AI Agent Builders
The industry-standard framework for building production-ready LLM applications with comprehensive tool integration, agent orchestration, and enterprise observability through LangSmith.
AI Agent Builders
LlamaIndex: Build and optimize RAG pipelines with advanced indexing and agent retrieval for LLM applications.
AI Agent Builders
Open-source Python framework that orchestrates autonomous AI agents collaborating as teams to accomplish complex workflows. Define agents with specific roles and goals, then organize them into crews that execute sequential or parallel tasks. Agents delegate work, share context, and complete multi-step processes like market research, content creation, and data analysis. Supports 100+ LLM providers through LiteLLM integration and includes memory systems for agent learning. Features 48K+ GitHub stars with active community.
Multi-Agent Builders
Microsoft's open-source framework for building multi-agent AI systems with asynchronous, event-driven architecture.
No reviews yet. Be the first to share your experience!
Get started with DSPy and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →