DSPy review 2026: Stanford NLP framework for programming LLMs with automatic prompt and weight optimization — features, optimizer list, pros, cons.
DSPy review 2026: Stanford NLP framework for programming LLMs with automatic prompt and weight optimization — features, optimizer list, pros, cons.
DSPy is a research-grade Python framework from the Stanford NLP group that treats LLM applications as programs to be written and compiled, not prompts to be hand-tuned. You declare your task using Signatures (typed input/output specs) and compose modules like Predict, ChainOfThought, ReAct, MultiChainComparison, and Retrieve into a pipeline. Then, instead of editing prompts manually, you hand DSPy a small set of labeled examples and a metric, and the built-in optimizers (BootstrapFewShot, MIPROv2, BootstrapFinetune, COPRO) search over prompts, few-shot demonstrations, and even fine-tuning data to maximize your metric on any underlying model. The result is a compiled program where the prompts are generated by the framework and updated automatically when you swap models. DSPy works with OpenAI, Anthropic, Gemini, Mistral, Together, Databricks, Ollama, and local models via LiteLLM, and integrates with most vector databases for retrieval. It has become the standard reference framework for serious LLM engineering at companies like Databricks, JetBlue, Replit, and Haize Labs, particularly for complex multi-step pipelines where manual prompt tuning is intractable. DSPy is free and open source under MIT, maintained by Stanford and Databricks researchers. There is no managed service; you bring your own model API keys.
Was this helpful?
DSPy is a paradigm-shifting framework that replaces manual prompt engineering with programmatic optimization. Revolutionary for teams building complex LLM pipelines who need measurable, reproducible quality improvements backed by metrics and evaluation methodology. The automatic optimization approach delivers genuine productivity gains and model portability, though it requires a steeper initial investment in learning the framework's abstractions and creating labeled evaluation data.
Define the input/output behavior of an LM call as a Python signature (e.g., `context, question -> reasoning, answer`) instead of a prompt string. Signatures specify field names, types, and descriptions, enabling DSPy's optimizers to automatically generate appropriate instructions, demonstrations, and formatting for any target model.
DSPy ships with a full library of optimizers that compile programs into better prompts or fine-tuned weights given a metric and training set. MIPROv2 jointly optimizes instructions and demonstrations using Bayesian surrogate models. GEPA uses reflective prompt evolution for complex reasoning. BootstrapFewShot generates demonstrations from the training set. SIMBA scales optimization to multi-module programs efficiently.
Built-in modules including ChainOfThought, ReAct, ProgramOfThought, CodeAct, BestOfN, Refine, MultiChainComparison, and Parallel let you compose multi-step LM programs the same way you compose PyTorch layers — each module encapsulates a prompting strategy and can be optimized independently or jointly within a larger program.
Through dspy.LM and LiteLLM under the hood, DSPy supports OpenAI, Anthropic, Google Gemini, Databricks, Together.ai, Ollama, vLLM, HuggingFace Transformers, and any OpenAI-compatible endpoint. Switching providers requires changing one line of configuration, and re-optimization adapts prompts to the new model's strengths automatically.
dspy.Evaluate runs programs over a dataset with parallel execution and metric aggregation, and built-in metrics include SemanticF1, answer_exact_match, answer_passage_match, and CompleteAndGrounded. Runtime assertions (dspy.Assert and dspy.Suggest) enforce constraints on LM outputs with automatic retry and backtracking on violation.
Free (MIT)
Ready to get started with DSPy?
View Pricing Options →DSPy works with these platforms and services:
We believe in transparent reviews. Here's what DSPy doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Recent additions include dspy.GEPA (Reflective Prompt Evolution) with tutorials for AIME math, structured information extraction, privacy-conscious delegation, and code backdoor classification. MCP tool support enables agent workflows with external tool servers. SIMBA optimizer provides scalable multi-module optimization. Streaming and async execution are now stable, and the framework has added improved TypedPredictor support for structured outputs with Pydantic models.
AI Agent Builders
The industry-standard framework for building production-ready LLM applications with comprehensive tool integration, agent orchestration, and enterprise observability through LangSmith.
AI agent framework
LlamaIndex is an open-source Python and TypeScript framework for building RAG, document workflows, and AI agents — with LlamaCloud for managed parsing, extraction, and indexing.
AI Agents
Open-source Python framework for orchestrating role-playing, autonomous AI agents that collaborate as a 'crew' to complete complex tasks.
Multi-Agent Builders
Microsoft's open-source framework for building multi-agent AI systems with asynchronous, event-driven architecture.
No reviews yet. Be the first to share your experience!
Get started with DSPy and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →