A programming language for controlling large language models with constrained generation and structured output guarantees
Control exactly how AI models generate text by enforcing structure rules during output, guaranteeing valid JSON, categories, or formatted data without retry loops
Guidance is a free, open-source Python library and constrained generation framework from Microsoft Research, categorized among AI agent builder and structured output tools, that gives developers deterministic control over large language model output by enforcing JSON schemas, regex patterns, and context-free grammars at the token level during generation — completely free with no paid tiers, usage fees, or API costs.
With over 19,000 GitHub stars and more than 100 contributors, Guidance has become one of the most widely adopted structured output libraries in the LLM ecosystem. The project averages over 50,000 monthly downloads on PyPI and has accumulated more than 3,500 forks on GitHub, reflecting strong community interest across research and production use cases. Originally released in 2023, Guidance has been under continuous development with frequent releases and an active issue tracker.
Unlike traditional prompting approaches where developers send text to a model and hope it responds in the correct format, Guidance interleaves fixed template text with constrained generation steps. This means the model can only produce tokens that satisfy the specified constraints — whether those are regex patterns, JSON schemas with full support for oneOf, allOf, anyOf, and recursive definitions, or arbitrary context-free grammars. Invalid output is structurally impossible, not merely unlikely.
At the core of Guidance is llguidance, a high-performance grammar engine written in Rust that efficiently processes constraints during token generation. This engine validates and masks tokens in real time, ensuring that every generated token conforms to the specified grammar without significant performance overhead. The Rust implementation provides substantial speed improvements over pure-Python alternatives, handling complex nested schemas and large grammars with minimal latency impact.
Guidance supports multiple model backends through a unified programming interface. Developers can use OpenAI GPT models, Azure OpenAI deployments, local models via Hugging Face Transformers or llama.cpp, and experimental Anthropic model support. The strongest structural guarantees are available with local model backends where Guidance can directly control token sampling and masking. Cloud API backends leverage provider-specific structured output features but may have some limitations compared to local execution.
Key programming primitives include gen() for constrained text generation, select() for forcing output to match one of predefined options, and the @guidance decorator for packaging reusable generation patterns as composable Python functions. Token healing automatically corrects tokenization boundary artifacts where template text meets generated content. Conditional logic with if/else blocks and loop constructs enables dynamic multi-step generation pipelines.
Guidance integrates naturally with Jupyter notebooks, providing token-level highlighting and visualization for debugging and rapid prototyping. The Mock model allows developers to validate grammar constraints and test generation pipelines without making any real LLM API calls, reducing development costs and iteration time. These features make Guidance particularly valuable for research teams and developers who need to experiment with complex generation strategies before deploying to production.
The library is MIT licensed with zero telemetry, no usage tracking, and complete source code available for audit. When used with local model backends, all data processing occurs entirely on-premises with no external network calls, making it suitable for sensitive or regulated environments.
Was this helpful?
Guidance from Microsoft Research is a powerful Python library for developers who need guaranteed structured output from LLMs. It excels at enforcing JSON schemas, regex patterns, and grammars at the token level, particularly with local model backends. The learning curve is steeper than simpler alternatives, but the guarantees it provides make it ideal for production systems where output validity is non-negotiable.
Interleave fixed template text with constrained generation steps, giving developers precise control over output structure while letting the model fill in variable content
Force generation to only produce tokens that satisfy regex patterns, JSON schemas, or context-free grammars — invalid output is structurally impossible
High-performance constraint processing engine written in Rust that efficiently validates and masks tokens during generation
Generate structured JSON guaranteed to conform to complex schemas including oneOf, allOf, anyOf, recursive definitions, and nested objects
Automatically corrects tokenization boundary artifacts where template text meets generated content, preventing subtle formatting errors
Control flow with if/else blocks and loop constructs for dynamic multi-step generation pipelines
Unified programming interface across OpenAI, Azure, local Transformers, and llama.cpp backends, with experimental Anthropic support
Package reusable generation patterns as decorated Python functions that can be composed into complex pipelines
Free
Pay-per-use via Azure
Pay-per-use via OpenAI
Free (hardware costs only)
Ready to get started with Guidance?
View Pricing Options →Guidance works with these platforms and services:
We believe in transparent reviews. Here's what Guidance doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Through 2025 and into 2026, Guidance has continued to mature its Rust-based grammar engine (llguidance) with improved JSON schema support, better error reporting, and expanded model backend compatibility. The library has seen performance improvements in constraint processing and broader community adoption for structured output use cases.
AI Agent Builders
Grammar-constrained generation for deterministic model outputs.
Coding Agents
Extract structured, validated data from any LLM using Pydantic models with automatic retries and multi-provider support. Most popular Python library with 3M+ monthly downloads and 11K+ GitHub stars.
No reviews yet. Be the first to share your experience!
Get started with Guidance and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →