Open-source CLI and library for testing, evaluating, and red-teaming LLM prompts, models, and RAG pipelines — runs locally on your machine or in CI.
Open-source CLI and library for testing, evaluating, and red-teaming LLM prompts, models, and RAG pipelines — runs locally on your machine or in CI.
Promptfoo is an open-source tool that has become the most popular CLI for evaluating LLM prompts and applications. You write a YAML config that lists prompts, providers, test cases, and assertions and Promptfoo runs the matrix locally, caches results, and shows a web UI diff between configurations.
Was this helpful?
Feature information is available on the official website.
View Features →Free (MIT)
Custom / usage-based
Custom
Ready to get started with Promptfoo?
View Pricing Options →We believe in transparent reviews. Here's what Promptfoo doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
LLM Observability
AI observability platform for evals, production tracing, prompt management, and regression detection.
AI Observability
LangSmith is LangChain's commercial observability, evaluation and prompt management platform for LLM apps and agents in production.
LLM evaluation and governance
an LLM development platform for prompt management, evaluations, logging, and trustworthy AI product iteration; the homepage announces the team joining Anthropic.
Testing & Quality
Open-source LLM evaluation framework with 50+ research-backed metrics including hallucination detection, tool use correctness, and conversational quality. Pytest-style testing for AI agents with CI/CD integration.
No reviews yet. Be the first to share your experience!
Get started with Promptfoo and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →