AI Evaluation🔴Developer

Promptfoo

Name: Promptfoo
Brand: Promptfoo
Availability: InStock

Open-source CLI and library for testing, evaluating, and red-teaming LLM prompts, models, and RAG pipelines — runs locally on your machine or in CI.

Starting atFree

Visit Promptfoo →

💡

In Plain English

Developer-focused open-source CLI and library for local or CI-based LLM evaluation, red-teaming, and RAG regression testing.

Overview

Promptfoo is best for engineering and security teams that need open-source, repeatable LLM evaluation, red-teaming, and regression testing in local development or CI, with a free Community tier and quote-based Enterprise or On-Premise options for teams that need shared dashboards, SSO, SLAs, managed cloud, or customer-controlled deployment. Public Promptfoo documentation describes the core product as an open-source CLI and library for evaluating and red-teaming LLM apps, and the website expands that positioning across evaluations, red teaming, guardrails, model security, MCP Proxy, and code scanning. That source context matters because Promptfoo is not just a hosted evaluation dashboard: it can run locally, fit into CI workflows, compare prompts and providers, test RAG behavior, apply assertions, and help teams catch regressions before release. The Community plan is listed as Free Forever at $0/month and includes all LLM evaluation features, all model providers and integrations, custom integration with your own app, local or self-hosted operation, vulnerability scanning, community support, and red teaming up to 10k probes per month. Enterprise and On-Premise are both public Custom plans rather than self-serve paid tiers. Promptfoo does not publish exact Enterprise or On-Premise monthly prices, annual prices, billing periods, conversion thresholds from Community, paid seat limits, standard usage caps, data-retention terms, or minimum contract lengths in the provided pricing content. The clearest public conversion path is therefore usage and procurement driven: teams can start on Community, then contact sales when they need Enterprise capabilities such as team sharing, continuous monitoring, centralized security and compliance dashboards, configurable attack profiles, SSO, granular permissions, Promptfoo API access, managed cloud deployment, professional services, priority support, or SLA guarantees. On-Premise is the higher-control custom option for deployment on customer infrastructure, complete data isolation, a dedicated runner, an assigned deployment engineer, and enterprise security and deployment support. For buyers, the practical pricing takeaway is simple: Promptfoo is transparent and generous at the free tier, but paid planning requires a sales conversation because final Enterprise and On-Premise cost, billing frequency, limits, and contractual obligations are not publicly specified. The tool is strongest when a team wants evaluation definitions and security tests to live close to engineering workflows, especially for prompt changes, model comparisons, RAG pipelines, agent behavior, MCP-related security boundaries, and release gates in CI/CD. It is less ideal as a purely nontechnical prompt management workspace or as a replacement for full production observability, because its public materials emphasize repeatable evaluation, red-team testing, vulnerability scanning, guardrails, MCP Proxy security, and development-time checks more than trace-first monitoring.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

Evaluations+

Promptfoo evaluates prompts, models, and RAG pipelines so teams can compare behavior across changes. This is useful for regression testing, factuality checks, hallucination reduction, and validating whether a model or retrieval change improves real application outputs.

Red Teaming+

The Red Teaming product is designed to proactively identify and fix vulnerabilities in AI applications. Teams can use it to test jailbreak resistance, adversarial prompts, unsafe completions, and other security risks before users encounter them.

Guardrails+

Promptfoo’s Guardrails are positioned as real-time protection against jailbreaks and adversarial attacks. This makes the platform relevant not only for offline evaluation but also for teams considering runtime safety controls around LLM applications.

MCP Proxy+

The MCP Proxy is described as a secure proxy for Model Context Protocol communications. This is important for agentic systems that use MCP connections and need a security boundary around model-to-tool or model-to-context interactions.

Code Scanning+

Promptfoo’s Code Scanning product finds LLM vulnerabilities in IDE and CI/CD workflows. That lets engineering teams catch AI-specific security issues earlier in the software development process instead of relying only on manual review or production monitoring.

Pricing Plans

Community

$0/month

✓Free Forever
✓All LLM evaluation features
✓All model providers and integrations
✓Red teaming up to 10k probes per month
✓Custom integration with your own app
✓Local or self-hosted operation
✓Vulnerability scanning
✓Community support

Enterprise

Custom

✓Quote-based paid plan; no public fixed monthly or annual price
✓No public billing period, minimum contract term, paid seat limit, or standard usage cap listed
✓Conversion from Community requires contacting sales when Enterprise features or higher organizational needs apply
✓Team sharing
✓Continuous monitoring
✓Centralized security and compliance dashboards
✓Configurable attack profiles
✓SSO
✓Granular permissions
✓Promptfoo API access
✓Managed cloud deployment
✓Professional services
✓Priority support
✓SLA guarantees

On-Premise

Custom

✓Quote-based paid plan; no public fixed monthly or annual price
✓No public billing period, minimum contract term, paid seat limit, or standard usage cap listed
✓Conversion from Community or Enterprise requires contacting sales for customer-infrastructure deployment terms
✓Deployment on customer infrastructure
✓Complete data isolation
✓Dedicated runner
✓Assigned deployment engineer
✓Enterprise security and deployment support

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with Promptfoo?

View Pricing Options →

Best Use Cases

🎯

A platform engineering team adds Promptfoo evaluations to CI so every prompt, model, or RAG retrieval change is tested against known regression cases before it can be merged.

⚡

A security team runs Promptfoo Red Teaming against a customer-facing AI assistant to identify jailbreaks, adversarial prompts, and unsafe responses before launch.

🔧

A financial services company uses Promptfoo for FINRA-aligned security testing of internal or customer-facing AI workflows that handle regulated communications.

🚀

An insurance company evaluates whether an AI support assistant answers policyholder coverage questions accurately without exposing sensitive data or inventing policy terms.

💡

A telecommunications provider tests voice and text AI agents for unsafe escalation behavior, jailbreak resistance, and adversarial input handling.

🔄

A real estate business checks an AI assistant for fair housing compliance risks when it answers questions about listings, neighborhoods, eligibility, or applicant screening.

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Promptfoo doesn't handle well:

⚠Enterprise and On-Premise pricing are Custom, with no exact public monthly price, annual price, paid seat limit, or standard contract term visible in the provided website content.
⚠Promptfoo does not remove the need to design high-quality prompts, test cases, assertions, graders, representative datasets, and remediation processes.
⚠The tool is broader than simple prompt testing, so smaller teams may need time to understand the differences between evaluations, red teaming, guardrails, model security, MCP Proxy, and code scanning.
⚠It is not described primarily as a production trace viewer or full observability suite, so live debugging and monitoring may require another product.
⚠Security testing can surface vulnerabilities, but organizations still need human review, policy decisions, and engineering follow-through for high-risk AI deployments.

Pros & Cons

✓ Pros

✓Covers 6 product areas listed on the website: Red Teaming, Guardrails, Model Security, MCP Proxy, Code Scanning, and Evaluations.
✓Community plan is described as Free Forever and includes local or self-hosted operation, all LLM evaluation features, vulnerability scanning, and red teaming up to 10k probes per month.
✓Useful beyond prompt testing because it includes real-time guardrail positioning, model security monitoring, MCP Proxy protection, and IDE/CI/CD code scanning for LLM vulnerabilities.
✓Strong fit for regulated workflows because the website names 4 industry solution areas: Financial Services, Insurance, Telecommunications, and Real Estate.
✓Supports development workflows where evaluations and red-team checks can run before merge or release instead of relying only on post-deployment monitoring.
✓The site displays a public 20.6k metric alongside its open-source and community positioning, indicating substantial visible adoption or repository activity.

✗ Cons

✗Public paid pricing is quote-based: Enterprise and On-Premise are listed as Custom rather than fixed monthly or annual prices.
✗The product surface is broad, so teams that only need simple prompt regression tests may find the security, guardrails, MCP proxy, and model-security positioning more than they need.
✗Red-teaming and evaluation quality still depend on well-designed test cases, assertions, graders, and representative datasets.
✗The website emphasizes development-time and security testing more than production observability, so teams may still need a tracing or monitoring platform alongside Promptfoo.
✗Enterprise suitability is clear, but self-serve details such as exact paid seat limits, usage caps beyond Community red-team probes, hosted data retention, and final contract terms are not visible in the public pricing content.

Frequently Asked Questions

What is Promptfoo used for?+

Promptfoo is used to test and evaluate AI applications before they reach users. The public documentation describes it as an open-source CLI and library for evaluating and red-teaming LLM apps, and the website lists products for evaluations, red teaming, guardrails, model security, MCP proxy protection, and code scanning. In practice, this means teams can compare prompts and models, test RAG factuality, look for jailbreak risks, and scan LLM application code as part of development or CI/CD.

Is Promptfoo open source?+

Yes. Promptfoo’s documentation describes it as an open-source CLI and library, and the public pricing page lists a Community plan as Free Forever. The Community plan includes core evaluation and vulnerability-scanning workflows, local or self-hosted operation, all listed model providers and integrations, and red teaming up to 10k probes per month. The same pricing page also lists Enterprise and On-Premise paid options with custom pricing.

How is Promptfoo different from LangSmith or Braintrust?+

Promptfoo is more focused on systematic testing, red-teaming, and AI security checks during development, while tools such as LangSmith and Braintrust are often selected for tracing, observability, experiment tracking, or evaluation management. Promptfoo’s website lists Red Teaming, Guardrails, Model Security, MCP Proxy, Code Scanning, and Evaluations as separate product areas, which gives it a stronger security-testing orientation. Choose Promptfoo when you need adversarial testing and CI-friendly regression checks around LLM applications.

Can Promptfoo help with regulated AI applications?+

Yes, the website explicitly lists industry solutions for Financial Services, Insurance, Telecommunications, and Real Estate. It mentions examples such as FINRA-aligned security testing, policyholder data and coverage accuracy, voice and text AI agent security, and fair housing compliance testing. Those examples suggest Promptfoo is aimed at teams that need evidence-driven testing around compliance, safety, and business-specific failure modes. Teams should still validate whether the enterprise deployment, audit, and contract terms meet their own regulatory requirements.

Does Promptfoo provide real-time protection or only offline evaluation?+

The website presents both evaluation and protection-oriented products. Evaluations cover prompt, model, and RAG testing, while Guardrails are described as real-time protection against jailbreaks and adversarial attacks. The site also lists an MCP Proxy for securing Model Context Protocol communications and Code Scanning for finding LLM vulnerabilities in IDE and CI/CD. That combination means Promptfoo can support pre-deployment testing and some runtime protection use cases, although production observability may still require a separate tracing or monitoring tool.

How much does Promptfoo cost?+

Promptfoo’s public pricing page lists Community as Free Forever at $0/month, Enterprise as Custom, and On-Premise as Custom. Community includes all LLM evaluation features, all model providers and integrations, red teaming up to 10k probes per month, local or self-hosted operation, vulnerability scanning, and community support. Enterprise and On-Premise do not publish exact monthly or annual prices, billing periods, paid seat limits, minimum contract terms, standard usage caps, or automatic upgrade thresholds; teams must contact sales for a quote and final conversion terms.

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on Promptfoo and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

The scraped website content states “Promptfoo is now part of OpenAI” and shows © 2026 Promptfoo, Inc. The provided content does not include a dated release note or detailed 2025-2026 changelog beyond that update.

Alternatives to Promptfoo

Braintrust

LLM Observability

Braintrust is an evals-first LLM observability platform combining production tracing, prompt playgrounds, autoevals, and Topics-based pattern discovery for teams shipping AI in production.

LangSmith

AI Observability

LangSmith is LangChain's commercial observability, evaluation and prompt management platform for LLM apps and agents in production.

Humanloop

LLM evaluation and governance

an LLM development platform for prompt management, evaluations, logging, and trustworthy AI product iteration; the homepage announces the team joining Anthropic.

DeepEval

Testing & Quality

Open-source LLM evaluation framework with 50+ research-backed metrics including hallucination detection, tool use correctness, and conversational quality. Pytest-style testing for AI agents with CI/CD integration.

View All Alternatives & Detailed Comparison →

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try Promptfoo Today

Get started with Promptfoo and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about Promptfoo

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Key Features

Evaluations+

Red Teaming+

Guardrails+

MCP Proxy+

Code Scanning+

Pricing Plans

Community

$0/month

✓Free Forever
✓All LLM evaluation features
✓All model providers and integrations
✓Red teaming up to 10k probes per month
✓Custom integration with your own app
✓Local or self-hosted operation
✓Vulnerability scanning
✓Community support

Enterprise

Custom

✓Quote-based paid plan; no public fixed monthly or annual price
✓No public billing period, minimum contract term, paid seat limit, or standard usage cap listed
✓Conversion from Community requires contacting sales when Enterprise features or higher organizational needs apply
✓Team sharing
✓Continuous monitoring
✓Centralized security and compliance dashboards
✓Configurable attack profiles
✓SSO
✓Granular permissions
✓Promptfoo API access
✓Managed cloud deployment
✓Professional services
✓Priority support
✓SLA guarantees

On-Premise

Custom

✓Quote-based paid plan; no public fixed monthly or annual price
✓No public billing period, minimum contract term, paid seat limit, or standard usage cap listed
✓Conversion from Community or Enterprise requires contacting sales for customer-infrastructure deployment terms
✓Deployment on customer infrastructure
✓Complete data isolation
✓Dedicated runner
✓Assigned deployment engineer
✓Enterprise security and deployment support

Ready to get started with Promptfoo?

View Pricing Options →

Best Use Cases

🎯

A platform engineering team adds Promptfoo evaluations to CI so every prompt, model, or RAG retrieval change is tested against known regression cases before it can be merged.

⚡

A security team runs Promptfoo Red Teaming against a customer-facing AI assistant to identify jailbreaks, adversarial prompts, and unsafe responses before launch.

🔧

A financial services company uses Promptfoo for FINRA-aligned security testing of internal or customer-facing AI workflows that handle regulated communications.

🚀

An insurance company evaluates whether an AI support assistant answers policyholder coverage questions accurately without exposing sensitive data or inventing policy terms.

💡

A telecommunications provider tests voice and text AI agents for unsafe escalation behavior, jailbreak resistance, and adversarial input handling.

🔄

A real estate business checks an AI assistant for fair housing compliance risks when it answers questions about listings, neighborhoods, eligibility, or applicant screening.

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Promptfoo doesn't handle well:

⚠Enterprise and On-Premise pricing are Custom, with no exact public monthly price, annual price, paid seat limit, or standard contract term visible in the provided website content.

⚠Promptfoo does not remove the need to design high-quality prompts, test cases, assertions, graders, representative datasets, and remediation processes.

⚠The tool is broader than simple prompt testing, so smaller teams may need time to understand the differences between evaluations, red teaming, guardrails, model security, MCP Proxy, and code scanning.

⚠It is not described primarily as a production trace viewer or full observability suite, so live debugging and monitoring may require another product.

⚠Security testing can surface vulnerabilities, but organizations still need human review, policy decisions, and engineering follow-through for high-risk AI deployments.

Pros & Cons

✓ Pros

✓Covers 6 product areas listed on the website: Red Teaming, Guardrails, Model Security, MCP Proxy, Code Scanning, and Evaluations.
✓Community plan is described as Free Forever and includes local or self-hosted operation, all LLM evaluation features, vulnerability scanning, and red teaming up to 10k probes per month.
✓Useful beyond prompt testing because it includes real-time guardrail positioning, model security monitoring, MCP Proxy protection, and IDE/CI/CD code scanning for LLM vulnerabilities.
✓Strong fit for regulated workflows because the website names 4 industry solution areas: Financial Services, Insurance, Telecommunications, and Real Estate.
✓Supports development workflows where evaluations and red-team checks can run before merge or release instead of relying only on post-deployment monitoring.
✓The site displays a public 20.6k metric alongside its open-source and community positioning, indicating substantial visible adoption or repository activity.

✗ Cons

✗Public paid pricing is quote-based: Enterprise and On-Premise are listed as Custom rather than fixed monthly or annual prices.
✗The product surface is broad, so teams that only need simple prompt regression tests may find the security, guardrails, MCP proxy, and model-security positioning more than they need.
✗Red-teaming and evaluation quality still depend on well-designed test cases, assertions, graders, and representative datasets.
✗The website emphasizes development-time and security testing more than production observability, so teams may still need a tracing or monitoring platform alongside Promptfoo.
✗Enterprise suitability is clear, but self-serve details such as exact paid seat limits, usage caps beyond Community red-team probes, hosted data retention, and final contract terms are not visible in the public pricing content.