Phoenix is Arize's open-source LLM observability project, and it has quietly become the default way tens of thousands of teams see what their agents are actually doing in production. The pitch is simple: `pip install arize-phoenix`, instrument with OpenInference (or any OpenTelemetry-compatible library), and every LLM call, tool invocation, retrieval, and embedding shows up as a spanned timeline you can filter, search, and replay. No vendor account required, no proprietary SDK lock-in. The Open
Phoenix is Arize's open-source LLM observability project, and it has quietly become the default way tens of thousands of teams see what their agents are actually doing in production. The pitch is simple: `pip install arize-phoenix`, instrument with OpenInference (or any OpenTelemetry-compatible library), and every LLM call, tool invocation, retrieval, and embedding shows up as a spanned timeline you can filter, search, and replay. No vendor account required, no proprietary SDK lock-in. The Open
Phoenix is Arize's open-source LLM observability project, used by tens of thousands of teams as the default way to see what their agents are actually doing. Phoenix ingests OpenTelemetry-compatible traces and renders every LLM call, tool invocation, retrieval, and embedding as a spanned timeline. On top of tracing, Phoenix ships evaluations, prompt playgrounds, dataset management, and an annotation UI. The product runs locally as a Python package, in Docker, or in Kubernetes, with a hosted SaaS tier and an enterprise platform (Arize AX) for production monitoring.
Was this helpful?
Leading open-source LLM observability platform offering comprehensive tracing, evaluation, and experimentation without vendor lock-in. Ideal for teams with DevOps capacity who need deep analytical insights into LLM application behavior, RAG pipeline quality, and multi-agent workflow debugging. Phoenix stands out for its OpenTelemetry foundation, which ensures trace portability and avoids ecosystem lock-in, and its robust evaluation framework that supports both automated LLM-as-a-judge scoring and human annotation workflows. The self-hosted model with zero licensing costs makes it particularly attractive for regulated industries and cost-conscious teams, though the operational overhead of managing infrastructure and the steeper learning curve compared to polished SaaS alternatives like LangSmith should be weighed against these benefits. With over 18,000 GitHub stars and strong backing from Arize AI, the project demonstrates sustained momentum and community adoption.
$0
Free / paid tiers
Contact sales
Ready to get started with Arize Phoenix?
View Pricing Options →Arize Phoenix works with these platforms and services:
We believe in transparent reviews. Here's what Arize Phoenix doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Through late 2025 and into 2026, Phoenix has expanded agent-focused tracing with deeper support for LangGraph, CrewAI, and AutoGen, including visualizations for multi-agent coordination and tool-call sequence inspection. The evaluation framework has been enhanced with new built-in evaluators for code generation quality, multi-turn conversation coherence, and structured output validation. Session and thread-based tracing now provides better visibility into conversational AI applications, grouping related interactions and tracking context evolution across turns. The prompt playground has been upgraded with multi-model comparison capabilities, allowing teams to test prompts against several providers simultaneously and feed results directly into experiments. Guardrails integration enables teams to define and monitor safety boundaries alongside performance metrics. The annotation workflow has been streamlined with bulk labeling tools, inter-annotator agreement metrics, and API-driven integration with external labeling platforms. Infrastructure improvements include faster trace ingestion, improved query performance for large datasets, and better support for high-cardinality span attributes in production environments.
AI Observability
LangSmith is LangChain's commercial observability, evaluation and prompt management platform for LLM apps and agents in production.
LLM Observability
Langfuse is an open-source LLM observability and engineering platform providing tracing, prompt management, evaluations, and dataset management for production AI applications.
LLM Observability
AI observability platform for evals, production tracing, prompt management, and regression detection.
LLM Observability
Open-source LLM observability and AI gateway — logs every prompt, response, cost, and latency across 20+ providers with a one-line proxy or async SDK, plus caching, retries, and prompt experiments.
MLOps
End-to-end MLOps and AI developer platform — Models (experiment tracking, sweeps, model registry) plus Weave (LLM/agent observability and evals) — used by frontier labs and enterprise ML teams.
No reviews yet. Be the first to share your experience!
Get started with Arize Phoenix and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →