Master Phoenix by Arize with our step-by-step tutorial, detailed feature walkthrough, and expert tips.
Start with Phoenix self
hosted for a free local or managed
you deployment. Instrument an LLM application using the Python or JavaScript SDK, OpenInference, or OpenTelemetry
compatible spans. Send traces to Phoenix, review spans, add evaluators, and use datasets or experiments to improve prompts and workflows. Compare Phoenix Cloud or Arize AX if the team needs hosted infrastructure, online evaluations, retention, support, or enterprise controls.
💡 Quick Start: Follow these 4 steps in order to get up and running with Phoenix by Arize quickly.
Explore the key features that make Phoenix by Arize powerful for analytics & monitoring workflows.
Trace collection from popular frameworks such as LangChain, LlamaIndex, OpenAI, and Anthropic, with agent tracing graphs, multi-agent workflow visualization, and span-level detail on LLM calls, tool invocation, and retrieval steps.
Debugging a multi-agent customer service system by tracing exactly which agent handled a query, what retrieval documents were used, which tool calls were made, and where the response quality degraded.
Score traces and spans using LLM-based evaluators, code-based checks such as regex or assertions, or human annotation labels. Supports offline batch evaluation, while managed AX plans add online evaluation workflows.
Running hallucination detection on production responses while maintaining a human labeling queue for edge cases, creating a continuous quality improvement loop.
Replay traced LLM calls with different prompts, models, or parameters. Compare results side-by-side with evaluation scoring. Iterate rapidly on prompt engineering without deploying changes to production.
Taking a poorly-performing production trace, replaying it with three different prompt variations, scoring each with relevance and accuracy evaluators, and deploying the winner.
Track token usage and costs across supported models and providers. Attribute costs to specific agents, workflows, and traces for financial visibility and optimization.
Identifying that a sales agent's summarization step consumes 60% of total token budget, then testing a smaller model for that specific step to reduce costs while maintaining quality.
Evaluators can help detect when LLM responses contain unsupported information, are irrelevant to the query, or violate configured quality thresholds, with flagging and alerting workflows depending on deployment and plan.
Monitoring a medical information chatbot for factual accuracy, flagging responses where the model generates unsupported claims, and routing flagged interactions for human review.
Arize's built-in AI agent for trace debugging and analysis. Alyx can explain span context, debug traces, create dashboards and widgets, optimize prompts, and search traces using natural language.
Asking Alyx 'Why did response quality drop for support queries last Tuesday?' and getting an analysis of trace patterns, evaluation scores, and potential root causes.
Phoenix is purpose-built for LLM and agent workflows, with trace inspection, evaluations, prompt and retrieval analysis, and AI-specific metadata such as tokens, spans, embeddings, and evaluator scores. General monitoring tools can still be useful for infrastructure, application metrics, and broader production observability.
Yes. While Phoenix provides automatic instrumentation for popular frameworks, it also supports custom instrumentation via Python SDK, JavaScript SDK, and OpenTelemetry-compatible spans for monitoring LLM applications or custom agent implementations.
Phoenix is the open-source library with tracing, evaluation, and experimentation workflows that teams can self-host for free. Phoenix Cloud provides free hosted Phoenix instances with fixed storage, while Arize AX is the managed cloud platform that adds hosted production observability, online evaluations, the Alyx AI assistant, product monitoring, retention, support, and enterprise controls depending on plan and contract.
Both. Phoenix supports real-time trace collection plus offline batch evaluation for deeper analysis. AX adds online evaluations that can score production traces continuously and support alerting workflows for quality or safety issues.
AX Free includes 25K spans/month and 1 GB ingestion. AX Pro is listed at $50/month with 50K spans/month, 10 GB ingestion, 30 days retention, higher rate limits, and email support. Enterprise pricing is custom based on scale, retention, support, and contracted controls.
Now that you know how to use Phoenix by Arize, it's time to put this knowledge into practice.
Sign up and follow the tutorial steps
Check pros, cons, and user feedback
See how it stacks against alternatives
Follow our tutorial and master this powerful analytics & monitoring tool in minutes.
Tutorial updated March 2026