Arize Phoenix is an open-source AI observability and evaluation platform developed by Arize AI, designed to help teams monitor, debug, and evaluate large language model (LLM) applications and machine learning models. It provides tools for tracing LLM calls, evaluating model outputs, visualizing embeddings, and identifying issues such as hallucinations, retrieval failures, and prompt problems. Phoenix supports integration with popular frameworks like LangChain, LlamaIndex, and OpenAI, enabling developers to gain deep insights into their AI pipelines throughout the development and production lifecycle.
Founded
2020
Company Size
51-200 employees
Headquarters
San Francisco, USA
Funding
Series B
Provides Python and TypeScript SDKs (arize-phoenix-evals) for running evaluations programmatically and integrating into CI/CD.
Captures detailed traces of LLM runs, including model calls, retrieval, tool use, and custom logic, to visualize and debug workflows.
Enables evaluation of prompts using production examples to assess relevance, toxicity, and response quality.
Allows running experiments that compare different prompts, models, or configurations on the same inputs to measure impact.
Can be self‑hosted on your own machine or server, giving full control over data storage and processing.
Available as a cloud service for teams that prefer managed hosting.
Monitors latency, token usage, and other performance metrics across LLM workflows.
Helps identify failures, regressions, and poor responses by comparing outputs across runs and experiments.
Designed to be vendor‑ and language‑agnostic, supporting multiple LLM providers and frameworks.
The core Phoenix platform is open source, allowing inspection, modification, and community contributions.
Offers an extensible architecture and plugin system to customize functionality, integrate new data sources, or add custom evaluators.
Built on OpenTelemetry (OTLP) to ingest telemetry data from existing instrumentation and standardize LLM observability.
Uses OpenInference to standardize telemetry data for LLMs and surrounding application context, enabling consistent evaluation.
Provides auto‑instrumentation for popular frameworks such as LlamaIndex, LangChain, DSPy, Mastra, and Vercel AI SDK.
Supports major LLM providers including OpenAI, Bedrock, Anthropic, MistralAI, VertexAI, LiteLLM, and others.
Works with Arize’s production observability platform, using the same collector frameworks for consistent metrics and tooling.
Can be used directly in notebooks for ML observability and experimentation during development.
Supports Python, TypeScript, and Java for instrumentation and evaluation workflows.
Common questions about Arize Phoenix features, pricing, and capabilities
Phoenix provides specialized evaluation tools that use LLM-as-a-judge metrics to score outputs for groundedness and relevance. By tracing the execution path, developers can pinpoint exactly where a model deviated from the provided context, allowing for rapid debugging of hallucination issues in RAG pipelines.
Yes, Phoenix includes a powerful embedding visualizer that allows you to project high-dimensional vector data into a 2D or 3D space. This helps teams identify clusters of problematic queries, visualize data drift, and understand how their retrieval systems are performing in real-time.
The migration path is straightforward because both versions share the same core API and data structures. Users can typically transition by updating their endpoint configuration and API keys, allowing teams to start for free locally and scale to the cloud as their production needs grow.
To get started, you simply need to install the phoenix library and initialize the tracer in your script. Once active, Phoenix automatically captures spans and traces from your LlamaIndex queries, providing an instant web-based UI to inspect your retrieval and generation steps.
Phoenix offers seamless, one-click integration with popular frameworks including LangChain, LlamaIndex, and the OpenAI SDK. It uses OpenTelemetry-based tracing, which means it can also be integrated into custom-built Python applications with minimal code changes.
Absolutely. Phoenix is designed to be used across the entire lifecycle, allowing you to run evaluation suites as part of your pre-deployment testing. You can programmatically trigger evaluations to ensure that new prompt versions or model updates meet your quality benchmarks before hitting production.
The Self-Hosted version is an open-source tool designed for local development and full data control. The AX Pro plan, priced at $50/month, is a managed SaaS offering tailored for small teams who need hosted infrastructure, persistent storage, and collaborative features without managing their own servers.
Yes, the AX Enterprise plan offers both a managed SaaS option and a dedicated Self-Hosted deployment. This plan is designed for large organizations requiring advanced security, custom SLAs, and the ability to run the observability platform within their own VPC or private cloud environment.
Phoenix provides tools for data masking and PII redaction to ensure sensitive information is not stored or visualized. For organizations with the highest security needs, the Self-Hosted and Enterprise versions allow you to keep all trace data within your own infrastructure, ensuring it never leaves your perimeter.
Arize AI maintains a strong commitment to security, and our managed AX platforms are built to meet enterprise standards. For specific compliance documentation such as SOC 2 Type II reports or GDPR data processing agreements, customers on the Enterprise plan can request our full security package.
AX Pro users receive standard email support and access to our community forums. Enterprise customers benefit from dedicated support channels, including Slack integration, prioritized ticket handling, and architectural reviews with our AI observability experts to optimize their monitoring setup.
Comprehensive documentation, API references, and step-by-step tutorials are available at phoenix.arize.com. We also provide a library of pre-built evaluation templates and Jupyter notebooks to help you implement best practices for monitoring RAG and agentic workflows.
Full control for devs, open source
Starting at
$0.00/month
Free and open source self-hosted version
Individuals and startups
Starting at
$0.00/month
First 25,000 spans included
For 0-25,000 spans
25k spans per month included
First 1 GB included
For 0-1 GBs
1 GB per month included
Small teams and startups
Starting at
$50.00/month
Monthly base subscription
First 100 GBs included
For 0-100 GBs
100 GB per month included
First 50,000 spans included
For 0-50,000 spans
50k spans per month included
$10 per million additional spans
$3 per additional GB
Enterprise SaaS or Self-Hosted
Contact for pricing
Custom volume, retention, and support
User reviews coming soon
We're building our review system to help you make informed decisions.
Performance data coming soon
We're collecting uptime and performance metrics to provide comprehensive insights.