LLM observability
Tools for engineers to trace, evaluate, and debug LLM application behavior in production, including prompts, latency, cost, and model outputs. Technical teams use these platforms to ship reliable AI features with measurable quality.
Showing 3 of 3 services
Arize Phoenix
Arize Phoenix is an open-source AI observability and evaluation platform developed by Arize AI, designed to help teams monitor, debug, and evaluate large language model (LLM) applications and machine learning models. It provides tools for tracing LLM calls, evaluating model outputs, visualizing embeddings, and identifying issues such as hallucinations, retrieval failures, and prompt problems. Phoenix supports integration with popular frameworks like LangChain, LlamaIndex, and OpenAI, enabling developers to gain deep insights into their AI pipelines throughout the development and production lifecycle.
Helicone
Helicone is an open-source LLM observability and monitoring platform that allows developers to log, monitor, and debug their large language model applications with a single line of code. It provides features such as request logging, cost tracking, prompt management, caching, and rate limiting for AI applications built on top of providers like OpenAI, Anthropic, and others. Helicone is designed to help teams gain visibility into their AI usage, optimize performance, and reduce costs in production environments.
Langfuse
Langfuse is an open-source LLM engineering platform that provides observability, analytics, and monitoring tools for AI applications built on large language models. It enables developers to trace LLM calls, evaluate model outputs, manage prompts, and debug production issues through a comprehensive dashboard and SDK integrations. The platform supports popular frameworks like LangChain, LlamaIndex, and OpenAI, and can be used as a cloud-hosted service or self-hosted deployment.