Langtrace: Open-source observability platform for LLM applications and AI agents with OpenTelemetry-based tracing, cost tracking, and performance analytics across 8+ model providers and 10+ frameworks.
Open-source monitoring for AI apps — see exactly what your AI is doing with detailed tracing and performance metrics.
Langtrace is an open-source observability and evaluation platform purpose-built for LLM applications, AI agents, and retrieval-augmented generation (RAG) pipelines. It provides detailed distributed tracing, cost analytics, and quality evaluation capabilities that help engineering teams understand exactly what their AI systems are doing in production, how much they cost, and how well they perform.
At its core, Langtrace is built natively on the OpenTelemetry standard, which means every trace and span it generates conforms to OTLP conventions and can be exported to any compatible backend — Grafana, Datadog, Signoz, or your own collector. This vendor-neutral approach sets it apart from observability tools that lock telemetry into proprietary formats. For platform teams already running OpenTelemetry infrastructure for microservices, Langtrace slots into the existing stack rather than creating a parallel silo.
The platform auto-instruments 8 major LLM providers (OpenAI, Anthropic, Google Gemini, Cohere, Groq, Mistral, Perplexity, and Ollama) and over 10 orchestration frameworks and vector databases including LangChain, LlamaIndex, LangGraph, CrewAI, DSPy, AutoGen, Pinecone, Chroma, Weaviate, and Qdrant. Instrumentation requires just two lines of code — import the SDK and call init — after which every LLM call, tool invocation, embedding query, and vector retrieval is captured automatically with full prompt and completion content, token counts, latency, and cost.
Cost tracking is a first-class feature. Dashboards aggregate spend by model, user, project, prompt template, and time window, making it straightforward to identify which features or tenants are driving the largest portion of an AI bill. Teams report using this data to set budget alerts, negotiate model pricing, and justify optimization investments to finance stakeholders.
For evaluation and quality management, Langtrace lets teams promote production traces into curated datasets, annotate them with human feedback, run prompt experiments across model versions, and score outputs using built-in evaluators for accuracy, faithfulness, toxicity, and custom metrics. This closes the loop between observability and iteration — instead of treating monitoring and evaluation as separate workflows, teams can move from a suspicious trace to a scored experiment in a few clicks.
The self-hosted deployment option is a significant differentiator for regulated industries. The server is AGPL-3.0 licensed while the SDKs are Apache-2.0, and a Docker Compose file launches the full stack (server, Postgres, ClickHouse) in minutes. Healthcare, finance, and government teams that cannot send raw prompts to third-party SaaS providers can run Langtrace entirely within their own VPC while maintaining standard OpenTelemetry compatibility.
The managed Cloud offering starts with a free tier (50,000 traces/month, 30-day retention), scales through a Pro plan at $59/month (up to 1 million spans included, ~$0.20 per 1,000 additional spans, 90-day retention), and offers custom Enterprise agreements with single-tenant deployment, SOC2 documentation, and SLAs. This tiered approach makes Langtrace accessible to individual developers prototyping an agent as well as platform teams running production workloads at scale.
Was this helpful?
All Langtrace spans conform to the emerging OpenTelemetry GenAI semantic conventions, so prompts, completions, token counts, model parameters, tool calls, and retrieval results are stored in standardized attributes. This means traces can be exported via OTLP to Grafana Tempo, Datadog, Signoz, Jaeger, or any compliant backend without transformation, giving teams full portability over their telemetry data and avoiding vendor lock-in.
Initialization takes two lines: import the SDK and call init with an API key. Every supported LLM, framework, and vector DB call is then traced automatically with full prompt content, completion text, token counts, latency, and cost — no manual span creation required. The Python SDK supports OpenAI, Anthropic, Gemini, Cohere, Groq, Mistral, Perplexity, Ollama, LangChain, LlamaIndex, CrewAI, DSPy, AutoGen, Pinecone, Chroma, Weaviate, and Qdrant. The TypeScript SDK covers a similar set of providers and frameworks.
Aggregated dashboards display cost per model, user, project, prompt template, and time range, alongside p50/p95/p99 latency for individual operations and full traces. Cost is calculated automatically using each provider's published token pricing. Teams use these dashboards to set budget alerts, identify cost spikes from specific features or tenants, and present attribution data to finance stakeholders for AI infrastructure spend.
Saved prompts can be versioned, edited, and tested across multiple models in a side-by-side playground. Experiment results are persisted so teams can compare output quality, latency, and cost across model versions and prompt variations before deploying changes to production. This workflow supports systematic prompt engineering rather than ad-hoc testing in notebooks.
Any production trace can be added to a dataset, labeled by human annotators, and run through built-in or custom evaluators measuring accuracy, faithfulness, toxicity, JSON schema compliance, and other quality metrics. Custom evaluator functions can be defined in Python for domain-specific scoring. This creates a feedback loop where production issues are captured, annotated, evaluated, and used to validate fixes before redeployment.
A single Docker Compose file launches the server, Postgres for metadata, and ClickHouse for high-performance trace storage. Kubernetes Helm charts are available for production deployments that require horizontal scaling. Self-hosted instances receive all features available in the managed Cloud offering, with the only trade-off being that teams manage their own infrastructure, upgrades, and backups.
Workspaces, projects, role-based access control, and API key scoping let larger organizations separate staging from production traffic and limit which team members can access sensitive trace data. This is essential for enterprise deployments where multiple teams share a single Langtrace instance but need isolation between their observability data and configurations.
$0
$0
Starting at $59/month
Custom
Ready to get started with Langtrace?
View Pricing Options →Tracing CrewAI, LangGraph, or AutoGen agents where understanding tool calls, retries, and intermediate reasoning across spans is essential to fix loops, hallucinations, or unexpected behavior. The waterfall trace visualization shows the full execution graph with timing, token counts, and cost for each step, making it straightforward to pinpoint where an agent goes off track.
Tracking token spend per user, tenant, or feature in B2B SaaS so finance and engineering can attribute OpenAI and Anthropic bills and enforce budget alerts. Per-request cost is calculated automatically using each provider's pricing, and dashboards aggregate spend by model, project, and time window to surface optimization opportunities and prevent cost overruns.
Inspecting embedding queries, vector retrieval latency, reranker behavior, and final completion quality in a single trace to optimize chunking and retrieval strategies. The end-to-end trace shows exactly which documents were retrieved, how long each step took, and whether the final response was grounded in the retrieved context, enabling data-driven tuning of the entire RAG pipeline.
Healthcare, finance, and government teams that cannot send raw prompts to third-party SaaS can run Langtrace inside their own VPC while keeping standard OpenTelemetry compatibility. The Docker Compose deployment includes all components needed for production use, and the AGPL license allows free self-hosting without per-seat or per-trace fees.
Capturing production traces, promoting them into evaluation datasets, and running scored prompt experiments before shipping new model versions or prompt changes. Teams can integrate evaluations into their deployment pipeline to catch quality regressions before they reach users, using both automated evaluators and human annotation workflows.
Platform teams already using Grafana, Datadog, or Signoz can route Langtrace OTLP data into the same dashboards used for microservices, avoiding a separate observability silo for AI features. This is especially valuable for organizations that have standardized on OpenTelemetry and want AI application telemetry to follow the same conventions and pipelines as the rest of their infrastructure.
Langtrace works with these platforms and services:
We believe in transparent reviews. Here's what Langtrace doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Through 2025 and into 2026 Langtrace expanded coverage of agentic frameworks, adding deeper LangGraph, CrewAI, AutoGen, and DSPy instrumentation and aligning trace attributes with the evolving OpenTelemetry GenAI semantic conventions. The evaluation suite gained support for custom Python evaluator functions and side-by-side prompt experiment comparisons across models. Cost analytics dashboards were enhanced with per-tenant and per-feature attribution views. The self-hosted deployment experience improved with Kubernetes Helm charts alongside the existing Docker Compose setup. SDK coverage expanded to include additional vector databases and model providers, and the TypeScript SDK reached feature parity with the Python SDK for most supported integrations.
LLM Observability
Langfuse is an open-source LLM observability and engineering platform providing tracing, prompt management, evaluations, and dataset management for production AI applications.
LLM Observability
Open-source LLM observability and AI gateway — logs every prompt, response, cost, and latency across 20+ providers with a one-line proxy or async SDK, plus caching, retries, and prompt experiments.
AI Observability
Open-source LLM observability and evaluation platform — traces, evals, prompt experiments and datasets in a self-hostable package.
Enterprise Agents
Developer platform for AI agent observability, debugging, and cost tracking with two-line SDK integration.
No reviews yet. Be the first to share your experience!
Get started with Langtrace and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →