Master Langfuse with our step-by-step tutorial, detailed feature walkthrough, and expert tips.
Sign up for a free Hobby account at langfuse.com, or deploy self
hosted with Docker Compose: git clone https://github.com/langfuse/langfuse && docker compose up Install the latest SDK: pip install langfuse (v
0+) for Python or npm install langfuse for JavaScript/TypeScript Add automatic tracing to your LLM calls with the @observe decorator (Python) or wrap function (JavaScript)
works with OpenAI, Anthropic, and all major providers Explore hierarchical traces in the Langfuse dashboard showing latency, token usage, costs, and complete conversation flows Set up prompt versioning in the UI to iterate on prompts without code deployment, and configure LLM
judge evaluators for automated quality scoring Create datasets from production traces for regression testing and run experiments comparing model configurations Configure alerts and export data via the comprehensive REST API or direct database access for advanced analytics
💡 Quick Start: Follow these 5 steps in order to get up and running with Langfuse quickly.
Explore the key features that make Langfuse powerful for open-source llm observability workflows.
Captures complete execution trees of complex AI workflows including multi-agent conversations, tool calling sequences, and RAG pipelines. Each trace shows parent-child relationships between all operations, enabling deep debugging of agent interactions and workflow bottlenecks with full context preservation.
Debug a customer support agent that gives incorrect answers by tracing the exact knowledge retrieval → context filtering → prompt construction → model generation → response formatting chain to identify the failure point.
Enterprise-grade prompt lifecycle management with version control, production trace linking, A/B testing capabilities, and protected deployment labels. Prompts are managed in the UI and linked to real production performance, enabling data-driven optimization without code deployment.
Test a new system prompt for a financial advisor agent by deploying two prompt versions simultaneously and comparing success rates, compliance scores, and customer satisfaction metrics in real-time dashboards.
Comprehensive quality assurance combining automated LLM-as-judge evaluators, categorical scoring, human annotation queues with inline comments anchored to specific text, and experiment management. Build regression datasets from production data for continuous model validation.
Implement systematic quality control for a medical AI assistant by running automated safety evaluations on every response and routing concerning outputs to medical professionals for detailed review with inline annotation tools.
Complete security package including SOC2 Type II, ISO27001, HIPAA compliance with BAA, enterprise SSO (Okta, Azure AD), SCIM API, audit logs, RBAC, and data retention management. Self-hosted option provides air-gapped deployment with full feature parity.
Deploy LLM observability for a healthcare organization requiring HIPAA compliance by using self-hosted Langfuse with encrypted data storage, access controls, and complete audit trails for regulatory reporting.
Granular cost tracking across multiple LLM providers with support for tiered pricing models (context-dependent rates for Claude, Gemini). Provides per-model, per-user, per-feature cost analysis with trend monitoring and budget alerting.
Optimize a multi-model AI application by analyzing cost-per-quality metrics across OpenAI GPT-4, Claude Sonnet, and local models to determine the optimal model routing strategy for different types of user queries.
Complete on-premises deployment using the same infrastructure as Langfuse Cloud (PostgreSQL, ClickHouse, Redis, S3). Includes Docker Compose for development, Kubernetes Helm charts, and Terraform modules for AWS/Azure/GCP with unlimited traces and users.
Deploy enterprise observability for a financial services firm requiring complete data residency by self-hosting Langfuse on internal infrastructure while maintaining access to all prompt management, evaluation, and security features.
Langfuse offers significant advantages: it's fully open-source with self-hosting at complete feature parity (LangSmith is closed-source cloud-only), includes unlimited users on all paid tiers (LangSmith charges $39/seat that scales with team size), and provides a more generous free tier (50K units vs limited). For teams needing data residency, avoiding vendor lock-in, or controlling costs as they scale, Langfuse is the superior choice.
ClickHouse's 2026 acquisition accelerates Langfuse development while maintaining its open-source nature. Users benefit from enhanced performance (ClickHouse's expertise in high-performance analytics), faster feature development, and stronger enterprise support. The self-hosted option remains fully open-source with feature parity, and existing cloud plans continue unchanged with improved infrastructure backing.
Yes, extensively. Langfuse is trusted by 19 of the Fortune 50 including Khan Academy, Merck, Canva, and Adobe. It provides SOC2 Type II, ISO27001, and HIPAA compliance (with BAA), enterprise SSO, SCIM API, audit logs, and scales to millions of traces. The self-hosted option enables complete data residency and air-gapped deployments for the most sensitive applications.
Unlike competitors that charge per seat ($39+ per user), Langfuse includes unlimited users on all paid tiers ($29 Core, $199 Pro, $2,499 Enterprise). This means your costs stay predictable as your engineering team grows, making it ideal for scaling organizations. You pay only for usage (traces/evaluations) and features, not headcount.
A 'unit' is any billable event: traces (conversation threads), observations (individual LLM calls, tool executions), and scores (evaluation results). A simple chatbot conversation might use 2-3 units, while a complex multi-agent workflow could consume 10-20 units. At 50K units/month (Hobby), that supports roughly 25K simple interactions or 5K complex agent workflows.
Self-hosted Langfuse provides battle-tested infrastructure used by Fortune 50 companies, comprehensive SDK integrations, continuous feature development, and community support - without the massive engineering investment required for internal solutions. Most teams underestimate the complexity of building production-grade observability, evaluation frameworks, and prompt management systems from scratch.
Langfuse requires PostgreSQL (transactional data), ClickHouse (observability data), Redis/Valkey (cache/queue), and S3-compatible storage (events/attachments). For production: 4+ CPU cores, 8GB+ RAM, SSD storage. Deploy via Docker Compose (testing), Kubernetes with Helm charts, or Terraform modules for AWS/Azure/GCP. Scales from single-node to multi-region deployments.
Unlike tools that log individual LLM calls in isolation, Langfuse captures parent-child relationships between all operations in your AI workflow. You can trace a user query through retrieval → context filtering → prompt construction → LLM generation → tool calling → response formatting, seeing exactly where failures occur and how changes propagate through multi-step agent workflows.
Langfuse offers automated LLM-as-judge evaluators, human annotation queues with inline comments, dataset management, and experiment comparison. You can create regression test datasets from production data, run A/B tests on prompt variants, score outputs for quality/safety, and build continuous evaluation pipelines. The 2026 update includes categorical scoring and individual operation evaluation for more precise assessment.
Langfuse provides client-side data masking, supports air-gapped self-hosted deployments, offers EU/US data residency options, and maintains certifications for SOC2 Type II, ISO27001, GDPR, and HIPAA. Enterprise features include audit logs, RBAC, SSO enforcement, and dedicated security support. Self-hosting ensures complete data control for the most sensitive applications.
Now that you know how to use Langfuse, it's time to put this knowledge into practice.
Sign up and follow the tutorial steps
Check pros, cons, and user feedback
See how it stacks against alternatives
Follow our tutorial and master this powerful open-source llm observability tool in minutes.
Tutorial updated March 2026