Galileo review 2026: enterprise AI evals, observability, guardrails, and Luna evaluator models for RAG and agents — features, pricing, pros, cons.
Galileo review 2026: enterprise AI evals, observability, guardrails, and Luna evaluator models for RAG and agents — features, pricing, pros, cons.
Galileo (galileo.ai) is an enterprise-focused AI quality platform that targets the full lifecycle of LLM and agent development — pre-launch evaluation, production observability, and runtime guardrails — under one product surface. The platform is built around Luna, Galileo's family of small evaluator models specifically trained to score hallucinations, instruction adherence, context relevance, completeness, and chunk attribution in RAG systems with much lower latency and cost than calling a frontier LLM as judge. Galileo Evaluate lets engineers run scored evals across datasets and surface specific failure modes; Galileo Observe streams production traces with span-level scoring and slicing by tag, user, and version; Galileo Protect provides real-time guardrails that can block or rewrite unsafe responses; and Galileo Agentic Eval gives multi-step tracing and root-cause analysis for agent traces, including identifying which step in a tool-use chain produced the wrong answer. Customers include Twilio, JPMorgan Chase, HP, and other large enterprises that need a single vendor for evaluation, monitoring, and safety on regulated workloads. Pricing is not publicly listed; Galileo offers a developer-tier free trial, paid Pro subscriptions for production workloads, and Enterprise contracts with VPC deployment, custom Luna fine-tuning, and dedicated success management.
Was this helpful?
Free
Custom
Custom
Ready to get started with Galileo?
View Pricing Options →Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
LLM Observability
AI observability platform for evals, production tracing, prompt management, and regression detection.
LLM Observability
Langfuse is an open-source LLM observability and engineering platform providing tracing, prompt management, evaluations, and dataset management for production AI applications.
Testing & Quality
Open-source LLM evaluation framework with 50+ research-backed metrics including hallucination detection, tool use correctness, and conversational quality. Pytest-style testing for AI agents with CI/CD integration.
LLM Observability
Open-source LLM observability and AI gateway — logs every prompt, response, cost, and latency across 20+ providers with a one-line proxy or async SDK, plus caching, retries, and prompt experiments.
No reviews yet. Be the first to share your experience!
Get started with Galileo and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →