GLM-5.1 is a large language model hosted on Hugging Face by zai-org, intended for chat and tool-calling workflows.
GLM-5.1 is a free open-source large language model from Z.ai (zai-org) hosted on Hugging Face, designed for complex systems engineering, long-horizon agentic tasks, reasoning, and tool-calling workflows. The model is distributed at no cost under an open-weights license, making it suitable for researchers, AI engineers, and enterprises seeking frontier-grade open models.
The GLM-5 architecture scales to 744B total parameters with 40B active parameters per forward pass (a Mixture-of-Experts design), up from GLM-4.5's 355B/32B configuration. Pre-training data was expanded from 23T to 28.5T tokens, and the model integrates DeepSeek Sparse Attention (DSA) to substantially reduce deployment cost while preserving long-context capacity. Z.ai's team also developed 'slime,' a novel asynchronous reinforcement learning infrastructure that improves RL training throughput, enabling fine-grained post-training iterations. On benchmarks, GLM-5 scores 30.5 on Humanity's Last Exam (50.4 with tools), 92.7 on AIME 2026 I, 96.9 on HMMT Nov. 2025, 86.0 on GPQA-Diamond, 77.8 on SWE-bench Verified, and 73.3 on SWE-bench Multilingual — closing the gap with frontier closed models like Claude Opus 4.5, Gemini 3 Pro, and GPT-5.2.
Developer deployment is flexible: the model runs through Hugging Face Transformers, vLLM, SGLang, Docker Model Runner, llama.cpp, Ollama, and LM Studio. It exposes an OpenAI-compatible chat completions API when self-hosted, and is also offered as a managed service via the Z.ai API Platform. Based on our analysis of 870+ AI tools, GLM-5.1 stands out among open-weights LLMs by combining best-in-class open-source performance on reasoning, coding, and agentic benchmarks with multiple production-ready inference paths — giving teams an alternative to API-only frontier models when they need data sovereignty, custom fine-tuning, or on-prem control.
Was this helpful?
GLM-5 scales from GLM-4.5's 355B/32B-active to 744B total parameters with 40B activated per token. The Mixture-of-Experts routing means inference cost scales with the active set rather than the full parameter count, allowing frontier-class capacity at competitive throughput.
GLM-5 integrates DSA to reduce attention compute on long contexts while preserving the model's long-context capacity. This lowers deployment cost meaningfully for workloads like document QA, long agent traces, and large codebase reasoning, where dense attention dominates GPU spend.
The tokenizer ships with a Jinja chat template that handles a tools array and emits <tool_call> XML blocks containing the function name and arg_key/arg_value pairs. Reasoning content is wrapped in <think>...</think> tags, separating the model's chain of thought from final outputs for cleaner agent loops.
GLM-5 is supported out of the box by Hugging Face Transformers, vLLM, SGLang, Docker Model Runner, Ollama, LM Studio, and llama.cpp via quantized variants. Each path exposes an OpenAI-compatible /v1/chat/completions API, making the model a near-drop-in replacement for OpenAI clients.
Z.ai built 'slime,' a novel asynchronous reinforcement-learning infrastructure that materially improves RL training throughput and enables more fine-grained post-training iterations. This is what powers GLM-5's gains on agentic and coding benchmarks over GLM-4.7, and it underpins ongoing model updates.
Free
Usage-based (see Z.ai)
Ready to get started with GLM-5.1?
View Pricing Options →We believe in transparent reviews. Here's what GLM-5.1 doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
GLM-5 launched as a successor to GLM-4.7 and GLM-4.5, scaling to 744B parameters (40B active) and 28.5T pre-training tokens. The release integrates DeepSeek Sparse Attention (DSA) for cheaper long-context inference and is post-trained with Z.ai's new asynchronous RL infrastructure, 'slime.' Reported benchmarks include AIME 2026 I (92.7), HMMT Nov. 2025 (96.9), SWE-bench Verified (77.8), and Terminal-Bench 2.0 / Terminus 2 (56.2 / 60.7).
No reviews yet. Be the first to share your experience!
Get started with GLM-5.1 and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →