GLM-4.5 Pricing & Plans 2026

Name: GLM-4.5
Brand: GLM-4.5
Availability: InStock

Complete pricing guide for GLM-4.5. Compare all plans, analyze costs, and find the perfect tier for your needs.

Not sure if free is enough? See our Free vs Paid comparison →
Still deciding? Read our full verdict on whether GLM-4.5 is worth it →

🆓Free Tier Available

💎4 Paid Plans

⚡No Setup Fees

Choose Your Plan

Open Weights

$0 model license fee under MIT license; self-hosting infrastructure not included

✓Downloadable GLM-4.5 model weights
✓Commercial use permitted under MIT license
✓Self-hosting and modification allowed
✓Available BF16 and FP8 variants
✓Base and hybrid reasoning model variants

Start Free Trial →

GLM-4.5 API Access

$0.60 per 1M input tokens, $0.11 per 1M cached input tokens, and $2.20 per 1M output tokens on Z.AI; provider pricing may vary

✓Hosted API access through Z.AI or supported providers
✓Text input and text output
✓Function calling and tool invocation support
✓Streaming output support
✓Structured output support

Start Free Trial →

GLM-4.5-Air API Access

$0.20 per 1M input tokens, $0.03 per 1M cached input tokens, and $1.10 per 1M output tokens on Z.AI; provider pricing may vary

✓Lower-cost hosted API access for the 106B total / 12B active GLM-4.5-Air model
✓Text input and text output
✓Function calling and tool invocation support
✓Streaming output support
✓Structured output support

Start Free Trial →

Self-Hosted Production

$0 model license fee; infrastructure cost depends on GPU deployment, from H100 x 8 or H200 x 4 for GLM-4.5 FP8 standard inference up to H100 x 32 or H200 x 16 for GLM-4.5 BF16 full 128K context

✓Private deployment in customer-controlled infrastructure
✓Supports vLLM and SGLang serving paths
✓Can use BF16 or FP8 model variants
✓Official requirements list server memory above 1T for normal model loading and operation
✓Suitable for regulated or data-sensitive workloads

Start Free Trial →

Pricing sourced from GLM-4.5 · Last verified March 2026

Feature Comparison

Features	Open Weights	GLM-4.5 API Access	GLM-4.5-Air API Access	Self-Hosted Production
Downloadable GLM-4.5 model weights	✓	✓	✓	✓
Commercial use permitted under MIT license	✓	✓	✓	✓
Self-hosting and modification allowed	✓	✓	✓	✓
Available BF16 and FP8 variants	✓	✓	✓	✓
Base and hybrid reasoning model variants	✓	✓	✓	✓
Hosted API access through Z.AI or supported providers	—	✓	✓	✓
Text input and text output	—	✓	✓	✓
Function calling and tool invocation support	—	✓	✓	✓
Streaming output support	—	✓	✓	✓
Structured output support	—	✓	✓	✓
Lower-cost hosted API access for the 106B total / 12B active GLM-4.5-Air model	—	—	✓	✓
Private deployment in customer-controlled infrastructure	—	—	—	✓
Supports vLLM and SGLang serving paths	—	—	—	✓
Can use BF16 or FP8 model variants	—	—	—	✓
Official requirements list server memory above 1T for normal model loading and operation	—	—	—	✓
Suitable for regulated or data-sensitive workloads	—	—	—	✓

Is GLM-4.5 Worth It?

✅ Why Choose GLM-4.5

• MIT licensing allows commercial deployment, modification, self-hosting, and derivative work without the contractual limits common in closed frontier models.
• The 355B total / 32B active MoE design gives teams a frontier-scale model while activating a much smaller subset of parameters per inference.
• A 128K context window and 96K maximum output make it practical for long documents, large codebases, lengthy transcripts, and multi-step agent traces.
• Hybrid reasoning lets developers choose deeper Thinking Mode for complex tool use or Non-Thinking Mode for faster direct responses.
• Official documentation highlights function calling, structured output, streaming, context caching, and integration with code-agent environments such as Claude Code and Roo Code.
• The GLM-4.5-Air variant provides a smaller 106B total / 12B active option for teams that need a lower-cost deployment path.

⚠️ Consider This

• It is not a turnkey voice-agent product; teams still need speech-to-text, text-to-speech, telephony, orchestration, monitoring, and safety layers for production voice workflows.
• Full self-hosting is hardware intensive: official full-context GLM-4.5 configurations list up to H100 x 32 or H200 x 16 for 128K-context BF16 inference.
• Hosted API pricing is token-based rather than a simple monthly SaaS plan, with Z.AI listing GLM-4.5 at $0.60 per 1M input tokens and $2.20 per 1M output tokens and GLM-4.5-Air at $0.20 per 1M input tokens and $1.10 per 1M output tokens.
• Although Z.AI reports strong open-model benchmark results, closed models such as Claude and GPT may still be easier to operate and may perform better in some enterprise support workflows.
• Some website setup examples reference older or adjacent GLM model names, so developers should rely on the current Z.AI docs or Hugging Face model card when deploying.

What Users Say About GLM-4.5

👍 What Users Love

✓MIT licensing allows commercial deployment, modification, self-hosting, and derivative work without the contractual limits common in closed frontier models.
✓The 355B total / 32B active MoE design gives teams a frontier-scale model while activating a much smaller subset of parameters per inference.
✓A 128K context window and 96K maximum output make it practical for long documents, large codebases, lengthy transcripts, and multi-step agent traces.
✓Hybrid reasoning lets developers choose deeper Thinking Mode for complex tool use or Non-Thinking Mode for faster direct responses.
✓Official documentation highlights function calling, structured output, streaming, context caching, and integration with code-agent environments such as Claude Code and Roo Code.
✓The GLM-4.5-Air variant provides a smaller 106B total / 12B active option for teams that need a lower-cost deployment path.

👎 Common Concerns

⚠It is not a turnkey voice-agent product; teams still need speech-to-text, text-to-speech, telephony, orchestration, monitoring, and safety layers for production voice workflows.
⚠Full self-hosting is hardware intensive: official full-context GLM-4.5 configurations list up to H100 x 32 or H200 x 16 for 128K-context BF16 inference.
⚠Hosted API pricing is token-based rather than a simple monthly SaaS plan, with Z.AI listing GLM-4.5 at $0.60 per 1M input tokens and $2.20 per 1M output tokens and GLM-4.5-Air at $0.20 per 1M input tokens and $1.10 per 1M output tokens.
⚠Although Z.AI reports strong open-model benchmark results, closed models such as Claude and GPT may still be easier to operate and may perform better in some enterprise support workflows.
⚠Some website setup examples reference older or adjacent GLM model names, so developers should rely on the current Z.AI docs or Hugging Face model card when deploying.

Pricing FAQ

Is GLM-4.5 actually a voice agent platform?

No. GLM-4.5 is a large language model for agentic reasoning, coding, tool use, and text generation; it is listed here as an AI model rather than a turnkey voice-agent platform. To build a complete voice agent, you would still need speech recognition, text-to-speech, a call or realtime transport layer, state management, and production observability. GLM-4.5 is better suited to engineering teams building their own agent infrastructure than teams looking for a ready-made call center product.

What are the most important technical specs of GLM-4.5?

The main GLM-4.5 model uses a Mixture-of-Experts architecture with 355 billion total parameters and 32 billion active parameters per forward pass. Z.AI documentation lists a 128K-token context length and up to 96K maximum output tokens. The series was pretrained on 15 trillion tokens and includes GLM-4.5-Air, a smaller 106B total / 12B active model for more cost-sensitive deployments. These numbers make it a large, infrastructure-heavy model rather than a lightweight local assistant.

Can GLM-4.5 be used commercially for free?

Yes, the official materials state that GLM-4.5 and GLM-4.5-Air are released under the MIT open-source license. That allows commercial use, modification, self-hosting, and secondary development without paying a model license fee. However, free licensing does not mean free operation: self-hosting a 355B-parameter MoE model requires substantial GPU infrastructure, and hosted API providers charge usage-based token fees. Z.AI documentation lists GLM-4.5 at $0.60 per million input tokens, $0.11 per million cached input tokens, and $2.20 per million output tokens, with GLM-4.5-Air listed at $0.20 per million input tokens, $0.03 per million cached input tokens, and $1.10 per million output tokens.

How does GLM-4.5 compare with closed models like GPT or Claude?

GLM-4.5's main advantage over closed models is control: teams can download weights, self-host, fine-tune, inspect deployment behavior, and avoid sending sensitive data to a third-party model API. Z.AI reports a 63.2 aggregate score across 12 benchmarks and positions GLM-4.5 as one of the strongest open-source models for reasoning, coding, and agent tasks. Closed models may still offer easier operations, stronger managed safety tooling, broader enterprise support, and simpler procurement. For teams with GPU capacity and model-serving expertise, GLM-4.5 is a serious open alternative; for teams without that infrastructure, a managed API may be more practical.

What hardware does GLM-4.5 require for self-hosting?

GLM-4.5 is not designed for casual laptop deployment. The Hugging Face model card lists GLM-4.5 BF16 inference on H100 x 16 or H200 x 8, and full 128K-context BF16 inference on H100 x 32 or H200 x 16. FP8 reduces the requirement, with GLM-4.5 FP8 listed at H100 x 8 or H200 x 4 for standard inference and H100 x 16 or H200 x 8 for full 128K context. The same official requirements also state that server memory should exceed 1T for normal model loading and operation. Smaller teams should evaluate GLM-4.5-Air, quantized builds, or hosted APIs before committing to self-hosting.

Ready to Get Started?

AI builders and operators use GLM-4.5 to streamline their workflow.

Try GLM-4.5 Now →

More about GLM-4.5

Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Compare GLM-4.5 Pricing with Alternatives

Claude Sonnet 4 Pricing

An advanced AI language model that delivers superior coding and reasoning capabilities with more precise instruction following. Offers both near-instant responses and extended thinking modes for deeper reasoning tasks.

Compare Pricing →

GLM-4.5 Pricing & Plans 2026

Complete pricing guide for GLM-4.5. Compare all plans, analyze costs, and find the perfect tier for your needs.

🆓Free Tier Available

💎4 Paid Plans

⚡No Setup Fees

Choose Your Plan

Open Weights

$0 model license fee under MIT license; self-hosting infrastructure not included

✓Downloadable GLM-4.5 model weights
✓Commercial use permitted under MIT license
✓Self-hosting and modification allowed
✓Available BF16 and FP8 variants
✓Base and hybrid reasoning model variants

Start Free Trial →

GLM-4.5 API Access

$0.60 per 1M input tokens, $0.11 per 1M cached input tokens, and $2.20 per 1M output tokens on Z.AI; provider pricing may vary

✓Hosted API access through Z.AI or supported providers
✓Text input and text output
✓Function calling and tool invocation support
✓Streaming output support
✓Structured output support

Start Free Trial →

GLM-4.5-Air API Access

$0.20 per 1M input tokens, $0.03 per 1M cached input tokens, and $1.10 per 1M output tokens on Z.AI; provider pricing may vary

✓Lower-cost hosted API access for the 106B total / 12B active GLM-4.5-Air model
✓Text input and text output
✓Function calling and tool invocation support
✓Streaming output support
✓Structured output support

Start Free Trial →

Self-Hosted Production

$0 model license fee; infrastructure cost depends on GPU deployment, from H100 x 8 or H200 x 4 for GLM-4.5 FP8 standard inference up to H100 x 32 or H200 x 16 for GLM-4.5 BF16 full 128K context

✓Private deployment in customer-controlled infrastructure
✓Supports vLLM and SGLang serving paths
✓Can use BF16 or FP8 model variants
✓Official requirements list server memory above 1T for normal model loading and operation
✓Suitable for regulated or data-sensitive workloads

Start Free Trial →

Pricing sourced from GLM-4.5 · Last verified March 2026

Feature Comparison

Features	Open Weights	GLM-4.5 API Access	GLM-4.5-Air API Access	Self-Hosted Production
Downloadable GLM-4.5 model weights	✓	✓	✓	✓
Commercial use permitted under MIT license	✓	✓	✓	✓
Self-hosting and modification allowed	✓	✓	✓	✓
Available BF16 and FP8 variants	✓	✓	✓	✓
Base and hybrid reasoning model variants	✓	✓	✓	✓
Hosted API access through Z.AI or supported providers	—	✓	✓	✓
Text input and text output	—	✓	✓	✓
Function calling and tool invocation support	—	✓	✓	✓
Streaming output support	—	✓	✓	✓
Structured output support	—	✓	✓	✓
Lower-cost hosted API access for the 106B total / 12B active GLM-4.5-Air model	—	—	✓	✓
Private deployment in customer-controlled infrastructure	—	—	—	✓
Supports vLLM and SGLang serving paths	—	—	—	✓
Can use BF16 or FP8 model variants	—	—	—	✓
Official requirements list server memory above 1T for normal model loading and operation	—	—	—	✓
Suitable for regulated or data-sensitive workloads	—	—	—	✓

Is GLM-4.5 Worth It?

✅ Why Choose GLM-4.5

• MIT licensing allows commercial deployment, modification, self-hosting, and derivative work without the contractual limits common in closed frontier models.
• The 355B total / 32B active MoE design gives teams a frontier-scale model while activating a much smaller subset of parameters per inference.
• A 128K context window and 96K maximum output make it practical for long documents, large codebases, lengthy transcripts, and multi-step agent traces.
• Hybrid reasoning lets developers choose deeper Thinking Mode for complex tool use or Non-Thinking Mode for faster direct responses.
• Official documentation highlights function calling, structured output, streaming, context caching, and integration with code-agent environments such as Claude Code and Roo Code.
• The GLM-4.5-Air variant provides a smaller 106B total / 12B active option for teams that need a lower-cost deployment path.

⚠️ Consider This

• It is not a turnkey voice-agent product; teams still need speech-to-text, text-to-speech, telephony, orchestration, monitoring, and safety layers for production voice workflows.
• Full self-hosting is hardware intensive: official full-context GLM-4.5 configurations list up to H100 x 32 or H200 x 16 for 128K-context BF16 inference.
• Hosted API pricing is token-based rather than a simple monthly SaaS plan, with Z.AI listing GLM-4.5 at $0.60 per 1M input tokens and $2.20 per 1M output tokens and GLM-4.5-Air at $0.20 per 1M input tokens and $1.10 per 1M output tokens.
• Although Z.AI reports strong open-model benchmark results, closed models such as Claude and GPT may still be easier to operate and may perform better in some enterprise support workflows.
• Some website setup examples reference older or adjacent GLM model names, so developers should rely on the current Z.AI docs or Hugging Face model card when deploying.

What Users Say About GLM-4.5

👍 What Users Love

✓MIT licensing allows commercial deployment, modification, self-hosting, and derivative work without the contractual limits common in closed frontier models.
✓The 355B total / 32B active MoE design gives teams a frontier-scale model while activating a much smaller subset of parameters per inference.
✓A 128K context window and 96K maximum output make it practical for long documents, large codebases, lengthy transcripts, and multi-step agent traces.
✓Hybrid reasoning lets developers choose deeper Thinking Mode for complex tool use or Non-Thinking Mode for faster direct responses.
✓Official documentation highlights function calling, structured output, streaming, context caching, and integration with code-agent environments such as Claude Code and Roo Code.
✓The GLM-4.5-Air variant provides a smaller 106B total / 12B active option for teams that need a lower-cost deployment path.

👎 Common Concerns

⚠It is not a turnkey voice-agent product; teams still need speech-to-text, text-to-speech, telephony, orchestration, monitoring, and safety layers for production voice workflows.
⚠Full self-hosting is hardware intensive: official full-context GLM-4.5 configurations list up to H100 x 32 or H200 x 16 for 128K-context BF16 inference.
⚠Hosted API pricing is token-based rather than a simple monthly SaaS plan, with Z.AI listing GLM-4.5 at $0.60 per 1M input tokens and $2.20 per 1M output tokens and GLM-4.5-Air at $0.20 per 1M input tokens and $1.10 per 1M output tokens.
⚠Although Z.AI reports strong open-model benchmark results, closed models such as Claude and GPT may still be easier to operate and may perform better in some enterprise support workflows.
⚠Some website setup examples reference older or adjacent GLM model names, so developers should rely on the current Z.AI docs or Hugging Face model card when deploying.