AI Model APIs

DeepSeek V3.2-Exp

Name: DeepSeek V3.2-Exp
Brand: DeepSeek V3.2-Exp
Availability: InStock

DeepSeek V3.2-Exp is an experimental large language model hosted on Hugging Face by deepseek-ai. It is designed for text generation and chat-style AI tasks.

Starting at$0

Visit DeepSeek V3.2-Exp →

Overview

DeepSeek V3.2-Exp is an experimental open-source large language model that introduces DeepSeek Sparse Attention (DSA) for substantially improved long-context training and inference efficiency, released free under the MIT License. It targets ML researchers, infrastructure engineers, and developers building self-hosted AI applications who need a frontier-grade model with permissive licensing.

Released in 2025 by DeepSeek-AI as an intermediate step toward the company's next-generation architecture, V3.2-Exp builds on V3.1-Terminus by replacing dense attention with a fine-grained sparse attention mechanism. The model uses a 671B-parameter Mixture-of-Experts design with 256 experts and is available for direct download from Hugging Face, where it has accumulated 213,035 downloads in the last month alone. Across public benchmarks, performance remains effectively on par with V3.1-Terminus: MMLU-Pro scores 85.0 (matching the prior version), AIME 2025 reaches 89.3 (up from 88.4), Codeforces hits 2121 (up from 2046), and SimpleQA scores 97.1, while delivering meaningful efficiency gains on extended-context workloads.

The model can be served locally using HuggingFace Transformers, vLLM (which provides day-0 support), SGLang (with Docker images for H200, MI350, and Ascend NPUs), or Docker Model Runner with the OpenAI-compatible chat completions API. DeepSeek also open-sources the supporting infrastructure: TileLang kernels for research, DeepGEMM for high-performance CUDA indexer logit kernels (including paged variants), and FlashMLA for sparse attention kernels. Compared to closed-weight frontier models like GPT-4 and Claude that charge per-token API fees, V3.2-Exp can be run for the cost of GPU compute alone. Based on our analysis of 870+ AI tools, this is one of the few experimental research releases that ships production-grade serving recipes alongside the weights, making it especially attractive for teams running inference at scale where long-context efficiency translates directly into infrastructure cost savings.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

DeepSeek Sparse Attention (DSA)+

A fine-grained sparse attention mechanism that replaces dense attention in V3.1-Terminus, delivering substantial improvements in long-context training and inference efficiency. DeepSeek reports this is the first time fine-grained sparse attention has been achieved at this scale while maintaining virtually identical model output quality. Sparse attention kernels are open-sourced via FlashMLA.

671B-parameter MoE Architecture+

Built on a Mixture-of-Experts design with 256 experts, configured via the config_671B_v3.2.json runtime config. This sparse activation pattern keeps per-token compute costs manageable relative to a dense 671B model, though aggregate memory footprint still requires multi-GPU deployment. Training configuration was deliberately aligned with V3.1-Terminus to isolate DSA's contribution.

Multi-Stack Serving Support+

Day-0 compatibility with vLLM, SGLang (including dedicated dsv32 Docker images for H200, MI350, and NPUs), HuggingFace Transformers, and Docker Model Runner. All paths expose the OpenAI-compatible /v1/chat/completions API, so existing clients work without modification. Recommended SGLang launch uses --tp 8 --dp 8 --enable-dp-attention.

Open-Source Companion Kernels+

DeepSeek released TileLang kernels for readable, research-oriented implementations and DeepGEMM for high-performance CUDA indexer logit kernels including paged variants. FlashMLA hosts the production sparse attention kernels. This lets researchers reproduce, audit, and extend the architecture rather than treating it as a black box.

MIT License+

Both the repository and model weights ship under the MIT License — one of the most permissive licenses available, with no usage restrictions, no acceptable-use policy clauses, and no commercial-use carve-outs. This is more permissive than Llama's community license or Mistral's research license, making V3.2-Exp particularly attractive for commercial fine-tuning and redistribution.

Pricing Plans

Open Weights (MIT License)

✓Full 671B-parameter model weights downloadable from Hugging Face
✓MIT License with no commercial-use restrictions
✓Access to inference demo code, vLLM, and SGLang serving recipes
✓Open-source companion kernels (TileLang, DeepGEMM, FlashMLA)
✓Docker images for H200, MI350, and Ascend NPU platforms

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with DeepSeek V3.2-Exp?

View Pricing Options →

Best Use Cases

🎯

Self-hosted long-context inference for legal, financial, or codebase analysis where DSA's efficiency reduces GPU costs at extended sequence lengths

⚡

Research labs studying sparse attention mechanisms — TileLang, DeepGEMM, and FlashMLA kernels are released alongside the weights for reproducibility

🔧

Building agentic tool-use systems leveraging the model's strong BrowseComp (40.1), SimpleQA (97.1), and Terminal-bench (37.7) scores

🚀

Coding assistants and competitive programming applications backed by the 2121 Codeforces rating and 74.1 LiveCodeBench score

💡

Math and STEM tutoring tools taking advantage of 89.3 AIME 2025 and 85.0 MMLU-Pro performance

🔄

Enterprises requiring MIT-licensed weights to avoid commercial restrictions imposed by other open-weight licenses like Llama's

Limitations & What It Can't Do

We believe in transparent reviews. Here's what DeepSeek V3.2-Exp doesn't handle well:

⚠Marketed and documented as experimental — not the recommended choice when stability and long-term support matter
⚠Resource intensity of 671B MoE makes single-GPU and consumer-hardware deployment impractical without aggressive quantization
⚠Slight benchmark regressions vs V3.1-Terminus on several reasoning evaluations including GPQA-Diamond, HMMT 2025, and Humanity's Last Exam
⚠No first-party hosted API on the Hugging Face listing — users must self-host or rely on third-party inference providers
⚠Inference demo code shipped a RoPE bug in the indexer module in earlier versions, requiring users to track upstream fixes

Pros & Cons

✓ Pros

✓Fully open weights under permissive MIT License — usable for commercial deployment without restrictions
✓DeepSeek Sparse Attention delivers substantial long-context inference efficiency gains while maintaining benchmark parity with V3.1-Terminus
✓Strong reasoning benchmarks: 89.3 on AIME 2025, 2121 Codeforces rating, 85.0 on MMLU-Pro
✓Day-0 support across vLLM, SGLang, and Docker Model Runner with OpenAI-compatible APIs simplifies integration
✓Hardware flexibility — official Docker images for NVIDIA H200, AMD MI350, and Ascend NPU platforms
✓Companion open-source kernels (DeepGEMM, FlashMLA, TileLang) released alongside the model for reproducibility

✗ Cons

✗Explicitly experimental — DeepSeek warns it is an intermediate step, not a stable production release
✗671B-parameter MoE requires multi-GPU infrastructure (typical deployments use TP=8, DP=8) putting it out of reach for solo developers without cloud access
✗A November 2025 RoPE implementation bug in the indexer module shipped in earlier demo code, illustrating the rough edges of an experimental release
✗Slight regressions vs V3.1-Terminus on some benchmarks (GPQA-Diamond 79.9 vs 80.7, Humanity's Last Exam 19.8 vs 21.7, HMMT 2025 83.6 vs 86.1)
✗No hosted/managed first-party API on Hugging Face — users must self-host or use third-party inference providers

Frequently Asked Questions

What is DeepSeek Sparse Attention and why does it matter?+

DeepSeek Sparse Attention (DSA) is a fine-grained sparse attention mechanism introduced in V3.2-Exp that replaces the dense attention used in V3.1-Terminus. It delivers substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality. For teams processing long documents, codebases, or extended agent traces, this translates directly into lower GPU memory pressure and faster throughput. According to DeepSeek, this is the first time fine-grained sparse attention has been achieved at this scale.

How much does DeepSeek V3.2-Exp cost to use?+

The model weights and repository are released under the MIT License, meaning the model itself is free to download, modify, and deploy commercially. The actual cost is the GPU infrastructure required to serve it — the 671B-parameter MoE typically runs with tensor parallelism of 8 across high-memory GPUs like the H200. Compared to per-token API pricing from closed-weight competitors, self-hosting V3.2-Exp can dramatically reduce inference costs at scale, but small-volume users may find third-party hosted inference providers more economical.

What hardware do I need to run DeepSeek V3.2-Exp?+

DeepSeek officially provides Docker images targeting NVIDIA H200 GPUs, AMD MI350 accelerators, and Ascend NPUs (A2 and A3 variants). The recommended SGLang launch configuration uses tensor parallelism of 8 with data parallelism of 8 and DP attention enabled. Practically, this means an 8-GPU node with high-bandwidth memory is the minimum reasonable deployment target. Quantized variants distributed by the community via llama.cpp, Ollama, and LM Studio can lower the bar, though with quality and context-length tradeoffs.

How does V3.2-Exp compare to V3.1-Terminus on benchmarks?+

DeepSeek deliberately aligned the training configurations of the two models to isolate the effect of sparse attention. Results are essentially a wash with small movements in either direction: MMLU-Pro is identical at 85.0, AIME 2025 improves to 89.3 (from 88.4), Codeforces rating rises to 2121 (from 2046), and SimpleQA edges up to 97.1. Slight regressions appear on GPQA-Diamond (79.9 vs 80.7) and Humanity's Last Exam (19.8 vs 21.7). The point of the release is the efficiency win from DSA, not benchmark improvements.

Is DeepSeek V3.2-Exp safe to use in production?+

DeepSeek explicitly labels this as an experimental release intended to validate optimizations for the next-generation architecture, not as a stable production model. A notable RoPE implementation bug in the indexer module was identified and patched on 2025-11-17, which is the type of rough edge typical of research releases. Teams that need production stability should weigh whether to wait for the non-experimental successor or to pin a specific commit and validate thoroughly. For research, evaluation, and internal tooling the MIT license and benchmark parity make it an attractive choice.

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on DeepSeek V3.2-Exp and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

Released in 2025 as an intermediate step toward DeepSeek's next-generation architecture, introducing DeepSeek Sparse Attention (DSA) for efficient long-context processing. On 2025-11-17, DeepSeek patched a Rotary Position Embedding (RoPE) implementation discrepancy in the indexer module that could degrade model performance — the input tensor to RoPE in the indexer requires a non-interleaved layout, while RoPE in the MLA module expects an interleaved layout. The model has accumulated 213,035 downloads in the last month on Hugging Face.

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try DeepSeek V3.2-Exp Today

Get started with DeepSeek V3.2-Exp and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about DeepSeek V3.2-Exp

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Key Features

DeepSeek Sparse Attention (DSA)+

671B-parameter MoE Architecture+

Multi-Stack Serving Support+

Open-Source Companion Kernels+

MIT License+

Pricing Plans

Open Weights (MIT License)

✓Full 671B-parameter model weights downloadable from Hugging Face
✓MIT License with no commercial-use restrictions
✓Access to inference demo code, vLLM, and SGLang serving recipes
✓Open-source companion kernels (TileLang, DeepGEMM, FlashMLA)
✓Docker images for H200, MI350, and Ascend NPU platforms

Ready to get started with DeepSeek V3.2-Exp?

View Pricing Options →

Best Use Cases

🎯

Self-hosted long-context inference for legal, financial, or codebase analysis where DSA's efficiency reduces GPU costs at extended sequence lengths

⚡

Research labs studying sparse attention mechanisms — TileLang, DeepGEMM, and FlashMLA kernels are released alongside the weights for reproducibility

🔧

Building agentic tool-use systems leveraging the model's strong BrowseComp (40.1), SimpleQA (97.1), and Terminal-bench (37.7) scores

🚀

Coding assistants and competitive programming applications backed by the 2121 Codeforces rating and 74.1 LiveCodeBench score

💡

Math and STEM tutoring tools taking advantage of 89.3 AIME 2025 and 85.0 MMLU-Pro performance

🔄

Enterprises requiring MIT-licensed weights to avoid commercial restrictions imposed by other open-weight licenses like Llama's

Limitations & What It Can't Do

We believe in transparent reviews. Here's what DeepSeek V3.2-Exp doesn't handle well:

⚠Marketed and documented as experimental — not the recommended choice when stability and long-term support matter

⚠Resource intensity of 671B MoE makes single-GPU and consumer-hardware deployment impractical without aggressive quantization

⚠Slight benchmark regressions vs V3.1-Terminus on several reasoning evaluations including GPQA-Diamond, HMMT 2025, and Humanity's Last Exam

⚠No first-party hosted API on the Hugging Face listing — users must self-host or rely on third-party inference providers

⚠Inference demo code shipped a RoPE bug in the indexer module in earlier versions, requiring users to track upstream fixes

Pros & Cons

✓ Pros

✓Fully open weights under permissive MIT License — usable for commercial deployment without restrictions
✓DeepSeek Sparse Attention delivers substantial long-context inference efficiency gains while maintaining benchmark parity with V3.1-Terminus
✓Strong reasoning benchmarks: 89.3 on AIME 2025, 2121 Codeforces rating, 85.0 on MMLU-Pro
✓Day-0 support across vLLM, SGLang, and Docker Model Runner with OpenAI-compatible APIs simplifies integration
✓Hardware flexibility — official Docker images for NVIDIA H200, AMD MI350, and Ascend NPU platforms
✓Companion open-source kernels (DeepGEMM, FlashMLA, TileLang) released alongside the model for reproducibility

✗ Cons

✗Explicitly experimental — DeepSeek warns it is an intermediate step, not a stable production release
✗671B-parameter MoE requires multi-GPU infrastructure (typical deployments use TP=8, DP=8) putting it out of reach for solo developers without cloud access
✗A November 2025 RoPE implementation bug in the indexer module shipped in earlier demo code, illustrating the rough edges of an experimental release
✗Slight regressions vs V3.1-Terminus on some benchmarks (GPQA-Diamond 79.9 vs 80.7, Humanity's Last Exam 19.8 vs 21.7, HMMT 2025 83.6 vs 86.1)
✗No hosted/managed first-party API on Hugging Face — users must self-host or use third-party inference providers