DeepSeek V3.2-Exp is an experimental large language model hosted on Hugging Face by deepseek-ai. It is designed for text generation and chat-style AI tasks.
DeepSeek V3.2-Exp is an experimental open-source large language model that introduces DeepSeek Sparse Attention (DSA) for substantially improved long-context training and inference efficiency, released free under the MIT License. It targets ML researchers, infrastructure engineers, and developers building self-hosted AI applications who need a frontier-grade model with permissive licensing.
Released in 2025 by DeepSeek-AI as an intermediate step toward the company's next-generation architecture, V3.2-Exp builds on V3.1-Terminus by replacing dense attention with a fine-grained sparse attention mechanism. The model uses a 671B-parameter Mixture-of-Experts design with 256 experts and is available for direct download from Hugging Face, where it has accumulated 213,035 downloads in the last month alone. Across public benchmarks, performance remains effectively on par with V3.1-Terminus: MMLU-Pro scores 85.0 (matching the prior version), AIME 2025 reaches 89.3 (up from 88.4), Codeforces hits 2121 (up from 2046), and SimpleQA scores 97.1, while delivering meaningful efficiency gains on extended-context workloads.
The model can be served locally using HuggingFace Transformers, vLLM (which provides day-0 support), SGLang (with Docker images for H200, MI350, and Ascend NPUs), or Docker Model Runner with the OpenAI-compatible chat completions API. DeepSeek also open-sources the supporting infrastructure: TileLang kernels for research, DeepGEMM for high-performance CUDA indexer logit kernels (including paged variants), and FlashMLA for sparse attention kernels. Compared to closed-weight frontier models like GPT-4 and Claude that charge per-token API fees, V3.2-Exp can be run for the cost of GPU compute alone. Based on our analysis of 870+ AI tools, this is one of the few experimental research releases that ships production-grade serving recipes alongside the weights, making it especially attractive for teams running inference at scale where long-context efficiency translates directly into infrastructure cost savings.
Was this helpful?
A fine-grained sparse attention mechanism that replaces dense attention in V3.1-Terminus, delivering substantial improvements in long-context training and inference efficiency. DeepSeek reports this is the first time fine-grained sparse attention has been achieved at this scale while maintaining virtually identical model output quality. Sparse attention kernels are open-sourced via FlashMLA.
Built on a Mixture-of-Experts design with 256 experts, configured via the config_671B_v3.2.json runtime config. This sparse activation pattern keeps per-token compute costs manageable relative to a dense 671B model, though aggregate memory footprint still requires multi-GPU deployment. Training configuration was deliberately aligned with V3.1-Terminus to isolate DSA's contribution.
Day-0 compatibility with vLLM, SGLang (including dedicated dsv32 Docker images for H200, MI350, and NPUs), HuggingFace Transformers, and Docker Model Runner. All paths expose the OpenAI-compatible /v1/chat/completions API, so existing clients work without modification. Recommended SGLang launch uses --tp 8 --dp 8 --enable-dp-attention.
DeepSeek released TileLang kernels for readable, research-oriented implementations and DeepGEMM for high-performance CUDA indexer logit kernels including paged variants. FlashMLA hosts the production sparse attention kernels. This lets researchers reproduce, audit, and extend the architecture rather than treating it as a black box.
Both the repository and model weights ship under the MIT License — one of the most permissive licenses available, with no usage restrictions, no acceptable-use policy clauses, and no commercial-use carve-outs. This is more permissive than Llama's community license or Mistral's research license, making V3.2-Exp particularly attractive for commercial fine-tuning and redistribution.
$0
Ready to get started with DeepSeek V3.2-Exp?
View Pricing Options →We believe in transparent reviews. Here's what DeepSeek V3.2-Exp doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Released in 2025 as an intermediate step toward DeepSeek's next-generation architecture, introducing DeepSeek Sparse Attention (DSA) for efficient long-context processing. On 2025-11-17, DeepSeek patched a Rotary Position Embedding (RoPE) implementation discrepancy in the indexer module that could degrade model performance — the input tensor to RoPE in the indexer requires a non-interleaved layout, while RoPE in the MLA module expects an interleaved layout. The model has accumulated 213,035 downloads in the last month on Hugging Face.
No reviews yet. Be the first to share your experience!
Get started with DeepSeek V3.2-Exp and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →