🛠️AI Tools
DeepSeek-
A
AdminA brand new model from DeepSeek-
The model was trained on the fresh V3.1-Terminus, but with a slightly modified attention mechanism, DeepSeek Sparse Attention. In short, now each token pays attention to 2048 others instead of all previous ones, and based on a slightly differently calculated product of Q and K. Replacing the previously used mechanism with the new one does not require training from scratch — V3.2 is the same as V3.1, further trained on about a trillion tokens.
This significantly reduces the cost of maintaining a long context — which is very important in the era of reasoning models; I think the main reason for moving in this direction is longer chains of reasoning for tasks requiring hundreds of tool calls.
For a million generated tokens, the new model will cost $0.42 (instead of $1.68 on V3.1).
Metrics show that quality does not suffer.
An article with technical details on how the new Attention works is here - https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/mai
Related Topics
AI ToolsTechnologyInnovation