Residual Context Diffusion Language Fashions

July 4, 2026

1

Diffusion Giant Language Fashions (dLLMs) have emerged as a promising various to purely autoregressive language fashions as a result of they will decode a number of tokens in parallel. Nevertheless, state-of-the-art block-wise dLLMs depend on a “remasking” mechanism that decodes solely essentially the most assured tokens and discards the remainder, successfully losing computation. We display that recycling computation from the discarded tokens is helpful, as these tokens retain contextual info helpful for subsequent decoding iterations. In gentle of this, we suggest Residual Context Diffusion (RCD), a module that converts these discarded token representations into contextual residuals and injects them again for the following denoising step. RCD makes use of a decoupled two-stage coaching pipeline to bypass the reminiscence bottlenecks related to backpropagation. We validate our technique on each lengthy CoT reasoning (SDAR) and quick CoT instruction following (LLaDA) fashions. We display that an ordinary dLLM could be effectively transformed to the RCD paradigm with merely ∼1 billion tokens. RCD persistently improves frontier dLLMs by 5–10 factors in accuracy with minimal further computation overhead throughout a variety of benchmarks. Notably, on essentially the most difficult AIME duties, RCD almost doubles baseline accuracy and attains as much as 4–5x fewer denoising steps at equal accuracy ranges.

† College of California, Berkeley
* Equal contribution
‡ Equal advising

Residual Context Diffusion Language Fashions

Related Articles

HALO: Debug AI Agent Traces Regionally And not using a Cloud Subscription

Coping with strategic tech rejections

A tool that revives eyeballs from lifeless donors may make eye transplants doable

Latest Articles

HALO: Debug AI Agent Traces Regionally And not using a Cloud Subscription

Coping with strategic tech rejections

A tool that revives eyeballs from lifeless donors may make eye transplants doable

Block advertisements & extra on 9 units for all times — AdGuard’s Household Plan is a one-time $28 by 7/5

Scientists make quantum time movement backward in gorgeous physics breakthrough