Monday, March 30, 2026

Pondering into the Future: Latent Lookahead Coaching for Transformers


This paper was accepted on the Workshop on Latent & Implicit Pondering – Going Past CoT Reasoning 2026 at ICLR.

Autoregressive language fashions educated with next-token prediction generate textual content by sampling one discrete token at a time. Though very scalable, this goal forces the mannequin to commit at each step, stopping it from exploring or reflecting upon a number of believable continuations. Moreover, the compute allocation throughout tokens is uniform; each token is shaped primarily based on a single forward-pass, doubtlessly limiting the mannequin’s expressiveness in instances the place tough tokens require inherently extra compute. In the direction of addressing these limitations, we introduce latent lookahead, a coaching technique that allows fashions to “suppose” earlier than producing: at chosen positions within the sequence, earlier than committing to the subsequent token, the mannequin performs a multi-step lookahead in latent house. Extra exactly, as an alternative of sampling future tokens, we leverage the community’s latent house by recursively feeding its hidden states again into the context for τ steps, investing extra compute on predicting that token. This produces τ latent predictions which can be supervised towards the subsequent τ ground-truth tokens, encouraging the mannequin to “lookahead” and refine its prediction. We present that latent lookahead considerably outperforms each autoregressive and non-autoregressive baselines on planning duties comparable to maze fixing, Sudoku, and ProsQA, the place foresight is important.

Related Articles

Latest Articles