Monday, April 20, 2026

MixAtlas: Uncertainty-aware Knowledge Combination Optimization for Multimodal LLM Midtraining


This paper was accepted on the Workshop on Navigating and Addressing Knowledge Issues for Basis Fashions (NADPFM) at ICLR 2026.

Principled area reweighting can considerably enhance pattern effectivity and downstream generalization; nonetheless, data-mixture optimization for multimodal pretraining stays underexplored. Present multimodal coaching recipes tune mixtures from solely a single perspective corresponding to knowledge format or activity kind. We introduce MixAtlas, a principled framework for compute-efficient multimodal combination optimization by way of systematic area decomposition and smaller proxy fashions. MixAtlas factorizes the coaching knowledge alongside two interpretable axes – picture ideas and activity supervision – enabling interpretable combination management and fine-grained attribution of downstream efficiency to particular domains inside every axis. Utilizing small proxy fashions and a Gaussian-process surrogate, we discover the combination area at 1/a hundredth the price of full-scale coaching. The ensuing mixtures yield substantial enhancements: as much as 3 sooner convergence and constant features of 2-5% throughout numerous benchmarks over current approaches, with particularly sturdy boosts on text-rich benchmarks like ChartQA (+10%) and TextVQA (+13%). Importantly, we present that mixtures obtained by way of smaller proxy fashions switch to bigger scale mannequin coaching, preserving each effectivity and accuracy features. General, MixAtlas makes multimodal combination optimization sensible and interpretable, offering concrete, compute-efficient recipes for coaching next-generation MLLMs.

Related Articles

Latest Articles