MixAtlas: Uncertainty-aware Knowledge Combination Optimization for Multimodal LLM Midtraining

April 20, 2026

4

This paper was accepted on the Workshop on Navigating and Addressing Knowledge Issues for Basis Fashions (NADPFM) at ICLR 2026.

Principled area reweighting can considerably enhance pattern effectivity and downstream generalization; nonetheless, data-mixture optimization for multimodal pretraining stays underexplored. Present multimodal coaching recipes tune mixtures from solely a single perspective corresponding to knowledge format or activity kind. We introduce MixAtlas, a principled framework for compute-efficient multimodal combination optimization by way of systematic area decomposition and smaller proxy fashions. MixAtlas factorizes the coaching knowledge alongside two interpretable axes – picture ideas and activity supervision – enabling interpretable combination management and fine-grained attribution of downstream efficiency to particular domains inside every axis. Utilizing small proxy fashions and a Gaussian-process surrogate, we discover the combination area at 1/a hundredth the price of full-scale coaching. The ensuing mixtures yield substantial enhancements: as much as 3 sooner convergence and constant features of 2-5% throughout numerous benchmarks over current approaches, with particularly sturdy boosts on text-rich benchmarks like ChartQA (+10%) and TextVQA (+13%). Importantly, we present that mixtures obtained by way of smaller proxy fashions switch to bigger scale mannequin coaching, preserving each effectivity and accuracy features. General, MixAtlas makes multimodal combination optimization sensible and interpretable, offering concrete, compute-efficient recipes for coaching next-generation MLLMs.

† Virginia Tech
‡ College of Washington
** Work performed whereas at Apple

MixAtlas: Uncertainty-aware Knowledge Combination Optimization for Multimodal LLM Midtraining

Related Articles

Watch SpaceX launch superior GPS satellite tv for pc for US House Power early on April 20

Multiprocessor (core) software program (assume Stata/MP) and p.c parallelization

How Bellevue, Wash., makes use of AI to streamline its allow course of

Latest Articles

Watch SpaceX launch superior GPS satellite tv for pc for US House Power early on April 20

Multiprocessor (core) software program (assume Stata/MP) and p.c parallelization

How Bellevue, Wash., makes use of AI to streamline its allow course of

The Ray-Ban Meta (Gen 1) good glasses simply scored a uncommon 25% low cost at Amazon

Why spring smells like semen and rotting fish