Friday, October 24, 2025

EncQA: Benchmarking Imaginative and prescient-Language Fashions on Visible Encodings for Charts


Multimodal vision-language fashions (VLMs) proceed to realize ever-improving scores on chart understanding benchmarks. But, we discover that this progress doesn’t totally seize the breadth of visible reasoning capabilities important for decoding charts. We introduce EncQA, a novel benchmark knowledgeable by the visualization literature, designed to offer systematic protection of visible encodings and analytic duties which can be essential for chart understanding. EncQA supplies 2,076 artificial question-answer pairs, enabling balanced protection of six visible encoding channels (place, size, space, coloration quantitative, coloration nominal, and form) and eight duties (discover extrema, retrieve worth, discover anomaly, filter values, compute derived worth actual, compute derived worth relative, correlate values, and correlate values relative). Our analysis of 9 state-of-the-art VLMs reveals that efficiency varies considerably throughout encodings inside the identical process, in addition to throughout duties. Opposite to expectations, we observe that efficiency doesn’t enhance with mannequin dimension for a lot of task-encoding pairs. Our outcomes recommend that advancing chart understanding requires focused methods addressing particular visible reasoning gaps, slightly than solely scaling up mannequin or dataset dimension.

Related Articles

Latest Articles