Wednesday, January 14, 2026

GIE-Bench: In the direction of Grounded Analysis for Textual content-Guided Picture Modifying


Modifying photographs utilizing pure language directions has turn out to be a pure and expressive solution to modify visible content material; but, evaluating the efficiency of such fashions stays difficult. Present analysis approaches usually depend on image-text similarity metrics like CLIP, which lack precision. On this work, we introduce a brand new benchmark designed to guage text-guided picture modifying fashions in a extra grounded method, alongside two essential dimensions: (i) useful correctness, assessed by way of robotically generated multiple-choice questions that confirm whether or not the meant change was efficiently utilized; and (ii) picture content material preservation, which ensures that non-targeted areas of the picture stay visually constant utilizing an object-aware masking method and preservation scoring. The benchmark contains over 1000 high-quality modifying examples throughout 20 numerous content material classes, every annotated with detailed modifying directions, analysis questions, and spatial object masks. We conduct a large-scale research evaluating GPT-Picture-1, the newest flagship within the text-guided picture modifying house, in opposition to a number of state-of-the-art modifying fashions, and validate our automated metrics in opposition to human scores. Outcomes present that GPT-Picture-1 leads in instruction-following accuracy, however usually over-modifies irrelevant picture areas, highlighting a key trade-off within the present mannequin habits. GIE-Bench offers a scalable, reproducible framework for advancing extra correct analysis of text-guided picture modifying.

Related Articles

Latest Articles