SafetyPairs: Isolating Security Essential Picture Options with Counterfactual Picture Era

March 25, 2026

57

This paper was accepted on the Principled Design for Reliable AI — Interpretability, Robustness, and Security throughout Modalities Workshop at ICLR 2026.

What precisely makes a specific picture unsafe? Systematically differentiating between benign and problematic photos is a difficult drawback, as delicate adjustments to a picture, resembling an insulting gesture or image, can drastically alter its security implications. Nevertheless, current picture security datasets are coarse and ambiguous, providing solely broad security labels with out isolating the precise options that drive these variations. We introduce SafetyPairs, a scalable framework for producing counterfactual pairs of photos, that differ solely within the options related to the given security coverage, thus flipping their security label. By leveraging picture modifying fashions, we make focused adjustments to photographs that alter their security labels whereas leaving safety-irrelevant particulars unchanged. Utilizing SafetyPairs, we assemble a brand new security benchmark, which serves as a strong supply of analysis knowledge that highlights weaknesses in vision-language fashions’ talents to tell apart between subtly completely different photos. Past analysis, we discover our pipeline serves as an efficient knowledge augmentation technique that improves the pattern effectivity of coaching light-weight guard fashions. We launch a benchmark containing over 3,020 SafetyPair photos spanning a various taxonomy of 9 security classes, offering the primary systematic useful resource for learning fine-grained picture security distinctions.

† Georgia Institute of Know-how, USA
** Work carried out whereas at Apple
‡ Equal senior authorship

SafetyPairs: Isolating Security Essential Picture Options with Counterfactual Picture Era

Related Articles

How you can Set Up Claude Code Channels Regionally

MongoDB targets AI’s retrieval downside

Greatest Vector Databases in 2026: Pricing, Scale Limits, and Structure Tradeoffs Throughout 9 Main Programs

Latest Articles

How you can Set Up Claude Code Channels Regionally

MongoDB targets AI’s retrieval downside

Greatest Vector Databases in 2026: Pricing, Scale Limits, and Structure Tradeoffs Throughout 9 Main Programs

Motorola Razr Extremely 2026 vs. Samsung Galaxy S26 Extremely

“Can’t be defined” – New extremely stainless-steel stuns researchers