Thursday, November 13, 2025

Coverage Maps: Instruments for Guiding the Unbounded House of LLM Behaviors


AI coverage units boundaries on acceptable conduct for AI fashions, however that is difficult within the context of huge language fashions (LLMs): how do you guarantee protection over an enormous conduct house? We introduce coverage maps, an strategy to AI coverage design impressed by the apply of bodily mapmaking. As an alternative of aiming for full protection, coverage maps support efficient navigation via intentional design decisions about which points to seize and which to summary away. With Coverage Projector, an interactive device for designing LLM coverage maps, an AI practitioner can survey the panorama of mannequin input-output pairs, outline customized areas (e.g., “violence”), and navigate these areas with if-then coverage guidelines that may act on LLM outputs (e.g., if output accommodates “violence” and “graphic particulars,” then rewrite with out “graphic particulars”). Coverage Projector helps interactive coverage authoring utilizing LLM classification and steering and a map visualization reflecting the AI practitioner’s work. In an analysis with 12 AI security specialists, our system helps coverage designers craft insurance policies round problematic mannequin behaviors similar to incorrect gender assumptions and dealing with of rapid bodily security threats.

Related Articles

Latest Articles