DSO: Direct Steering Optimization for Bias Mitigation

April 30, 2026

73

Generative fashions are sometimes deployed to make selections on behalf of customers, reminiscent of vision-language fashions (VLMs) figuring out which individual in a room is a physician to assist visually impaired people. But, VLM selections are influenced by the perceived demographic attributes of individuals within the enter, which might result in biased outcomes like failing to determine girls as docs. Furthermore, when lowering bias results in efficiency loss, customers might have various wants for balancing bias mitigation with total mannequin capabilities, highlighting the demand for strategies that allow controllable bias discount throughout inference. Activation steering is a well-liked strategy for inference-time controllability that has proven potential in inducing safer conduct in massive language fashions (LLMs). Nonetheless, we observe that present steering strategies battle to right biases, the place equiprobable outcomes throughout demographic teams are required. To deal with this, we suggest Direct Steering Optimization (DSO) which makes use of reinforcement studying to seek out linear transformations for steering activations, tailor-made to mitigate bias whereas sustaining management over mannequin efficiency. We show that DSO achieves state-of-the-art trade-off between equity and capabilities on each VLMs and LLMs, whereas providing practitioners inference-time management over the trade-off. General, our work highlights the advantage of designing steering methods which might be instantly optimized to regulate mannequin conduct, offering more practical bias intervention than strategies that depend on pre-defined heuristics for controllability.

† Carnegie Mellon College
‡ Equal contribution
** Work finished whereas at Apple

DSO: Direct Steering Optimization for Bias Mitigation

Related Articles

Parting pictures — following up on some latest threads

A First Take a look at Scroll-Triggered Animations

AI Brokers Want New Safety: Cisco Pronounces Intent to Purchase WideField Safety

Latest Articles

Parting pictures — following up on some latest threads

A First Take a look at Scroll-Triggered Animations

AI Brokers Want New Safety: Cisco Pronounces Intent to Purchase WideField Safety

Okta’s Harish Peri on what it takes for CIOs to safe AI brokers

Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interplay Fashions for Quick Multilingual Search Throughout 11 Languages