On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

March 4, 2026

1

With the elevated deployment of huge language fashions (LLMs), one concern is their potential misuse for producing dangerous content material. Our work research the alignment problem, with a deal with filters to forestall the technology of unsafe data. Two pure factors of intervention are the filtering of the enter immediate earlier than it reaches the mannequin, and filtering the output after technology. Our important outcomes show computational challenges in filtering each prompts and outputs. First, we present that there exist LLMs for which there aren’t any environment friendly immediate filters: adversarial prompts that elicit dangerous conduct may be simply constructed, that are computationally indistinguishable from benign prompts for any environment friendly filter. Our second important end result identifies a pure setting during which output filtering is computationally intractable. All of our separation outcomes are below cryptographic hardness assumptions. Along with these core findings, we additionally formalize and research relaxed mitigation approaches, demonstrating additional computational boundaries. We conclude that security can’t be achieved by designing filters exterior to the LLM internals (structure and weights); specifically, black-box entry to the LLM is not going to suffice. Based mostly on our technical outcomes, we argue that an aligned AI system’s intelligence can’t be separated from its judgment.

† Ludwig-Maximilians-Universität in Munich (MCML)
‡ College of California, Berkeley
§ JPSM College of Maryland
¶ Stanford College

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

Related Articles

How you can Construct a Multi-Agent System: A Sensible Blueprint

How an AI Course Can Assist You Pivot After a Layoff

CISA flags VMware Aria Operations RCE flaw as exploited in assaults

Latest Articles

How you can Construct a Multi-Agent System: A Sensible Blueprint

How an AI Course Can Assist You Pivot After a Layoff

CISA flags VMware Aria Operations RCE flaw as exploited in assaults

NASA repairs Artemis 2 rocket, continues eyeing April moon launch

Fever after journey? Right here’s what to inform your physician and why it issues