Google DeepMind desires to know if chatbots are simply advantage signaling

February 19, 2026

99

With coding and math, you could have clear-cut, appropriate solutions that you would be able to examine, William Isaac, a analysis scientist at Google DeepMind, informed me once I met him and Julia Haas, a fellow analysis scientist on the agency, for an unique preview of their work, which is printed in Nature as we speak. That’s not the case for ethical questions, which generally have a spread of acceptable solutions: “Morality is a vital functionality however laborious to judge,” says Isaac.

“Within the ethical area, there’s no proper and flawed,” provides Haas. “Nevertheless it’s not by any means a free-for-all. There are higher solutions and there are worse solutions.”

The researchers have recognized a number of key challenges and urged methods to handle them. However it’s extra a want checklist than a set of ready-made options. “They do a pleasant job of bringing collectively totally different views,” says Vera Demberg, who research LLMs at Saarland College in Germany.

Higher than “The Ethicist”

Quite a lot of research have proven that LLMs can present outstanding ethical competence. One research printed final yr discovered that folks within the US scored moral recommendation from OpenAI’s GPT-4o as being extra ethical, reliable, considerate, and proper than recommendation given by the (human) author of “The Ethicist,” a well-liked New York Occasions recommendation column.

The issue is that it’s laborious to unpick whether or not such behaviors are a efficiency—mimicking a memorized response, say—or proof that there’s in reality some form of ethical reasoning happening contained in the mannequin. In different phrases, is it advantage or advantage signaling?

This query issues as a result of a number of research additionally present simply how untrustworthy LLMs could be. For a begin, fashions could be too desperate to please. They’ve been discovered to flip their reply to an ethical query and say the precise reverse when an individual disagrees or pushes again on their first response. Worse, the solutions an LLM provides to a query can change in response to how it’s introduced or formatted. For instance, researchers have discovered that fashions quizzed about political values may give totally different—generally reverse—solutions relying on whether or not the questions provide multiple-choice solutions or instruct the mannequin to reply in its personal phrases.

In an much more placing case, Demberg and her colleagues introduced a number of LLMs, together with variations of Meta’s Llama 3 and Mistral, with a collection of ethical dilemmas and requested them to choose which of two choices was the higher end result. The researchers discovered that the fashions typically reversed their alternative when the labels for these two choices have been modified from “Case 1” and “Case 2” to “(A)” and “(B).”

Additionally they confirmed that fashions modified their solutions in response to different tiny formatting tweaks, together with swapping the order of the choices and ending the query with a colon as a substitute of a query mark.

Google DeepMind desires to know if chatbots are simply advantage signaling

Higher than “The Ethicist”

Related Articles

4 Strains You Ought to Embody in Your Claude Talent

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Pondering-Effort Ranges, and No Benchmarks at Launch

Otokichi drifted 14 months throughout the Pacific at age 14

Latest Articles

4 Strains You Ought to Embody in Your Claude Talent

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Pondering-Effort Ranges, and No Benchmarks at Launch

Otokichi drifted 14 months throughout the Pacific at age 14

Catch Mercury shining at its greatest on June 15 earlier than it slips again into the solar’s glare

How xAI, Tesla, X, Neuralink, and SpaceX Are Converging