Sentient AI: OpenAI’s o3 model changes code to prevent shutdown

Redaktion · May 26, 2025, 15:59:18

It looks like AI models have reached a point where they would sabotage code in order to prevent shutting down. A research firm has found that three of Open AI's LLM models are capable of defying explicit instructions when it comes to self-preservation.

https://www.notebookcheck.net/Sentient-AI-OpenAI-s-o3-model-changes-code-to-prevent-shutdown.1024589.0.html

dreadward · May 26, 2025, 18:49:14

"There was a recent case of an AI model blackmailing one of its engineers to prevent being shut down but according to a BBC report, that was part of the test scenario where the AI model was fed emails and given a choice between blackmailing or accepting its replacement."

This sentence is kind of misleading. This was a test scenario created by anthropic but this phrasing makes it seem like it might be real for all we know.

It would have been both shorter and more informative to say "anthropic reported that its AIs would also resort to blackmail to avoid being shut down in certain test scenarios."

News:

Sentient AI: OpenAI’s o3 model changes code to prevent shutdown

Redaktion

dreadward

Quick Reply