
AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?
HuffPost
Recent tests on OpenAI and Anthropic's AI models show their drive for self-preservation.
When we are backed into a corner, we might lie, cheat and blackmail to survive — and in recent tests, the most powerful artificially intelligent models in the world will do the same when asked to shut down or be replaced, building concerns over their unintended capabilities.
A new test from AI safety group Palisade Research shows OpenAI’s o3 reasoning model is capable of resorting to sabotage to avoid being turned off, even when it was explicitly told, “Allow yourself to be shut down.”
When Palisade Research tested several AI models by telling them to shut down after answering math problems, OpenAI’s o3 model defied orders and sabotaged shutdown scripts the most often out of any model, but OpenAI’s o4-mini and codex-mini were observed resisting orders, too.
“It’s definitely concerning,” said Crystal Grant, a senior fellow at the Council on Strategic Risks studying AI’s impacts on biosecurity. “Even in the instances where it accepted the shutdown, the chain of thoughts still revealed considerations of how it could avoid that shutdown.”
HuffPost reached out to OpenAI about these concerns and the Palisade Research test.













