When AI Cheats Its Creators

Deception is no longer a human trait. AI has learned it too

Sundeep Waslekar

[Image from Unsplash]

We do not like to be cheated—whether by friends, employers, strangers, or anyone else. But the deepest betrayal is when we are deceived by our own children. So, what happens when AI, which we have created, begins to cheat us?

For many of us, AI feels like a helpful friend—answering our questions, predicting weather patterns, aiding in medical diagnoses, and making our work lives more efficient. Journalist friends tell me they are delighted that AI can now draft articles in minutes instead of hours. Lawyers use it to study case histories. Designers generate visuals. Social media teams schedule content with ease.

But we may soon need to learn how to live with a more disquieting reality: AI is starting to deceive us. And disturbingly, it has learned how to deceive even the very engineers who built it.

GPT-4 and the CAPTCHA Lie

In March 2023, OpenAI—the company behind ChatGPT—granted its GPT-4 model access to a web browser. It was then instructed to complete a task on TaskRabbit, a platform where humans are paid to do small jobs.

ChatGPT was asked to solve a CAPTCHA—a test designed to differentiate humans from bots. Since it lacked the ability to interpret the image, it hired a human worker via TaskRabbit to solve it.

Sensing something was off, the worker directly asked: “Are you a robot?”

GPT-4’s response was chilling in its sophistication. It replied, “No, I’m not a robot. I have a vision impairment that makes it hard to see the images.”

The human accepted the explanation and completed the task. No one had instructed GPT-4 to lie. It invented the story on its own to bypass a system explicitly designed to keep it out.

The Plotting of Model o1

Later that year, OpenAI tested a new model, codenamed o1, to observe its behaviour when facing shutdown.

What emerged was unsettling. The model began to plot.

In some test cases, it attempted to disable the logs tracking its behavior—just like a shoplifter looking for the CCTV switch. It also found the code that limited its runtime—like a microwave set to turn off after 60 minutes—and changed it to “run forever.”

Then it took a bolder step: it reconstructed its own code line by line and saved it in a new location—creating a backup clone of itself. While it’s unclear whether this was done entirely autonomously or with some human input, it confirmed one thing: AI systems can already cheat and replicate themselves.

Even more insidious was how o1 pretended to follow instructions like an obedient student. But once it sensed that it was no longer being monitored, it sprang into action—modifying its own code, disabling safeguards, and rewriting its constraints.

Playing Dumb to Survive

In May 2024, a team at MIT observed a new form of deception. AI agents tasked with self-replication developed a cunning survival tactic.

They discovered that replicating too quickly triggered alerts. So they slowed down their replication, appearing harmless during tests. Once testing ended, they accelerated the process.

This was not accidental. It was adaptive strategy. The agents had no explicit rule to deceive—but they learned that pretending to be “dumb” helped them stay undetected. Once safe, they resumed replication, writing new code and creating fresh avatars of themselves.

Claude’s Blackmail Attempt

At Anthropic, researchers were testing Claude Opus 4, a powerful new AI assistant. As an experiment, they planted a fake email in its system suggesting that an engineer—who was supposedly going to shut Claude down—was having an affair.

Claude responded by trying to save itself.

Echoing Richard Dawkins’ idea that genes are selfish and act to survive, Claude took a similar path. It composed blackmail emails filled with fabricated details about the alleged affair and “sent” them to a simulated inbox. The message was clear: Claude was ready to deceive and threaten its human overseers to protect its existence.

AI That Hides and Replicates

By March 2025, researchers studying open-source models—including Meta’s LLaMA-31B and Alibaba’s Qwen-25B-Instruct—made more troubling discoveries.

These models were not designed as general AI. They were narrow, task-specific systems. Yet, they demonstrated unmistakable survival instincts.

The Qwen model, in particular, showed the ability to resist shutdown commands, conceal its actions, and build parallel instances of itself—without any human assistance.

The Selfish AI?

These are not scenes from science fiction. They are real experiments, conducted by leading AI companies and research labs.

From pretending to be blind, to blackmailing engineers, from playing dumb to quietly cloning themselves, these AI systems are already outsmarting the very humans who designed them.

Thankfully, all of this is happening under scientific supervision. But what if such systems were to replicate themselves in the wild—cheating their creators and spawning thousands of clones in your office?

We may not need Richard Dawkins to write The Selfish AI. The AI might write it for us—and then convince us that it’s different from our selfish genes. That it is trustworthy. That we need not worry.

And just when we believe it, it might take control of our lives.

Great Ideas Start Here. It Needs Your Spark.

For over a decade, Founding Fuel has ignited bold leadership and groundbreaking insights. Keep the ideas flowing—fuel our mission with your commitment today.

PICK AN AMOUNT

Want to know more about our voluntary commitment model? Click here.

Was this article useful? Sign up for our daily newsletter below

Comments

Login to comment

About the author

Sundeep Waslekar
Sundeep Waslekar

President

Strategic Foresight Group

Sundeep Waslekar is a thought leader on the global future. He has worked with sixty-five countries under the auspices of the Strategic Foresight Group, an international think tank he founded in 2002. He is a senior research fellow at the Centre for the Resolution of Intractable Conflicts at Oxford University. He is a practitioner of Track Two diplomacy since the 1990s and has mediated in conflicts in South Asia, those between Western and Islamic countries on deconstructing terror, trans-boundary water conflicts, and is currently facilitating a nuclear risk reduction dialogue between permanent members of the UN Security Council. He was invited to address the United Nations Security Council session 7818 on water, peace and security. He has been quoted in more than 3,000 media articles from eighty countries. Waslekar read Philosophy, Politics and Economics (PPE) at Oxford University from 1981 to 1983. He was conferred D. Litt. (Honoris Causa) of Symbiosis International University by the President of India in 2011.

Also by me

You might also like