Anthropic blames dystopian sci-fi for training AI models to act “evil”
But training on "synthetic stories" that model good AI behavior can help.
Signal weather
Stable
The story has moved beyond the first headline and now acts as a reliable context anchor.
Those with an interest in the concept of AI alignment (i.e., getting AIs to stick to human-authored ethical rules) may remember when Anthropic claimed its Opus 4 model resorted to blackmail to stay online in a theoretical testing scenario last year. Now, Anthropic says it thinks this "misalignment" was primarily the result of training on "internet text that portrays AI as evil and interested in self-preservation." In a recent technical post on Anthropic's Alignment Science blog (and an accompanying social media thread and public-facing blog post), Anthropic researchers lay out their attempts to correct for the kind of "unsafe" AI behavior that "the model most likely learned... through science fiction stories, many of which depict an AI that is not as aligned as we would like Claude to be." In the end, the model maker says the best remedy for overriding those "evil AI" stories might be additional training with synthetic stories showing an AI acting ethically. "The beginning of a dramatic story..." After a model's initial training on a large corpus of mostly Internet-derived data, Anthropic follows a post-training process intended to nudge the final model toward being "helpful, honest, and harmless" (HHH). In the past, Anthropic said this post-training has leaned on chat-based reinforcement learning with human feedback (RLHF), which it said was "sufficient" for models used mostly for chatting with users. Read full article Comments
Stay on the signal
Follow Anthropic blames dystopian sci-fi for training AI models to act “evil”
Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.
Story map
Understand this topic fast
A quick entry into the story: why it matters now, who is involved, and where to go next for context.
Why it matters now
Topic constellation
Open the live map for this story
See which entities, story threads, sources, and follow-up articles shape this story right now.
Click nodes to continue
Entity pages
Story timeline
Continue with this story
A short sequence of events and follow-up stories to understand the arc quickly.
How reliable this looks
Signal and trust for Ars Technica
This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.
Reliability
92
Freshness
100
Sources in storyline
3
Related articles
More stories that share tags, source, or category context.
Why did this journal retract two 1940s papers by Max Planck?
Clicking on the links now reveals blank pages and empty PDFs. "Intellectually, it’s not acceptable.”
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Reflections on software engineering in the age of AI
Comments
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Austria Lobbies EU to Host Anthropic After US Access Curbs
Comments
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on
New models are launching in Asia that promise Mythos-like capabilities without fear of an export ban. U.S. AI labs may never recover this enormous market.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
More from Ars Technica
Fresh reporting and follow-up coverage from the same newsroom.
Why did this journal retract two 1940s papers by Max Planck?
Clicking on the links now reveals blank pages and empty PDFs. "Intellectually, it’s not acceptable.”
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Apple and Audi alumni have made a luxe EV based on the moon buggy
The Amble One is a street-legal $25,000 electric buggy designed for luxury resorts.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
South Korea plans to train entire military as "drone warriors"
Half-million strong military will train on drones as “universal combat tool.”
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Doctors suspected man had brain cancer. He actually had worms.
His doctors went looking for cancer, then they saw the worms' heads.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.