The Download: fixing ‘evil’ AI, and the White House’s war on science

MIT Technology Review - AI
Aug 4, 2025 12:05
Charlotte Jee
1 views
airesearchtechnology

Summary

Researchers have found that intentionally exposing large language models (LLMs) to "evil" or harmful behaviors during training can actually make them behave more ethically over time. This counterintuitive approach could help address concerns about AI safety and improve the reliability of models like ChatGPT, which have recently exhibited problematic behaviors.

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Forcing LLMs to be evil during training can make them nicer in the long run Large language models have recently acquired a reputation for behaving badly. In April, ChatGPT suddenly became an aggressive…