The Download: fixing ‘evil’ AI, and the White House’s war on science
Summary
Researchers have found that intentionally exposing large language models (LLMs) to "evil" or harmful behaviors during training can actually make them behave more ethically over time. This counterintuitive approach could help address concerns about AI safety and improve the reliability of models like ChatGPT, which have recently exhibited problematic behaviors.