Estimating worst case frontier risks of open weight LLMs
Summary
The paper examines the potential worst-case risks of releasing open-weight large language models (LLMs) like gpt-oss by introducing "malicious fine-tuning" (MFT), a method to maximize model capabilities in sensitive areas such as biology and cybersecurity. The findings highlight the heightened risks associated with open access to powerful LLMs, emphasizing the need for careful consideration of their release and potential misuse in high-stakes domains.