METR's AI Coding RCT
Summary
METR has conducted a randomized controlled trial (RCT) to rigorously evaluate the coding abilities of AI models. The study aims to provide more reliable, standardized benchmarks for AI coding performance, addressing concerns about inconsistent or inflated claims from model developers. This approach could set a new standard for transparency and accountability in AI evaluation.