METR's AI Coding RCT

Hacker News - AI

Jul 19, 2025 12:13

nsoonhui

1 views

hackernewsaidiscussion

Summary

METR has conducted a randomized controlled trial (RCT) to rigorously evaluate the coding abilities of AI models. The study aims to provide more reliable, standardized benchmarks for AI coding performance, addressing concerns about inconsistent or inflated claims from model developers. This approach could set a new standard for transparency and accountability in AI evaluation.

Article URL: https://thezvi.substack.com/p/on-metrs-ai-coding-rct Comments URL: https://news.ycombinator.com/item?id=44614874 Points: 1 # Comments: 0

Read Full Article More News

Windsurf CEO opens up about ‘very bleak’ mood before Cognition deal

AI News - TechCrunchJul 19

Windsurf CEO Jeff Wang revealed on X that the period leading up to the company’s acquisition by Cognition was marked by significant uncertainty and disappointment, especially after failed talks with OpenAI and Google DeepMind hiring away their team. The acquisition highlights the intense competition among major AI players to secure top talent and innovative startups, underscoring the volatility and high stakes in the AI industry.

U.S. states are best positioned with electricity to power AI data center boom

Hacker News - AIJul 19

A new analysis highlights that certain U.S. states are best positioned to support the rapid growth of AI data centers due to their robust and reliable electricity grids. This advantage is crucial as AI development drives soaring demand for power-intensive data infrastructure, making energy access a key factor in the future expansion of the AI industry.

Software 3.0 Is Coming. Long Live the AI Manager