Show HN: I Made a Hot or Not Benchmark for AI Design

Hacker News - AI
Jul 5, 2025 16:08
grxxxce
1 views
hackernewsaidiscussion

Summary

A team created a "Hot or Not" style benchmark game to evaluate and rank AI-generated frontend designs, revealing significant variability in quality across models and categories. Their findings highlight that while some models like DeepSeek and Grok excel in certain areas, others such as OpenAI's models perform inconsistently, especially outside game development. This crowdsourced approach provides valuable insights into the strengths and weaknesses of current AI design capabilities, underlining both impressive progress and ongoing limitations in the field.

We noticed most AI-generated frontend looks and feels vibe-coded, but couldn’t put our finger on why. So, we built a voting game to figure out the best ranking internally. It was surprisingly fun (and useful) so we refined it and wanted to share it here! State-of-the-art models go head-to-head in design across websites, game dev, 3d models, more — the things that are generated are at times very impressive, and at times make AGI feel far, far away. We were especially impressed with the quality of DeepSeek and Grok, and variance between categories (OpenAI is very good for game dev, but seems to suck everywhere else). Leaderboard: https://www.designarena.ai/leaderboard Voting: https://www.designarena.ai/vote Give us your thoughts (and if you make something cool, we want to see it :)! Comments URL: https://news.ycombinator.com/item?id=44473673 Points: 5 # Comments: 1