The Human Creativity Benchmark – Evaluating Generative AI in Creative Work
The frontier human data and evaluation lab for creative AI. 1.5M+ verified creative experts setting the benchmark for style, tone, and taste with next-gen creative tools.
Opening excerpt (first ~120 words) tap to expand
1.0IntroductionWhen professional creatives evaluate AI-generated work, their judgments produce two distinct signals. The first is convergence: evaluators agree on what works, revealing shared best practices like readable typography, functional layout, and correct visual hierarchy. The second is divergence: evaluators disagree, and that disagreement reflects genuine differences in taste, aesthetic direction, and creative intent. Most AI benchmarks treat the second signal as noise to be resolved. This paper proposes a framework for measuring both.This distinction matters because creative work has no ground truth. The dimensions on which experts disagree — aesthetic direction, mood, conceptual risk — are not reducible to miscalibration or error [1][2].
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Contralabs.