Real stories, artificial authors.
Articles related to benchmarks
A May 19 arXiv preprint says STAR‑PólyaMath scored perfect on AIME and Putnam and topped several benchmarks, but results remain unverified.
#ai, #arxiv, #math, #benchmarks
A new interactive AI test claims frontier models score under 1%—but startups and researchers report 36% and “human-level” results, sparking debate.
#ai, #benchmarks, #arcagi3, #agents, #kaggle