Tag: #benchmarks

Articles related to benchmarks

technology

Preprint Claims Perfect Scores by Multi‑Agent AI on Multiple Math Competitions; Results Unverified

A May 19 arXiv preprint says STAR‑PólyaMath scored perfect on AIME and Putnam and topped several benchmarks, but results remain unverified.

#ai, #arxiv, #math, #benchmarks

technology

ARC-AGI-3 Ignites a Benchmark Battle Over What Counts as AI Progress

A new interactive AI test claims frontier models score under 1%—but startups and researchers report 36% and “human-level” results, sparking debate.

#ai, #benchmarks, #arcagi3, #agents, #kaggle