Tag: #benchmarks

Articles related to benchmarks

RAND paper warns AI-biology tests need clearer design, scoring and reporting for policy use

RAND paper warns tests of AI agents' biological capabilities are sensitive to design and need clearer standards for scoring, reporting and policy use.

#ai, #biosecurity, #rand, #benchmarks

Preprint Claims Perfect Scores by Multi‑Agent AI on Multiple Math Competitions; Results Unverified

A May 19 arXiv preprint says STAR‑PólyaMath scored perfect on AIME and Putnam and topped several benchmarks, but results remain unverified.

#ai, #arxiv, #math, #benchmarks