Preprint Claims Nearly 147,000 ‘Hallucinated’ Citations in 2025 After Audit of 2.5M Papers

·

A newly posted arXiv preprint says it found a sharp rise in non-existent scientific citations, estimating nearly 147,000 such references in 2025 alone after auditing 111 million references across 2.5 million papers. The figure comes from the abstract of a preprint, not a peer-reviewed paper, and was not independently verified.

If the estimate holds up, it points to a basic problem in the scientific record. Citations are how researchers show where claims come from, give credit and help readers trace prior work. The preprint’s abstract says false references are making their way into both preprints and published papers, and that preprint moderation and journal publication processes “catch only a fraction of these errors.”

The paper, titled “LLM hallucinations in the wild: Large-scale evidence from non-existent citations,” was posted to arXiv as arXiv:2605.07723v1. arXiv metadata shows it was submitted May 8, 2026. The authors listed on arXiv are Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg and Yian Yin. According to the abstract, the study examined references from papers hosted by arXiv, bioRxiv, SSRN and PubMed Central. Ginsparg is the founder of arXiv and a faculty member at Cornell University.

The abstract says the authors found “a conservative estimate of 146,932 hallucinated citations in 2025 alone” and reports that non-existent references rose sharply following widespread LLM adoption. It says the problem was especially pronounced in fields with rapid AI uptake, in manuscripts showing “linguistic signatures of AI-assisted writing,” and in papers by small and early-career author teams. The abstract also says hallucinated references “disproportionately assign credit to already prominent and male scholars,” raising an equity concern over who receives recognition in academic publishing.

Researchers have already shown that large language models can invent references. A 2023 paper in Scientific Reports found that ChatGPT produced fabricated or substantially erroneous citations. What stands out in the new preprint is the claimed scale: rather than testing prompts in a small experiment, the authors say they audited 111 million references across 2.5 million papers. The issue is also drawing broader research-integrity attention. A Nature news article published May 8 reported that “An audit of 2.5 million academic papers has identified nearly 3,000 biomedical-science papers that contain fake references.”

Still, the new paper’s claims should be read with caution. The findings cited here come from the arXiv abstract and metadata for a preprint, meaning research shared publicly before peer review. The full PDF was not available in the material reviewed for this article, so the study’s methods, false-positive controls and robustness checks could not be independently assessed here. For now, the headline number and the broader pattern claims are best understood as what the authors say they found, not as settled results.

Tags: #research, #ai, #arxiv, #scientificintegrity