Open-source system QED claims machine-generated proofs for three open math problems; formal checks pending

Published:

An arXiv preprint and companion GitHub repository claim that an open-source multi-agent AI system called QED generated expert-verified proofs for three open mathematics problems. But the claim, while notable, is so far supported only by the authors’ materials and the mathematicians who supplied the problems, not by independent peer review or formal proof checking in a system such as Lean or Coq.

The preprint, “QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems,” was posted to arXiv as arXiv:2604.24021 on April 27, 2026. Its authors are Chenyang An, Qihao Ye, Minghao Pan and Jiayun Zhang. The team also released code and supporting files on GitHub under proofQED/QED. According to the paper and repository, QED was tested on five open research problems in applied analysis and partial differential equations, or PDEs, supplied by domain experts, and produced correct proofs for three.

Those three claimed successes break down into one Carleman weight function construction problem — a type of problem in inverse problems and Carleman estimates — and two problems from a set of four open analysis and PDE questions supplied by Xiaoqian Xu, an assistant professor at Duke Kunshan University. The repository says QED proved Xu’s problems labeled P3 and P4, while P1 and P2 were rejected by QED’s own verifier. For the Carleman problem, the repository lists the experts who formulated and verified the proof as Qiao Zhuang of the University of Missouri-Kansas City, Qihao Ye of the University of California, San Diego, and Zhongqiang Zhang of Worcester Polytechnic Institute.

QED is described in the paper and repository as a multi-agent proof system that coordinates multiple large language models with a verifier, rather than relying on a single model to produce a proof from start to finish. The repository says the system can call models including Claude, Codex and Gemini. The authors say they designed the system around seven failure modes that make research-level proof generation unreliable, including citation hallucination, unstable proof plans, hand-waving over key steps and the tendency of a single model to become a bottleneck.

The distinction between the project’s claims and a fully machine-checked mathematical result is important. The proofs presented are natural-language proofs, not formal proofs certified in proof assistants such as Lean or Coq, which can check every logical step mechanically. Instead, QED as presented uses a structured verifier plus human review by the contributing experts. In the Carleman case, one of the listed verifiers, Ye, is also a co-author of the paper. Project materials also say experts updated the Carleman problem constraints once during the workflow, so that result should not be described as fully hands-off.

By contrast, the repository says the runs on Xu’s problems were fully automated and that “No human intervention occurred at any point” before the proofs were returned to Xu for verification. The GitHub materials include full machine-generated proof files for those two problems, along with verification code and other run materials. The project materials say the contributing experts judged the successful proofs to be correct, original and nontrivial.

That is what makes the claim stand out. This is not a report about an AI system performing well on a benchmark or contest-style exercise, but about a system that the authors say produced new proofs for open research questions — a substantially higher bar. At the same time, as of May 1, 2026, there was no evidence of independent journal peer review or third-party verification beyond the named contributors and the project’s own documentation.

Xu, in repository documentation, said one of the proofs was surprising because “usually people would expect some harder tools to deal with it, but AI just gives a very straightforward proof.” Even so, the broader claim now rests on a preprint, a public code release and expert checks by the people closest to the project. External validation is still pending.

Tags: #ai, #math, #arxiv, #proofs