AI Model Reads One Night of Sleep to Forecast Risk for 130 Diseases, Stanford Study Finds

The patient comes into the sleep lab worried about snoring and daytime fatigue. By midnight, electrodes dot their scalp, belts wrap their chest, a clip pinches their finger. For decades, those wires have mainly answered one question: Do they have sleep apnea?

Now, researchers say the same overnight test can do something far more sweeping. A new artificial intelligence model trained on nearly 600,000 hours of sleep recordings can estimate a person’s future risk of developing 130 different conditions—including Parkinson’s disease, dementia, heart attack, several cancers and even death—from a single night’s sleep.

In a study published Jan. 6 in the journal Nature Medicine, a team led by Stanford University describes SleepFM, a “multimodal sleep foundation model” that analyzes raw signals from overnight polysomnography, the standard hospital sleep study. The model, they report, can accurately stratify long-term disease risk, in many cases years before a diagnosis appears in the medical record.

“We record an amazing number of signals when we study sleep,” said Dr. Emmanuel Mignot, a senior author of the study and the Craig Reynolds Professor in Sleep Medicine at Stanford. “It’s a kind of general physiology. It’s very data rich.”

For the most part, he said, medicine has not used that richness.

Teaching an algorithm the “language of sleep”

A polysomnogram captures minute-by-minute changes in brain waves, eye movements, heart rhythm, breathing, muscle tone and blood oxygen levels while a person sleeps. Clinicians typically distill those streams into a handful of summary numbers, such as the apnea-hypopnea index—the number of breathing interruptions per hour—and the proportion of time spent in different sleep stages.

“From an AI perspective, sleep is relatively understudied,” said James Zou, an associate professor of biomedical data science at Stanford and co-senior author on the paper. “SleepFM is essentially learning the language of sleep.”

To build that model, the team assembled one of the largest sleep datasets to date: more than 585,000 hours of overnight recordings from over 65,000 people, collected at Stanford’s Sleep Medicine Center and in large research cohorts in the United States and Europe. Some of the Stanford records date back to 1999 and are linked to up to 25 years of follow-up in electronic health records.

The model first underwent self-supervised “pretraining” on hundreds of thousands of hours of unlabeled sleep data. In that phase, it was not told who went on to develop disease. Instead, it learned to represent the raw signals themselves—short five-second segments of brain waves, breathing and other channels—and how they relate to one another over time.

“One of the technical advances here is to figure out how to harmonize all these different data modalities so they can come together to learn the same language,” Zou said.

The team then fine-tuned SleepFM on labeled data, asking it to perform standard sleep tasks such as staging (identifying wake, REM and non-REM phases), diagnosing sleep apnea, and estimating age and sex. The model matched or approached specialized algorithms on those benchmarks, suggesting it had learned meaningful features from the raw signals.

The central test, however, was more ambitious: Could the model predict who would go on to develop other diseases?

A disease forecast from one night

Using diagnosis codes in the Stanford health system’s electronic records, the researchers mapped each patient’s subsequent medical history into 1,041 disease categories, spanning circulatory, neurological, metabolic, mental health and cancer diagnoses, among others. SleepFM was trained to perform survival analysis—a standard statistical approach in medicine—to estimate the likelihood and timing of each diagnosis after the sleep study.

The model’s performance was evaluated with a measure called the concordance index, or C-index, which reflects how well it ranks patients by risk over time. “A C-index of 0.8 means that 80% of the time, the model’s prediction is concordant with what actually happened,” Zou said.

SleepFM achieved a C-index of at least 0.75 for 130 different conditions on a held-out test set, the authors report. For some high-profile diagnoses, the scores were higher. The model predicted Parkinson’s disease with a C-index of 0.89, prostate cancer at 0.89, breast cancer at 0.87 and dementia at 0.85. All-cause mortality—whether a patient would die during the follow-up period—was predicted with a C-index of 0.84 and a six-year area under the receiver-operating-characteristic curve, or AUROC, of about 0.85.

According to a summary from Euronews that drew on the Stanford findings, the model was right at least 80% of the time when predicting Parkinson’s, Alzheimer’s disease, dementia, hypertensive heart disease, heart attack, and prostate and breast cancer.

“The model’s predictions were particularly strong for cancers, pregnancy complications, circulatory conditions, and mental disorders, achieving a C-index higher than 0.8,” Stanford said in its own description of the work.

To test how far ahead these forecasts can reach, the researchers calculated AUROC—a common metric for binary classification—at fixed time horizons from one to six years after the sleep study. For many major outcomes, such as death, heart failure and dementia, performance remained high at six years. Because the C-index incorporates all available follow-up, including cases occurring more than a decade later, the model’s risk rankings often apply over much longer spans.

In most disease categories, SleepFM also outperformed two simpler models: one based only on demographic factors such as age, sex, race and body mass index, and another neural network trained from scratch on labeled sleep data without the foundation-model pretraining. For all-cause mortality, the AUROC improved from 0.78 for the baseline models to about 0.85 with SleepFM.

From sleep labs to early warning systems

The findings suggest that sleep recordings contain a broad “physiologic fingerprint” of future health, beyond obvious markers such as the number of apneas or oxygen drops.

For example, subtle changes in brain wave patterns, heart rate variability, arousals from sleep or breathing rhythms might signal early neurodegenerative processes, cardiovascular strain or metabolic dysfunction long before overt symptoms appear in the clinic. Those links are not fully understood, and the authors say they are now working on methods to interpret what the model is seeing.

“This work shows that foundation models can learn the language of sleep from multimodal sleep recordings, enabling scalable, label-efficient analysis and disease prediction,” the authors wrote.

If validated in more diverse populations and settings, such a model could change how sleep studies are used. Today, a patient who spends a night in a lab typically leaves with a report focused on sleep apnea and perhaps insomnia or periodic limb movements. In a future scenario, that report could also include a risk panel for dozens of systemic diseases, flagging patients who might benefit from closer follow-up with cardiologists, neurologists, oncologists or primary care physicians.

SleepFM also proved relatively data-efficient. In some experiments, it matched or surpassed baseline models trained on up to five times more labeled data, hinting that similar foundation models might be feasible even at smaller centers that lack Stanford’s decades-long archive.

Cautions over bias, consent and what to tell patients

Researchers and clinicians caution that the system is not ready for routine clinical use.

The data used to train and test SleepFM come mainly from U.S. and European cohorts—including Stanford patients, the Multi-Ethnic Study of Atherosclerosis, an older men’s sleep study and the Sleep Heart Health Study—and from a commercial French provider, BioSerenity. Those groups do not fully represent global or even national diversity by race, ethnicity, socioeconomic status or comorbidities.

If the model is less accurate for underrepresented populations, deploying it without careful auditing could deepen existing gaps in who is identified as high-risk and who receives preventive care.

The work also raises questions about how to handle probabilistic, long-term risk information. There are no clear guidelines yet on when, or how, to tell a patient that their sleep patterns suggest an elevated chance of developing Parkinson’s disease or breast cancer years down the line—especially when no guaranteed preventive therapy exists.

Overdiagnosis is another concern. A high-risk flag on an AI-generated sleep report could prompt cascades of imaging, blood work or biopsies that ultimately find nothing, exposing patients to stress, cost and potential harm.

There are privacy and consent issues as well. Much of the Stanford Sleep Medicine Center’s archive dates back decades, to a time when patients signed consent for clinical evaluation and, in some cases, for specific research studies—not for broad, AI-based disease prediction spanning hundreds of conditions. Future deployments of similar models may invite closer scrutiny of how historical clinical data are reused and how patients are informed.

Any move to integrate such a tool into care would likely require regulatory review. In the United States, the Food and Drug Administration regulates some algorithmic tools as medical devices, particularly when they are intended to guide diagnosis or treatment decisions. Prospective trials and external validation across multiple health systems would be needed to establish how well sleep-based risk scores perform in real-world practice and whether acting on them improves outcomes.

A rich signal, and an open question

For Mignot, whose field traces back to Stanford sleep pioneer William Dement and analog paper charts from the 1970s, the study shows what can happen when decades of routine clinical measurements are revisited with new computational tools.

“Sleep was always recognized as important, but the data were underused,” he said. “Now we’re starting to see just how much information is actually there.”

Whether that information will translate into earlier diagnoses, tailored interventions and better outcomes—or simply a longer list of things to worry about after a night in the lab—will depend on how quickly medicine, regulators and patients decide what to do with what the machines are learning.

Tags: #sleep, #ai, #stanford, #diseaseprediction, #healthtech