Stanford Researchers Turn Protein Fragments Into DNA for Sequencer-Based Proteomics

The machines in modern genomics labs were built to read DNA. Now, a team at Stanford University says it has found a way to feed those same sequencers information about proteins, potentially turning a decades-old genomics infrastructure into a new engine for proteomics.

In a paper published March 18 in Nature Biotechnology, researchers describe a method for “reverse translation” of proteins: converting pieces of proteins into uniquely barcoded DNA and then decoding those barcodes on standard next-generation DNA sequencers. The team reports that the approach can read peptides—short fragments of proteins—one molecule at a time, with single–amino acid resolution.

“We created a technology that can convert protein sequences back into DNA sequences,” said senior author Hyongsok “Tom” Soh, a professor of electrical engineering, bioengineering and radiology at Stanford. “It’s kind of like running the natural process—in reverse—so that we can leverage powerful DNA sequencing technology that is already available.”

The work, led by Stanford radiology researcher Liwei Zheng, aims to tackle one of biology’s central measurement problems. DNA and RNA can be read cheaply and at massive scale, but proteins—the molecules that actually do most of the work in cells—have remained largely dependent on mass spectrometry, a sensitive but complex technology that struggles to reach true single-cell depth.

By effectively turning protein information into DNA, the Stanford group is betting that the next phase of proteomics can ride on the back of the genomics revolution.

Turning peptides into DNA

In the new method, proteins from a sample are first cut into peptides and immobilized on a surface. Each peptide is chemically linked to a piece of DNA that acts as a unique ID tag—a barcode that says, in effect, “this DNA belongs to that peptide.”

The researchers then adapt a classic chemistry called Edman degradation, which has been used since the mid-20th century to determine protein sequences by cleaving off one amino acid at a time from the end of a peptide. Instead of discarding those amino acids or analyzing them in a mass spectrometer, the Stanford team captures each one and uses it to trigger the creation of a new DNA fragment.

Each cycle of chemistry removes a single amino acid from the peptide’s exposed end. The freed amino acid is captured on a carrier molecule that still holds the original peptide’s DNA barcode. Panels of antibodies that recognize specific amino acids or certain chemical modifications then bind to the captured residue.

The group uses a proximity extension assay, a technique in which pairs of antibodies carry short DNA strands that are joined only when the antibodies bind in close proximity. When the right antibodies attach to the freed amino acid, their DNA tags are enzymatically extended and fused into an amplifiable DNA reporter.

In that reporter, different parts of the sequence encode three critical pieces of information: which peptide the amino acid came from, which step in the degradation cycle it was removed in, and which type of amino acid or modification it appears to be.

After multiple cycles, the DNA reporters from potentially millions of individual peptide molecules are pooled into a single library and fed into a standard next-generation sequencing machine. Custom software uses the barcodes to digitally reconstruct each peptide molecule, reading off its amino acids in order.

The process does not recreate the original gene sequence that encoded the protein. Instead, it builds a synthetic DNA record that describes the protein’s composition and order.

“Our approach is to treat protein sequencing as a DNA sequencing problem,” Zheng said in an interview released by Stanford. “We’re not trying to measure physical properties like mass or current. We’re converting what we care about—amino acid identity and position—into a DNA code that sequencers can read very efficiently.”

Competing with mass spectrometry and new proteomics platforms

For decades, mass spectrometry has been the main tool for measuring proteins at scale. In modern workflows, proteins from a cell or tissue are digested into peptides, ionized and sent through instruments that separate them by mass and charge. Advanced methods can identify and quantify thousands of proteins in a sample.

But even with those advances, mass spectrometry sees only a fraction of the molecules that enter the instrument. Zheng said that in a typical experiment, researchers might spray on the order of 1 billion to 10 billion peptide molecules into a mass spectrometer but reliably detect only around 1 million of them.

“With mass spec you see maybe a million molecules; with our method, potentially 1,000 times that,” Zheng said. That claim has not yet been independently verified, but it underscores the team’s central argument: by tagging and amplifying DNA rather than measuring physical signals from every molecule, reverse translation could in principle capture information from far more individual peptides.

The Stanford approach arrives amid intense interest in so-called single-molecule proteomics—technologies that seek to read proteins one at a time rather than in bulk. Several companies and academic groups are pursuing distinct paths.

Quantum-Si, a Connecticut-based firm, markets a semiconductor chip that observes fluorescent binding events of recognizer molecules on single peptides. Nautilus Biotechnology in Seattle is developing an array platform that anchors intact proteins at billions of positions on a chip and repeatedly probes them with panels of affinity reagents. Austin, Texas-based Erisyon uses “fluorosequencing,” labeling certain amino acids with fluorescent dyes and imaging them through iterative cycles. Other researchers are working on nanopore-based protein sequencing, monitoring electrical currents as proteins thread through tiny pores.

Those approaches largely depend on specialized hardware and detection schemes. By contrast, the Stanford method shifts complexity into the front-end chemistry and then hands off the actual measurement to existing DNA sequencers.

“We wanted to build on top of an ecosystem that’s already very mature,” Soh said. “Sequencers are everywhere. If we can turn proteins into DNA, we can use that infrastructure instead of inventing a whole new class of instruments.”

Promise and open questions

In their Nature Biotechnology study, the authors demonstrate what they call “true single-molecule peptide sequencing” on defined model peptides, showing complete sequence coverage in millions of reads and the ability to distinguish certain post-translational modifications—chemical changes to proteins that can alter their function.

They also release sequencing data and analysis code, a move that will allow other groups to scrutinize the method’s performance.

At the same time, the work remains at an early stage. The experiments described focus on controlled peptide mixtures rather than the complex protein digests found in real biological samples. The team has not yet published whole-proteome measurements from cells, tissues or patient samples.

The method also depends on panels of antibodies to identify amino acids and some modifications, a potential bottleneck. Antibodies can be expensive to develop and purchase, may vary in performance from lot to lot, and do not exist for every possible modification or subtle sequence difference.

Edman degradation itself, while well-known, is sensitive to certain chemical groups and can fail on some protein termini. Incomplete cleavage, missed cycles and stochastic antibody binding introduce error modes that will need to be measured and controlled in larger-scale studies.

“I think it’s a very exciting proof of concept,” said a proteomics researcher at a separate institution who was not involved with the work and asked not to be named because they collaborate with multiple companies in the field. “The big questions are how robust the chemistry will be on complex samples, and how the cost and throughput will compare with mass spectrometry and other single-molecule platforms once you factor in all the reagents and steps.”

Toward single-cell proteomics—and a new market

If those challenges are addressed, the implications could be considerable.

Soh and his colleagues emphasize single-cell and small-sample applications. Because each peptide molecule can, in principle, yield a DNA record that can be amplified and counted, the method could eventually help quantify proteins in individual cells, rare cell populations or tiny biopsies.

That level of sensitivity is of particular interest in cancer and immunology, where small differences in protein expression or modification between cells can determine whether a therapy works.

“Understanding why a therapy works in some patients but not others may come down to protein states in specific cells,” Soh said. “Being able to look at proteins one molecule at a time in those cells could be very powerful.”

The technology is also drawing commercial interest. Zheng, Soh and co-author Yujia Sun are listed as inventors on a pending international patent application related to the work. Stanford officials say the method has already been licensed, and that the goal is to develop an instrument that would allow users to “put in a sample, press a button, and have it go,” in Soh’s words.

That vision echoes the trajectory of DNA sequencing over the last two decades, as cumbersome early machines gave way to automated benchtop instruments in hospitals and research centers.

Whether protein sequencing will follow a similar path—and whether reverse translation will be the winning approach—remains uncertain. Independent laboratories will need to test the method, compare it against established and emerging technologies, and determine where it offers unique advantages.

What is clear is that protein sequencing is moving closer to the kind of standardized, digital readout that transformed genomics. For more than half a century, biology textbooks have summarized the flow of genetic information with a simple diagram: DNA makes RNA, and RNA makes protein. At least for measurement purposes, the new work from Stanford suggests that the arrow between DNA and protein may no longer point in just one direction.

Tags: #proteomics, #dnasequencing, #stanford, #biotechnology, #singlecell