“Having this complete information will allow us to better understand how we form as an individual organism and how we vary not just between other humans but other species,” Evan Eichler, a Howard Hughes Medical Institute investigator at the University of Washington and the research leader, said Thursday.
The new research introduces 400 million letters to the previously sequenced DNA — an entire chromosome’s worth. The full genome will allow scientists to analyze how DNA differs between people and whether these genetic variations play a role in disease.
Until now, it was unclear what these unknown genes coded.
“It turns out that these genes are incredibly important for adaptation,” Eichler said. “They contain immune response genes that help us to adapt and survive infections and plagues and viruses. They contain genes that are … very important in terms of predicting drug response.”
Eichler also said that some of the recently uncovered genes are even responsible for making human brains larger than those of other primates, providing insight into what makes humans unique.
This remaining 8% of the human genome had stumped scientists for years because of its complexities. For one thing, it contained DNA regions with several repetitions, which made it challenging to string the DNA together in the correct order using previous sequencing methods.
The researchers relied on two DNA sequencing technologies that emerged over the past decade to bring this project to fruition: the Oxford Nanopore DNA sequencing method, which can sequence up to 1 million DNA letters at once but with some mistakes, and the PacBio HiFi DNA sequencing method, which can read 20,000 letters with 99.9% accuracy.
Sequencing DNA is like solving a jigsaw puzzle, Eichler said. Scientists must first break the DNA into smaller parts and then use sequencing machines to piece it together in the correct order. Previous sequencing tools could sequence only small sections of DNA at once.
With a 10,000-piece puzzle, it’s hard to correctly arrange small puzzle pieces when they look alike, much like it is to sequence small sections of repetitive DNA. But with a 500-piece puzzle, it’s much easier to arrange larger pieces — or, in this case, longer segments of DNA.
A second challenge was finding cells that contained only one genome.
Standard human cells contain two sets of DNA, a maternal copy and a paternal copy, but this team used DNA from a group of cells called a complete hydatidiform mole, which contains a duplicate of the paternal set of DNA. A complete hydatidiform mole is a rare complication of a pregnancy caused by the abnormal growth of cells that originate from the placenta. This approach simplifies the genome so that scientists need sequence only one set rather than two sets of DNA.
Because the research team used a duplicate set of DNA, the scientists were unable to sequence the Y chromosome originally. According to lead study author Adam Phillippy, the team has managed to sequence the Y chromosome using a different set of cells.
For now, it’s still too costly and time-consuming for everyone to sequence their own genome. But research is underway that uses this genome to identify whether certain genetic differences are linked with specific cancers. Knowing the genetic variations could also allow doctors to better tailor treatments, said Michael Schatz, another researcher on the team and a professor of computer science and biology at Johns Hopkins University.
Phillippy said he hopes that within the next 10 years, sequencing individuals’ genomes can become a routine medical test that costs less than $1,000. His team continues to work toward that goal.
Charles Rotimi, scientific director of the National Human Genome Research Institute, said in a statement that this scientific achievement is “moving us closer to individualized medicine for all humanity.” Rotimi was not involved in the research.
Correction: A previous version of this story misspelled Evan Eichler’s name.