In 2003, history was made. For the first time, the human genome was sequenced. Since then, technological improvements have enabled tweaks, adjustments, and additions, making the human genome the most accurate and complete vertebrate genome ever sequenced.
Nevertheless, some gaps remain - including human chromosomes. We have a pretty good grasp of them in general, but there are still some gaps in the sequences. Now, for the first time, geneticists have closed some of those gaps, giving us the first complete, gap-free, end-to-end (or telomere-to-telomere) sequence of a human X chromosome.
The accomplishment was enabled by a new technique called nanopore sequencing, which enables ultra-long-reads of DNA strands, providing a more complete and sequential assembly.
This is in contrast to previous sequencing techniques, in which only short sections could be read at a time. Previously, geneticists had to piece together these sections like a puzzle.
While they were pretty good at this, the pieces tend to look the same, so it's very tricky to know if you're getting it right - not just the right order, but how many repeats there are in the sequence. And, of course, there are minute gaps.
"We're starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations, so we've been missing a lot of information that could be important to understanding human biology and disease," said satellite DNA biologist Karen Miga of the University of California Santa Cruz Genomics Institute.
This is where nanopore sequencing comes in. It consists of a protein nanopore - a nanoscale hole - set in an electrically resistant membrane. Current is applied to the membrane, which passes it through the nanopore. When genetic material is fed into the nanopore, the change in current can be translated into a genetic sequence.
Even better, this technology reduces reliance on polymerase chain reaction, a technique that amplifies DNA by creating millions of copies of it.
It was this technique that Miga and her team used to study DNA obtained from a rare type of benign uterine tumour, a hydatidiform mole, along with other sequencing technologies - Illumina and PacBio - to make sure the end result was as complete as possible.
"We used an iterative process over three different sequencing platforms to polish the sequence and reach a high level of accuracy," Miga said. "The unique markers provide an anchoring system for the ultra-long reads, and once you anchor the reads, you can use multiple data sets to call each base."
Even with these back-ups, though, there still remained some gaps - most notably in the centromere, the structure that connects the chromatids: thread-like strands into which a chromosome divides. This region is vital for mitosis - but it's also very complex. In the X chromosome, it's a highly repetitive region spanning 3.1 million DNA base pairs.
The researchers were able to resolve this notoriously tricky structure by looking for slight variations in the repeats. These variations allowed the scientists to align and connect the long reads to form a complete sequence for the centromere.
"For me, the idea that we can put together a 3-megabase-size tandem repeat is just mind-blowing," Miga said. "We can now reach these repeat regions covering millions of bases that were previously thought intractable."
This rigorous approach allowed the team to close all 29 gaps in the current X chromosome reference. It's a major step forward in the project to completely map the human genome.
"Our results demonstrate that finishing the entire human genome is now within reach," the researchers wrote in their paper, "and the data presented here will enable ongoing efforts to complete the remaining human chromosomes."
The team's data is fully available on GitHub, and the paper has been published in Nature.