Of the roughly three billion base pairs making up the human genome, only around 2 percent encodes proteins, leaving the remaining 98 percent with less obvious functions.
Dismissed by some as useless 'junk DNA', its origins, effects, and potential purpose in the evolution of life has attracted the attention of biologists ever since it was first noticed cluttering up our chromosomes in the 1960s.
Now researchers from Tel Aviv University in Israel have added some vital insights into the reasons why non-coding DNA persists, which could help us better understand the rich variety of genome sizes across the living world.
In 1977, two scientists named Richard Roberts and Phil Sharp independently noticed a good portion of this DNA clutter wasn't just scattered between our genes, but often interrupted them mid-sequence, a discovery that later earned them a Nobel Prize.
Known as introns, they seemed to burden complex cells like ours, while leaving simpler ones – such as those of bacteria – untouched. They also added a lot of labor to the process of translating DNA into something material.
Every time a protein was freshly minted, these interruptions would have to be cut out of the genetic template, requiring the coding instructions to be pieced back together before being interpreted as a protein. An everyday comparison would be having to remove thousands of nonsense words just to read a sentence.
This seemingly wasteful way of operating is necessary throughout nature, with those lucky bacteria and other prokaryotes standing out as exceptions.
The number of introns also happens to differ wildly from species to species; humans have nearly 140,000 introns, rats around 33,000, common fruit flies nearly 38,000, yeast (Saccharomyces cerevisiae) a mere 286, and the unicellular fungus Encephalitozoon cuniculi just 15.
Why hasn't evolution cleaned up this mess through natural selection to make us more efficient organisms?
And why, when genomes have a known natural bias towards deleting instead of adding DNA over time, does 'junk DNA' never seem to get any shorter even after millions of years of evolution?
"Intriguingly, the opposite has supposedly happened, as eukaryotes have larger genomes, longer proteins, and much larger intergenic regions compared to prokaryotes," the scientists behind this latest study into introns write in their recently published report.
The researchers proposed that deleting any intrusive pieces of DNA around coding regions would likely hurt the animal's survival, as coding sections might also be snipped out at the same time.
"Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection," the researchers write.
This "border-induced selection", where a neutral sequence sits between coding regions, would therefore create an insertion bias for short, non-coding DNA sequences.
Essentially, 'junk DNA' acts like a mutational buffer, protecting regions that contain the more sensitive sequences necessary for coding proteins.
The researchers created a mathematical model to show these dynamics in action.
Previously it has been suggested that "deletion bias leads to shrinkage of genomes over evolutionary times," the team explains.
"The counterintuitive result that long neutrally evolving sequences can emerge even under a strong deletion bias is due to the rejection of deletions that invade the highly conserved borders of the neutral sequences."
While their model provides a plausible explanation for the variation in intron lengths within a species, it can't explain why these differ between species.
"One trivial explanation is that the model parameters themselves evolve," they write. "Thus, different species have different insertion-to-deletion rate ratios and, possibly, different propensity for the emergence of conserved regions within introns."
Knowing that there is a bias might help explain the variety of introns that we see in nature, and why some organisms seem more genetically "chaotic" than others.
Just where these interruptions come from in the first place is also an area of ongoing research, with a long history of viruses and outdated genes suggested as sources.
Much it might not even be non-coding after all, tasked with functions we're simply not yet aware of. In recent years, science has increasingly moved away from describing all introns as 'junk DNA' as more possible functions are discovered, including introns being transcribed into strands of RNA that oversee protein production.
What we might think of as junk could, in time, be seen as genetic treasure. It might seem like a complicated way to build an organism, but with several billion years of evolution under its belt, nature seems to know what its doing.
This paper was published in Open Biology.