Sunday, September 9, 2012

ENCODE: Great science, poor communication

Last week the ENCODE project published 30 papers in three different journals, Nature, Genome Research and Genome Biology. In the summary paper, they claimed that they had found a function for 80% of the genes in the genome. However, in order to make this claim it seems they have had to redefine 'function' to have a meaning that most people wouldn't accept as meaningful.

We know that only about 1 to 1.5% is used to make proteins and the ENCODE project's findings didn't change that figure. A lot of DNA is transcribed into RNA and some of that RNA has a regulatory role, that is it regulates gene function by turning them on and off. ENCODE found that adding the amount regulatory DNA to protein ecoding DNA and you get to a figure of 9%. This is higher than was expected and is an exciting result.

Getting from 9% to 20% was all estimation. The ENCODE project looked at 147 different human cell types, but there are at least 210 and possibly many more distinct human cell types. Based on their incomplete coverage of cell types, ENCODE researchers believe that there is at least another 11% of the genome that is regulatory. But, this remains to be demonstrated.

The final 60% is part DNA that's meant to help package the DNA helix, part that has sites that proteins bind to and part DNA that's transcribed into essentially meaningless RNA (I might have missed some things here). The argument for including this 60% in the estimate of how much of the genome is 'functional' seems to boil down to the idea that is does something and evolution wouldn't let it do something if it wasn't useful. Other than this, there seems little merit in including this 60% as functional.

If my suspicions about the argument are correct, it's adaptationist nonsense. The amount of non-coding DNA, also called 'junk DNA', is variable among species. For instance, the genome of the pufferfish, Takifugu rubripes, is ~365 million base pairs, while genome of the lungfish, Protopterus aethiopicus, is orders of magnitude larger at ~133 billion base pairs. Much of the lungfish genome would be functional under the ENCODE definition, but if it's important, how come the pufferfish can get away with 0.3% of the base pairs*?

The media coverage of the ENCODE publications has focused on the 80% figure, without much discussion of what is meant by 'functional'. This is unfortunate because the definition of 'functional' is critical for evaluating the findings. In my opinion, 80% is a fudge that can only be reached by a weaseling use of language. It's clear to me, from the variation in genome size among species and that we can remove large sections of non-coding DNA with no observable effect, that most of our genome has no important function. The ENCODE project has not shown it to be otherwise.

Other coverage that I thought was good:

T. Ryan Gregory - A slightly different response to today's ENCODE hype

Michael Eisen - A neutral theory of molecular function

Sean Eddy - ENCODE says what?

Brendan Maher - Fighting about ENCODE and junk

John Farell - Reports of junk DNA's demise have been greatly exaggerated


* This is Ryan Gregory's "Onion Test".


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.