It was supposed to be like putting a man on the moon. Sequencing the entire human genome--spelling out the 3.1 billion chemical "letters" that make up human DNA--would be, scientists said, as challenging and rewarding as the Apollo mission that deposited Neil Armstrong on the lunar surface. But the comparison was never exact, and as the genome project approaches completion, it is becoming increasingly clear just how bad the analogy really is. Landing a human on our nearest cosmic neighbor was a straightforward achievement with no need for caveats or footnotes. As of July 20, 1969, nobody had set foot on another world. The next day, Armstrong had. Simple as that.
By contrast, when scientists from Craig Venter's Celera Genomics and the Human Genome Project announce that they're finished sequencing the genome--which they are scheduled to do this week--the milestone will be a lot murkier. That's because they're not really finished. What the scientists at Celera have done is sequence about 97% of the genome, and the remaining 150 million or so letters won't be deciphered anytime soon. The HGP is even further behind; unlike Celera, it hasn't put its strings of letters into proper order yet. This loose end should be cleared up in a year or two, but even then the so-called book of life will remain unreadable. That's because, explains Gerald Rubin, vice president for biomedical research at the Howard Hughes Medical Institute, "it's written in a foreign language. It's a very complicated problem. It's going to be a long time coming."
Molecular biologists still know so little about the human genome, in fact, that even with some 85% of the sequence published on the HGP's GenBank website for every scientist in the world to see, nobody has even a ballpark figure for how many genes humans have. Before this week, the betting ranged from as few as 28,000 to as many as 140,000. Now it looks more like 50,000.
Beyond that, knowing the code for a gene doesn't mean you know what protein it produces in the body, or what that protein does, or how it interacts with other proteins--vital information if you want to know how the genetic code locked in our cells ends up constructing and maintaining a fully functioning human being.
Given this seemingly overwhelming ignorance, why is everyone making such a fuss? Because laying out the biochemical code for all our genes, however many there turn out to be, and locating them within the 23 chromosomes in the human genome may turn out to be the necessary first step to solving all these mysteries. The hope is that the completed genome will enable scientists to lay bare the genetic triggers for hundreds of diseases--from Alzheimer's to diabetes to heart disease--and to devise exquisitely sensitive diagnostic tests. It will help pharmaceutical companies create drugs tailored to a patient's genetic profile, boosting effectiveness while drastically reducing side effects. It could change our very conception of what a disease is, replacing broad descriptive categories--breast cancer, for example--with precise genetic definitions that make diagnosis sure and treatment swift.