Racing To Map Our DNA

Competition from private labs has forced the Human Genome Project into a frantic rush to finish first

  • Share
  • Read Later

(4 of 5)

But because the original DNA has been torn into so many random bits of genetic gibberish (as opposed to the predictable fragments made by gene-cutting enzymes), scientists need powerful computers to determine where the tiny fragments overlap. This is tough enough when you're sequencing a small part of a chromosome. But now Smith urged Venter to try it out, not merely on a strip of DNA but on an entire genome. He proposed Haemophilus influenzae, a bacterium that causes ear infections and meningitis. Until then, only a few small viruses, whose genomes had tens of thousands of genetic letters, had been entirely decoded. H. flu had 1.8 million.

The audacious proposal was quickly denied federal funding. Venter and Smith pushed ahead anyway--and within a year they had succeeded. The publication of their 1995 paper in Science was a landmark that galvanized researchers. For the first time, the genetic secrets of an entire living organism had been exposed.

Today, four years later, a total of 20 genomes have been fully decoded, 10 of them at TIGR. In December scientists at Washington University in St. Louis, Mo., and at the Sanger Centre passed a new milestone by decoding the first animal genome, that of a tiny roundworm, Caenorhabditis elegans. At 97 million letters, C. elegans' genome is by far the most sophisticated ever sequenced. But if Venter's newly formed Celera (derived from the word celerity, which means swiftness) can pull it off, his proposal to shotgun the entire 3 billion-letter human genome in three years will make the roundworm's DNA look downright puny.

Venter admits that whole-genome shotgunning will leave gaps in the sequence where segments can't be fitted perfectly. But as he points out, traditional sequencing leaves holes as well. Like the government's gaps, his can be filled in later--and fast. "Let's say there are 50,000 holes averaging 83 letters each," he says. "At the rate we plan to clone and sequence DNA, we could close those in a day."

But many scientists believe that Venter won't be able to complete the genome-reassembly process. They liken the job to taking a year's worth of issues of a magazine like this one, chopping the pages into one-line fragments, then trying to put the fragments back together without a single typo. As daunting as that seems, imagine that up to 30% of the text consists of nearly identical strings of words up to 7,000 letters long. Assembling these "repeat sequences," says the genome project's Francis Collins, is "a challenge to anyone who doesn't break it down into bite-size pieces."

Whether or not Venter succeeds in putting his Humpty Dumpty genome back together again, his basic premise, shared by the competition at Genset and Incyte, remains compelling: you don't need the entire genome mapped to high precision to make big advances. Cohen's discoveries of prostate-cancer genes are one example. Similarly, the National Center for Biotechnology Information, part of NIH's National Library of Medicine, is using databases of partial gene sequences to zero in on genes that make aberrant proteins in ailments like Parkinson's disease.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5