(4 of 9)
Reading these genetic words and deciphering their meaning is apparently a snap for the clever machinery of a cell. But for mere scientists it is a formidable and time-consuming task. For instance, a snippet of DNA might read ACGGTAGAT, a message that researchers can decipher rather easily. It codes for a sequence of three of the 20 varieties of amino acids that constitute the building blocks of proteins. But the entire genome of even the simplest organism dwarfs that snippet. The genetic blueprint of the lowly E. coli bacterium, for one, is more than 4.5 million base pairs long. For a microscopic yeast plant, the length is 15 million units. And in a human being, the genetic message is some 3 billion letters long.
Like cartographers mapping the ancient world, scientists over the past three decades have been laboriously charting human DNA. Of the estimated 100,000-odd genes that populate the genome, just 4,550 have been identified. And only 1,500 of those have been roughly located on the various chromosomes. The message of the genes has been equally difficult to come by. Most genes consist of between 10,000 and 150,000 code letters, and only a few genes have been completely deciphered. Long segments of the genome, like the vast uncharted regions of early maps, remain terra incognita.
To complicate matters, between the segments of DNA that represent genes are endless stretches of code letters that seem to spell out only genetic gibberish. Geneticists once thought most of the unintelligible stuff was "junk DNA" -- useless sequences of code letters that accidentally developed during evolution and were not discarded. That concept has changed. "My feeling is there's a lot of very useful information buried in the sequence," says Nobel laureate Paul Berg of Stanford University. "Some of it we will know how to interpret; some we know is going to be gibberish."
In fact, some of the nongene regions on the genome have already been identified as instructions necessary for DNA to replicate itself during cell division. Their message is obviously detailed and complex. Explains George Bell, head of genome studies at Los Alamos National Laboratory: "It's as if you had a rope that was maybe 2 in. in diameter and 32,000 miles long, all neatly arranged inside a structure the size of a superdome. When the appropriate signal comes, you have to unwind the rope, which consists of two strands, and copy each strand so you end up with two new ropes that again have to fold up. The machinery to do that cannot be trivial."
One of the most formidable tasks faced by geneticists is to learn the nature of that machinery and other genetic instructions buried in the lengthy, still undeciphered base sequences. To do so fully requires achievement of the project's most challenging goal: the "sequencing" of the entire human genome. In other words, the identification and listing in order of all the genome's 3 billion base pairs.