(2 of 5)
And while it's true that researchers can and have sequenced individual genes, they had to use a process that was expensive and terribly laborious--like writing your own reference book before you can start any real experiments. Having the sequences laid out in advance gives the scientific world a big head start.
Those sequences are so useful, in fact, that researchers started tapping into the data long before they were complete. Scientists at drug firms, biotech companies and university labs have taken literally hundreds of baby steps into the era of genomic medicine using an impressive array of powerful new tools: DNA chips and microarrays that let scientists see at a glance which of thousands of genes are active in a given tissue sample; sophisticated software that can organize gigabytes of genetic data; huge databases of genes, disease-tissue samples and mRNA--the molecules that initiate the actual construction of working proteins. "The announcement of finishing the genome is to us a mini-event," says Allen Roses, worldwide director of genetics for Glaxo Wellcome and a prominent Alzheimer's researcher. "We've been making use of the information as it has become available, and we've already done some proof of the concept that finding genes for disease and developing the right drug for the right patient will actually work."
One scientist whose work has been transformed by genomics is Dr. David Altshuler, an endocrinologist at Massachusetts General Hospital who does research at M.I.T.'s Whitehead Institute. A diabetes expert, he wanted to learn more about a gene known to be involved in adult-onset (Type II) diabetes and obesity. He knew that the gene was about 100,000 chemical letters--or base pairs--long, and that only about 2,000 of those directed the production of a protein.
Hidden somewhere in the remaining 98,000 base pairs are instructions that govern how much protein gets churned out--an essential clue for developing eventual treatments for diabetics. But before the public project's data began going up on GenBank, finding the hidden code would have been a daunting task. "To isolate the DNA and do all the sequencing would have taken a highly trained Ph.D. a year or two," says Altshuler, "an ungodly, unacceptable amount of work."
This spring Altshuler simply went to the public database, fed in the 2,000 base pairs he already knew about and asked the computer: Is the rest of the gene sequenced? "For four months," he says, "we went back every week, and the answer was 'nope, nope, nope.' Then one week, all of a sudden, there it was," he says, "all 100,000 base pairs in a row--a year, two years' work handed to us, all before lunch."
His next step was to look at the same gene in the mouse, taking advantage of the fact that the noncoding portions of the genome in man and mouse are 75% similar. Three weeks after pulling the gene's human sequence off GenBank, Altshuler lined up the mouse and man genes side by side and spotted five regions that were active in both. Now he's going to focus on these five regions as possible targets for drug design, figuring this is where the regulatory action is likely to be.