The Gene Hunt

  • Know then thyself the glory, jest, and riddle of the world.

    -- Alexander Pope

    In an obscure corner of the National Institutes of Health (NIH), molecular biologist Norton Zinder strode to a 30-ft.-long oval conference table, sat down and rapped his gavel for order. A hush settled over the Human Genome Advisory Committee, an unlikely assemblage of computer experts, biologists, ethicists, industry scientists and engineers. "Today we begin," chairman Zinder declared. "We are initiating an unending study of human biology. Whatever it's going to be, it will be an adventure, a priceless endeavor. And when it's done, someone else will sit down and say, 'It's time to begin.' "

    With these words, spoken in January, Zinder formally launched a monumental effort that could rival in scope both the Manhattan Project, which created the A-bomb, and the Apollo moon-landing program -- and may exceed them in importance. The goal: to map the human genome and spell out for the world the entire message hidden in its chemical code.

    Genome? The word evokes a blank stare from most Americans, whose taxes will largely support the project's estimated $3 billion cost. Explains biochemist Robert Sinsheimer of the University of California at Santa Barbara: "The human genome is the complete set of instructions for making a human being." Those instructions are tucked into the nucleus of each of the human body's 100 trillion cells* and written in the language of deoxyribonucleic acid, the fabled DNA molecule.

    In the 35 years since James Watson and Francis Crick first discerned the complex structure of DNA, scientists have managed to decipher only a tiny fraction of the human genome. But they have high hopes that with new, automated techniques and a huge coordinated effort, the genome project can reach its goal in 15 years.

    The achievement of that goal would launch a new era in medicine. James Wyngaarden, director of the NIH, which will oversee the project, predicts that it will make "major contributions to understanding growth, development and human health, and open new avenues for therapy." Full translation of the genetic message would enable medical researchers to identify the causes of thousands of still mysterious inherited disorders, both physical and behavioral.

    With this insight, scientists could more accurately predict an individual's vulnerability to such obviously genetic diseases as cystic fibrosis and could eventually develop new drugs to treat or even prevent them. The same would be true for more common disorders like heart disease and cancer, which at the very least have large genetic components. Better knowledge of the genome could speed development of gene therapy -- the actual alteration of instructions in the human genome to eliminate genetic defects.

    The NIH and the Food and Drug Administration have already taken a dramatic step toward gene therapy. In January they gave approval to Dr. W. French Anderson and Dr. Steven Rosenberg, both at the NIH, to transplant a bacterial gene into cancer patients. While this gene is intended only to make it easier for doctors to monitor an experimental cancer treatment and will not benefit the patients, its successful implantation should help pave the way for actual gene therapy.

    The very thought of being able to read the entire genetic message, and perhaps alter it, is alarming to those who fear the knowledge could create many moral and ethical problems. Does genetic testing constitute an invasion of privacy, for example, and could it lead to more abortions and to discrimination against the "genetically unfit"? Should someone destined to be stricken with a deadly genetic disease be told about his fate, especially if no cure is yet available? Does it demean humans to have the very essence of their lives reduced to strings of letters in a computer data bank? Should gene therapy be used only for treating disease, or also for "improving" a person's genetic legacy?

    Although scientists share many of these concerns, the concept of deciphering the human genome sends most of them into paroxysms of rapture. "It's the Holy Grail of biology," says Harvard biologist and Nobel laureate Walter Gilbert. "This information will usher in the Golden Age of molecular medicine," says Mark Pearson, Du Pont's director of molecular biology. Predicts George Cahill, a vice president at the Howard Hughes Medical Institute: "It's going to tell us everything. Evolution, disease, everything will be based on what's in that magnificent tape called DNA."

    That kind of enthusiasm is infectious. In an era of budgetary restraint, Washington has been unblinkingly generous toward the genome project, especially since last April, when an array of scientists testified on the subject at a congressional committee hearing. There, Nobel laureate Watson of DNA fame, since picked by the NIH to head the effort, mesmerized listeners with his plea for support: "I see an extraordinary potential for human betterment ahead of us. We can have at our disposal the ultimate tool for understanding ourselves at the molecular level . . . The time to act is now."

    Congress rose to the challenge. It promptly allocated more than $31 million for genome research to the NIH and to the Department of Energy and the National Library of Medicine, which are also involved in the quest. The combined appropriations rose to $53 million for fiscal 1989.

    Even more will be needed when the effort is in full swing, involving hundreds of scientists, dozens of Government, university and private laboratories, and several computer and data centers. With contributions from other Government agencies and private organizations like the Hughes institute, the total annual cost of the project will probably rise to $200 million, which over 15 years will account for the $3 billion price tag.

    The staggering expense and sheer size of the genome project were what bothered scientists most when the idea was first broached in 1985 by Sinsheimer, then chancellor of the University of California at Santa Cruz. "I thought Bob Sinsheimer was crazy," recalls Leroy Hood, a biologist at the California Institute of Technology. "It seemed to me to be a very big science project with marginal value to the science community."

    Nobel laureate David Baltimore, director of M.I.T.'s Whitehead Institute, was one of the many who feared that such a megaproject would have much the same impact on biology that the shuttle had on the U.S. space program: soaking up so much money and talent that smaller but vital projects would dry up. Others stressed that the technology to do the job in a reasonable time was not available. But by 1986 some opponents realized they were fighting a losing battle. "The idea is gaining momentum. I shiver at the thought," said Baltimore then. Now, however, he approves of the way the project has evolved and has thrown his weight behind it.

    What really turned the tide was a February 1988 report by the prestigious ; National Research Council enthusiastically endorsing a project that would first map and interpret important regions of the genome, then -- as better technology became available -- proceed to reading the entire genetic message. Most of the remaining critics were silenced last fall when the NIH chose the respected Watson as project director. Still, some scientists remain wary of the project. Says David Botstein, a vice president at Genentech and a member of the Human Genome Advisory Committee: "We need to test its progress, regulate its growth and slap it down if it becomes a monster. Jim Watson understands the dangers as well as any of us."

    The concern, as well as the cost, reflects the complexity of the human genome and the magnitude of the effort required to understand it. DNA is found in the human-cell nucleus in the form of 46 separate threads, each coiled into a packet called a chromosome. Unraveled and tied together, these threads would form a fragile string more than 5 ft. long but only 50 trillionths of an inch across.

    And what a wondrous string it is. As Watson and Crick discovered in 1953, DNA consists of a double helix, resembling a twisted ladder with sidepieces made of sugar and phosphates and closely spaced connecting rungs. Each rung is called a base pair because it consists of a pair of complementary chemicals called nitrogenous bases, attached end to end, either adenine (A) joined to thymine (T) or cytosine (C) attached to guanine (G).

    Fundamental to the genius of DNA is the fact that A and T are mutually attractive, as are C and G. Consequently, when DNA separates during cell division, coming apart at the middle of each rung like a zipper opening, an exposed T half-rung on one side of the ladder will always attract an A floating freely in the cell. The corresponding A half-rung on the other section of the ladder will attract a floating T, and so on, until two double helixes, each identical to the original DNA molecule, are formed.

    Even more remarkable, each of the four bases represents a letter in the genetic code. The three-letter "words" they spell, reading in sequence along either side of the ladder, are instructions to the cell on how to assemble amino acids into the proteins essential to the structure and life of its host. Each complete DNA "sentence" is a gene, a discrete segment of the DNA string responsible for ordering the production of a specific protein.

    Reading these genetic words and deciphering their meaning is apparently a snap for the clever machinery of a cell. But for mere scientists it is a formidable and time-consuming task. For instance, a snippet of DNA might read ACGGTAGAT, a message that researchers can decipher rather easily. It codes for a sequence of three of the 20 varieties of amino acids that constitute the building blocks of proteins. But the entire genome of even the simplest organism dwarfs that snippet. The genetic blueprint of the lowly E. coli bacterium, for one, is more than 4.5 million base pairs long. For a microscopic yeast plant, the length is 15 million units. And in a human being, the genetic message is some 3 billion letters long.

    Like cartographers mapping the ancient world, scientists over the past three decades have been laboriously charting human DNA. Of the estimated 100,000-odd genes that populate the genome, just 4,550 have been identified. And only 1,500 of those have been roughly located on the various chromosomes. The message of the genes has been equally difficult to come by. Most genes consist of between 10,000 and 150,000 code letters, and only a few genes have been completely deciphered. Long segments of the genome, like the vast uncharted regions of early maps, remain terra incognita.

    To complicate matters, between the segments of DNA that represent genes are endless stretches of code letters that seem to spell out only genetic gibberish. Geneticists once thought most of the unintelligible stuff was "junk DNA" -- useless sequences of code letters that accidentally developed during evolution and were not discarded. That concept has changed. "My feeling is there's a lot of very useful information buried in the sequence," says Nobel laureate Paul Berg of Stanford University. "Some of it we will know how to interpret; some we know is going to be gibberish."

    In fact, some of the nongene regions on the genome have already been identified as instructions necessary for DNA to replicate itself during cell division. Their message is obviously detailed and complex. Explains George Bell, head of genome studies at Los Alamos National Laboratory: "It's as if you had a rope that was maybe 2 in. in diameter and 32,000 miles long, all neatly arranged inside a structure the size of a superdome. When the appropriate signal comes, you have to unwind the rope, which consists of two strands, and copy each strand so you end up with two new ropes that again have to fold up. The machinery to do that cannot be trivial."

    One of the most formidable tasks faced by geneticists is to learn the nature of that machinery and other genetic instructions buried in the lengthy, still undeciphered base sequences. To do so fully requires achievement of the project's most challenging goal: the "sequencing" of the entire human genome. In other words, the identification and listing in order of all the genome's 3 billion base pairs.

    That effort, says Caltech research fellow Richard Wilson, "is analogous to going around and shaking hands with everyone on earth." The resulting string of code letters, according to the 1988 National Research Council report urging adoption of the genome project, would fill a million-page book. Even then, much of the message would be obscure. To decipher it, researchers would need more powerful computer systems to roam the length of the genome, seeking out meaningful patterns and relationships.

    It was from the patterns and relationships of pea plants that a concept of heredity first arose in the mind of Gregor Mendel, an Austrian monk. In 1865, after studying the flower colors and other characteristics of many generations of pea plants, Mendel formulated the laws of heredity and suggested the existence of packets of genetic information, which became known as genes. Soon afterward, chromosomes were observed in the nuclei of dividing cells, and scientists later discovered a chromosomal difference between the sexes. One chromosome, which they named Y, was found in human males' cells, together with another, called X. Females' cells, on the other hand, had two copies of X.

    But it was not until 1911 that a gene, only a theoretical entity at the time, was correctly assigned to a particular chromosome. After studying the pedigrees of several large families with many color-blind members (males are primarily affected), Columbia University scientist E.B. Wilson applied Mendelian logic and proved that the trait was carried on the X chromosome. In the same manner over the next few decades, several genes responsible for such gender-linked diseases as hemophilia were assigned to the X chromosome and a few others attributed to the Y.

    Scientists remained uncertain about the exact number of human chromosomes until 1956, when improved photomicrographs of dividing cells clearly established that there were 46. This revelation led directly to identification of the cause of Down syndrome (a single extra copy of chromosome 21) and other ( disorders that result from distinctly visible errors in the number or shape of certain chromosomes.

    But greater challenges lay ahead. How could a particular gene be assigned to any of the nonsex chromosomes? Scientists cleverly tackled that problem by fusing human cells with mouse cells, then growing hybrid mouse-human cells in the laboratory. As the hybrid cells divided again and again, they gradually shed their human chromosomes until only one -- or simply a fragment of one -- was left in the nucleus of each cell.

    By identifying the kind of human protein each of these hybrid cells produced, the researchers could deduce that the gene responsible for that protein resided in the surviving chromosome. Using this method, they assigned hundreds of genes to specific chromosomes.

    Finding the location of a gene on a chromosome is even more complicated. But over the past several years, scientists have managed to draw rough maps of all the chromosomes. They determine the approximate site of the genes, including many associated with hereditary diseases, by studying patterns of inheritance in families and chopping up their DNA strands for analysis. With this technique, they have tracked down the gene for cystic fibrosis in the midsection of chromosome 7, the gene for a rare form of colon cancer midway along the long arm of chromosome 5, and the one for familial Alzheimer's disease on the long arm of chromosome 21.

    One of the more dramatic hunts for a disease gene was led by Nancy Wexler, a neuropsychologist at Columbia University and president of the Hereditary Disease Foundation. Wexler was highly motivated; her mother died of Huntington's disease, a debilitating and painful disorder that usually strikes adults between the ages of 35 and 45 and is invariably fatal. This meant that Wexler had a 50% chance of inheriting the gene from her mother and contracting the disease.

    In a search coordinated by Wexler's foundation, geneticist James Gusella of Massachusetts General Hospital discovered a particular piece of DNA, called a genetic marker, that seemed to be present in people suffering from Huntington's disease. His evidence suggested that the marker must be near the Huntington's disease gene on the same chromosome, but he needed a larger sample to confirm his findings. This was provided by Wexler, who had previously traveled to Venezuela to chart the family tree of a clan of some 5,000 people, all of them descendants of a woman who died of Huntington's $ disease a century ago. Working with DNA samples from affected family members, Gusella and Wexler in 1983 concluded that they had indeed found a Huntington's marker, which was located near one end of chromosome 4.

    That paved the way for a Huntington's gene test, which is now available. The actual gene has not yet been isolated and since there is no cure at present, many people at risk for Huntington's are reluctant to take it. "Before the test," Wexler says, "you can always say, 'Well, it can't happen to me.' After the test, if it is positive, you can't say that anymore." Has Wexler, 43, taken the test? "People need to have some privacy," she answers.

    Tracking down the location of a gene requires tedious analysis. But it is sheer adventure when compared with the task of determining the sequence of base pairs in a DNA chain. Small groups of scientists, working literally by hand, have spent years simply trying to sequence a single gene. This hands-on method of sequencing costs as much as a dollar per base pair, and deciphering the entire genome by this method might take centuries.

    The solution is automation. "It will improve accuracy," says Stanford's Paul Berg. "It will remove boredom; it will accomplish what we want in the end." The drive for automation has already begun; a machine designed by Caltech biologist Leroy Hood can now sequence 16,000 base pairs a day. But Hood, a member of the Genome Advisory Committee, is hardly satisfied. "Before we can seriously take on the genome initiative," he says, "we will want to do 100,000 to a million a day." The cost, he hopes, will eventually drop to a penny per base pair.

    Hood is not alone in his quest for automation. That is also the goal of Columbia University biochemist Charles Cantor, recently appointed by the Energy Department to head one of its two genome centers. "It's largely an engineering project," Cantor explains, intended to produce tools for faster, less expensive sequencing and to develop data bases and computer programs to scan the data. Not to be outdone, Japan has set up a consortium of four high- tech companies to establish an automated assembly line, complete with robots, that researchers hope will be capable of sequencing 100,000 base pairs a day within three years.

    Is there a better way? In San Francisco in January, Energy Department scientists displayed a photograph of a DNA strand magnified a million times by a scanning tunneling microscope. It was the first direct image of the molecule. If sharper images can be made, the scientists suggested, it may be possible to read the genetic code directly. But that day seems very far off.

    Even before the Human Genome Project was begun by the NIH, others were deeply involved in probing the genome. Building on a long-standing program of research on DNA damage caused by radiation, biologist Charles DeLisi in 1987 persuaded the Energy Department to launch its own genome program. In addition to the sequencer and computer-hardware engineering projects, Energy Department scientists are focusing their attention on mapping seven complete chromosomes.

    Victor McKusick, a geneticist at Johns Hopkins University, was in the game much earlier. He has been cataloging genes since 1959, compiling findings in his regularly updated publication, Mendelian Inheritance in Man. In August 1987 he introduced an electronic version that scientists around the world can tap into by computer. At the end of December it contained information on all the 4,550 genes identified to date. Says McKusick: "That's an impressive figure, but we still have a long way to go." Several other libraries of genetic information are already functioning, among them GenBank at the Los Alamos National Laboratory and the Howard Hughes Medical Institute's Human Gene Mapping Library in New Haven, Conn.

    McKusick also directs the Human Genome Organization (known informally as "Victor's HuGO"), a group formed last September in Montreux, Switzerland, by 42 scientists representing 17 nations. "The U.N. of gene mapping," as McKusick describes it, plans to open three data-collection and -distribution sites, one each in Japan, North America and Europe.

    Geneticist Ray White, formerly at M.I.T., has established a major center for genetic-linkage mapping at the University of Utah in Salt Lake City. In 1980 he began a study of 50 large families, collecting their blood samples, extracting white blood cells, which he multiplies in cell cultures, then preserving them in freezers.

    Working with family pedigrees and DNA extracted from the cell bank, White and his group have identified more than 1,000 markers, each about 10 million base pairs apart, on all the chromosomes. They have also been major contributors to the Center for the Study of Human Polymorphisms, set up in Paris by French Nobel laureate Jean Dausset to coordinate an international effort to map the genes. Of the 40 families whose cell lines reside in CEPH's major data banks, 27 have been provided by White's group.

    How and if these and other genetic research efforts will be coordinated with the Human Genome Project is a question being pondered by director Watson and his advisory committee. "Right now," says Watson, "the program supports people through individual research grants. We have to build up around ten research centers, each with specific objectives, if we want to do this project in a reasonable period of time."

    The effort will also include studies of genes in other organisms, such as mice and fruit flies. "We've got to build a few places that are very strong in mouse genetics," Watson says, "because in order to interpret the human, we need to have a parallel in the mouse." Explains Genentech's Botstein: "Experimentation with lower organisms will illuminate the meaning of the sequence in humans." For example, genes that control growth and development in the fruit fly are virtually identical to oncogenes, which cause cancer in humans.

    One of the early benefits of the genome project will be the identification of more and more of the defective genes responsible for the thousands of known inherited diseases and development of tests to detect them. Like those already used to find Huntington's and sickle-cell markers, for example, these tests will allow doctors to predict with near certainty that some patients will fall victim to specific genetic diseases and that others are vulnerable and could be stricken.

    University of Utah geneticist Mark Skolnick is convinced that mapping the genome will radically change the way medicine is practiced. "Right now," he says, "we wait for someone to get sick so we can cut them and drug them. It's pretty old stuff. Once you can make a profile of a person's genetic predisposition to disease, medicine will finally become predictive and preventive."

    Eventually, says Mark Guyer of the NIH's Human Genome Office, people might have access to a computer readout of their own genome, with an interpretation of their genetic strengths and weaknesses. At the very least, this would enable them to adopt an appropriate life-style, choosing the proper diet, environment and -- if necessary -- drugs to minimize the effects of genetic disorders.

    The ever improving ability to read base-pair sequences of genes will enable researchers to speed the discovery of new proteins, assess their role in the life processes, and use them -- as the interferons and interleukins are + already used -- for fighting disease. It will also help them pinpoint missing proteins, such as insulin, that can correct genetic diseases.

    Mapping and sequencing the genes should accelerate progress in another highly touted and controversial discipline: gene therapy. Using this technique, scientists hope someday to cure genetic diseases by actually inserting good genes into their patients' cells. One proposed form of gene therapy would be used to fight beta-thalassemia major, a blood disease characterized by severe anemia and caused by the inability of hemoglobin to function properly. That inability results from the lack of a protein in the hemoglobin, a deficiency that in turn is caused by a defective gene in bone- marrow cells.

    To effect a cure, doctors would remove bone-marrow cells from a patient and expose them to a retrovirus* engineered to carry correctly functioning versions of the patient's faulty gene. When the retrovirus invaded a marrow cell, it would insert itself into the cellular DNA, as retroviruses are wont to do, carrying the good gene with it. Reimplanted in the marrow, the altered marrow cells would take hold and multiply, churning out the previously lacking protein and curing the thalassemia patient.

    Easier said than done. Scientists have had trouble getting such implanted genes to "turn on" in their new environment, and they worry about unforeseen consequences if the gene is inserted in the wrong place in a chromosome. Should the gene be slipped into the middle of another vital gene, for example, it might disrupt the functioning of that gene, with disastrous consequences. Also, says M.I.T. biologist Richard Mulligan, there are limitations to the viral insertion of genes. "Most genes," he explains, "are too big to fit into a retrovirus."

    Undaunted, researchers are refining their techniques in experiments with mice, and Mulligan believes that the first human-gene-therapy experiments could occur in the next three years. Looking further ahead, other scientists are experimenting with a kind of genetic microsurgery that bypasses the retrovirus, mechanically inserting genes directly into the cell nucleus.

    Not only those with rare genetic disorders could benefit from the new technology. Says John Brunzell, a University of Washington medicine professor: "Ten years ago, it was thought that only 10% of premature coronary heart disease came from inherited abnormalities. Now that proportion is approaching 80% to 90%."

    Harvard geneticist Philip Leder cites many common diseases -- hyper-tension, allergies, diabetes, heart disease, mental illness and some (perhaps all) cancers -- that have a genetic component. Unlike Huntington's and Tay-Sachs diseases, which are caused by a single defective gene, many of these disorders have their roots in several errant genes and would require genetic therapy far more sophisticated than any now even being contemplated. Still, says Leder, "in the end, genetic mapping is going to have its greatest impact on these major diseases."

    Of all the enthusiasm that the genome project has generated among scientists and their supporters in Washington, however, none matches that of James Watson as he gears up for the monumental task ahead. "It excites me enormously," he says, and he remains confident that it can be accomplished despite the naysayers both within and outside the scientific community. "How can we not do it?" he demands. "We used to think our fate was in our stars. Now we know, in large measure, our fate is in our genes."

    FOOTNOTE: *Except red blood cells, which have no nucleus.

    FOOTNOTE: *A virus consisting largely of RNA, a single-stranded chain of bases similar to the DNA double helix.