New Biological Database Seeks Out Products of Alternative Gene Splicing

August 2, 1999

BERKELEY, CA -- In its first half year of operation, a new database that identifies clusters of proteins arising from alternative gene splicing has received more than 35,000 requests from researchers in genetics and cell and developmental biology around the world.

The Alternative Splicing Data Base, or ASDB, is based at the National Energy Research Scientific Computing Center (NERSC) in the Department of Energy's Lawrence Berkeley National Laboratory. It was created by Inna Dubchak, Igor Dralyuk, and Manfred Zorn of NERSC's Center for Bioinformatics and Computational Genomics, in collaboration with M.S. Gelfand of the Institute of Protein Research at the Russian Academy of Sciences.

"In the first weeks after ASDB went on line in January, requests for data went from an average of a few dozen per day to hundreds," says Dubchak. "One day in May, we got more than 6,000 requests."

The world-wide demand for alternative gene-splicing data is confirmation that Dubchak and her colleagues have hit upon one of the most exciting and important problems in contemporary biology -- which suits her fine: "We want to help biologists solve their hardest problems by computational methods."

Genes that can be spliced alternately to produce different proteins violate what, not long ago, was considered a basic tenet of biology -- "one gene, one protein." But it is now clear that alternative splicing plays a crucial role in the development and health of many organisms.

Several steps lie between a gene -- a sequence of nucleotides on a strand of DNA -- and the protein for which it codes. Messenger RNA copies the gene, then carries the information to a ribosome. The ribosome reads the RNA and cranks out an amino-acid string, which folds into the functional protein.

In 1977 researchers found that with some genes, after the messenger RNA leaves the DNA strand and before it is processed by a ribosome, large chunks of it are edited out. The discarded pieces represent stretches of the gene (later named introns) that do not code for amino acids; sequences that actually do code for amino acids are called exons. Sixteen years after their discovery of these "split" genes, Richard Roberts and Phillip Sharp won the Nobel Prize in 1993.

Split genes have a remarkable property: their exons can be added or deleted, giving rise to different proteins from the same gene. This alternative splicing plays a vital role in most higher organisms; in the development of the fruit fly, a single split gene arranged one way eventually produces a female, but if arranged another way produces a male.

Split genes are also important in generating the numerous "impromptu" variations of antibodies produced by the human immune system in response to novel infectious agents. And splicing variations have been found to result in some cancers as well. Alternative splicing in humans is not rare -- almost a third of human genes are subject to it.

Dubchak and her colleagues spent a year and a half assembling the ASDB, which currently contains some 1,700 protein sequences. It can be searched to find out how many known proteins can be derived from a single gene sequence (some can generate up to 64 variations of messenger RNA!) or to find all known products of alternative splicing in a given organism, such as the fruit fly, mouse, or human, or in a particular tissue such as muscle, heart, or brain.

The mechanism of alternative splicing is not well understood; relative concentrations of antagonistic "splicing factors," including small proteins in the cell nucleus, are an important factor. And no one really knows the origins or evolutionary reasons for the persistence of introns, although conflicting theories abound.

By understanding the variations in proteins that result from shuffling exons in split genes, answers to these and a host of other questions may emerge. This promise is what drives the ever-increasing use of the Alternative Splicing Data Base. Gelfand, Dubchak, Dralyuk, and Zorn give a detailed description of ASDB in the January 1, 1999 issue of Nucleic Acids Research (Vol 27, no 1).

About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.