NCJ Number
193065
Date Published
2001
Length
7 pages
Annotation
This chapter describes several high-throughput biology database projects being conducted under the National Center for Biotechnology Information (NCBI) and involving collaboration among numerous groups
Abstract
Under the National Library of Medicine, the National Center for Biotechnology Information (NCBI) has been conducting scientific research on the input/output of several high-throughput biology projects through collaboration with numerous groups. Several high-throughput biology projects are presented. First, PubMed is a system providing access to Medline, a database of more than 10 million abstracts in biomedical literature, and linkage to online journals and factual databases. PubMed illustrates how many databases were built in the past and their strengths and weaknesses. PubMed offers rapid dissemination of information on infectious diseases. Second, GenBank is a database of gene sequences and contains over 4 billion base pairs of DNA from more than 60,000 different species. GenBank is used for computation requiring information be validated in a variety of ways. Thirdly, UniGene is a resource used in the development of a gene map that involves the clustering of partial gene sequences called expressed sequence tags (ESTs). Lastly, the Cancer Genome Anatomy Project (CGAP) developed information about reagents for deciphering the molecular anatomy of the cancer cell and provides a model for future full-length cDNA projects. The project's objective is to develop a tumor gene index of all of the genes that are involved in cancer. High-throughput biology projects are representative of database projects of the future. Its challenge is to maintain control and organization in the data collection process and preserve the system flexibility and autonomy.