Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs

Rehannah H A Borup; Stefano Toppo; Yi-Wen Chen; Tanya M Teslovich; Gerolamo Lanfranchi; Giorgio Valle; Eric P Hoffman

doi:10.1186/1471-2105-3-33

Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs

BMC Bioinformatics. 2002 Oct 29:3:33. doi: 10.1186/1471-2105-3-33.

Authors

Rehannah H A Borup¹, Stefano Toppo, Yi-Wen Chen, Tanya M Teslovich, Gerolamo Lanfranchi, Giorgio Valle, Eric P Hoffman

Affiliation

¹ Research Center for Genetic Medicine, Children's National Medical Center, 111 Michigan Avenue N.W., Washington, DC 20010, USA. [email protected]

Abstract

Background: We describe the development, validation, and use of a highly redundant 120,000 oligonucleotide microarray (MuscleChip) containing 4,601 probe sets representing 1,150 known genes expressed in muscle and 2,075 EST clusters from a non-normalized subtracted muscle EST sequencing project (28,074 EST sequences). This set included 369 novel EST clusters showing no match to previously characterized proteins in any database. Each probe set was designed to contain 20-32 25 mer oligonucleotides (10-16 paired perfect match and mismatch probe pairs per gene), with each probe evaluated for hybridization kinetics (Tm) and similarity to other sequences. The 120,000 oligonucleotides were synthesized by photolithography and light-activated chemistry on each microarray.

Results: Hybridization of human muscle cRNAs to this MuscleChip (33 samples) showed a correlation of 0.6 between the number of ESTs sequenced in each cluster and hybridization intensity. Out of 369 novel EST clusters not showing any similarity to previously characterized proteins, we focused on 250 EST clusters that were represented by robust probe sets on the MuscleChip fulfilling all stringent rules. 102 (41%) were found to be consistently "present" by analysis of hybridization to human muscle RNA, of which 40 ESTs (39%) could be genome anchored to potential transcription units in the human genome sequence. 19 ESTs of the 40 ESTs were furthermore computer-predicted as exons by one or more than three gene identification algorithms.

Conclusion: Our analysis found 40 transcriptionally validated, genome-anchored novel EST clusters to be expressed in human muscle. As most of these ESTs were low copy clusters (duplex and triplex) in the original 28,000 EST project, the identification of these as significantly expressed is a robust validation of the transcript units that permits subsequent focus on the novel proteins encoded by these genes.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Cluster Analysis
Computational Biology / instrumentation*
Computational Biology / methods*
Databases, Genetic
Expressed Sequence Tags*
Gene Library
Genome, Human
Humans
Kinetics
Muscles / metabolism
Nucleic Acid Hybridization
Oligonucleotide Array Sequence Analysis / methods*
Oligonucleotides / chemistry*
RNA, Complementary / metabolism
Software
Transcription, Genetic

Substances

Oligonucleotides
RNA, Complementary

Abstract

Publication types

MeSH terms

Substances

Grants and funding