Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach

Artif Intell Med. 2008 Jun;43(2):99-111. doi: 10.1016/j.artmed.2008.03.001. Epub 2008 Apr 16.

Abstract

Objective: The advent of microarrays has attracted considerable interest from biologists due to the potential for high throughput analysis of hundreds of thousands of gene transcripts. Subsequent analysis of the data may identify specific features which correspond to characteristics of interest within the population, for example, analysis of gene expression profiles in cancer patients to identify molecular signatures corresponding with prognostic outcome. These high throughput technologies have resulted in an unprecedented rate of data generation, often of high complexity, highlighting the need for novel data analysis methodologies that will cope with data of this nature.

Methods: Stepwise methods using artificial neural networks (ANNs) have been developed to identify an optimal subset of predictive gene transcripts from highly dimensional microarray data. Here these methods have been applied to a gene microarray dataset to identify and validate gene signatures corresponding with estrogen receptor and lymph node status in breast cancer.

Results: Many gene transcripts were identified whose expression could differentiate patients to very high accuracies based upon firstly whether they were positive or negative for estrogen receptor, and secondly whether metastasis to the axillary lymph node had occurred. A number of these genes had been previously reported to have a role in cancer. Significantly fewer genes were used compared to other previous studies. The models using the optimal gene subsets were internally validated using an extensive random sample cross-validation procedure and externally validated using a follow up dataset from a different cohort of patients on a newer array chip containing the same and additional probe sets. Here, the models retained high accuracies, emphasising the potential power of this approach in analysing complex systems. These findings show how the proposed method allows for the rapid analysis and subsequent detailed interrogation of gene expression signatures to provide a further understanding of the underlying molecular mechanisms that could be important in determining novel prognostic markers associated with cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology*
  • Female
  • Gene Expression Profiling
  • Humans
  • Lymphatic Metastasis
  • Neural Networks, Computer*
  • Oligonucleotide Array Sequence Analysis
  • Predictive Value of Tests
  • Receptors, Estrogen / physiology*
  • Reproducibility of Results
  • Transcription, Genetic / physiology*

Substances

  • Receptors, Estrogen