Background: Prognosis following a diagnosis of primary lung cancer is very poor and varies significantly even after adjusting for known predictors. Inherent and acquired gene alterations could cause failure in lung cancer treatment and patient survival. To search for potential molecular markers with significant and independent predictive value in lung cancer survival, we applied oligo-nucleotide microarray analysis, along with patients' phenotypic profile, in a case-control study. The focus of this report is on the methodology used in the identification of potential genes as prognostic factors.
Methods: Selected from 304 patients at Mayo Clinic, 18 stage I squamous cell lung cancer patients who died within 2 years (high-aggressive) or lived beyond 5 years (low-aggressive) were included in this study. Both a one-to-one matched design (paired) and a two-group design (grouped) were utilized. Matching variables were age, gender, tumor size and grade, smoking status, and treatment. Two-GeneChip-array sets from Affymetrix (HG-U133) were used. We applied multiple analytic approaches including Dchip (Harvard University), SAM (Stanford University), ArrayTools (US National Cancer Institute), and MAS5 (Affymetrix); and integrated multiple results to generate the final candidate genes for further investigation. We evaluated the consistency across the methods and the effects of matched versus grouped design on the results.
Results: Using the same pre-processed data under the same criteria for type I error and fold-change in expression intensity, results are 94-100% concordant in the list of significant genes by Dchip and by ArrayTools, and 53% concordant between the paired and the grouped analysis. If using differently pre-processed data, the concordance rate is under 6% even by the same analytic tool. Combining results from all analyses, we found 23 potentially important genes that may distinguish the high- versus low-aggressive squamous cell tumors of the lung.
Conclusion: Given the generally low consistency of results across analytic algorithms and study design, poor agreement is expected from different investigators reporting candidate genes for the same endpoint. A well-designed study with a carefully planned analytic strategy is critical. We are in the process of validating the 23 preliminary candidate genes found from this study among independent yet comparable cases.