Background: We assessed the predictability of various classes of gastric carcinoma defined by clinicopathological parameters, such as invasiveness and clinical outcomes, using cDNA array data obtained from 54 cases.
Materials and methods: We searched an optimal combination of genes to discriminate the classes defined with the clinicopathological parameters by using a feature subset selection algorithm, which was applied to a set of genes preselected on the basis of statistical difference in expression (two-sided t test, P < or = 0.05). With the selected features (gene set), we evaluated the predictability of each parameter in a leave-one-out cross-validation test.
Results: We successfully selected sets of genes for which the classifier predicted better versus worse overall survival (tumor-specific death) and tumor-free survival (recurrence), with respective classification rates of 94 and 92%. A contingency table analysis (chi2 test) and Cox proportional hazard model analysis revealed that lymph node metastasis is the most important factor (confounding factor) in patients' prognoses and risks of recurrence. The feature subset selection procedure successfully extracted expression patterns characteristic of lymph node metastasis and lymphatic vessel invasion, yielding 92 and 98% prediction accuracies for these respective factors.
Conclusion: We conclude that expression profiling using feature subset selection provides a powerful means of stratification of gastric cancer patients in regard to the prognostic factors. Further studies should be warranted to apply this method to personalization of the treatment options.