The utility of data-driven feature selection: re: Chu et al. 2012

Neuroimage. 2014 Jan 1:84:1107-10. doi: 10.1016/j.neuroimage.2013.07.050. Epub 2013 Jul 25.

Abstract

The recent Chu et al. (2012) manuscript discusses two key findings regarding feature selection (FS): (1) data driven FS was no better than using whole brain voxel data and (2) a priori biological knowledge was effective to guide FS. Use of FS is highly relevant in neuroimaging-based machine learning, as the number of attributes can greatly exceed the number of exemplars. We strongly endorse their demonstration of both of these findings, and we provide additional important practical and theoretical arguments as to why, in their case, the data-driven FS methods they implemented did not result in improved accuracy. Further, we emphasize that the data-driven FS methods they tested performed approximately as well as the all-voxel case. We discuss why a sparse model may be favored over a complex one with similar performance. We caution readers that the findings in the Chu et al. report should not be generalized to all data-driven FS methods.

Keywords: Feature selection; Machine learning; Neuroimaging.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Comment

MeSH terms

  • Alzheimer Disease / classification*
  • Alzheimer Disease / pathology*
  • Cognitive Dysfunction / classification*
  • Cognitive Dysfunction / pathology*
  • Female
  • Humans
  • Magnetic Resonance Imaging*
  • Male
  • Neuroimaging*