Impossibility of successful classification when useful features are rare and weak

Jiashun Jin

doi:10.1073/pnas.0903931106

Impossibility of successful classification when useful features are rare and weak

Proc Natl Acad Sci U S A. 2009 Jun 2;106(22):8859-64. doi: 10.1073/pnas.0903931106. Epub 2009 May 15.

Author

Jiashun Jin¹

Affiliation

¹ Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA. [email protected]

Abstract

We study a two-class classification problem with a large number of features, out of which many are useless and only a few are useful, but we do not know which ones they are. The number of features is large compared with the number of training observations. Calibrating the model with 4 key parameters--the number of features, the size of the training sample, the fraction, and strength of useful features--we identify a region in parameter space where no trained classifier can reliably separate the two classes on fresh data. The complement of this region--where successful classification is possible--is also briefly discussed.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.