Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies

Biotechniques. 2013 Mar;54(3):165-8. doi: 10.2144/000113978.

Abstract

Principal Component Analysis (PCA) is a common exploratory tool used to evaluate large complex data sets. The resulting lower-dimensional representations are often valuable for pattern visualization, clustering, or classification of the data. However, PCA cannot be applied directly to many -omics data sets generated by newer technologies such as label-free mass spectrometry due to large numbers of non-random missing values. Here we present a sequential projection pursuit PCA (sppPCA) method for defining principal components in the presence of missing data. Our results demonstrate that this approach generates robust and informative low-dimensional data representations compared to commonly used imputation approaches.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Chromatography, Liquid / methods
  • Databases, Protein
  • Humans
  • Mass Spectrometry / methods*
  • Metabolomics / methods
  • Principal Component Analysis*
  • Proteomics / methods*