GLAD: a mixed-membership model for heterogeneous tumor subtype classification

Hachem Saddiki; Jon McAuliffe; Patrick Flaherty

doi:10.1093/bioinformatics/btu618

GLAD: a mixed-membership model for heterogeneous tumor subtype classification

Bioinformatics. 2015 Jan 15;31(2):225-32. doi: 10.1093/bioinformatics/btu618. Epub 2014 Sep 29.

Authors

Hachem Saddiki¹, Jon McAuliffe², Patrick Flaherty¹

Affiliations

¹ Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, USA, School of Science and Engineering, Al Akhawayn University, Ifrane, 53000, Morocco, Department of Statistics, University of California, Berkeley, CA 94720, USA, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, USA, School of Science and Engineering, Al Akhawayn University, Ifrane, 53000, Morocco, Department of Statistics, University of California, Berkeley, CA 94720, USA, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA.
² Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609, USA, School of Science and Engineering, Al Akhawayn University, Ifrane, 53000, Morocco, Department of Statistics, University of California, Berkeley, CA 94720, USA, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA.

Abstract

Motivation: Genomic analyses of many solid cancers have demonstrated extensive genetic heterogeneity between as well as within individual tumors. However, statistical methods for classifying tumors by subtype based on genomic biomarkers generally entail an all-or-none decision, which may be misleading for clinical samples containing a mixture of subtypes and/or normal cell contamination.

Results: We have developed a mixed-membership classification model, called glad, that simultaneously learns a sparse biomarker signature for each subtype as well as a distribution over subtypes for each sample. We demonstrate the accuracy of this model on simulated data, in-vitro mixture experiments, and clinical samples from the Cancer Genome Atlas (TCGA) project. We show that many TCGA samples are likely a mixture of multiple subtypes.

Availability: A python module implementing our algorithm is available from http://genomics.wpi.edu/glad/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Biomarkers, Tumor / genetics*
Computational Biology / methods*
Computer Simulation
Data Interpretation, Statistical
Gene Expression Profiling
Gene Regulatory Networks
Humans
Neoplasms / classification*
Neoplasms / genetics*
Software*

Substances

Biomarkers, Tumor

Grants and funding

T32 CA121940/CA/NCI NIH HHS/United States