multiClust: An R-package for Identifying Biologically Relevant Clusters in Cancer Transcriptome Profiles

Nathan Lawlor; Alec Fabbri; Peiyong Guan; Joshy George; R Krishna Murthy Karuturi

doi:10.4137/CIN.S38000

multiClust: An R-package for Identifying Biologically Relevant Clusters in Cancer Transcriptome Profiles

Cancer Inform. 2016 Jun 12:15:103-14. doi: 10.4137/CIN.S38000. eCollection 2016.

Authors

Nathan Lawlor¹, Alec Fabbri², Peiyong Guan³, Joshy George⁴, R Krishna Murthy Karuturi⁴

Affiliations

¹ Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.
² Department of Biomedical Engineering, University of Connecticut, Storrs, CT, USA.
³ Genome Institute of Singapore, ASTAR (Agency for Science, Technology and Research), Singapore.; School of Computer Science and Engineering, Nanyang Technological University, Singapore.
⁴ The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.

Abstract

Clustering is carried out to identify patterns in transcriptomics profiles to determine clinically relevant subgroups of patients. Feature (gene) selection is a critical and an integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing an appropriate methodology is difficult. In addition, extensive feature selection methods have not been supported by the available packages. Hence, we developed an integrative R-package called multiClust that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. Using multiClust, we identified the best performing clustering methodology in the context of clinical outcome. Our observations demonstrate that simple methods such as variance-based ranking perform well on the majority of data sets, provided that the appropriate number of genes is selected. However, different gene ranking and selection methods remain relevant as no methodology works for all studies.

Keywords: R software; clinical outcome; data clustering; gene selection.

Publication types

Review

Abstract

Publication types

Grants and funding