Computational Tools for Parsimony Phylogenetic Analysis of Omics Data

OMICS. 2015 Aug;19(8):471-7. doi: 10.1089/omi.2015.0018. Epub 2015 Jun 3.

Abstract

High-throughput assays from genomics, proteomics, metabolomics, and next generation sequencing produce massive omics datasets that are challenging to analyze in biological or clinical contexts. Thus far, there is no publicly available program for converting quantitative omics data into input formats to be used in off-the-shelf robust phylogenetic programs. To the best of our knowledge, this is the first report on creation of two Windows-based programs, OmicsTract and SynpExtractor, to address this gap. We note, as a way of introduction and development of these programs, that one particularly useful bioinformatics inferential modeling is the phylogenetic cladogram. Cladograms are multidimensional tools that show the relatedness between subgroups of healthy and diseased individuals and the latter's shared aberrations; they also reveal some characteristics of a disease that would not otherwise be apparent by other analytical methods. The OmicsTract and SynpExtractor were written for the respective tasks of (1) accommodating advanced phylogenetic parsimony analysis (through standard programs of MIX [from PHYLIP] and TNT), and (2) extracting shared aberrations at the cladogram nodes. OmicsTract converts comma-delimited data tables through assigning each data point into a binary value ("0" for normal states and "1" for abnormal states) then outputs the converted data tables into the proper input file formats for MIX or with embedded commands for TNT. SynapExtractor uses outfiles from MIX and TNT to extract the shared aberrations of each node of the cladogram, matching them with identifying labels from the dataset and exporting them into a comma-delimited file. Labels may be gene identifiers in gene-expression datasets or m/z values in mass spectrometry datasets. By automating these steps, OmicsTract and SynpExtractor offer a veritable opportunity for rapid and standardized phylogenetic analyses of omics data; their model can also be extended to next generation sequencing (NGS) data. We make OmicsTract and SynpExtractor publicly and freely available for non-commercial use in order to strengthen and build capacity for the phylogenetic paradigm of omics analysis.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Algorithms
  • Datasets as Topic
  • Gene Expression Profiling / classification*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Information Dissemination
  • Information Storage and Retrieval
  • Male
  • Metabolomics / methods
  • Prostate / metabolism
  • Prostate / pathology
  • Prostatic Neoplasms / diagnosis*
  • Prostatic Neoplasms / genetics*
  • Prostatic Neoplasms / pathology
  • Software*