SeqPlots - Interactive software for exploratory data analyses, pattern discovery and visualization in genomics

Wellcome Open Res. 2016 Nov 15:1:14. doi: 10.12688/wellcomeopenres.10004.1. eCollection 2016.

Abstract

Experiments involving high-throughput sequencing are widely used for analyses of chromatin function and gene expression. Common examples are the use of chromatin immunoprecipitation for the analysis of chromatin modifications or factor binding, enzymatic digestions for chromatin structure assays, and RNA sequencing to assess gene expression changes after biological perturbations. To investigate the pattern and abundance of coverage signals across regions of interest, data are often visualized as profile plots of average signal or stacked rows of signal in the form of heatmaps. We found that available plotting software was either slow and laborious or difficult to use by investigators with little computational training, which inhibited wide data exploration. To address this need, we developed SeqPlots, a user-friendly exploratory data analysis (EDA) and visualization software for genomics. After choosing groups of signal and feature files and defining plotting parameters, users can generate profile plots of average signal or heatmaps clustered using different algorithms in a matter of seconds through the graphical user interface (GUI) controls. SeqPlots accepts all major genomic file formats as input and can also generate and plot user defined motif densities. Profile plots and heatmaps are highly configurable and batch operations can be used to generate a large number of plots at once. SeqPlots is available as a GUI application for Mac or Windows and Linux, or as an R/Bioconductor package. It can also be deployed on a server for remote and collaborative usage. The analysis features and ease of use of SeqPlots encourages wide data exploration, which should aid the discovery of novel genomic associations.

Keywords: aggregate gene profile plot; hierarchical cluster; k-means cluster; self-organizing maps; unsupervised machine learning.