MixMir: microRNA motif discovery from gene expression data using mixed linear models

Liyang Diao; Antoine Marcais; Scott Norton; Kevin C Chen

doi:10.1093/nar/gku672

MixMir: microRNA motif discovery from gene expression data using mixed linear models

Nucleic Acids Res. 2014;42(17):e135. doi: 10.1093/nar/gku672. Epub 2014 Jul 31.

Authors

Liyang Diao¹, Antoine Marcais², Scott Norton³, Kevin C Chen⁴

Affiliations

¹ BioMaPS Institute for Quantitative Biology and Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.
² CIRI, International Center for Infectiology Research, Université de Lyon, Inserm, CNRS, Ecole Normale Supérieure, Lyon, France.
³ BioMaPS Institute for Quantitative Biology and Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA Department of Mathematics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA.
⁴ BioMaPS Institute for Quantitative Biology and Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA [email protected].

Abstract

microRNAs (miRNAs) are a class of ∼22nt non-coding RNAs that potentially regulate over 60% of human protein-coding genes. miRNA activity is highly specific, differing between cell types, developmental stages and environmental conditions, so the identification of active miRNAs in a given sample is of great interest. Here we present a novel computational approach for analyzing both mRNA sequence and gene expression data, called MixMir. Our method corrects for 3' UTR background sequence similarity between transcripts, which is known to correlate with mRNA transcript abundance. We demonstrate that after accounting for kmer sequence similarities in 3' UTRs, a statistical linear model based on motif presence/absence can effectively discover active miRNAs in a sample. MixMir utilizes fast software implementations for solving mixed linear models, which are widely used in genome-wide association studies (GWASs). Essentially we use 3' UTR sequence similarity in place of population cryptic relatedness in the GWAS problem. Compared to similar methods such as miReduce, Sylamer and cWords, we found that MixMir performed better at discovering true miRNA motifs in three mouse Dicer-knockout experiments from different tissues, two of which were collected by our group. We confirmed these results on protein and mRNA expression data obtained from miRNA transfection experiments in human cell lines. MixMir can be freely downloaded from https://github.com/ldiao/MixMir.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

3' Untranslated Regions*
Adrenal Cortex / metabolism
Algorithms
Animals
DEAD-box RNA Helicases / genetics
Embryonic Stem Cells / metabolism
Gene Expression Profiling / methods*
Linear Models
Mice
Mice, Knockout
MicroRNAs / metabolism*
Nucleotide Motifs
Oligonucleotide Array Sequence Analysis / methods
RNA, Messenger / chemistry
RNA, Messenger / metabolism
Ribonuclease III / genetics
Sequence Analysis, RNA / methods*

Substances

3' Untranslated Regions
MicroRNAs
RNA, Messenger
Dicer1 protein, mouse
Ribonuclease III
DEAD-box RNA Helicases

Grants and funding

R00HG004515/HG/NHGRI NIH HHS/United States