Orthology detection combining clustering and synteny for very large datasets

PLoS One. 2014 Aug 19;9(8):e105015. doi: 10.1371/journal.pone.0105015. eCollection 2014.

Abstract

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / genetics
  • Cluster Analysis
  • Computer Simulation
  • Datasets as Topic
  • Genes, Bacterial
  • Models, Genetic*
  • Software*
  • Synteny*

Substances

  • Bacterial Proteins

Grants and funding

We acknowledge support for the Article Processing Charge by the German Research Foundation and the Open Access Publication Fund of Bielefeld University Library. This work was supported in part by the Deutsche Forschungsgemeinschaft grants no. GRK-1384, MA5082/1- 1, MI439/14-1. DD receives a scholarship from the CLIB Graduate Cluster Industrial Biotechnology. AT is a research fellow of the Alexander von Humboldt Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.