Predicting citation count of Bioinformatics papers within four years of publication

Bioinformatics. 2009 Dec 15;25(24):3303-9. doi: 10.1093/bioinformatics/btp585. Epub 2009 Oct 9.

Abstract

Motivation: Nowadays, publishers of scientific journals face the tough task of selecting high-quality articles that will attract as many readers as possible from a pool of articles. This is due to the growth of scientific output and literature. The possibility of a journal having a tool capable of predicting the citation count of an article within the first few years after publication would pave the way for new assessment systems.

Results: This article presents a new approach based on building several prediction models for the Bioinformatics journal. These models predict the citation count of an article within 4 years after publication (global models). To build these models, tokens found in the abstracts of Bioinformatics papers have been used as predictive features, along with other features like the journal sections and 2-week post-publication periods. To improve the accuracy of the global models, specific models have been built for each Bioinformatics journal section (Data and Text Mining, Databases and Ontologies, Gene Expression, Genetics and Population Analysis, Genome Analysis, Phylogenetics, Sequence Analysis, Structural Bioinformatics and Systems Biology). In these new models, the average success rate for predictions using the naive Bayes and logistic regression supervised classification methods was 89.4% and 91.5%, respectively, within the nine sections and for 4-year time horizon.

Availability: Supplementary material on this experimental survey is available at http://www.dia.fi.upm.es/~concha/bioinformatics.html

Contact: [email protected]

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Periodicals as Topic / statistics & numerical data*
  • Publishing / statistics & numerical data