Use of a Machine-learning Method for Predicting Highly Cited Articles Within General Radiology Journals

Acad Radiol. 2016 Dec;23(12):1573-1581. doi: 10.1016/j.acra.2016.08.011. Epub 2016 Sep 28.

Abstract

Rationale and objectives: This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the model's most influential article features.

Materials and methods: We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features.

Results: The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics).

Conclusions: Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature.

Keywords: Radiology; bibliometrics; biomedical journals; machine learning.

Publication types

  • Evaluation Study

MeSH terms

  • Area Under Curve
  • Bayes Theorem
  • Bibliometrics
  • Humans
  • Journal Impact Factor
  • Machine Learning*
  • Periodicals as Topic / statistics & numerical data*
  • Publications / statistics & numerical data
  • Publishing / statistics & numerical data
  • Radiology / statistics & numerical data*