Towards more accurate classification of risk of arrest among offenders on community supervision: An application of machine learning versus logistic regression

Crim Behav Ment Health. 2023 Jun;33(3):156-171. doi: 10.1002/cbm.2289. Epub 2023 Apr 26.

Abstract

Background: Although there is general consensus about the behavioural, clinical and sociodemographic variables that are risk factors for reoffending, optimal statistical modelling of these variables is less clear. Machine learning techniques offer an approach that may provide greater accuracy than traditional methods.

Aim: To compare the performance of advanced machine learning techniques (classification trees and random forests) to logistic regression in classifying correlates of rearrest among adult probationers and parolees in the United States.

Method: Data were from the subgroup of people on probation or parole who had taken part in the National Survey on Drug Use and Health for the years 2015-2019. We compared the performance of logistic regression, classification trees and random forests, using receiver operating characteristic curves, to examine the correlates of arrest within the past 12 months.

Results: We found that machine learning techniques, specifically random forests, possessed significantly greater accuracy than logistic regression in classifying correlates of arrest.

Conclusions: Our findings suggest the potential for enhanced risk classification. The next step would be to develop applications for criminal justice and clinical practice to inform better support and management strategies for former offenders in the community.

Keywords: arrest; classification; machine learning; parole; probation; recidivism.

MeSH terms

  • Adult
  • Criminals*
  • Humans
  • Law Enforcement
  • Logistic Models
  • Machine Learning
  • Substance-Related Disorders*