Background: Although there is general consensus about the behavioural, clinical and sociodemographic variables that are risk factors for reoffending, optimal statistical modelling of these variables is less clear. Machine learning techniques offer an approach that may provide greater accuracy than traditional methods.
Aim: To compare the performance of advanced machine learning techniques (classification trees and random forests) to logistic regression in classifying correlates of rearrest among adult probationers and parolees in the United States.
Method: Data were from the subgroup of people on probation or parole who had taken part in the National Survey on Drug Use and Health for the years 2015-2019. We compared the performance of logistic regression, classification trees and random forests, using receiver operating characteristic curves, to examine the correlates of arrest within the past 12 months.
Results: We found that machine learning techniques, specifically random forests, possessed significantly greater accuracy than logistic regression in classifying correlates of arrest.
Conclusions: Our findings suggest the potential for enhanced risk classification. The next step would be to develop applications for criminal justice and clinical practice to inform better support and management strategies for former offenders in the community.
Keywords: arrest; classification; machine learning; parole; probation; recidivism.
© 2023 John Wiley & Sons Ltd.