Extensive hypothesis testing for estimation of crash frequency models

Heliyon. 2024 Feb 23;10(5):e26634. doi: 10.1016/j.heliyon.2024.e26634. eCollection 2024 Mar 15.

Abstract

Estimating crash data count models poses a significant challenge which requires extensive knowledge, experience, and meticulous hypothesis testing to capture underlying trends. Simultaneous consideration of multiple modelling aspects is required including, among others, functional forms, likely contributing factors, and unobserved heterogeneity. However, model development, frequently affected by time and knowledge, can easily overlook crucial modelling aspects such as identification of likely contributing factors, necessary transformations, and distributional assumptions. To facilitate model development and an estimation that can extract as many insights as possible, an optimization framework is proposed to generate and simultaneously test a diverse array of hypothesis. The framework comprises a mathematical programming formulation and three alternative solution algorithms. The objective function involves minimizing the Bayesian Information Criterion (BIC) to avoid overfitting. The solution algorithms include metaheuristics to deal with an NP-hard problem and search through a complex and nonconvex space. The metaheuristics also enable to handle unique datasets through varying search strategies. The effectiveness of the proposed framework was ascertained using three distinct datasets, and published models used as benchmarks. The results highlighted the ability of the proposed framework to estimate crash data count models, surpassing benchmark models in terms of insights and goodness-of-fit. The framework provides several advantages, such as robust hypothesis testing, uncovering unique specifications and vital insights in the data, and leveraging existing knowledge to enhance search efficiency. The framework also exposes the vulnerability of traditional analyst efforts to fall into local optima, bias, and limitations in creating more efficient models. In a compelling example using crash data from Washington, the proposed framework unveiled insights overlooked by a benchmark published model, identifying speed, interchanges, and grade breaks as likely crash contributors, and revealing the potential danger of excessively wide shoulders. Conversely, the benchmark model identified fewer contributing factors and missed a crucial non-linear relationship between crash safety and shoulder widths. While wider shoulders are typically associated with improved safety, the proposed models suggest a safety threshold beyond which further widening could decrease safety. The introduction of random parameters in the analysis revealed a more nuanced relationship with crash frequency, thereby underlining the limitations of models incapable of capturing heterogeneity.

Keywords: Crash data; Data count models; Hypothesis testing; Metaheuristic; Optimization; Random parameters; Regression.