Since the advent of the phrase "subgroup identification," there has been an explosion of methodologies that seek to identify meaningful subgroups of patients with exceptional response in order to further the realization of personalized medicine. However, to perform fair comparison and understand what methods work best under different clinical trials situations, a common platform is needed for comparative effectiveness of these various approaches. In this paper, we describe a comprehensive project that created an extensive platform for evaluating subgroup identification methods as well as a publicly posted challenge that was used to elicit new approaches. We proposed a common data-generating model for creating virtual clinical trial datasets that contain subgroups of exceptional responders encompassing the many dimensions of the problem or null scenarios in which there are no such subgroups. Furthermore, we created a common scoring system for evaluating performance of purported methods for identifying subgroups. This makes it possible to benchmark methodologies in order to understand what methods work best under different clinical trial situations. The findings from this project produced considerable insights and allow us to make recommendations for how the statistical community can better compare and contrast old and new subgroup identification methodologies.
Keywords: InnoCentive Challenge; methods comparison; personalized medicine; subgroup identification.
© 2023 The Authors. Biometrical Journal published by Wiley-VCH GmbH.