Entrofy your cohort: A transparent method for diverse cohort selection

PLoS One. 2020 Jul 27;15(7):e0231939. doi: 10.1371/journal.pone.0231939. eCollection 2020.

Abstract

Selecting a cohort from a set of candidates is a common task within and beyond academia. Admitting students, awarding grants, and choosing speakers for a conference are situations where human biases may affect the selection of any particular candidate, and, thereby the composition of the final cohort. In this paper, we propose a new algorithm, entrofy, designed to be part of a human-in-the-loop decision making strategy aimed at making cohort selection as just, transparent, and accountable as possible. We suggest embedding entrofy in a two-step selection procedure. During a merit review, the committee selects all applicants, submissions, or other entities that meet their merit-based criteria. This often yields a cohort larger than the admissible number. In the second stage, the target cohort can be chosen from this meritorious pool via a new algorithm and software tool called entrofy. entrofy optimizes differences across an assignable set of categories selected by the human committee. Criteria could include academic discipline, home country, experience with certain technologies, or other quantifiable characteristics. The entrofy algorithm then yields the approximation of pre-defined target proportions for each category by solving the tie-breaking problem with provable performance guarantees. We show how entrofy selects cohorts according to pre-determined characteristics in simulated sets of applications and demonstrate its use in a case study of Astro Hack Week. This two stage candidate and cohort selection process allows human judgment and debate to guide the assessment of candidates' merit in step 1. Then the human committee defines relevant diversity criteria which will be used as computational parameters in entrofy. Once the parameters are defined, the set of candidates who meet the minimum threshold for merit are passed through the entrofy cohort selection procedure in step 2 which yields a cohort of a composition as close as possible to the computational parameters defined by the committee. This process has the benefit of separating the meritorious assessment of candidates from certain elements of their diversity and from some considerations around cohort composition. It also increases the transparency and auditability of the process, which enables, but does not guarantee, fairness. Splitting merit and diversity considerations into their own assessment stages makes it easier to explain why a given candidate was selected or rejected, though it does not eliminate the possibility of objectionable bias.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cohort Studies*
  • Humans
  • Models, Theoretical
  • Research Design*

Grants and funding

DH, BM and LN acknowledge support by the Moore-Sloan Data Science Environment at NYU (http://msdse.org). DH was partially funded by the James Arthur Postdoctoral Fellowship at NYU. DH acknowledges support from the DIRAC Institute in the Department of Astronomy at the University of Washington. The DIRAC Institute is supported through generous gifts from the Charles and Lisa Simonyi Fund for Arts and Sciences (no website), and the Washington Research Foundation (http://www.wrfseattle.org). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. LN acknowledges support by Obsidian Security. The funder provided support in the form of salaries for LN, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. We note that the majority of the work in this manuscript was performed while LN was a faculty member at NYU, as declared above. This work was supported by the Washington Research Foundation and by a Data Science Environments project award from the Gordon and Betty Moore Foundation (Award #2013-10-29) and the Alfred P. Sloan Foundation (Award #3835) to the University of Washington eScience Institute.