The band gap of hybrid organic-inorganic perovskites (HOIPs) is a key factor affecting the light absorption characteristics and thus the performance of perovskite solar cells (PSCs). However, band gap engineering, using experimental trial and error and high-throughput density functional theory calculations, is blind and costly. Therefore, it is critical to statistically identify the multiple factors influencing band gaps and to rationally design perovskites with targeted band gaps. This study combined feature engineering, the gradient-boosted regression tree (GBRT) algorithm, and the genetic algorithm-based symbolic regression (GASR) algorithm to develop an interpretable machine learning (ML) strategy for predicting the band gap of HOIPs accurately and quantitatively interpreting the factors affecting the band gap. Seven best physical features were selected to construct a GBRT model with a root-mean-square error of less than 0.060 eV, and the most important feature is the electronegativity difference between the B-site and the X-site (χB-X). Further, a mathematical formula (Eg = χB-X2 + 0.881χB-X) was constructed with GASR for a quantitative interpretation of the band gap influence patterns. According to the ML model, the HOIP MA0.23FA0.02Cs0.75Pb0.59Sn0.41Br0.24I2.76 was obtained, with a suitable band gap of 1.39 eV. Our proposed interpretable ML strategy provides an effective approach for developing HOIP structures with targeted band gaps, which can also be applied to other material fields.
Keywords: GBRT; band gap; machine learning; perovskite solar cell; symbolic regression.