E-commerce faces challenges such as content homogenization and high perceived risk among users. This paper aims to predict perceived risk in different contexts by analyzing review content and website information. Based on a dataset containing 262,752 online reviews, we employ the KeyBERT-TextCNN model to extract thematic features from the review content. Subsequently, we combine these thematic features with product and merchant characteristics. Using the PCA-K-medoids-XGBoost algorithm, we developed a predictive model for perceived risk. In the feature extraction phase, we identified 11 key features that influence perceived risk in online shopping. During the prediction phase, the model performs excellently across different sample types in the test set, achieving a precision (P) of 84%, a recall (R) of 86%, and an F1 score of 85%. Through the model's interpretability analysis, we find that quality, functionality, and price are key features affecting perceived risk for electronic products. In the case of skincare products, skin safety is the most critical feature. Additionally, there are significant differences in feature characteristics between high-risk samples and normal samples.
Copyright: © 2025 Qi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.