The construction of Chinese microblog gender-specific thesauruses and user gender classification

Appl Netw Sci. 2018;3(1):47. doi: 10.1007/s41109-018-0104-1. Epub 2018 Nov 8.

Abstract

Based on the statistical features, short text messages published by different gender users are different in terms of the words and semantics used. In this paper, two new features are constructed after constructing a gender-specific thesaurus. A new classification model is constructed by combining the traditional statistical features and the improved text implicitness feature. The experimental evaluation performed on the Sina Weibo dataset demonstrated the effectiveness of gender-specific thesaurus-based features, and the improved text implicitness feature improved the accuracy of gender classification to 84.7%.

Keywords: Gender classification; Gender-specific thesaurus; Machine learning; Statistical feature.