Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network

Data Sci Eng. 2024;9(1):41-61. doi: 10.1007/s41019-023-00239-2. Epub 2024 Mar 13.

Abstract

Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.

Keywords: Community mining; Data mining; Graphs; Hierarchical topics; Information networks; Natural language processing; Topic modeling.