Transferring Structure Knowledge: A New Task to Fake news Detection Towards Cold-Start Propagation

Abstract

Many fake news detection studies have achieved promising performance by extracting effective semantic and structure features from both content and propagation trees. However, it is challenging to apply them to practical situations, especially when using the trained propagation-based models to detect news with no propagation data. Towards this scenario, we study a new task named cold-start fake news detection, which aims to detect content-only samples with missing propagation. To achieve the task, we design a simple but effective Structure Adversarial Net (SAN) framework to learn transferable features from available propagation to boost the detection of content-only samples. SAN introduces a structure discriminator to estimate dissimilarities among learned features with and without propagation, and further learns structure-invariant features to enhance the generalization of existing propagation-based methods for content-only samples. We conduct qualitative and quantitative experiments on three datasets. Results show the challenge of the new task and the effectiveness of our SAN framework.

Index Terms— Fake news detection, propagation structure learning, cold start, social network

1 Introduction

Nowadays, mainstream social platforms (e.g., Twitter) have facilitated the dissemination of information in a faster and cheaper way. Nevertheless, the ease has also caused the wide spread of fake news, which has brought detrimental effects on individuals and society [1]. Triggered by the negative impact of fake news spreading, it is critical to develop automatic methods for fake news detection.

Generally, users on social media share opinions, conjectures and evidence for checking fake news. Through their various interactive behaviors, a propagation tree describing the law of information transmission is formed and plays a significant role in fake news detection. Previous works [2, 3] have empirically shown that compared with the truth, false news has deeper propagation structures, and reaches a wider audience. To leverage the difference, many efforts [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] have been denoted to jointly explore effective high-level semantic and structural properties from content and the corresponding propagation trees via different neural networks. Compared with the model learned only on the content [14, 15, 16], these propagation-based models trained on samples with both content and propagation, provides a more comprehensive view of fake news and have shown superior detection performance.

However, a practical barrier is that, in most cases, the acquisition of propagation data is not available at any time, and usually requires a great quantity of manpower and computation resources. When lacking propagation structure, the above propagation-based detection systems would obtain suboptimal performance. These models are trained on content and propagation tree to jointly learn semantic and structural features, leading to a specific feature space for detection. Obviously, for samples that lack propagation information, i.e., cold-start propagation, they fail to perform well due to the dissimilarities among features with and without propagation.

Based on the above scenario, we develops a new task to fake news detection towards cold-start propagation, named cold-start fake news detection. It aims to train the model from samples with available propagation and content, and then predict content-only samples without any propagation data. Different from the existing fake news detection tasks, the new task focuses on the model’s generalization capability of absence of propagation trees. Studying cold-start propagation scenario can promote the extensive applications of propagation-based detection methods in practical detection.

Under cold-start fake news detection task, directly applying existing propagation based models to capture structure-specific features from propagation trees would hurt the detection of the cold-start news that has no propagation trees. Therefore, an intuitive solution to the new task is to remove the nontransferable structure-specific features and preserve the shared characteristics across different data types.

To achieve this, we design a simple but effective Structure Adversarial Net (SAN) framework to boost the detection performance of propagation-based models for cold-start news. SAN is a unified framework and can be employed to any propagation-based model. Specifically, inspired by [17], SAN incorporates a structure discriminator to predict whether the high-level representation of the target news includes structure properties. During the training phase, the original feature extractor cooperates with the fake news detector to carry out identifying fake news. Simultaneously, the feature extractor tries to fool the structure discriminator to close the gap distributions of the presence and absence of propagation trees.

We conduct experiments on three public fake news benchmarks and build two different cold-start propagation settings (i.e., general and event-aware) to simulate the real-world detection scenario. Experimental results show different degrees of the degradation of existing propagation-based detection methods, demonstrating the challenge of fake news detection towards cold-start propagation. We also evaluate our proposed SAN framework on several baseline models. The results prove that SAN consistently enhances the detection performance of these methods under the cold-start propagation condition. We further discuss several potential directions to promote the development of cold-start fake news detection.

The main contributions can be summarized as follows: 1) We develop a new task, fake news detection towards cold-start propagation, which focuses on the generalization of absence of the whole propagation. The goal of the task is to identify the cold-start news by exploiting previously available propagation and contents. It allows propagation-based methods to be applied to more practical detection scenarios. 2) We propose a simple but effective Structure Adversarial Net (SAN) framework to transfer structure knowledge for content-only samples. It can be applied to any propagation-based models to promote the generalization for detecting cold-start news. 3) We explore different settings for cold-start fake news detection on three datasets. Experiments demonstrate the challenge of new task and the effectiveness of the proposed framework.

Refer to caption — Fig. 1: The architecture of SAN framework for cold-start fake news detection. A structure discriminator is introduced to predict the auxiliary structure label based on the latent representation. SAN can transfer structure knowledge learned from existing propagation trees to content-only samples.

2 A New Task: Cold-start Fake news Detection

We describe how to revise the traditional fake news detection to achieve the new task towards cold-start propagation. Specifically, as shown in the Input in Fig. 1, we directly remove the whole propagation trees of samples in the testing set as the cold-start news to simulate the cold-start propagation setups. Formally, define $D^{\text{train}}=\{{(x^{\text{train}}_{i},G^{\text{train}}_{i})},i\in[1,N_{% \text{train}}]\}$ and $D^{\text{test}}=\{{(x^{\text{test}}_{i})},i\in[1,N_{\text{test}}]\}$ as the training and testing sets, respectively, where $x$ refers to the source news and $G$ indicates propagation trees. During the training stage, the model is trained on the complete samples $(x^{\text{train}}_{i},G^{\text{train}}_{i})$ with both content and propagation to learn transferable semantic and structural patterns for detecting fake news, i.e.,

f:(x^{\text{train}},G^{\text{train}})\rightarrow y.

During the testing stage, the trained model is used to predict the cold-start news $x^{\text{test}}_{i}$ that lacks propagation data, i.e.,

\hat{y}=f(x^{\text{test}}).

3 Approach

To boost the detection for content-only samples, we develop a simple but effective Structure Adversarial Net (SAN) framework to learn latent structural features from previous propagation trees. The overall architecture is shown in Fig. 1.

3.1 Vanilla Propagation-based Approach

Given the input sample including content of the source news $x$ and propagation trees $G$ , existing models apply various neural networks to extract high-level textual and structural features. The latent representation h is computed by,

\textbf{h}=f_{\text{enc}}(x,G;\Theta),

(1)

where $f_{\text{enc}}$ can be the encoder in [8, 9, 12] to learn semantic and structural features, and $\Theta$ refers to the corresponding trainable parameters. Then, a classifier consisting of a full connection layer and a softmax function, is applied to predict the label probabilities of all classes, i.e.,

\hat{\textbf{y}}=f_{\text{cls}}(\textbf{h};\theta_{f}),

(2)

where $\theta_{f}$ is the classifier’s learnable parameters.

3.2 Structure Adversarial Net Framework

As previous propagation-based detection methods fail to generalize well for content-only samples, we design a Structure Adversarial Net (SAN) framework to learn a transferable feature representation between content and propagation.

Based on the architecture of existing detection methods, SAN incorporates a structure discriminator to predict whether the high-level representation of the target news includes structure properties. Given the hidden representation, we leverage a full connection layer and a softmax function to predict label probabilities $\hat{\textbf{y}}_{d}$ of the representation computing from the propagation structure, i.e.,

\hat{\textbf{y}}_{d}=f_{\text{d}}(\textbf{h};\theta_{d}),

(3)

where $\theta_{d}$ is learnable parameters of the classifier. ${\textbf{y}}_{d}=1$ refers to the representation is learned from original samples with content and propagation; ${\textbf{y}}_{d}=0$ refers to the representation is solely learned from content-only samples.

During the training, the original feature extractor cooperates with the fake news detector to carry out the major task of identifying fake news. The classification loss $L_{\text{CLS}}(\Theta,\theta_{f})$ is defined as,

\mathcal{L}_{\text{CLS}}(\Theta,\theta_{f})=-\textbf{y}\log(\hat{\textbf{y}})-% (1-\textbf{y})\log(1-\hat{\textbf{y}}),

(4)

where y is the ground-truth and $\hat{\textbf{y}}$ is prediction distribution. Simultaneously, the feature extractor tries to fool the structure discriminator to close the gap across distributions from contents and propagation trees. The loss of the discriminator captures the dissimilarities of feature representations from different data types. It is defined as,

\mathcal{L}_{d}(\Theta,\theta_{d})=-\textbf{y}_{d}log(\hat{\textbf{y}}_{d})+(1% -\textbf{y}_{d})log(1-\hat{\textbf{y}}_{d}),

(5)

where $\textbf{y}_{d}$ and $\hat{\textbf{y}}_{d}$ are the ground-truth and prediction labels that describe whether the high-level representation of the target news includes structure properties, respectively. The larger the loss, the lower the dissimilarities. Cooperating with the fake news classifier $f_{\text{cls}}$ to minimize the cross-entropy loss, the final objective of optimization can be defined as,

\mathcal{L}_{\text{SAN}}=\mathcal{L}_{\text{CLS}}(\Theta,\theta_{f})-\mathcal{% L}_{\text{d}}(\Theta,\theta_{d}).

(6)

The gradient reversal layer [18] is added between encoder and the structure discriminator to achieve an adversarial effect. Thus, the optimization of the model parameters are summarized as follows:

\begin{split}\Theta&\leftarrow\Theta-\eta(\frac{\partial\mathcal{L}_{\text{CLS% }}}{\partial\Theta}-\frac{\partial\mathcal{L}_{\text{d}}}{\partial\Theta}),\\ \theta_{f}&\leftarrow\theta_{f}-\eta\frac{\partial\mathcal{L}_{\text{CLS}}}{% \partial\theta_{f}},\theta_{d}\leftarrow\theta_{d}-\eta\frac{\partial\mathcal{% L}_{d}}{\partial\theta_{d}},\end{split}

(7)

where $\eta$ is the learning rate. In the implementation, samples in the training set are processed into two copies. One is the full samples with both content and propagation, denoted as $D^{\text{train}}$ , and the other only contains cold-start samples lacking the whole propagation, denoted as $\tilde{D}^{\text{train}}$ . We adopt the objectives of SAN for both of them, i.e.,

\mathcal{L}=\mathcal{L}_{\text{SAN}}^{D^{\text{train}}}+\lambda\mathcal{L}_{% \text{SAN}}^{\tilde{D}^{\text{train}}},

(8)

where $\lambda$ is a trade-off hyper-parameter to control weights of considering cold-start samples during the training.

4 Experiments

4.1 Experimental Setups

Datasets. We experiments on three real-world public datasets. PolitiFact and GossipCop are released by [19]. Samples are collected from two fact-checking websites PolitiFact¹¹1https://www.politifact.com/ and GossipCop²²2https://www.gossipcop.com/. PolitiFact provides 157 fake news and 157 true news; GossipCop provides 2,732 fake news and 2,732 true news. PHEME-5 [20] contains tweets related to five different events. It contains 581 true news and 230 fake news.

Task Setups. We build two different cold-start propagation settings. General cold start fake news detection aims to detect content-only fake news without considering specific events. We choose PolitiFact and GossipCop, and follow the same procedure as [21, 12] to split each dataset, i.e., randomly choose 75% of the data as the training set and keep the rest as the test set. We further remove the propagation trees and only retain the source news for each sample in the test set to achieve the cold-start propagation. Event-aware cold start fake news detection focus on detecting cold-start fake news for the new event. We evaluate on PHEME-5, which contains news from five specific events. We use one event’s samples are used for testing, and all the rest are used for training. Similarly, we further remove the whole propagation tree for each sample in the test set.

Baselines. mGRU [14] and CSI [15] are RNN-based models to capture sequential patterns from retweet sequences. GCNFN [22] models the propagation structure as a graph and uses graph convolutional networks (GCN) to encode the propagation. We implemented the model by removing profile information for a fair comparison. GAT [23] applies graph attention networks to encode the propagation. BiGCN [8] employs two GCNs to model the propagation graph and dispersion graph. UPSR [12] is a state-of-the-art model that reconstructs latent propagation structure to explore more accurate and diverse structural properties. Besides, we also report a baseline that only using the content of source news for detection, denoted as Content, to evaluate the role of propagation trees on fake news detection tasks. We extracted textual features by word2vec embeddings and then fed them into the MLP for classification.

Implementation Details. For PolitiFact and GossipCop, we use 300-dimensional word2vec vectors [24] provided by [25] as the input features of text contents. For PHEME-5, we extract text embedding of each sentence by skip-gram with negative sampling [26], and the dimension of input vectors is also set to 200. The dimension of hidden vectors is set to 64. $\lambda$ is searched from $\{0.1,1,1.5,2,5,10\}$ . The learning rate is set to 0.001, 0.0005, and 0.005 for PolitiFact, GossipCop, and PHEME-5. We run each model with five random seeds and report the average results of the test set.

4.2 Task Analysis

We first quantitatively evaluate existing propagation-based methods for two types of cold-start propagation. Results are shown in Fig. 2. For PolitiFact and GossipCop, we consider a mixture of social events; and for PHEME-5, we consider the event-specific setting. From results, when direct applying propagation-based methods to the new task, the results of all comparison models decrease to a varying degree in terms of metrics. The inferior results show that these models do not generalize well to cold-start propagation. For the same dataset, the more complex the models, the greater the performance degradation. It may be because the complex model is prone to fuse excessive nontransferable structure features from the propagation tree. Once the propagation structure is missing, the model cannot perform well.

Figure 3 qualitatively visualize the sample’s representations on the training and testing set of PolitiFact with t-SNE [27]. We observe the inconsistent representation distribution of training samples and test samples. For the new task, it is critical to learn transferable patterns between content and propagation for the new task so that the model can adapt to detect content-only samples without propagation structure.

Model	PolitiFact				GossipCop
	Acc	ma-F1	F1		Acc	ma-F1	F1
	Acc	ma-F1	fake	real	Acc	ma-F1	fake	real
Content	0.684	0.668	0.641	0.714	0.755	0.749	0.756	0.753
mGRU	0.592	0.546	0.654	0.448	0.582	0.559	0.488	0.643
mGRU+SAN	0.699	0.660	0.629	0.719	0.710	0.702	0.721	0.696
Improve (%)	+10.7	+11.4	-2.5	+27.1	+12.8	+14.3	+23.3	+5.3
CSI	0.562	0.442	0.337	0.562	0.552	0.461	0.256	0.677
CSI+SAN	0.610	0.558	0.563	0.565	0.705	0.697	0.691	0.714
Improve (%)	+4.8	+11.6	+22.6	+0.3	+15.3	+23.6	+43.6	+3.7
GAT	0.777	0.766	0.781	0.760	0.574	0.508	0.349	0.680
GAT+SAN	0.851	0.844	0.848	0.848	0.751	0.745	0.750	0.752
Improve (%)	+7.3	+7.7	+6.7	+8.8	+17.8	+23.7	+40.1	+7.1
GCNFN	0.830	0.823	0.829	0.823	0.731	0.725	0.727	0.735
GCNFN+SAN	0.843	0.834	0.843	0.836	0.755	0.749	0.756	0.754
Improve (%)	+1.3	+1.1	+1.4	+1.3	+2.4	+2.4	+2.9	+1.9
BiGCN	0.825	0.817	0.834	0.808	0.559	0.460	0.243	0.688
BiGCN+SAN	0.853	0.846	0.852	0.850	0.767	0.761	0.766	0.766
Improve (%)	+2.8	+2.8	+1.8	+4.1	+20.7	+30.1	+52.3	+7.8
UPSR	0.635	0.546	0.499	0.612	0.587	0.556	0.505	0.621
UPSR+SAN	0.767	0.758	0.759	0.773	0.757	0.751	0.758	0.756
Improve (%)	+13.2	+21.3	+26.0	+16.1	+17.0	+19.5	+25.3	+13.4

Table 1: Results of fake news detection for general cold-start propagation. ma-F1 means macro-average F1 score. The improvements over the corresponding baseline are significant at level p

<

0.05 based on

t

-test. We highlight them in bold.

Model	Acc.	weighted-F1					Avg.
Model	Acc.	ch	fg	gc	os	ss	Avg.
Content	0.538	0.367	0.507	0.582	0.671	0.523	0.530
GAT	0.644	0.503	0.492	0.733	0.643	0.466	0.567
GAT+SAN	0.666	0.503	0.525	0.733	0.672	0.508	0.609
Improve (%)	+2.2	0.0	+3.3	0.0	+2.9	+4.2	+4.2
GCNFN	0.633	0.503	0.580	0.728	0.594	0.546	0.590
GCNFN+SAN	0.659	0.503	0.587	0.746	0.678	0.472	0.597
Improve (%)	+2.6	0.0	+0.7	+1.7	+8.4	-7.4	+0.7
BiGCN	0.620	0.468	0.471	0.727	0.658	0.324	0.529
BiGCN+SAN	0.641	0.503	0.563	0.733	0.681	0.428	0.581
Improve (%)	+2.2	+3.6	+9.2	+0.6	+2.3	+10.4	+5.2
UPSR	0.630	0.444	0.521	0.742	0.671	0.534	0.582
UPSR+SAN	0.664	0.503	0.564	0.721	0.665	0.578	0.606
Improve (%)	+3.4	+5.9	+4.3	-2.0	-0.6	+4.4	+2.4

Table 2: Results of fake news detection for event-level cold-start propagation. Avg. refers to an average of weighted-F1 of five events. The bolded improvements over the corresponding baseline model are significant at level p

<

0.05 based on

t

-test.

4.3 Effects of SAN Framework

Table 1 and Table 2 summarize results of SAN applied to propagation-based methods for general and event-aware cold-start fake news detection tasks. Methods that apply the SAN framework consistently outperform the corresponding baseline on all datasets for general and event-aware cold-start fake news detection, which shows the effectiveness of SAN.

5 Conclusion

This paper focuses on the generalization of absence of the whole propagation and explores a new practical fake news detection task towards cold-start propagation, aiming to identify the content-only news by exploiting previously contents and propagation. For the task, we design a simple but effective SAN framework to transfer the propagation patterns to the content-only samples. Experiments show the poor generalization of existing propagation-based models and the effectiveness of SAN for two types of cold-start propagation.

Acknowledgements

The authors thank anonymous reviewers for their helpful comments. This work is supported by the National Natural Science Foundation of China (No.6210071416), the National Key Research and Development Program of China (No. 2022YFC3302102), and the National Funded Postdoctoral Researcher Program of China (No. GZC20232969).

References

[1] Sahil Loomba, Alexandre de Figueiredo, Simon J Piatek, et al., “Measuring the impact of covid-19 vaccine misinformation on vaccination intent in the uk and usa,” Nature human behaviour, vol. 5, pp. 337–348, 2021.
[2] Soroush Vosoughi, Deb Roy, and Sinan Aral, “The spread of true and false news online,” Science, vol. 359, no. 6380, pp. 1146–1151, 2018.
[3] S. Mo Jang, Tieming Geng, Jo-Yun Queenie Li, et al., “A computational approach for examining the roots and spreading patterns of fake news: Evolution tree analysis,” Comput. Hum. Behav., vol. 84, pp. 103–113, 2018.
[4] Jing Ma, Wei Gao, and Kam-Fai Wong, “Rumor detection on twitter with tree-structured recursive neural networks,” in ACL, 2018, pp. 1980–1989.
[5] Sumeet Kumar and Kathleen M. Carley, “Tree lstms with convolution units to predict stance and rumor veracity in social media conversations,” in ACL, 2019, pp. 5047–5058.
[6] Jing Ma and Wei Gao, “Debunking rumors on twitter with tree transformer,” in COLING, 2020, pp. 5455–5466.
[7] Dou Hu, Lingwei Wei, Wei Zhou, et al., “A rumor detection approach based on multi-relational propagation tree,” Journal of Computer Research and Development, vol. 58, no. 7, pp. 1395–1411, 2021.
[8] Tian Bian, Xi Xiao, Tingyang Xu, et al., “Rumor detection on social media with bi-directional graph convolutional networks,” in AAAI, 2020, pp. 549–556.
[9] Lingwei Wei, Dou Hu, Wei Zhou, et al., “Towards propagation uncertainty: Edge-enhanced bayesian graph convolutional networks for rumor detection,” in ACL, 2021, pp. 3845–3854.
[10] Hongzhan Lin, Jing Ma, Mingfei Cheng, et al., “Rumor detection on twitter with claim-guided hierarchical graph attention networks,” in EMNLP, 2021, pp. 10035–10047.
[11] Lingwei Wei, Dou Hu, Wei Zhou, et al., “Modeling the uncertainty of information propagation for rumor detection: A neuro-fuzzy approach,” TNNLS, 2022.
[12] Lingwei Wei, Dou Hu, Wei Zhou, et al., “Uncertainty-aware propagation structure reconstruction for fake news detection,” in COLING, 2022, pp. 2759–2768.
[13] Lingwei Wei, Dou Hu, Yantong Lai, et al., “A unified propagation forest-based framework for fake news detection,” in COLING, 2022, pp. 2769–2779.
[14] Jing Ma, Wei Gao, Prasenjit Mitra, et al., “Detecting rumors from microblogs with recurrent neural networks,” in IJCAI, 2016, pp. 3818–3824.
[15] Natali Ruchansky, Sungyong Seo, and Yan Liu, “CSI: A hybrid deep model for fake news detection,” in CIKM, 2017, pp. 797–806.
[16] Hamid Karimi and Jiliang Tang, “Learning hierarchical discourse-level structure for fake news detection,” in NAACL-HLT, June 2019, pp. 3432–3442.
[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, vol. 27.
[18] Yaroslav Ganin and Victor S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in ICML. 2015, vol. 37 of JMLR Workshop and Conference Proceedings, pp. 1180–1189, JMLR.org.
[19] Kai Shu, Deepak Mahudeswaran, Suhang Wang, et al., “Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media,” Big Data, vol. 8, no. 3, pp. 171–188, 2020.
[20] Arkaitz Zubiaga, Geraldine Wong Sak Hoi, Maria Liakata, et al., “Analysing how people orient to and spread rumours in social media by looking at conversational threads,” PloS one, vol. 11(3), pp. e0150989, 2015.
[21] Kai Shu, Limeng Cui, Suhang Wang, et al., “defend: Explainable fake news detection,” in KDD, 2019, pp. 395–405.
[22] Federico Monti, Fabrizio Frasca, Davide Eynard, et al., “Fake news detection on social media using geometric deep learning,” in ICLR (Workshop), 2019.
[23] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, et al., “Graph attention networks,” in ICLR (Poster), 2018.
[24] Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” in ICLR (Workshop Poster), 2013.
[25] Yingtong Dou, Kai Shu, Congying Xia, Philip S. Yu, and Lichao Sun, “User preference-aware fake news detection,” in SIGIR, 2021, pp. 2051–2055.
[26] Tomás Mikolov, Ilya Sutskever, Kai Chen, and and, “Distributed representations of words and phrases and their compositionality,” in NIPS, 2013, pp. 3111–3119.
[27] Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008.