Detect, Investigate, Judge and Determine:
A Novel LLM-based Framework for Few-shot Fake News Detection

Ye Liu1,3, Jiajun Zhu2,3, Kai Zhang2,3, Haoyu Tang2,3
Yanghai Zhang2,3, Xukai Liu2,3, Qi Liu1,3, Enhong Chen2,3
1 School of Artificial Intelligence and Data Science, University of Science and Technology of China
2 School of Computer Science and Technology, University of Science and Technology of China
3 State Key Laboratory of Cognitive Intelligence
[email protected]
Abstract

Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learning abilities. However, existing methods face significant limitations, such as the Understanding Ambiguity and Information Scarcity, which significantly undermine the potential of LLMs. To address these shortcomings, we propose a Dual-perspective Augmented Fake News Detection (DAFND) model, designed to enhance LLMs from both inside and outside perspectives. Specifically, DAFND first identifies the keywords of each news article through a Detection Module. Subsequently, DAFND creatively designs an Investigation Module to retrieve inside and outside valuable information concerning to the current news, followed by another Judge Module to derive its respective two prediction results. Finally, a Determination Module further integrates these two predictions and derives the final result. Extensive experiments on two publicly available datasets show the efficacy of our proposed method, particularly in low-resource settings.

1 Introduction

Fake News Detection (FND), aiming to distinguish between inaccurate news and legitimate news, has garnered increasing importance and attention due to the the pervasive dissemination and detrimental effects of fake news on social media platforms (Shu et al., 2017). Few-Shot Fake News Detection (FS-FND), as a subtask of FND, endeavors to identify fake news by leveraging only K𝐾Kitalic_K instances per category (K𝐾Kitalic_K-shot) in the training phase (Hu et al., 2024; Gao et al., 2021; Ma et al., 2023).

Refer to caption
Figure 1: An example of fake news detection and the comparison between existing LLM-based methods and our design.

Generally, fake news detection can be framed as a binary classification problem and addressed using various classification models. In the early stage, researchers primarily employ machine learning or deep learning algorithms to represent and classify candidate news articles (Horne and Adali, 2017; Jiang et al., 2022). More recently, with the advent of Large Language Models (LLMs), FS-FND has been effectively addressed through the In-Context Learning (ICL) technology, which is particularly prevalent in few-shot settings (Hu et al., 2024; Boissonneault and Hensen, 2024). Among them, Hu et al. (2024); Wang et al. (2024); Teo et al. (2024) were pioneers in investigating the potential of LLMs in this field.

However, this kind of methods mainly directly ask the LLM to judge the authenticity of the given news, which often exceeds the capabilities of LLMs in many circumstances, particularly those relatively small LLMs in common usage (e.g., 7B parameters). An example of this can be observed in Figure 1, which presents a news about microchip developments. Existing LLM-based approaches encounter two principal challenges: (1) Understanding Ambiguity: LLMs may fail to understand and grasp the core meaning conveyed in the news, thereby straining the detection process. (2) Information Scarcity: Due to the timeliness nature of news content, the training corpus of LLMs is frequently outdated. This poses fundamental difficulties in the detection of fake news.

To this end, this paper proposes a novel approach to address the two aforementioned issues. Specifically, to mitigate the Understanding Ambiguity problem, we aim to extract valuable insights from an inside perspective by retrieving similar samples from the training set, thereby enhancing the comprehension of key concepts in the target news. Concurrently, to tackle the Information Scarcity problem, we employ an external search engine to gather relevant information about the news online. This integrates real-time data, effectively overcoming the limitation of information obsolescence.

More specifically, we design a Dual-perspective Augmented Fake News Detection (DAFND) model. DAFND comprises four key components: (a) A Detection Module: This module employs LLMs to extract keywords from each news article, providing an effective query for the subsequent modules. (b) An Investigation Module: It investigates more valuable information related to the target news, which comes from both inside (i.e., training set) and outside (i.e., search engine) perspectives. (c) A Judge Module: This module designs prompts that enable LLMs to generate predictions and explanations based on the findings from both inside and outside investigations. (d) A Determination Module: It takes into account the predictions and explanations from both two perspectives, and thus makes a final decision with high confidence, especially in cases where two perspectives conflict.

In summary, the main contributions of our work could be summarized as follows.

  • For the first time, we explore augmenting LLMs from inside and outside perspectives for few-shot fake news detection, pioneering a novel direction in this field.

  • We devise the Dual-perspective Augmented Fake News Detection (DAFND) model, which effectively addresses the Understanding Ambiguity and Information Scarcity problems, particularly in low-resource settings.

  • We conduct extensive experiments on two publicly available datasets, where the experimental results demonstrate the effectiveness of our proposed method. We will make our codes publicly available upon acceptance of the paper.

2 Related Work

2.1 Few-Shot Fake News Detection

The objective of fake news detection task is to distinguish inaccurate news from real ones (Shu et al., 2017). For few-shot fake news detection, only K𝐾Kitalic_K instances per category (K𝐾Kitalic_K-shot) are sampled for the training phase (Gao et al., 2021; Ma et al., 2023).

Generally, fake news detection can be defined as a binary classification problem and is addressed by a variety of classification models. Initially, researchers mainly rely on feature engineering and machine learning algorithms. For example, Horne and Adali (2017) presented a set of content-based features to a Support Vector Machine (SVM) classifier. With the rapid development of computing power, significant improvements have been made with the help of various deep learning algorithms and Pre-trained Language Models (PLMs). Ghanem et al. (2021) combined lexical features and a Bi-GRU network to achieve accurate fake news detection. Jiang et al. (2022) proposed Knowledgeable Prompt Learning (KPL), incorporating prompt learning into fake news detection for the first time, and achieved state-of-the-art performance.

Additionally, due to the specificity of news articles, researchers have incorporated external knowledge knowledge to contribute to traditional fake news detection models. For instance, Dun et al. (2021); Ma et al. (2023); Hu et al. (2021b) utilized knowledge graphs to enrich entity information and structured relation knowledge, leading to more precise news representations and improved detection performance. Meanwhile, Huang et al. (2023) adopted a data augmentation perspective, proposing a novel framework for generating more valuable training examples, which has proven to be beneficial in detecting human-written fake news.

Currently, with the advent of large language models, many researchers are exploring few-shot fake news detection through in-context learning and data augmentation technologies (Hu et al., 2024; Wang et al., 2024; Teo et al., 2024). However, These methodologies either simply apply LLMs to judge the authenticity of the given news, or employ LLMs to rephrase the training data, thereby not fully utilizing the potential of LLMs. More importantly, most of them are significantly limited by the aforementioned two shortcomings, particularly in the Information Scarcity problem.

Refer to caption
Figure 2: The architecture of our DAFND model. It includes four sequentially connected parts: (a) Detection Module; (b) Investigation Module; (c) Judge Module; (d) Determination Module.

2.2 Large Language Models

The emergence of Large Language Models (LLMs) such as GPT-4, LLama-3 and others (Hoffmann et al., 2022; OpenAI, 2023; AI@Meta, 2024; Tunstall et al., 2023), marks a significant advancement in the field of natural language processing. In-context learning, a novel few-shot learning paradigm, was initially introduced by Brown et al. (2020). To date, LLMs have exhibited remarkable performance across a range of NLP tasks, including text classification, information extraction, question answering and fake news detection (Hu et al., 2024; Wang et al., 2024; Teo et al., 2024; Liu et al., 2022; Zhao et al., 2021).

Previous studies (Hu et al., 2024; Wang et al., 2024; Teo et al., 2024) have endeavored to solve few-shot fake news detection via directly asking LLMs or employing them to rephrase the training data. For example, Hu et al. (2024) explored the potential of LLMs in fake news detection, and further developed an Adaptive Rationale Guidance (ARG) network to synergize traditional methods with large language models. Similarly, Wang et al. (2024) leveraged LLMs to generate justifications towards evidence relevant to given news, which were subsequently fed into a trainable classifier.

3 Problem Statement

Generally, fake news detection can be framed as a binary classification problem, wherein each news article is classified as either real (y=0𝑦0y=0italic_y = 0) or fake (y=1𝑦1y=1italic_y = 1(Dun et al., 2021). Formally, each piece of news S𝑆Sitalic_S is composed of a sequence of words, i.e., S={s1,s2,,sn}𝑆subscript𝑠1subscript𝑠2subscript𝑠𝑛S=\{s_{1},s_{2},...,s_{n}\}italic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, encompassing its title, content text, and relevant tweets. The goal is to learn a detection function F:F(S)y:𝐹𝐹𝑆𝑦F:F(S)\Longrightarrow yitalic_F : italic_F ( italic_S ) ⟹ italic_y, where y{0,1}𝑦01y\in\{0,1\}italic_y ∈ { 0 , 1 } denotes the ground-truth label of news.

In the few-shot settings, adhering the strategy employed in (Gao et al., 2021; Ma et al., 2023), we randomly sample K𝐾Kitalic_K instances per category (K𝐾Kitalic_K-shot)111This implies that for a K𝐾Kitalic_K-shot fake news detection setting, the number of training instances is 2K2𝐾2K2 italic_K. for the training phase. The entire test set is preserved to ensure the comprehensiveness and effectiveness of evaluation.

4 The DAFND Model

In this section, we will introduce the technical details of DAFND model. As depicted in Figure 2, DAFND comprises of four distinct components: (a) Detection Module; (b) Investigation Module; (c) Judge Module; (d) Determination Module. These modules are sequentially interconnected to achieve the final accurate detection of fake news.

4.1 Detection Module

In this module, we aim to identify the key information contained in the given news article, which will serve as the query for the subsequent modules. Specifically, we construct prompts to convey the original news article to the LLM. The LLM is then guided to extract N𝑁Nitalic_N keywords {w1,w2,,wN}subscript𝑤1subscript𝑤2subscript𝑤𝑁\{w_{1},w_{2},...,w_{N}\}{ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, which are expected to address the question: “when, where, who, what, how and why did the given news S𝑆Sitalic_S happen?”:

{w1,,wN}=(Insw,S),subscript𝑤1subscript𝑤𝑁𝐼𝑛subscript𝑠𝑤𝑆\{w_{1},...,w_{N}\}=\mathcal{F}(Ins_{w},S),{ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } = caligraphic_F ( italic_I italic_n italic_s start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_S ) , (1)

where \mathcal{F}caligraphic_F represents the LLM, and Insw𝐼𝑛subscript𝑠𝑤Ins_{w}italic_I italic_n italic_s start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT denotes the prompt for the in-context learning. We provide a more detailed description about it in Appendix A.1.

4.2 Investigation Module

In this module, we aim to investigate the relevant information to assist the LLM in conducting accurate inferences. As outlined in Section 1, this process is approached from two perspectives: Inside Investigation and Outside Investigation.

Inside Investigation.

To address the Understanding Ambiguity problem introduced in the Introduction, we retrieve effective demonstrations to enhance the LLM’s understanding during the in-context learning process (Liu et al., 2022; Rubin et al., 2022).

Specifically, we first concatenate the extracted N𝑁Nitalic_N keywords of each news, and then utilize the pre-trained language model \mathcal{M}caligraphic_M to obtain the representation of these keywords {w1,,wN}subscript𝑤1subscript𝑤𝑁\{w_{1},...,w_{N}\}{ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }:

W=w1𝑊subscript𝑤1\displaystyle W=w_{1}italic_W = italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT w2wN,subscript𝑤2subscript𝑤𝑁\displaystyle\uplus w_{2}\uplus...\uplus w_{N},⊎ italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊎ … ⊎ italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , (2)
H𝐻\displaystyle Hitalic_H =(W),absent𝑊\displaystyle=\mathcal{M}(W),= caligraphic_M ( italic_W ) ,

where \uplus represents the concatenation operation. The derived representation H𝐻Hitalic_H is used to represent each news sample. Along this line, we can further obtain the representation and label pairs (Hi,li)subscript𝐻𝑖subscript𝑙𝑖(H_{i},l_{i})( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for the training set222Here, the training set refers to the few-shot training data., which constitute a datastore, denoted as D𝐷Ditalic_D.

Subsequently, when inferring a candidate news j𝑗jitalic_j, we employ the k𝑘kitalic_k-Nearest Neighbors (k𝑘kitalic_kNN) search method (Khandelwal et al., 2019) to retrieve valuable samples from the training set. In detail, we use the representation Hjsubscript𝐻𝑗H_{j}italic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of news j𝑗jitalic_j to query the datastore D𝐷Ditalic_D according to the euclidean distance. Then, based on the computed distance, we select the nearest k𝑘kitalic_k positive and negative news samples, respectively333This approach ensures the diversity and effectiveness of the retrieved samples.:

={Upositive,Unegative}.subscript𝑈𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒subscript𝑈𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒\mathcal{II}=\{U_{positive},U_{negative}\}.caligraphic_I caligraphic_I = { italic_U start_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_POSTSUBSCRIPT } . (3)

As a consequence, we obtain the inside investigation outcome \mathcal{II}caligraphic_I caligraphic_I, comprising 2k2𝑘2k2 italic_k instances.

Outside Investigation.

In response to the Information Scarcity problem, we further retrieve additional real-time information from external sources. Inspired by Yoran et al. (2023); Paranjape et al. (2023), we implement a retriever based on the google search engine, using the SerpAPI service444https://serpapi.com/.

Specifically, based on the extracted keywords {w1,w2,,wN}subscript𝑤1subscript𝑤2subscript𝑤𝑁\{w_{1},w_{2},...,w_{N}\}{ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } in Section 4.1, we first concatenate them to construct the initial query 𝒬=w1w2wN𝒬subscript𝑤1subscript𝑤2subscript𝑤𝑁\mathcal{Q}=w_{1}\uplus w_{2}\uplus...\uplus w_{N}caligraphic_Q = italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊎ italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊎ … ⊎ italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. Then, following the strategy proposed in (Yoran et al., 2023), we further format the search queries as “en.wikipedia.org 𝒬𝒬\mathcal{Q}caligraphic_Q”, with the Wikipedia domain preceding the intermediate question. We return the top-1 evidence retrieved by Google. And all retrieved evidence sentences are prepended to the outside investigation outcome, denoted as 𝒪𝒪\mathcal{OI}caligraphic_O caligraphic_I.

4.3 Judge Module

Following the paradigm designed in the Investigation Module (Section 4.2), we attempt to derive the inference results based on the investigated information from both inside and outside perspectives.

Inside Judge.

After obtaining the effective demonstrations \mathcal{II}caligraphic_I caligraphic_I from Inside Investigation, we design prompts to provide the essential information to the LLM, thereby generating the inside prediction. Specifically, inspired by the various attempts about in-context learning Paranjape et al. (2023), we first describe the target of the fake news detection task through an inside instruction. Then, the retrieved inside investigation results ={Upositive,Unegative}subscript𝑈𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒subscript𝑈𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒\mathcal{II}=\{U_{positive},U_{negative}\}caligraphic_I caligraphic_I = { italic_U start_POSTSUBSCRIPT italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_POSTSUBSCRIPT } of current candidate news are followed, which augment the LLM’s understanding of this task. Finally, we prompt the LLM to predict the result of current news and give its corresponding supportive explanation.

In summary, the inside judge process can be expressed as:

Pi,Ri=(Insixtest),subscript𝑃𝑖subscript𝑅𝑖𝐼𝑛subscript𝑠𝑖subscript𝑥𝑡𝑒𝑠𝑡P_{i},R_{i}=\mathcal{F}(Ins_{i}\ \uplus\ \mathcal{II}\ \uplus\ x_{test}),italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = caligraphic_F ( italic_I italic_n italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊎ caligraphic_I caligraphic_I ⊎ italic_x start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT ) , (4)

where Insi𝐼𝑛subscript𝑠𝑖Ins_{i}italic_I italic_n italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the inside instruction, \uplus denotes the concatenation operation of two textual pieces, and xtestsubscript𝑥𝑡𝑒𝑠𝑡x_{test}italic_x start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT represents the current candidate news. Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT refers to the prediction result, while Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the corresponding explanation. You can move to Appendix A.2 for more details about this prompt.

Outside Judge.

With the outside investigation information 𝒪𝒪\mathcal{OI}caligraphic_O caligraphic_I retrieved through the google search engine as described in Section 4.2, we can derive the outside judge prediction, which is crucial for real-time news detection.

Similar to the design of Inside Judge, we describe the objective of fake news detection through an outside instruction, followed by the candidate news to be detected and the retrieved outside investigation documents 𝒪𝒪\mathcal{OI}caligraphic_O caligraphic_I. After that, we can derive the outside prediction Posubscript𝑃𝑜P_{o}italic_P start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and explanation Rosubscript𝑅𝑜R_{o}italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT:

Po,Ro=(Insoxtest𝒪),subscript𝑃𝑜subscript𝑅𝑜𝐼𝑛subscript𝑠𝑜subscript𝑥𝑡𝑒𝑠𝑡𝒪P_{o},R_{o}=\mathcal{F}(Ins_{o}\ \uplus\ x_{test}\ \uplus\ \mathcal{OI}),italic_P start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = caligraphic_F ( italic_I italic_n italic_s start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ⊎ italic_x start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT ⊎ caligraphic_O caligraphic_I ) , (5)

where Inso𝐼𝑛subscript𝑠𝑜Ins_{o}italic_I italic_n italic_s start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT denotes the outside instruction. Detailed information about this prompt is available in Appendix A.3.

Following these two parallel judge processes, we obtain the prediction results Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Posubscript𝑃𝑜P_{o}italic_P start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and corresponding explanations Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Rosubscript𝑅𝑜R_{o}italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT from inside and outside perspectives, respectively.

4.4 Determination Module

With the predictions Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Posubscript𝑃𝑜P_{o}italic_P start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and their corresponding explanations Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Rosubscript𝑅𝑜R_{o}italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, the final outputs are obtained by jointly considering these two perspectives.

More specifically, if the two predictions are identical (i.e., Pi=Posubscript𝑃𝑖subscript𝑃𝑜P_{i}=P_{o}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT), we can directly derive the final prediction with high confidence. Nevertheless, if two results diverge, indicating a conflict between the Inside Judge and Outside Judge, we further propose a determination selector to make a choice based on both sets of predictions and explanations:

Pf=(Insdxtest\displaystyle P_{f}\ =\ \mathcal{F}(Ins_{d}\uplus\ x_{test}\ \uplusitalic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = caligraphic_F ( italic_I italic_n italic_s start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ⊎ italic_x start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT ⊎ PiRisubscript𝑃𝑖subscript𝑅𝑖\displaystyle\ P_{i}\ \uplus R_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊎ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (6)
PoRo),\displaystyle\uplus\ P_{o}\ \uplus\ R_{o}\ ),⊎ italic_P start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ⊎ italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ,

where Insd𝐼𝑛subscript𝑠𝑑Ins_{d}italic_I italic_n italic_s start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT denotes the determination instruction (see Appendix A.4 for more details). Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is the final inference result of the DAFND model.

5 Experiments

5.1 Experiment Setup

Datasets and Evaluation Metrics.

We conduct experiments on two datasets, PolitiFact and Gossipcop, both of which are proposed in a benchmark called FakeNewsNet (Shu et al., 2020). PolitiFact consists of various political news, while Gossipcop is sourced from an entertainment story fact-checking website. For the few-shot setting, following the strategy employed in (Jiang et al., 2022; Ma et al., 2023), we randomly select K(8,32,100)𝐾832100K\in(8,32,100)italic_K ∈ ( 8 , 32 , 100 ) positive and negative news articles as the training set, respectively. More statistics about the datasets are illustrated in Table 1.

Given that the task focuses on detecting fake news, fake news articles are regarded as positive examples (Ma et al., 2023). We further adopt the F1-score and Accuracy (ACC) as the evaluation metrics to measure classification performance.

Dataset PolitiFact Gossipcop
Train # True news 8/32/100 8/32/100
# Fake news 8/32/100 8/32/100
# Total news 16/64/200 16/64/200
Test # True news 120 3,200
# Fake news 80 1,060
# Total news 200 4,260
Table 1: Statistics of PolitiFact and Gossipcop datasets.
Dataset Methods ACC F-1 score
K𝐾Kitalic_K=8 K𝐾Kitalic_K=32 K𝐾Kitalic_K=100 K𝐾Kitalic_K=8 K𝐾Kitalic_K=32 K𝐾Kitalic_K=100
PolitiFact ① PROPANEWS 40.00 43.50 40.00 57.14 58.30 57.14
② FakeFlow 61.00 62.50 63.50 44.29 47.55 48.95
③ MDFEND 65.50 64.00 71.50 62.30 64.36 69.84
④ PSM 70.00 72.50 79.00 49.15 52.38 65.70
⑤ Zephyr 60.00 63.50 66.50 48.72 53.50 54.42
⑥ ChatGLM-3 68.50 68.50 72.50 58.28 58.82 64.05
⑦ LLama-3 69.50 70.50 69.00 63.91 65.09 64.00
⑧ GPT-3.5 71.00 69.50 73.00 60.27 60.65 64.47
⑨ ARG 74.00 78.50 82.50 67.16 68.61 80.61
DAFND (ours) 86.50 87.50 88.50 81.40 82.80 84.40
Gossipcop ① PROPANEWS 24.88 25.40 24.88 39.85 39.97 39.85
② FakeFlow 57.89 58.26 57.28 26.60 27.66 28.18
③ MDFEND 41.27 56.08 63.73 40.20 42.06 44.52
④ PSM 77.44 78.05 78.30 41.73 41.37 54.20
⑤ Zephyr 67.21 65.85 67.23 27.05 27.43 27.67
⑥ ChatGLM-3 62.49 62.75 63.43 31.59 34.83 34.15
⑦ LLama-3 65.96 65.85 66.17 30.89 35.07 31.74
⑧ GPT-3.5 68.50 69.44 67.44 32.90 36.73 36.58
⑨ ARG 61.41 77.42 76.50 42.32 51.46 46.57
DAFND (ours) 82.80 82.80 83.30 53.50 55.00 56.50
Table 2: Experimental results of our proposed method on the PolitiFact and Gossipcop datasets. Bold font represents the optimal result. For baseline methods, we follow their publicly released codes to obtain the results.

Implementation Details.

In DAFND architecture, we utilize the zephyr-7b-beta (Tunstall et al., 2023) model on Huggingface as the LLM to conduct experiments. When running Zephyr, we adhere to the default parameter values provided by the official, where the sampling temperature is 0.700.700.700.70, top_k is 50505050, and top_p is 0.950.950.950.95. The max_new_token is set to 256256256256, and do_sample is set as True𝑇𝑟𝑢𝑒Trueitalic_T italic_r italic_u italic_e.

In the Detection Module (Section 4.1), the number of keywords to extract is set to N=5𝑁5N=5italic_N = 5. In the Inside Investigation part (Section 4.2), we employ the DeBERTa-base model (He et al., 2021) from Transformers (Wolf et al., 2020) as the representation model. The number of retrieved positive/negative nearest neighbors is set as k=2𝑘2k=2italic_k = 2.

All experiments are conducted on a Linux server with two 3.00GHz Intel Xeon Gold 5317 CPUs and two Tesla A100 GPUs.

Benchmark Methods.

In order to verify the effectiveness of our DAFND model, we compare DAFND with the state-of-the-art few-shot fake news detection methods. According to the model architecture, they can be grouped into three categories, including traditional fake news detection methods (① similar-to\sim ④), LLM-based methods (⑤ similar-to\sim ⑧), and hybrid methods (⑨).

  • PROPANEWS (Huang et al., 2023) proposes a novel framework for generating more valuable training examples, which is more beneficial to human-written situations555Specifically, We combine the few-shot training set and the augmented data to train a RoBERTA-Large-based classifier, adhering to the setting in the original paper..

  • FakeFlow (Ghanem et al., 2021) devises a model that detects fake news articles by integrating the flow of affective information.

  • MDFEND (Nan et al., 2021) incorporates the domain information through a domain gate mechanism to aggregate multiple representations extracted by a mixture of experts.

  • PSM (Ni et al., 2020) proposes to utilize Propensity Score Matching (PSM) to select decounfounded features, thereby boosting the detection of fake news.

  • Zephyr (Tunstall et al., 2023) represents the advanced 7B model, which is optimized by the preference data from AI Feedback.

  • ChatGLM-3 (Zeng et al., 2022; Du et al., 2022) is a series of pre-trained dialogue models, and we select the ChatGLM3-6B version for the experimental comparison.

  • LLama-3 (AI@Meta, 2024) refers to the LLM proposed by Meta. We adopt its 8B version (Meta-Llama-3-8B-Instruct) sourced from Huggingface for experiments.

  • GPT-3.5 (Ouyang et al., 2022) is an advanced LLM developed by OpenAI. We leverage the API (version: gpt-3.5-turbo-0613) to conduct in-context learning.

  • ARG (Hu et al., 2024) designs an adaptive rationale guidance network for fake news detection, which integrates insights from both LLMs and traditional detection methods.

Refer to caption
Figure 3: Ablation experiments on the PolitiFact and Gossipcop dataset.

It is worth noting that, for these LLM-based baselines (⑤ similar-to\sim ⑧), we adhere to the instruction prompt proposed by (Hu et al., 2024) to conduct in-context learning. Besides, due to the limitations of maximum tokens, we randomly select 6 samples as the demonstrations, which is more than the demonstration samples utilized in our DAFND model666As introduced in Section 5.1, DAFND utilizes k=2𝑘2k=2italic_k = 2 positive and negative samples as demonstrations, with a total number of 4., facilitating a fair comparison777For LLM-based baselines, if the LLM fails to make an inference or we are unable to categorize the output as Real or Fake categories, we directly treat it as the wrong prediction..

5.2 Experimental Result

The main results, presented in Table 2, indicate that our proposed DAFND model surpasses all baselines across various metrics, encompassing traditional, LLM-based and hybrid methods. This underscores the effectiveness of our design and the advantages of enhancing the LLM through both inside and outside perspectives. Furthermore, several notable phenomena emerge from these results:

Firstly, for most baselines and our DAFND model, the performance on the PolitiFact dataset exceeds that on the Gossipcop dataset, suggesting that Gossipcop presents greater difficulty. Specifically, PolitiFact consists of political news while Gossipcop pertains to the entertainment domain. This disparity is reasonable as political news typically exhibits more organized format and content, which facilitates the fake news detection process. Secondly, with the increase of training instances (K𝐾Kitalic_K), most traditional fake news detection methods (e.g., ④ PSM) and hybrid methods (⑨ ARG) show improved performance. This is logical as more data enables better training of a supervised model, mitigating the lack of prior knowledge. However, an exception is observed in ① PROPANEWS, whose performance appears relatively unaffected by K𝐾Kitalic_K. As introduced in Section 5.1, different from other methods, PROPANEWS designs a data constructing strategy to supplement original training set. This significantly offsets the impact of training data quantity. Moreover, for LLM-based methods (⑤ similar-to\sim ⑧), as outlined in Section 5.1, due to the limitation of maximum tokens, we all randomly select 6 samples as demonstrations. Hence, increasing the number of training instances does not substantially benefit the in-context learning of LLMs. Thirdly, although our DAFND model also faces the constraint of maximum token limitations, the k𝑘kitalic_kNN retrieval mechanism in the Inside Investigation part (Section 4.2) enhances the utilization of increased training data, thereby achieving a certain degree of improvements with higher K𝐾Kitalic_K values. Fourthly, the hybrid method (⑨ ARG), benefiting from the joint modeling of traditional models and LLMs, obtains competitive performance. And compared to that, DAFND still maintains a significant advantage, particularly in scenarios with scarcer data. These observations further demonstrate the effectiveness of our designs from multiple perspectives.

Refer to caption
Figure 4: The case study of the DAFND model. Specifically, (a) is from the PolitiFact dataset (K𝐾Kitalic_K=100), while (b) and (c) are from the Gossipcop dataset (K𝐾Kitalic_K=100).

5.3 Ablation Study

In this subsection, we conduct ablation experiments to assess the effectiveness of different components within our model. Specifically, we directly remove the Determination Module, consequently leading to two ablated variants: Inside Judge and Outside Judge. The results are depicted in Figure 3.

From this figure, across all configurations on PolitiFact and Gossipcop datasets, noticeable decreases are observed between the full DAFND model and its two ablated variants. This thoroughly substantiates the essentiality and non-redundancy of our DAFND designs. Notably, when the training instances are relatively limited (K=8𝐾8K=8italic_K = 8), Inside Judge exhibits less effectiveness compared to Outside Judge. With the increase of training instances (K=32,100𝐾32100K=32,100italic_K = 32 , 100), Inside Judge progressively achieves more competitive performance than Outside Judge. This can be attributed to the increasing benefit of additional training data for the Inside Investigation and Inside Judge components. In contrast, Outside Investigation and Outside Judge rely on the retrieved information online, which remains unaffected by the quantity of training data.

Furthermore, considering that the backbone of DAFND is ⑤ Zephyr (Tunstall et al., 2023), we can jointly compare the results in Figure 3 and the “⑤ Zephyr” line in Table 2. This comparison highlights that both Inside Judge and Outside Judge significantly outperform Zephyr, thereby demonstrating the efficacy of our motivation to design DAFND from both inside and outside perspectives.

5.4 Case Study

To further illustrate the effectiveness of different modules in our model, we conduct a case study on both PolitiFact and Gossipcop datasets. Specifically, Figure 4 presents the input information (i.e., target news), the ground truth label, DAFND results (including the intermediate results of each module and the final results).

As depicted Figure 4 (a), the Detection Module accurately identifies the key information within the target news. And based on the valid results from Inside Investigation and Outside Investigation, both the Inside Judge and Outside Judge correctly infer that: “[This is fake news].”. These are fed into the Determination Module, leading to the final prediction, which is consistent with the ground truth label (i.e., [Fake]). Moreover, In Figure  4 (b), Inside Judge obtains the right inference (i.e., [Fake]), while Outside Judge makes the wrong inference (i.e., [Real]). Conversely, in Figure 4 (c), Inside Judge incorrectly infers “[Real]”, while Outside Judge accurately predicts “[Fake]”. In both (b) and (c), with the design of Determination Module, DAFND finally makes the correct decision. These cases intuitively demonstrate the significant role of each module in DAFND, affirming its efficacy.

More experimental analyses, such as Bad Case Analysis, can be found in Appendix B.

6 Conclusions

In this paper, we explored a motivated direction for few-shot fake news detection. We began by analyzing the limitations of current LLM-based detection methods, identifying two primary challenges: (1) Understanding Ambiguity and (2) Information Scarcity. To address these issues, we developed a Dual-perspective Augmented Fake News Detection (DAFND) model. In DAFND, a Detection Module was designed to identify keywords from the given news. Then, we proposed an Investigation Module, a Judge Module to retrieve valuable information and further generate respective predictions. More importantly, a Determination Module integrated the predictions from both inside and outside perspectives to produce the final output. Finally, extensive experiments on two publicly available datasets demonstrated the effectiveness of our proposed method. We hope our work will lead to more future studies.

Limitations

In our proposed DAFND method, we need to integrate the LLM (i.e., zephyr-7b-beta introduced in Section 5.1). Due to the large scale of LLMs, it tends to consume more computing resources and time compared to traditional baselines, such as PSM (Ni et al., 2020) and FakeFlow (Ghanem et al., 2021). Essentially speaking, LLMs contain a vast amount of knowledge, much of which may be unnecessary for fake news detection. Distilling useful knowledge so as to accelerate the inference remains a valuable and intriguing research direction.

Another limitation is that our current approach only employs LLMs for inference. Although we design precise prompts to implement in-context learning, it still cannot fully exploit the capabilities of LLMs due to the inherent gap between the natural language and the knowledge encoded in the model parameters. In future work, we would like to explore the low resource scenario fine-tuning techniques (e.g., lora (Hu et al., 2021a)) to adapt LLMs for the few-shot fake news detection task.

References

  • AI@Meta (2024) AI@Meta. 2024. Llama 3 model card.
  • Boissonneault and Hensen (2024) David Boissonneault and Emily Hensen. 2024. Fake news detection with large language models on the liar dataset.
  • Brown et al. (2020) Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  • Du et al. (2022) Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
  • Dun et al. (2021) Yaqian Dun, Kefei Tu, Chen Chen, Chunyan Hou, and Xiaojie Yuan. 2021. Kan: Knowledge-aware attention network for fake news detection. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 81–89.
  • Gao et al. (2021) Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830.
  • Ghanem et al. (2021) Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso, and Francisco Rangel. 2021. Fakeflow: Fake news detection by modeling the flow of affective information. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 679–689.
  • He et al. (2021) Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2021. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
  • Hoffmann et al. (2022) Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. 2022. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
  • Horne and Adali (2017) Benjamin Horne and Sibel Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 759–766.
  • Hu et al. (2024) Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, and Peng Qi. 2024. Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 22105–22113.
  • Hu et al. (2021a) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021a. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  • Hu et al. (2021b) Linmei Hu, Tianchi Yang, Luhao Zhang, Wanjun Zhong, Duyu Tang, Chuan Shi, Nan Duan, and Ming Zhou. 2021b. Compare to the knowledge: Graph neural fake news detection with external knowledge. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 754–763.
  • Huang et al. (2023) Kung-Hsiang Huang, Kathleen Mckeown, Preslav Nakov, Yejin Choi, and Heng Ji. 2023. Faking fake news for real fake news detection: Propaganda-loaded training data generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14571–14589.
  • Jiang et al. (2022) Gongyao Jiang, Shuang Liu, Yu Zhao, Yueheng Sun, and Meishan Zhang. 2022. Fake news detection via knowledgeable prompt learning. Information Processing & Management, 59(5):103029.
  • Khandelwal et al. (2019) Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2019. Generalization through memorization: Nearest neighbor language models. In International Conference on Learning Representations.
  • Liu et al. (2022) Jiachang Liu, Dinghan Shen, Yizhe Zhang, William B Dolan, Lawrence Carin, and Weizhu Chen. 2022. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022), pages 100–114.
  • Ma et al. (2023) Jing Ma, Chen Chen, Chunyan Hou, and Xiaojie Yuan. 2023. Kapalm: Knowledge graph enhanced language models for fake news detection. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3999–4009.
  • Nan et al. (2021) Qiong Nan, Juan Cao, Yongchun Zhu, Yanyan Wang, and Jintao Li. 2021. Mdfend: Multi-domain fake news detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 3343–3347.
  • Ni et al. (2020) Bo Ni, Zhichun Guo, Jianing Li, and Meng Jiang. 2020. Improving generalizability of fake news detection methods using propensity score matching. arXiv preprint arXiv:2002.00838.
  • OpenAI (2023) OpenAI. 2023. Gpt-4 technical report.
  • Ouyang et al. (2022) Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  • Paranjape et al. (2023) Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. 2023. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014.
  • Rubin et al. (2022) Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2022. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671.
  • Shu et al. (2020) Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2020. Fakenewsnet: A data repository with news content, social context and spatiotemporal information for studying fake news on social media. Journal on big data, 8(3).
  • Shu et al. (2017) Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1):22–36.
  • Teo et al. (2024) Ting Wei Teo, Hui Na Chua, Muhammed Basheer Jasser, and Richard TK Wong. 2024. Integrating large language models and machine learning for fake news detection. In 2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), pages 102–107. IEEE.
  • Tunstall et al. (2023) Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, et al. 2023. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944.
  • Wang et al. (2024) Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. 2024. Explainable fake news detection with large language model via defense among competing wisdom. In Proceedings of the ACM on Web Conference 2024, pages 2452–2463.
  • Wolf et al. (2020) Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  • Yoran et al. (2023) Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, and Jonathan Berant. 2023. Answering questions by meta-reasoning over multiple chains of thought. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5942–5966.
  • Zeng et al. (2022) Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
  • Zhao et al. (2021) Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.

Appendix A Prompts

In this section, we illustrate the prompts utilized in our DAFND methodology, which can serve as a valuable resource for future research in this area.

A.1 Detection Prompt

As a news keyword extractor, your task is to extract the N𝑁Nitalic_N most important keywords from a given news text. The keywords should include when, where, who, what, how and why the news happened. Please give me the six keywords only. My first suggestion request is {Target News Document}.

A.2 Inside Judge Prompt

I need your assistance in evaluating the authenticity of a news article. I will provide you the news article and additional information about this news. You have to answer that [This is fake news] or [This is real news] in the first sentence of your output and give your explanation about [target news].

I will give you some examples of news. Your answer after [output] should be consistent with the following examples:

[example 1]:

[input news]: [news title: {}, news text: {}, news tweet: {}]

[output]: [This is {} news]

……

[target news]:

[input news]: [news title: {}, news text: {}, news tweet: {}]

[output]:

A.3 Outside Judge Prompt

I need your assistance in evaluating the authenticity of a news article. I will provide you the news article and additional information about this news. Please analyze the following news and give your decision. The first sentence of your [Decision] must be [This is fake news] or [This is real news].

The news article is: {}.

The additional information is: {}.

[Decision]:

A.4 Determination Prompt

I need your assistance in evaluating the authenticity of a news article. This news article include news title, news text and news tweet.

The news article is: news title: {}, news text: {}, news tweet: {}.

There are two different views on this news article.

Some people believe that {}, their explanation is: {}.

Others believe that {}, their explanation is: {}.

Please judge their opinion and give your decision. The first sentence after [Explanation] must be [This is fake news] or [This is real news], and then give your explanation.

[Explanation]:

Appendix B Bad Case Analysis

In this section, we illustrate the bad cases that DAFND struggles with, with a goal to analyze its shortcomings and possible improvement directions.

Refer to caption
Figure 5: The bad case of DAFND on the Gossipcop dataset (K𝐾Kitalic_K=100).

As illustrated in Figure 5, a particular failure occurs when there are no relevant samples in the few-shot training set for retrieval. Consequently, the Inside Investigation and Inside Judge modules fail to work effectively. Meanwhile, although Outside Investigation retrieves a proof supporting the given news, Outside Judge still classifies it into Fake category because it claims the necessity of further confirmation. This bad case reveals two potential directions for improving our model:

For one thing, as introduced in Section 1, DAFND retrieves similar demonstrations from the training set in response to the Understanding Ambiguity problem. This relies on the assumption that valuable samples can be found in the training set, which is not always the case. As a consequence, the failure in Figure 5 occurs. Therefore, how to discriminate and mitigate the impact of such circumstances is a crucial direction for improving the DAFND design.

For another, as discussed in the Limitation section, we only employ the LLM to conduct inference, which cannot fully exploit LLMs’ powerful capabilities. In Figure 5, although Outside Investigation retrieves valuable information, the prediction of Outside Judge still goes wrong. In fact, despite the great reasoning ability of these general LLMs, they are not competent in news-related domains and are not sufficiently familiar with the specific expression characteristics. Thence, we would like to adopt fine-tuning techniques to adapt LLMs for the news corpus, which we believe could bring a positive effect to DAFND.