Sina at FigNews 2024:
Multilingual Datasets Annotated with Bias and Propaganda

Lina Duaibes Birzeit University [email protected] Areej Jaber Palestine Technical University-Khadoorie [email protected] Mustafa Jarrar Birzeit University [email protected]
Ahmad Qadi 7amleh Center [email protected] Mais Qandeel ÖREBRO University [email protected]
Abstract

The proliferation of bias and propaganda on social media is an increasingly significant concern, leading to the development of techniques for automatic detection. This article presents a multilingual corpus of 12,0001200012,00012 , 000 Facebook posts fully annotated for bias and propaganda. The corpus was created as part of the FigNews 2024202420242024 Shared Task on News Media Narratives for framing the Israeli War on Gaza. It covers various events during the War from October 7777, 2023202320232023 to January 31313131, 2024202420242024. The corpus comprises 12,0001200012,00012 , 000 posts in five languages (Arabic, Hebrew, English, French, and Hindi), with 2,40024002,4002 , 400 posts for each language. The annotation process involved 10101010 graduate students specializing in Law. The Inter-Annotator Agreement (IAA) was used to evaluate the annotations of the corpus, with an average IAA of 80.8%percent80.880.8\%80.8 % for bias and 70.15%percent70.1570.15\%70.15 % for propaganda annotations. Our team was ranked among the best-performing teams in both Bias and Propaganda subtasks. The corpus is open-source and available at https://sina.birzeit.edu/fada

\setcode

utf8

Sina at FigNews 2024:
Multilingual Datasets Annotated with Bias and Propaganda


Lina Duaibes Birzeit University [email protected] Areej Jaber Palestine Technical University-Khadoorie [email protected] Mustafa Jarrar Birzeit University [email protected] Ahmad Qadi 7amleh Center [email protected] Mais Qandeel ÖREBRO University [email protected]


1 Introduction

Since October 7777, social media has been flooded with posts, articles, images, and videos related to the Israeli War on Gaza. Such posts are often divided by hate, bias, and fake news either in favor of or against one of the parties or by remaining neutral, see e.g., "Framing the Israeli War on Gaza" is a shared task on news media narratives Zaghouani et al. (2024), which is part of the 2222nd ArabicNLP conference. The task aims to create a multilingual corpus that unravels the layers of bias and propaganda within news articles in various languages.

Such shared tasks and datathons are crucial in the NLP community to foster collaboration and advance research in specific areas. Previous efforts, such as SemEval-2020202020202020 Task 11111111 Martino et al. (2020) and TSHP-17171717 Rashkin et al. (2017) have provided valuable resources for propaganda detection in news articles. The dual focus of FigNews on bias and propaganda is a novel approach that addresses the evolving nature of misinformation on social media platforms. The detection of propaganda on social media is crucial Darwish et al. (2021), as it can polarize public sentiment, foster violent extremism and hate speech, and eventually erode democracies and diminish trust in democratic procedures Abuaiadah et al. (2017). Notably, only a few corpora have been recently built to address these issues. Recent work by (Hamad et al., 2023) involved establishing a Hebrew dataset comprising 15,8811588115,88115 , 881 tweets for detecting offensive language. This dataset was manually annotated with four labels: hate, abusive, violence, and pornographic. Their work focused on detecting hate speech in Hebrew tweets and implemented in SinaTools Hammouda et al. (2024). Additionally, the WojoodNER Shared Task 2024 offered a new NER dataset related to the Israeli War on Gaza called WojoodGaza Jarrar et al. (2024). Other notable works include TSHP-17171717 (Rashkin et al., 2017), QProp (Barrón-Cedeno et al., 2019), and PTC (Da San Martino et al., 2019). TSHP-17171717 and QProp are document-level corpora, while PTC is a sentence-level corpus. While SemEval-2020202020202020 (Martino et al., 2020) Task 11111111 is similar to FigNews (Zaghouani et al., 2024) in its objective, they differ in their data sources and focus areas.

This paper describes our participation in the FigNews. Our contributions are:

  • Annotated Corpus (12K12𝐾12K12 italic_K FB posts) for bias and propaganda, in 5555 languages.

  • Annotation guidelines ensuring consistency and accuracy.

Remark: The corpus presented in this article does not cover the genocide, ethnic cleansing, or starvation events as they mostly happened after collecting the corpus.

The article is organized as follows: Section 2 describes the methodology, 3 presents our team composition and training; Section 3 presents our participation and results; Section 5 analyzes some errors, and Section 6 concludes the paper.

2 Annotation Methodology

The objective of the task is to address the complex landscape of social media discourse related to the Israeli War on Gaza 2023202320232023-2024202420242024. The task organizers provided participants with 15151515k posts from verified Facebook accounts, selected between October 6666, 2023202320232023, and January 31313131, 2024202420242024, using "Gaza" as a query keyword across 5555 languages: Arabic, Hebrew, English, French, and Hindi. The dataset consists of 15151515 batches, each containing 1000100010001000 posts.

Biased Propaganda
Cohen’s kappa F1_score_weighted Cohen’s kappa F1_score_weighted
Annotators’ pair Alle Binary Alle Binary Alle Binary Alle Binary
6 0.57 0.57 0.79 0.79 0.76 0.85 0.85 0.98
4 0.76 0.77 0.53 0.56 0.33 0.25 0.58 0.93
8 0.29 0.28 0.64 0.65 0.11 1 0.28 1
1 2 0.64 0.64 0.82 0.85 0.72 0.62 0.8 0.91
10 0.8 0.78 0.86 0.9 0.67 1 0.77 1
4 0.51 0.55 0.75 0.79 0.81 1 0.87 1
6 0.89 0.93 0.96 0.98 0.37 0.78 0.59 0.97
3 8 0.97 1 0.98 1 0.79 0.96 0.85 0.98
10 0.3 0.38 0.59 0.77 0.07 0.44 0.32 0.83
2 0.79 0.81 0.87 0.92 0.79 0.86 0.785 0.93
4 0.58 0.72 0.79 0.9 0.82 0.98 0.75 0.95
9 6 -0.11 -0.09 0.54 0.59 0.18 0.57 0.48 0.85
8 0.94 0.97 0.98 0.98 0.93 1 0.95 1
10 0.47 0.49 0.74 0.76 0.93 1 0.95 1
2 1 1 1 1 0.85 0.93 0.91 0.97
7 4 1 1 1 1 0.81 0.91 0.89 0.98
8 0.51 0.63 0.72 0.83 0.15 0 0.49 93
10 0.87 0.85 0.95 0.95 0.92 1 0.95 1
2 0.52 0.55 0.77 0.85 0.46 0.54 0.49 0.8
5 6 0.39 0.45 0.65 0.75 0.05 0 0.34 0.91
Average 0.808 0.8515 0.623 0.6535 0.7015 0.9475 0.5725 0.733
Table 1: IAA for bias and propaganda annotations.

2.1 Annotation Guidelines

Our understanding of "bias" is based on the work done by the United Nations Committee on the Elimination of Racial Discrimination and the European Commission against Racism and Intolerance European External Action Service (n.d.). We define the notations ‘bias’ and ‘propaganda’ based on the UN and EU accounts, as:

Bias: is generally understood as an inclination or prejudice towards or against a particular person or group, often in a way considered to be unfair. In other words, it is an unreasonable preference or dislike that prompts someone to behave in a discriminatory way, often based on unfair judgment. This bias is typically based on prohibited grounds of discrimination such as race, religion, language, nationality, ethnicity, social background, gender, and others.

Classifications of Bias: we adopted the same classes provided in the Shared Task: (1) Biased against Palestine,(2) Biased against Israel, (3) Biased against others, (4) Biased against both Israel and Palestine, (5) Not Applicable, (6) Unclear, and (7) Unbiased. We also introduced a new feature called "Type of Bias", which can be either: (a) Explicit𝐸𝑥𝑝𝑙𝑖𝑐𝑖𝑡Explicititalic_E italic_x italic_p italic_l italic_i italic_c italic_i italic_t (\<تحيز صريح>) if it is obvious and evident in the post, (b) Implicit𝐼𝑚𝑝𝑙𝑖𝑐𝑖𝑡Implicititalic_I italic_m italic_p italic_l italic_i italic_c italic_i italic_t (\<تحيز ضمني>) if it is clear but not evident in the post, and (c) Vague𝑉𝑎𝑔𝑢𝑒Vagueitalic_V italic_a italic_g italic_u italic_e (\<تحيز مبهم>) in case of indirect and ambiguous bias. This feature is important from a methodological viewpoint as it encourages the annotators to think more during classification. If a post contains biased content but not in a direct way it can be accounted as implicit.
Propaganda: misleading ideas or statements that can distort the truth or omit facts to promote a specific political or social agenda. These ideas are typically published by media outlets. For example, propaganda can take the forms of exaggeration, minimization, spreading doubts, name-calling, labeling, or intentional vagueness. All these forms have the common intention to spread false information and obscure facts.

Classifications of Propaganda: We adopted the four classes provided in the Shared Task: (i) Propaganda, (ii) Not propaganda, (iii) Not Applicable, and (iv) Unclear.

Additionally, we added a new column to classify Propaganda into three types: (1) Propaganda must be deleted: if it contains evident harmful content that poses risks to the safety and security of individuals or groups; (2) Propaganda may be deleted: if we cannot easily judge whether it is propaganda, depending on a specific context; and (3) Propaganda not to be deleted: if it is not clear and lacks harmful consequences and therefore does not warrant deletion.

Remark: Since the data was collected from Facebook posts some cases contain quoted content (e.g. an unbiased post quoting biased content). It was established in the guidelines that a post should not be classified as bias or propaganda based on its quotation, but rather on the post itself.

An Example of the guidelines mentioned earlier regarding quoted content is as follows: “Hamas and Islamic Jihad spare no effort to exploit religious institutions for terrorist purposes,” the IDF said in a statement. This post is annotated as unbiased because it is a direct quote and does not include any additional commentary or interpretation.

2.2 Inter-Annotator Agreement (IAA)

To evaluate the quality of our annotations, we used the F1𝐹1F1italic_F 1-score and Cohen’s Kappa Cohen (1968) to compute the agreement between the annotators. The results are shown in Table 1.

The task organizers allocated 100 posts (10%percent1010\%10 %) from each batch for IAA, including 20202020 posts randomly selected from each language. Overall, we annotated 12,0001200012,00012 , 000 posts, resulting in an IAA dataset of 1,20012001,2001 , 200 posts. These were distributed among our 10101010 annotators following this scheme: (1) each annotator received 240240240240 posts, (2) each post was annotated by two different annotators, and (3) the 240240240240 posts assigned to each annotator were distributed among four other annotators. Consequently, each pair of annotators had 60606060 posts in common.

All vs. Binary IAA: to evaluate whether a (dis)agreement was dominated by a certain class, we mapped all labels into binary categories: (Bias𝐵𝑖𝑎𝑠Biasitalic_B italic_i italic_a italic_s oder NotBias𝑁𝑜𝑡𝐵𝑖𝑎𝑠NotBiasitalic_N italic_o italic_t italic_B italic_i italic_a italic_s and𝑎𝑛𝑑anditalic_a italic_n italic_d others𝑜𝑡𝑒𝑟𝑠othersitalic_o italic_t italic_h italic_e italic_r italic_s) and (Propaganda𝑃𝑟𝑜𝑝𝑎𝑔𝑎𝑛𝑑𝑎Propagandaitalic_P italic_r italic_o italic_p italic_a italic_g italic_a italic_n italic_d italic_a oder NotPropaganda𝑁𝑜𝑡𝑃𝑟𝑜𝑝𝑎𝑔𝑎𝑛𝑑𝑎NotPropagandaitalic_N italic_o italic_t italic_P italic_r italic_o italic_p italic_a italic_g italic_a italic_n italic_d italic_a and𝑎𝑛𝑑anditalic_a italic_n italic_d others𝑜𝑡𝑒𝑟𝑠othersitalic_o italic_t italic_h italic_e italic_r italic_s). Table 1 demonstrates no class dominance because All and Binary evaluations are close to each other.

Looking at all Cohen’s scores in Table 1, the average is 0.8080.8080.8080.808 for bias, which is a "very good" agreement, and 0.70150.70150.70150.7015 for propaganda, which is a "good" agreement overall. Agreement on propaganda was more challenging but the results are enhanced when it is considered as a binary.

3 Team Composition and Training

Team composition: We assembled a team of 10101010 Master’s students specializing in Law at Birzeit University, comprising 7777 females and 3333 males. All team members are native Arabic speakers with a good command of English.

Training phase: We began by selecting 200200200200 posts to train all students in annotation. After training, each student was assigned 1,20012001,2001 , 200 posts for annotation.

Ensuring consistency We held three workshops to ensure consistency to discuss guidelines, address challenges, and resolve disparities.The first workshop involved an expert who reviewed the annotations and added comments for the annotators to address. In the second workshop, the annotators met with the expert to discuss his comments on the posts. In the final workshop, after reviewing their annotations compared to the expert’s, they discussed the points of agreement and disagreement with him.

{tblr}

width = colspec = Q[125]Q[119]Q[179]Q[163]Q[337], cells = c, hlines, vlines, Subtask & Track 1st Place 2nd Place 3rd Place
Bias Guidelines NLPColab Eagles Narrative Navigators
Bias IAA Quality NLPColab JusticeLeague Sina
Bias Quantity DRAGON NLPColab Sina
Bias Consistency The Lexicon Ladies NLPColab Narrative Navigators
Propaganda Guidelines NLPColab Bias Bluff Busters Sina
Propaganda IAA Quality NLPColab Sina The CyberEquity Lab
Propaganda Quantity NLPColab Sina The CyberEquity Lab
Propaganda Consistency NLPColab Bias Bluff Busters Sahara Pioneers/The CyberEquity Lab

Table 2: FIGNEWS 2024 shared task results.

3.1 Annotation process

Annotation Phase: The dataset consisted of 12121212 batches, comprising 10,8001080010,80010 , 800 posts from the Main sheet, and 1200120012001200 posts from the IAA sheet. The annotation was carried out in two phases:

  1. 1.

    Phase One: We distributed Batch01 and Batch02, each with 180180180180 posts, among team members. To ensure consistency with the guidelines, an expert reviewed all student annotations for these batches and provided feedback.

  2. 2.

    Phase Two: we assigned each annotator 450450450450 posts from two different batches. This step allowed us to complete the annotation of all 12121212 batches (i.e. 12k12𝑘12k12 italic_k posts).

    Set quality standards

    To set quality standards among annotators, after the annotation process was complete, each pair of annotators who had annotated the same data held meetings to review the selected posts they disagreed on. They discussed their differences, and if they reached an agreement, they would change the label accordingly. If they could not agree, they kept the original label.

4 Task Participation and Results

4.1 Results

Table 2 displays the final results provided by the shared task organizers. Our Sina team achieved the third and second place in the IAA Quality and Quantity tracks for the Bias and Propaganda subtasks, respectively. In addition to third place in Propaganda Guidelines.

Table 3 and Table 4 illustrate the distribution of the bias classes and types of bias across languages respectively. Table 3 shows that about 27%percent2727\%27 % of the posts are biased against Palestine and 63%percent6363\%63 % of the posts are unbiased. Most of the bias against Palestine originated from French posts. Table 4 gives more statistics about the types of bias. As shown in this table, most of the posts annotated as Explicit𝐸𝑥𝑝𝑙𝑖𝑐𝑖𝑡Explicititalic_E italic_x italic_p italic_l italic_i italic_c italic_i italic_t bias are in Hebrew.

For propaganda results, Table 5 illustrates the distributions of propaganda classes across languages, which shows that 31313131% of the posts (3333333333333333) are annotated as "Propaganda", and 66%percent6666\%66 % (7084708470847084) are "Not Propaganda". The majority of the propaganda originated from French posts. Table 6 illustrates the distribution of the type of propaganda classes among languages. As shown in the table posts that were classified as propaganda must be deleted were in French with 348 posts.

Class Ar En He Fr Hi Total
Biased Against Palestine 466 514 595 807 534 2916
Biased Against Israel 94 79 23 19 70 285
Biased against Both 6 7 11 6 14 44
Biased against others 42 28 53 39 49 211
Unbiased 1371 1486 1369 1212 1386 6824
Not applicable 49 7 17 20 25 118
Unclear 132 39 92 57 82 402
Total 2160 2160 2160 2160 2160 10800
Table 3: Distribution of bias classes across languages
Type of Bias Ar En He Fr Hi Total
Explicit𝐸𝑥𝑝𝑙𝑖𝑐𝑖𝑡Explicititalic_E italic_x italic_p italic_l italic_i italic_c italic_i italic_t (\<تحيز صريح>) 394 336 563 412 388 2093
Implicit𝐼𝑚𝑝𝑙𝑖𝑐𝑖𝑡Implicititalic_I italic_m italic_p italic_l italic_i italic_c italic_i italic_t (\<تحيز ضمني>) 199 217 265 236 269 1186
Vague𝑉𝑎𝑔𝑢𝑒Vagueitalic_V italic_a italic_g italic_u italic_e (\<تحيز مبهم>) 36 37 59 52 27 211
Table 4: Types of Bias
Class Ar En He Fr Hi Total
Propaganda 524 679 648 809 673 3333
Not propaganda 1484 1443 1447 1297 1413 7084
Not applicable 48 11 17 16 24 116
Unclear 104 27 48 38 50 267
Total 2160 2160 2160 2160 2160 10800
Table 5: Distribution of propaganda subtask classes across languages
Class Ar En He Fr Hi Total
Propaganda Must be deleted 192 191 266 348 277 1274
Propaganda May be deleted 524 488 382 461 396 2059
Propaganda not to be deleted 451 422 565 648 436 2522
Table 6: Types of Propaganda classes.

5 Error Analysis and Discussion

Despite training and supervision, errors may arise from subjective interpretation, ambiguous guidelines, or complex content. We explored the errors and noted:

  1. 1.

    False positives in bias annotations occurred when annotators marked neutral content as biased. For instance, the post: "Israel launched attacks on Syria on Nov 10101010 in response to a drone strike on Eilat. The IDF claimed it attacked an organization responsible for the drone. Watch for more details." This news excerpt is informative and not biased.

  2. 2.

    Misclassification of propaganda: Some content was wrongly labeled as "must be deleted" propaganda despite lacking direct harmful implications. For example: "BREAKING: Israeli forces are causing massive destruction in Gaza, in response to a terrorist attack by Hamas. Image source: Middle East Eye post." While it is propaganda, it shouldn’t be classified as "must be deleted."

6 Conclusion

This article presents our contribution to the FigNews 2024202420242024, where we annotated a multilingual corpus of 12,0001200012,00012 , 000 Facebook posts for bias and propaganda across five languages. We extended the annotation guidelines for better consistency and accuracy, providing a foundation for future work in detecting bias in social media. Our plans include expanding the corpus to cover more critical events of the war and leveraging neural and large Language models to automatically detect bias and propaganda on social media posts.

Ethical Considerations

Given the sensitive nature of the topics and media narratives related to the Israel War on Gaza, our annotators, who are lawyers, have undergone extensive training to ensure careful and fair judgments. They meticulously review both Arabic and English translations to avoid any bias that might arise from machine translation.

Limitations

We recognize the limitations in our annotation process. This is because of the subjective nature of identifying bias and propaganda in social media posts, and the sensitivity of the datasets involved.

Acknowledgments

We would like to acknowledge the contributions of Nejira Softic during the formulation of the guidelines. We would like to also thank the Master’s students specializing in Law and IT at Birzeit University for their help in annotating the datasets, especially Maram Shour, Belal Abu Zaina, Bayan Abu Alawi, Zainah Abughosh, Aya Al Dimasy, Doaa Abozena, Waad Alsheikh, Qassam Abu Hakmeh, Aseel Mustafa, Basel Awwad, Omar To’Mallah, and Dyala Fakhouri, and as well as Prof. Reem Al-Botmeh for her support during the course. Also thanks to Palestine Technical University - Kadoorie for its support.

References

  • Abuaiadah et al. (2017) Diab Abuaiadah, Dileep Rajendran, and Mustafa Jarrar. 2017. Clustering Arabic Tweets for Sentiment Analysis. In Proceedings of the 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications, pages 499–506. IEEE Computer Society.
  • Barrón-Cedeno et al. (2019) Alberto Barrón-Cedeno, Israa Jaradat, Giovanni Da San Martino, and Preslav Nakov. 2019. Proppy: Organizing the news based on their propagandistic content. Information Processing & Management, 56(5):1849–1864.
  • Cohen (1968) Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4):213.
  • Da San Martino et al. (2019) Giovanni Da San Martino, Yu Seunghak, Alberto Barrón-Cedeno, Rostislav Petrov, Preslav Nakov, et al. 2019. Fine-grained analysis of propaganda in news article. In Proceedings of the 2019 EMNLP-IJCNLP Conference, pages 5636–5646. ACL.
  • Darwish et al. (2021) Kareem Darwish, Nizar Habash, Mourad Abbas, Hend Al-Khalifa, Huseein T. Al-Natsheh, Houda Bouamor, Karim Bouzoubaa, Violetta Cavalli-Sforza, Samhaa R. El-Beltagy, Wassim El-Hajj, Mustafa Jarrar, and Hamdy Mubarak. 2021. A Panoramic survey of Natural Language Processing in the Arab Worlds. Commun. ACM, 64(4):72–81.
  • European External Action Service (n.d.) European External Action Service. n.d. Human rights guidelines on freedom of expression online and offline. https://www.eeas.europa.eu/sites/default/files/11_hr_guidelines_external_action_en.pdf.
  • Hamad et al. (2023) Nagham Hamad, Mustafa Jarrar, Mohammed Khalilia, and Nadim Nashif. 2023. Offensive Hebrew Corpus and Detection using BERT. In Proceedings of the 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA). IEEE.
  • Hammouda et al. (2024) Tymaa Hammouda, Mustafa Jarrar, and Mohammed Khalilia. 2024. SinaTools: Open Source Toolkit for Arabic Natural Language Understanding. In Proceedings of the 2024 AI in Computational Linguistics (ACLING 2024), Procedia Computer Science, Dubai. ELSEVIER.
  • Jarrar et al. (2024) Mustafa Jarrar, Nagham Hamad, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, and Muhammad Abdul-Mageed. 2024. WojoodNER 2024: The Second Arabic Named Entity Recognition Shared Task. In Proceedings of the Second Arabic Natural Language Processing Conference (ArabicNLP 2024), Bangkok, Thailand. Association for Computational Linguistics.
  • Martino et al. (2020) G. Martino, Alberto Barrón-Cedeno, Henning Wachsmuth, Rostislav Petrov, and Preslav Nakov. 2020. Semeval-2020 task 11: Detection of propaganda techniques in news articles. arXiv preprint arXiv:2009.02696.
  • Rashkin et al. (2017) Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2931–2937. Association for Computational Linguistics.
  • Zaghouani et al. (2024) Wajdi Zaghouani, Mustafa Jarrar, Nizar Habash, Houda Bouamor, Imed Zitouni, Mona Diab, Samhaa R. El-Beltagy, and Muhammed AbuOdeh, editors. 2024. The FIGNEWS Shared Task on News Media Narratives. Association for Computational Linguistics, Bangkok, Thailand.