¹¹institutetext: Coburg University of Applied Sciences, Coburg, DE ²²institutetext: University of Sheffield, Sheffield, UK
(Contact: ²²email: [email protected])

BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy^†^†thanks: The authors gratefully acknowledge the funding provided by the Free State of Bavaria under its “Hitech Agenda Bavaria”. All views are the authors’ and do not necessarily reflect the views of any funders or affiliated institutions. We thank Michael Reiche for feedback and discussions.

Tim Menzner 11 Jochen L. Leidner 1122

Abstract

The increasing consumption of news online in the 21st century coincided with increased publication of disinformation, biased reporting, hate speech and other unwanted Web content.
We describe BiasScanner, an application that aims to strengthen democracy by supporting news consumers with scrutinizing news articles they are reading online. BiasScanner contains a server-side pre-trained large language model to identify biased sentences of news articles and a front-end Web browser plug-in. At the time of writing, BiasScanner can identify and classify more than two dozen types of media bias at the sentence level, making it the most fine-grained model and only deployed application (automatic system in use) of its kind. It was implemented in a light-weight and privacy-respecting manner, and in addition to highlighting likely biased sentence it also provides explanations for each classification decision as well as a summary analysis for each news article.

While prior research has addressed news bias detection, we are not aware of any work that resulted in a deployed browser plug-in (c.f. also biasscanner.org for a Web demo).

information access systems

Keywords:

news bias identification media bias classification content quality news analytics media monitoring Web applications natural language processing information retrieval

1 Introduction

Democracy faces an existential threat when most citizens get their news from online platforms focused on controversy rather than balanced reporting. Such behavior increases advertising revenue, contributing to media bias and the spread of fake news [20, 15, 25, 14, 31].

To combat this trend, we introduced BiasScanner, a practical tool to help readers assess online news regarding instances of biased reporting, which we describe here. It highlights biased sentences, offers detailed analysis reports, and assigns bias scores. Users can also donate bias reports for research. BiasScanner makes use of advanced neural transformer models, such as OpenAI’s GPT 3.5 for efficient and effective bias detection. We prioritize user privacy by not storing personal(ly identifiable) information or news stories without explicit consent.

2 Related Work

Foundational Language Models. The neural transformer model was first described in [30]. Google BERT [11] and OpenAI’s GPT-3 [5] GPT-4 (and their application ChatGPT [24]) and Meta’s Llama [29] have been early foundational models that have introduced a paradigm shift in NLP by demonstrating how large, pre-trained language models can dramatically reduce the development time of NLP systems by using large quantities of un-annotated text to train general-purpose “foundational” models.

News Bias. Groeling [14] presents a survey of the literature covering partisan bias. Conrad et al. [8] focused on content mining to measure credibility of authors on the web. The topic of bias in mass media was dealt with in detail by [20] and [25]. Hamborg et al. [16] provided an interdisciplinary literature review to suggest methods how bias could be bias detection could be automated.
Bias Detection. Media bias datasets with different focus where released by [2],[17], and [27, 26] After early pioneering work on bias from economics [15], Arapakis et al. [2] labeled 561 articles along 14 quality dimensions including subjectivity. Horne et al. [17] released a larger dataset annotated for political partisanship bias, but without grouping articles by event, which makes apples-to-apples comparison harder; Chen et al. [6] addressed this issue by resorting to another corpus sampled from the website allsides.com, which includes human labels by U.S. political orientation (on the ordinal scale $\{LL,L,C,R,RR\}$ ); they also present an ML model to flip the orientation to the oppositite one. Yano, Resnik and Smith [33] also on the liberal-conservative axis, manually annotating sentence-level partisanship bias.
MBIB, the first media bias identification benchmark, was introduced by Wessel et al. [32], who evaluated Transformer techniques on detecting nine different types of bias across 22 selected datasets. Baumer et al. focused on detecting framing language. [3] Chen et al. [7] demonstrated that incorporating second-order information, such as the probability distributions of the frequency, positions, and sequential order of sentence-level bias, can enhance the effectiveness of article-level bias detection, especially in cases where relying solely on individual words or sentences is insufficient. Spinde et al. published a dataset containing biased sentences and evaluated detection techniques on it.
[27, 26]

Web Apps & Mobile Apps. Hamborg et al. [16] presented Newsanalyze, a system that highlights sentiment target entities colored by polarity. In contrast, we perform sentence classification targeting bias and sub-type of bias (sentiment $=$ affective state $\neq$ bias (Although there can and typically is a connection, bias is more general e.g. under-reporting is not sentiment-related at all). Da San Martino et al [10] developed Prta, a tool highlighting propaganda techniques in news articles. While propaganda and news bias are related (as visible in the overlap of propaganda techniques and bias types), new bias is a broader phenomena, also including unintentional subjective reporting.
Other Related Work. Conrad, Leidner and Schilder characterize signals for credibility in the context of credibility for professionals [9]. Bhuiyan et al. [4] compare crowdsourced and expert assessment criteria for credibility on statements about climate change. Allen and co-workers [1] studied the Ghanem et al. [13] analyze an interesting way to distinguish between real/credible news and fake news by looking at the distribution of affective words within the document.

To the best of our knowledge, BiasScanner is the first system for news bias detection and bias sub-type classification based on a neural transformer architecture published in the scientific literature and deployed/release to the general public as a free browser plug-in.

3 System

This section describes BiasScanner, our system, which is also deployed on the World Wide Web at https://biasscanner.org. This address also contains a separate Web demo where users can experiment with our model before installing the Web browser plug-in.

3.1 Architecture

Architecture. We designed BiasScanner with ease and convenience of use and respect for the user’s privacy in mind. A frond-end application deals with the user interface and communicates with our server, which provides a bias classification service, and which shields the originating IP address of the user when invoking OpenAI GPT – current model: a gpt-3.5-turbo-16k fine-tuned on articles constructed from the BABE dataset [28] with information about bias type and strength added using GPT-4 – via a REST API on a US server, but without any transfer of PII data. Our server layer also deals with payment authentication for the transformer model use to hide this aspect from users, as we believe dealing with cumbersome API keys would exclude some users. The nature of our architecture also permits easy switching of the model working behind the scenes (we are considering switching to an Open Source Model long-term) without disruption for users.

Refer to caption — Figure 1: BiasScanner System Architecture

We designed BiasScanner in a user-friendly way and with privacy protection in mind. The overall architecture is shown in Figure 1. Our front-end application, a Web browser plug-in, handles the user interface and connects to our server, which offers bias classification, currently in turn calling OpenAI via a US-based REST API as its large language model (LLM) server, with user IP address protection and no PII data transfer. Additionally, our server manages payment authentication to simplify the user experience; we aim to avoid the hassle of dealing with API keys, ensuring inclusiveness for all users.

Implementation. We implemented BiasScanner as a Web Application on our site, where users can type or copy in text to get an analysis and as browser plug-in for Firefox, Chrome/Chromium and other browsers using JavaScript. When using the plug-in, the relevant article text is extracted from the HTML of a web page by utilizing Mozilla’s readability library, which also serves as base of the Firefox reader view [23].

3.2 User Interface

User Interface.

Figure 2 shows the graphical user interface of BiasScanner (Web plug-in version).

The prompt used for instructing the language model was developed iteratively and aims to provide consistent and high-quality output by considering best practices, like a clear definition of every searched-for bias type and by including an example for the desired JSON output format. The answer given by the model is then post-processed and filtered to prevent potential errors before being used to highlight biased sentences directly on the site. A more detailed report including the type of bias, a short explanation and a score indicating the strength of the bias, is also available for the user to view. This bias report concludes by providing a general assessment of the article’s bias(es).

It calculates a score by normalizing the sum of two components: the ratio of biased sentences to total sentences in the article and the average bias score across all biased sentences in the article. The prompt for instructing the language model was developed in several iterations to ensure consistent and high-quality output. It includes a clear definition of each searched-for bias type and an example for the desired JSON output format. The model’s response is post-processed and filtered to prevent errors before highlighting biased sentences on the site. Users can access a detailed report that includes bias type, explanation, and a bias strength score. This report also provides a general assessment of the article’s bias, and a overall score, calculated by normalizing the ratio of biased sentences to total sentences and the average bias score across all biased sentences.

Currently Supported Types of Bias. In general, we define media bias as the tendency to, consciously or unconsciously, report a news story in a way that supports a pre-existing narrative instead of providing unprejudiced coverage of an issue. Our implementation explicitly searches for 27 different types of Bias, namely Ad Hominem Bias, Ambiguous Attribution Bias, Anecdotal Evidence Bias, Causal Misunderstanding Bias, Cherry Picking Bias, Circular Reasoning Bias, Discriminatory Bias, Emotional Sensationalism Bias, External Validation Bias, False Balance Bias, False Dichotomy Bias, Faulty Analogy Bias, Generalization Bias, Insinuative Questioning Bias, Intergroup Bias, Mud Praise Bias, Opinionated Bias, Political Bias, Projection Bias, Shifting Benchmark Bias, Source Selection Bias, Speculation Bias, Straw Man Bias, Unsubstantiated Claims Bias, Whataboutism Bias and Word Choice Bias:

4 Evaluation

Quantiative Evaluation. While a detailed evaluation is beyond the scope of this system paper, we presented detailed quantitative and qualitative evaluations for the English language in [21] and [22]. Table 1 shows some quality numbers from [22], which were representative as of June 2024 (for BiasScanner as of release 1.0.0 from July 2024; ongoing development may lead to different scores going forward.) F1-score is high at 76%, and our fine-tuned model’s quality dominates GPT-4 on all metrics except for precision (73% versus the latter’s 85%, at the time of writing).

Table 1: Evaluation Results on the BABE dataset for BiasScanner, GPT-3.5-turbo-1106 with prompt only and GPT-4-turbo-0125. Best results are highlighted in bold.

Model	TP	FP	FN	TN	F1-Score	Recall	Precision	Accuracy
BiasScanner	576	214	154	524	0.758	0.790	0.729	0.749
GPT-3.5 (Zero shot)	384	205	346	533	0.582	0.526	0.651	0.624
GPT-4.0 (Zero shot)	393	69	337	669	0.659	0.538	0.850	0.723
Baseline (Random)	362	374	368	364	0.494	0.496	0.492	0.495

Qualitative Evaluation. The achieved quality level is satisfying for practical use of the browser plug-in; a common error is the mis-classification of neutral reporting sentences with embedded radical quotes as “biased”; we believe embedded quotes ought to be removed before judging a sentence, which we will address in future work. We are particularly encouraged by the quality of our generated explanations, the evaluation of which is left for future work.

Beyond English. At the time of writing, BiasScanner can deal with news in English through our fine-tuned model, and also with other languages via said model’s transfer capabilities; in future work we want to fine-tune models for additional specific languages and evaluate them, as well as compare their performance with our existing model’s transfer abilities.

5 Limitations and Ethical Concerns

BiasScanner may not identify all instance of biases, and while we do not claim it does, the users may wrongly believe otherwise, consciously or unconsciously, after getting used to it. It can also not recognize all types of bias: notably, underreporting bias and other types that need across across several articles, are beyond its scope, as it only analyzes one individual news story at a time; we leave news coverage comparison for future work. It should also be noted that bias detection is always, to an extent, a subjective matter. Often a sentence might be considered biased by one person while another considers it to still be objective, therefore no classification will probably ever satisfy everyone at once.

Our current back-end implementation still depends on an underlying proprietary foundational model; in future work, we plan to become independent and port to an open model, even if this may mean a slight reduction of accuracy, as this may limit the ability to manipulate the system’s behavior from the outside.

6 Summary, Conclusions/Limitations and Future Work

We introduced BiasScanner, a new system for enhancing online news consumption by highlighting biased individual sentences in news articles, by offering news story analysis within Web browsers. We have successfully realized our design goals, including user privacy, rapid implementation and accurate bias classification.

BiasScanner may not identify all biases, as to date it focuses on individual news stories and does not compare across articles.

To date, BiasScanner has mainly been tested with English articles, introducing a development bias. Sending plain text to a server for security is required, but it is done anonymously. The system has been released as experimental browser extension available free of charge for Firefox trough the Mozilla plug-in marketplace[18] (Available on Desktop and Android). Future Releases for Chrome and Safari are planned. It can also be installed from GitHub [19].

We are also already using BiasScanner in the classroom for the teaching of critical reading and engaging students with the topic of media manipulation and its effects on a democracy (in Summer Semester 2024, the second author used it to support his course Media Manipulation, Propaganda and Fake News at Coburg University of Applied Sciences in Germany).

In future work, we aim to support open-source language models [29] to reduce cost and decrease reliance on commercial model vendors. We intent to support languages other than English, and we plan to expand the tool’s capabilities for multi-dimensional content analysis, including hate speech detection, readability scoring, fake news detection/credibility assessment and identifying inappropriate content for children [12]. We also welcome collaborations with other research teams and contributions to our effort from the open source community.

References

[1] Allen, J., et al.: Scaling up fact-checking using the wisdom of crowds. Science Advances 7 (2021)
[2] Arapakis, I., et al.: Linguistic benchmarks of online news article quality. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. pp. 1893–1902. ACL, Berlin, Germany (2016)
[3] Baumer, E., et al.: Testing and comparing computational approaches for identifying the language of framing in political news. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1472–1482. ACL, Denver, CO, USA (2015)
[4] Bhuiyan, M.M., et al.: Investigating differences in crowdsourced news credibility assessment: Raters, tasks, and expert criteria. Proc. ACM Hum.-Comput. Interact. 4(CSCW2) (2020)
[5] Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 1877–1901. Curran (2020)
[6] Chen, W.F., et al.: Learning to flip the bias of news headlines. In: Proceedings of the 11th International Conference on Natural Language Generation. pp. 79–88. ACL, Tilburg University, The Netherlands (2018)
[7] Chen, W.F., et al.: Detecting media bias in news articles using gaussian bias distributions (2020)
[8] Conboy, M.: The Language of the News. Routledge, London, UK (2007)
[9] Conrad, J.G., Leidner, J.L., Schilder, F.: Professional credibility: authority on the web. In: Tanaka, K., Matsuyama, T., Lim, E.P., Jatowt, A. (eds.) Proceedings of the 2nd ACM Workshop on Information Credibility on the Web, WICOW 2008, Napa Valley, California, USA, October 30, 2008. pp. 85–88. ACM (2008)
[10] Da San Martino, G., et al.: Prta: A system to support the analysis of propaganda techniques in the news. In: Celikyilmaz, A., Wen, T.H. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 287–293. Association for Computational Linguistics, Online (2020)
[11] Devlin, J., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186. ACL, Minneapolis, MN, USA (2019)
[12] Fuhr, N., et al.: An information nutritional label for online documents. SIGIR Forum 51(3), 46–66 (2018)
[13] Ghanem, B., et al.: FakeFlow: Fake news detection by modeling the flow of affective information. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. pp. 679–689. Association for Computational Linguistics, Online (2021)
[14] Groeling, T.: Media bias by the numbers: Challenges and opportunities in the empirical study of partisan news. Annu. Rev. Polit. Sci. 16, 129–151 (2013)
[15] Groseclose, T., et al.: A measure of media bias. The Quarterly Journal of Economics 120(4), 1191–1237 (2005)
[16] Hamborg, F., et al.: Newsalyze: Enabling news consumers to understand media bias. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. pp. 455–456. JCDL 2020, ACM, New York, NY, USA (2020)
[17] Horne, B.D., et al.: Sampling the news producers: A large news and feature data set for the study of the complex media landscape (2018)
[18] Information Access Research Group (ed.): Bias scanner (2024), https://addons.mozilla.org/de/firefox/addon/bias-scanner/, available for Firefox for Android
[19] Information Access Research Group (ed.): BiasScanner source code repository (2024), https://github.com/Information-Access-Research-Group-IARG/biasscanner, online/open source and open access, GitHub.com a Microsoft Corporation company, accessed 2024-06-26
[20] Lee, M.A., et al.: Unreliable Sources. A Guide to Detecting Bias in News Media. Carol Publishing, New York, NY, USA (1990)
[21] Menzner, T., Leidner, J.L.: Experiments in news bias detection with pre-trained neural transformers. In: Goharian, N., Tonellotto, N., He, Y., Lipani, A., McDonald, G., Macdonald, C., Ounis, I. (eds.) Advances in Information Retrieval. Lecture Notes in Computer Science (LNCS 14611), vol. IV, pp. 270–284. Springer Nature, Cham, Switzerland (2024)
[22] Menzner, T., Leidner, J.L.: Improved models for media bias detection and subcategorization. In: Proceedings of the 29th International Conference on Natural Language & Information Systems, 25-27 June 2024, University of Turin, Italy (NLDB 2024). Lecture Notes in Computer Science (LNCS), Springer Nature, Cham, Switzerland (2024)
[23] Mozilla: Mozilla Readability. https://github.com/mozilla/readability (2023), gitHub repository
[24] Roumeliotis, K.I., et al.: ChatGPT and Open-AI models: A preliminary review. Future Internet 15(6) (2023)
[25] Sloan, W.D., Mackay, J.B. (eds.): Media Bias: Finding It, Fixing It. McFarland & Company, Jefferson, NC, USA (2007)
[26] Spinde, T., et al.: Media bias in German news articles: A combined approach. In: Koprinska, I., Kamp, M., Appice, A., Loglisci, C., Antonie, L., Zimmermann, A., Guidotti, R., Özgöbek, Ö., Ribeiro, R.P., Gavaldà, R., Gama, J., Adilova, L., Krishnamurthy, Y., Ferreira, P.M., Malerba, D., Medeiros, I., Ceci, M., Manco, G., Masciari, E., Ras, Z.W., Christen, P., Ntoutsi, E., Schubert, E., Zimek, A., Monreale, A., Biecek, P., Rinzivillo, S., Kille, B., Lommatzsch, A., Gulla, J.A. (eds.) ECML PKDD 2020 Workshops. pp. 581–590. Springer, Cham, Switzerland (2020)
[27] Spinde, T., et al.: Automated identification of bias inducing words in news articles using linguistic and context-oriented features. Information Processing & Management 58(3), 102505 (2021)
[28] Spinde, T., et al.: Neural media bias detection using distant supervision with BABE - bias annotations by experts. In: Findings of the Association for Computational Linguistics: EMNLP 2021. pp. 1166–1177. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
[29] Touvron, H., et al.: Llama: Open and efficient foundation language models (2023), unpublished manuscript, Cornell University ArXiv pre-print server
[30] Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)
[31] Vosoughi, S., et al.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
[32] Wessel, M., et al.: Introducing MBIB – the first media bias identification benchmark task and dataset collection. In: Proceedings of 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2023, ACM, New York, NY, USA (2023), iSBN 978-1-4503-9408-6/23/07
[33] Yano, T., et al.: Shedding (a thousand points of) light on biased language. In: Proceedings of the NAACL-HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. pp. 152–158. ACL (2018)