Characterization of Political Polarized Users Attacked by Language Toxicity on Twitter

Wentao Xu
Department of Science and Technology of Communication
University of Science and Technology of China
[email protected]
Abstract

Understanding the dynamics of language toxicity on social media is important for us to investigate the propagation of misinformation and the development of echo chambers for political scenarios such as U.S. presidential elections. Recent research has used large-scale data to investigate the dynamics across social media platforms. However, research on the toxicity dynamics is not enough. This study aims to provide a first exploration of the potential language toxicity flow among Left, Right and Center users. Specifically, we aim to examine whether Left users were easier to be attacked by language toxicity. In this study, more than 500M Twitter posts were examined. It was discovered that Left users received much more toxic replies than Right and Center users.

1 Introduction

Social media has become an indispensable daily element of contemporary social life [1]. When social media platforms bring people freedom of communication, studies identified language toxicity emerging across platforms, such as Twitter (Now, X) [2, 3], Facebook [4, 5], Reddit [6], YouTube [7], Telegram [8], and Whisper [9]. These studies have identified various forms of toxic content, including violence, obscenity, threats, insults, and abusive language. Toxic language in online social networks is prevalent between users with no connection than between mutual friends, and mildly offensive terms are used more frequently to express hostility between these two groups[10]. The nature and extent of toxic language can vary by platform. For instance, Reddit has been found to contain a higher frequency of posts with insults, identity attacks, threats of violence, and sexual harassment[11]. Additionally, the research indicates that while most studies on offensive language detection have focused on English, the toxic contents have been identified in other languages, such as Greek and Indonesian, as well [12, 13]. In addition, the presence of toxic language on Twitter can have significant negative impacts on individuals and communities, even affecting mental health[14].

In political scenarios, online conversations during U.S. presidential elections can indeed exhibit toxicity [15]. While social media platforms are praised for enhancing democratic discussions, the presence of social bots can distort political discourse, potentially influencing public opinion and election integrity negatively[16, 17]. The behaviour of social bots aggravates the propagation of toxic content. The toxicity in online political talk is often linked to incivility, challenging the perception that it is beneficial for the elections. Moreover, the study of online chatter surrounding elections is crucial for ensuring evidence-based political discourse and free and fair elections [18]. Therefore, while online conversations can provide a platform for political discussions, the toxicity and manipulation through bots underscores the importance of monitoring and studying these interactions to safeguard the election process.

However, the political process is severely affected by polarization. Polarization is popular on Twitter, as the platform serves as a significant space for political discourse, which can influence public opinion and democratic processes. Studies have shown that Twitter can both facilitate cross-ideological exchanges and contribute to the clustering of users around shared political views, potentially reinforcing partisan loyalties and contributing to polarization [19].

The impact of Twitter on political polarization is also significant in fragmented political systems, where the platform’s role in shaping communication among political entities can affect collaboration between parties and the overall political landscape [20]. Interestingly, while some research supports the “echo chambers” view, suggesting that social media platforms like Twitter foster political polarization by creating fragmented and niche-oriented spaces for like-minded individuals, other studies highlight the presence of cross-cutting interactions that could mitigate this effect [21].

Moreover, the influence of social media, on political polarization has been demonstrated through simulations, indicating that the type of political views presented in social media can shape the political orientation of a population [22]. The investigation into political polarization on Twitter is crucial due to the platform’s ability to shape political communication and influence public opinion. While there is evidence of both echo chambers and cross-ideological interactions, the overall effect of Twitter on political polarization varies and is a subject of ongoing scholarly debate.

Language on Twitter can potentially exacerbate political polarization [23]. The language of toxicity can be bared in the echo chambers on Twitter, leading to more extreme and toxic language, reinforcing divisions[21]. Moreover, language on Twitter can also be affected by traditional media, such as broadcast media language, which contributes to toxic online interactions[24]. Additionally, Twitter is vulnerable to manipulation by malicious actors who use polarized and toxic language to sow discord. This can significantly distort the political landscape and influence public opinion [25].

The most recent extensive research on language toxicity shows that toxicity does not always increase as online discussions progress, suggesting more rounds of conversation may not lead to higher toxicity [26]. However, the detailed dynamics of these toxic politically polarized conversations are still unclear. Here we focus on the replies to politically polarized users on Twitter. In most cases, any social networking service (SNS) account can freely engage with another one with texts, which could lead to further negative and harmful online social behaviours, such as rounds of toxic replying. [27] discovered that anti-vaxxers are more aggressive in replying by analyzing toxic replies of English and Japanese tweets. [28] found that ideological extremity is more associated with the conservatives than the liberals through network analysis. [29] identified toxic replies diffusing patterns on Twitter based on news outlets diffused on Twitter. These studies helped understand the mechanism of replies on Twitter, but the language toxicity patterns of politically polarized Twitter replies were not investigated. This study examined the correlation between political polarization and language toxicity of Right, “Center,” and Left replies.

2 Data & Methods

2.1 Data

It is well known that the COVID-19 pandemic is a worldwide healthcare crisis, during which political polarization was intensified [30, 31, 32]. Such a catastrophic global situation provides a time window to examine the association between political polarization and online language toxicity.

In this study, 542,212,429 English tweets were collected from February 20 2020 to May 30 2022 by querying COVID-19-related keywords: “corona virus”, “coronavirus", “covid19”, “2019-nCoV”, “SARS-CoV-2”, “wuhanpneumonia” using the Twitter Search API. A total of 25,370,268 replies of English tweets were used for this study.

2.2 User annotation

A politically-leaning URL domain list of news websites was then obtained by requesting from Allsides111www.allsides.com for academic research purposes, which contains 160 Left and Lean Left URLs, 98 Right and Lean Right URLs and 180 Center URLs. Based on the list, each reply was labelled as Right if its domain of the Twitter URL object was identified in the Right or Lean Right domain list; the other replies were labelled as Left and Center, accordingly.

To examine the degree to which a user engages with labelled replies, we categorized users according to their replies’ domain labels. For example, the Right user category includes users whose reply URL objects contain the Right domains, exclusively. It happens that a reply does not contain any URLs. Please, keep in mind that this study only looked at replies that met two criteria:

Meanwhile, the study further considered the frequency with which each user was replied to in each politically-leaning category. For example, if the Left domains occurred in a replied-to user’s reply URL object three times without Center and Right domains occurring, this user was considered to be a three-time-replied-to user in the Left category and was called a three-time-replied-to Left user. As a result, a user who was replied more frequently, in this study, is regarded as a more engaged user with a politically-leaning domain category. To this end, Twitter users were annotated as Left, Right and Center categories.

2.3 Toxicity Calculation

The Perspective API 333https://www.perspectiveapi.com/ is considered suitable for toxicity calculation due to its machine learning-based approach to detecting and moderating toxic content on social media platforms [33, 34, 35, 36, 27, 7, 28]. It has been adopted for content moderation, monitoring, and research purposes. It aligns well with human ratings of toxicity [26] and disrespectfulness, especially for highly toxic comments [37], indicating the capability of language toxicity measurement for Perspective API is robust.

For the text input into the Perspective API, a probability score scaling within [0,1]01[0,1][ 0 , 1 ] is calculated. The higher the score is, the more toxic the input text is. Some research uses a threshold for classifying “toxic” and “nontoxic” texts. Here, this strategy was not adopted, as I need to characterise the toxicity of all users. To measure the toxicity of each user, the replied texts for each user were aggregated, and then sent to Perspective API. Since each category of users possesses various statistical indicators for toxicity, here, the analysis for maximum and median toxicity scores of Left, Right and Center was reported.

3 Results

3.1 The Left received much more toxic replies.

The overall negative correlation between maximum toxicity and the replied times was identified in this study(Figure.1).

Refer to caption
Figure 1: The maximum toxicities of replies of each politically-leaning category of replied-to users received. The X-axis,repied times, is in log scale. The Y axis indicates the times each category of users were replied. Reds indicate users were engaging with Right domains, exclusively; Greens indicate users were engaging with Center domains, exclusively; Blues indicate uses were engaging with Left domains, exclusively. Top toxic to-replied users were indicated by arrows. Left category outliers were indicated by arrows. Replies to the Left category users were significantly more toxic than the ones to the Right and Center category (p<0.005𝑝0.005p<0.005italic_p < 0.005 by Mann–Whitney U-test with a Bonferroni correction.).

The maximum toxicity is the highest value of language toxicity of the category with specific replied times. Figure 1 illustrates the maximum toxicities of each category replied at different times, indicating that more-replied-to users were less likely to receive replies with high toxicity. The Right and Center categories replied-to users shared a similar maximum toxicity distribution (Kolmogorov-Smirnov test, p>0.05𝑝0.05p>0.05italic_p > 0.05), while the Left category showed a different distribution (p<0.05𝑝0.05p<0.05italic_p < 0.05). However, the statistical difference does not change the overall trend of the three categories.

In general, more frequently replied-to users shared lower maximum toxicities, regardless of user category. The Left category differs from the others, possibly due to the higher toxicity values of several outliers. For instance, some Left category outliers (indicated by arrows in Figure 1) shared larger toxicities and some of them even reached more than 0.80.80.80.8. The outliers could be top toxic repliers. By contrast, the Right category users’ maximum toxicities were less than 0.40.40.40.4, when they were replied more than, approximately 1,000 times. The maximum toxicity is compared between categories in Figure 2.

Refer to caption
Figure 2: Boxplots represent the maximum toxicity of the replies for each user category. The median for each category is shown in each bar.

This reveals that the maximum toxicities of the Left category users are significantly higher than those of the other two categories (Mann-Whitney U test, p<0.005𝑝0.005p<0.005italic_p < 0.005).

3.2 The Left and Center outliers received much more toxic replies.

The median can be used to represent the centre tendency of a dataset. In contrast to the maximum scenario, the level of median toxicity did not exhibit a negative correlation with replied times.

Most of the median toxicity values were concentrated between around 0.050.050.050.05 to 0.40.40.40.4. This overall tendency showed that the toxicity of replies was less aggressive, but fluctuated as the replied times increased. Specifically, when we looked at the Right category users, the median toxicities were below 0.50.50.50.5, but the outlier values for Left and Center users reached over 0.70.70.70.7. No statistical significance was identified across the Left, Right, and Center, suggesting the three categories shared a similar distribution for median toxicity (Figure 3), and no significant median toxicity group was identified out of the three categories (Figure 4).

Refer to caption
Figure 3: The median toxicities of replies of each politically-leaning category of replied-to users received. The three categories shared a similar distribution for median toxicity. The X-axis, repied times, is in log scale. The Y axis indicates the times each category of users were replied. Reds indicate users were engaging with Right domains, exclusively; Greens indicate users were engaging with Center domains, exclusively; Blues indicate uses were engaging with Left domains, exclusively. The Left and Center outlier users (indicated by arrows) received much more toxic replies.
Refer to caption
Figure 4: Boxplots represent the median toxicity of the replies for each user category. The median for each category is shown in each bar. The overall reply toxicities were similar across the three categories, although the Left and Center outliers received much more toxic contents.

4 Discussion

This study shows that Left users could receive more toxic replies than Right and Center users. This pattern of toxicity propagation is important for understanding misinformation propagation and echo chamber development, as toxicity in online interactions can lead to a decrease in user activity, ultimately impacting the collaborative nature of platforms  [38]. Previous research confirmed that the left group was more distant from the neutral group than the right group [27]. However, this study found that Left users were much closer to Right users than the Center user, in terms of maximum toxicities. This “toxicity distance” might suggest that right and left users were sending toxicities to each other, but Left users received much more. Although there was no significant difference in the language toxicity across the replied-to users of the Left, Right, and Center categories, the replied users targeted by toxic repliers in each category cannot be neglectable, especially the Left users.

What precautions are necessary to take for protecting users from language toxicity attacks, especially during political discussions, such as U.S. presidential elections should be carefully considered. When users are engaging the Left, it is suggested to pay attention to the toxic comments and replies, which might further pollute the SNS ecosystem and make users more emotional. Future work would be finding out the dynamics of interaction and engagement dynamics for the Left, Right and Center. In addition, more intelligent tools need to be proposed to combat the aggression of the toxic language to keep our SNS ecosystem healthier. This study has implications for other platforms, such as Facebook and Reddit.

References

  • [1] Robert E. Wilson, Samuel D. Gosling, and Lindsay T. Graham. A review of facebook research in the social sciences. Perspectives on Psychological Science, 7(3):203–220, 2012. PMID: 26168459.
  • [2] Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. Measuring #gamergate: A tale of hate, sexism, and bullying. In Proceedings of the 26th International Conference on World Wide Web Companion, WWW ’17 Companion, page 1285–1290, Republic and Canton of Geneva, CHE, 2017. International World Wide Web Conferences Steering Committee.
  • [3] Joshua Guberman, Carol Schmitz, and Libby Hemphill. Quantifying toxicity and verbal violence on twitter. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, CSCW ’16 Companion, page 277–280, New York, NY, USA, 2016. Association for Computing Machinery.
  • [4] Ariadna Matamoros-Fernández and Johan Farkas. Racism, hate speech, and social media: A systematic review and critique. Television & New Media, 22(2):205–224, 2021.
  • [5] Paula Fortuna and Sérgio Nunes. A survey on automatic detection of hate speech in text. ACM Comput. Surv., 51(4), jul 2018.
  • [6] Deepak Kumar, Jeff Hancock, Kurt Thomas, and Zakir Durumeric. Understanding the behaviors of toxic accounts on reddit. In Proceedings of the ACM Web Conference 2023, WWW ’23, page 2797–2807, New York, NY, USA, 2023. Association for Computing Machinery.
  • [7] Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An, and Kazutoshi Sasahara. The impact of toxic trolling comments on anti-vaccine youtube videos. Scientific Reports, 14(1), March 2024.
  • [8] Maximilian Wich, Adrian Gorniak, Tobias Eder, Daniel Bartmann, Burak Enes Çakici, and Georg Groh. Introducing an abusive language classification framework for telegram to investigate the german hater community. Proceedings of the International AAAI Conference on Web and Social Media, 16(1):1133–1144, May 2022.
  • [9] Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. Analyzing the targets of hate in online social media. Proceedings of the International AAAI Conference on Web and Social Media, 10(1):687–690, Aug. 2021.
  • [10] Bahar Radfar, Karthik Shivaram, and Aron Culotta. Characterizing variation in toxic language by social context. Proceedings of the International AAAI Conference on Web and Social Media, 14(1):959–963, May 2020.
  • [11] Deepak Kumar, Jeff Hancock, Kurt Thomas, and Zakir Durumeric. Understanding the behaviors of toxic accounts on reddit. In Proceedings of the ACM Web Conference 2023, WWW ’23. ACM, April 2023.
  • [12] Zesis Pitenis, Marcos Zampieri, and Tharindu Ranasinghe. Offensive language identification in Greek. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5113–5119, Marseille, France, May 2020. European Language Resources Association.
  • [13] Muhammad Okky Ibrohim, Erryan Sazany, and Indra Budi. Identify abusive and offensive language in indonesian twitter using deep learning approach. Journal of Physics: Conference Series, 1196(1):012041, mar 2019.
  • [14] Katja Rost, Lea Stahel, and Bruno S. Frey. Digital social norm enforcement: Online firestorms in social media. PLOS ONE, 11(6):e0155923, June 2016.
  • [15] Tiago Ventura, Kevin Munger, Katherine McCabe, and Keng-Chi Chang. Connective effervescence and streaming chat during political debates. 1, Apr. 2021.
  • [16] Patrícia Rossini. More than just shouting? distinguishing interpersonal-directed and elite-directed incivility in online political talk. Social Media + Society, 7(2):20563051211008827, 2021.
  • [17] Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 u.s. presidential election online discussion. First Monday, November 2016.
  • [18] Emily Chen, Ashok Deb, and Emilio Ferrara. #election2020: the first public twitter dataset on the 2020 us presidential election. Journal of Computational Social Science, 5(1):1–18, April 2021.
  • [19] Anatoliy Gruzd and Jeffrey Roy. Investigating political polarization on twitter: A canadian perspective. Policy & Internet, 6(1):28–45, 2014.
  • [20] Anindita Borah and Sanasam Ranbir Singh. Investigating political polarization in india through the lens of twitter. Social Network Analysis and Mining, 12(1), July 2022.
  • [21] Sounman Hong and Sun Hyoung Kim. Political polarization on twitter: Implications for the use of social media in digital governments. Government Information Quarterly, 33(4):777–782, 2016.
  • [22] Shubh Goyal and Mukul Goyal. Impact of social/traditional media on political polarization. Journal of Student Research, 12(2), May 2023.
  • [23] Yihong Zhang, Masumi Shirakawa, and Takahiro Hara. An automatic method for understanding political polarization through social media. In Han Qiu, Cheng Zhang, Zongming Fei, Meikang Qiu, and Sun-Yuan Kung, editors, Knowledge Science, Engineering and Management, pages 52–63, Cham, 2021. Springer International Publishing.
  • [24] Xiaohan Ding, Michael Horning, and Eugenia H. Rho. Same words, different meanings: Semantic polarization in broadcast media language forecasts polarity in online public discourse. Proceedings of the International AAAI Conference on Web and Social Media, 17(1):161–172, Jun. 2023.
  • [25] Almog Simchon, William J Brady, and Jay J Van Bavel. Troll and divide: the language of online polarization. PNAS Nexus, 1(1):pgac019, 03 2022.
  • [26] Michele Avalle, Niccolò Di Marco, Gabriele Etta, Emanuele Sangiorgio, Shayan Alipour, Anita Bonetti, Lorenzo Alvisi, Antonio Scala, Andrea Baronchelli, Matteo Cinelli, and Walter Quattrociocchi. Persistent interaction patterns across social media platforms and over time. Nature, 628(8008):582–589, March 2024.
  • [27] Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, and Kazutoshi Sasahara. Aggressive behaviour of anti-vaxxers and their toxic replies in english and japanese. Humanities and Social Sciences Communications, 9(229), 2022.
  • [28] Mohsen Mosleh and David G Rand. Measuring exposure to misinformation from political elites on twitter. Nature Communications, 13(7144), 2022.
  • [29] Martin Saveski, Brandon Roy, and Deb Roy. The structure of toxic conversations on twitter. In Proceedings of the Web Conference 2021, WWW ’21, page 1086–1097, New York, NY, USA, 2021. Association for Computing Machinery.
  • [30] Joel M. Levin, Leigh A. Bukowski, Julia A. Minson, and Jeremy M. Kahn. The political polarization of covid-19 treatments among physicians and laypeople in the united states. Proceedings of the National Academy of Sciences, 120(7):e2216179120, 2023.
  • [31] Sebastian Jungkunz. Political polarization during the covid-19 pandemic. Frontiers in Political Science, 3, 2021.
  • [32] Alexandra Flores, Jennifer C. Cole, Stephan Dickert, Kimin Eom, Gabriela M. Jiga-Boy, Tehila Kogut, Riley Loria, Marcus Mayorga, Eric J. Pedersen, Beatriz Pereira, Enrico Rubaltelli, David K. Sherman, Paul Slovic, Daniel Västfjäll, and Leaf Van Boven. Politicians polarize and experts depolarize public support for covid-19 management policies across countries. Proceedings of the National Academy of Sciences, 119(3):e2117543119, 2022.
  • [33] Bernhard Rieder and Yarden Skop. The fabrics of machine moderation: Studying the technical, normative, and organizational structure of perspective api. Big Data & Society, 8(2):20539517211046181, 2021.
  • [34] Luiza Pozzobon, Beyza Ermis, Patrick Lewis, and Sara Hooker. On the challenges of using black-box APIs for toxicity evaluation in research. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7595–7609, Singapore, December 2023. Association for Computational Linguistics.
  • [35] Ashish Sharma, Inna W. Lin, Adam S. Miner, David C. Atkins, and Tim Althoff. Human–ai collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nature Machine Intelligence, 5(1):46–57, January 2023.
  • [36] Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A. Rothkopf, and Kristian Kersting. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4(3):258–268, March 2022.
  • [37] Lucas Rosenblatt, Lorena Piedras, and Julia Wilkins. Critical perspectives: A benchmark revealing pitfalls in PerspectiveAPI. In Laura Biester, Dorottya Demszky, Zhijing Jin, Mrinmaya Sachan, Joel Tetreault, Steven Wilson, Lu Xiao, and Jieyu Zhao, editors, Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI), pages 15–24, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics.
  • [38] Hind Almerekhi, Haewoon Kwak, Joni Salminen, and Bernard J. Jansen. Are these comments triggering? predicting triggers of toxicity in online discussions. In Proceedings of The Web Conference 2020, WWW ’20, page 3033–3040, New York, NY, USA, 2020. Association for Computing Machinery.