Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use

Hang Jiang [email protected] 0009-0000-9523-136X MIT Center for Constructive Communication & MIT Media LabCambridgeMAUSA , Doug Beeferman [email protected] 0009-0005-5879-5744 MIT Center for Constructive Communication & MIT Media LabCambridgeMAUSA , William Brannon [email protected] 0000-0002-1435-8535 MIT Center for Constructive Communication & MIT Media LabCambridgeMAUSA , Andrew Heyward [email protected] MIT Center for Constructive Communication & MIT Media LabCambridgeMAUSA and Deb Roy [email protected] 0000-0002-2780-4768 MIT Center for Constructive Communication & MIT Media LabCambridgeMAUSA

Abstract.

Words often carry different meanings for people from diverse backgrounds. Today’s era of social polarization demands that we choose words carefully to prevent miscommunication, especially in political communication and journalism. To address this issue, we introduce the Bridging Dictionary, an interactive tool designed to illuminate how words are perceived by people with different political views. The Bridging Dictionary includes a static, printable document featuring 796 terms with summaries generated by a large language model. These summaries highlight how the terms are used distinctively by Republicans and Democrats. Additionally, the Bridging Dictionary offers an interactive interface that lets users explore selected words, visualizing their frequency, sentiment, summaries, and examples across political divides. We present a use case for journalists and emphasize the importance of human agency and trust in further enhancing this tool¹¹1HJ and DB are both co-first authors. HJ led the evaluation and writing and DB led the tool development.. The deployed version of Bridging Dictionary is available at https://dictionary.ccc-mit.org/.

natural language processing, text analysis, political science, journalism

^†^†conference: The 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work & Social Computing (CSCW); November 9-13, 2024; San Jose, Costa Rica^†^†ccs: Human-centered computing Empirical studies in HCI^†^†ccs: Human-centered computing Web-based interaction^†^†ccs: Human-centered computing Visualization toolkits^†^†ccs: Computing methodologies Natural language processing

1. Introduction

Polarization is a significant feature of the political landscape in the United States (Poole and Rosenthal, 1984; McCarty et al., 2016; Heltzel and Laurin, 2020). Previous research has shown that Republicans and Democrats often use and interpret words differently, even when speaking the same language (Li et al., 2017; R. KhudaBukhsh et al., 2021). This linguistic divide poses considerable challenges to public discourse, particularly in journalism, where the use of words without an understanding of their varying connotations across political communities can have serious consequences. Journalists face the added difficulty of manually reading and editing news content, a process that is not only time-consuming but also prone to errors due to the constantly evolving connotations of words online. To address this problem, we introduce the Bridging Dictionary (BD), a tool designed to automatically identify controversial terms across the political divides and to summarize their different usages. We provide not only a useful resource for the academic community, journalists, and wider audiences but also highlight the importance of considering human agency and trust in developing human-AI systems.

2. Related Work

Polarized language use in NLP

Researchers in political science and Natural Language Processing (NLP) have found that there is a partisan difference in language understanding (Li et al., 2017). R. KhudaBukhsh et al. (2021) used modern machine-translation methods to show that the Republican and Democratic communities use English words differently. For instance, there are different connotations when partisans use “undocumented workers” or “illegal aliens” to discuss the same group of people. To address this issue, Webson et al. (2020) developed an NLP method to mitigate the political bias of text representations, showing that it improves the viewpoint diversity of document rankings. Recently, NLP researchers have also proposed novel methods to quantify and debias the political bias of language models (Liu et al., 2021, 2022). However, previous work tends to focus on measuring and debiasing NLP models instead of facilitating humans in writing less biased content. Our work fills the gap by leveraging NLP to help humans understand the language bias and facilitate them in writing and editing through an interactive tool.

NLP for qualitative analysis and sensemaking

NLP has been used to develop computational tools for qualitative analysis and sensemaking (Crowston et al., 2012; Guetterman et al., 2018). Among these tools, topic models are particularly prevalent for text analysis across various domains (Yan et al., 2013; Hu et al., 2019; Isoaho et al., 2021). However, traditional NLP models often lack world knowledge, resulting in limited insights and interpretability (Barde and Bainwad, 2017). The modern large language models (LLMs) have enabled users to interact with unstructured data through simple queries to extract more nuanced and interpretable insights. Recent studies have explored the application of LLMs to automatically extract and summarize valuable information from texts (Chew et al., 2023; Xiao et al., 2023; Gero et al., 2024; Overney et al., 2024). Despite these advancements, there is a lack of research focusing on how LLMs can assist journalists by providing qualitative insights for writing and editing. This study aims to bridge this gap by introducing an interactive tool to summarize the varying usage of terms across political divides, thereby guiding journalists in their word choices in news writing.

3. System Overview

Refer to caption — (a) The front page of the interactive Bridging Dictionary demo. The user can type any term they are interested in.

The Bridging Dictionary (BD) comprises two main components: (1) a paper dictionary and (2) an interactive demo. As illustrated in Figure 1, the paper dictionary presents 796 representative terms in a print-ready format, supplemented with summaries generated by an LLM. The interactive demo, on the other hand, enables users to explore the usage of a given term within Republican and Democratic communities in greater detail. BD leverages gpt-3.5-turbo, a widely-recognized LLM, via OpenAI’s API to generate these summaries. The term usages are sampled from a Twitter dataset (Jiang et al., 2022), which includes 4.7 million partisan-generated tweets (amounting to 100 million tokens) from each side during the 2020 American election. Users can customize both the dataset and the available LLMs from OpenAI.

3.1. Paper Dictionary

We generate a static printable document called “Bridging Dictionary: Paper Edition” that comprises 796 terms with LLM-generated summaries. These terms are curated algorithmically: we identify words and multi-word phrases that (1) occur sufficiently often within both partisan communities and (2) have significant differences between the two communities, either in sentiment score or in usage frequency. The parameters for these operations (i.e., the thresholds for “sufficient” and “significant”) are adjusted manually by an editor.

3.2. Interactive Demo

The interactive demo is a web-based generative dictionary providing greater detail and flexibility than the print version. It is implemented with the Streamlit framework (Streamlit, 2021). Whenever a user types a phrase, the system provides a few functions to explore how two communities (Republicans and Democrats) use this term, with alternative suggestions and visualizations of the input data.

Statistics. This section offers an overview of tweets containing the specified term from each community. The column “Matches per Thousand Tweets” indicates the frequency of tweets that include the term (e.g., “climate change”) per 1,000 tweets within a community. The accompanying pie chart illustrates the proportion of term usage between two partisan communities. Sentiment scores represent the average sentiment for tweets from a community that matched the term, with higher scores indicating a more positive sentiment. Colored text highlights the comparative scores between the two communities, helping users to interpret the data easily.

Summary. This section features LLM-generated summaries that explain the usage of the term across divides and propose alternative terms. The generation follows a standard retrieval-augmented generation (RAG) procedure, as outlined by Gao et al. (2023). Initially, the system randomly samples up to 50 tweets containing the term from a specific community, creates a prompt using these tweets, and prompts the LLM to produce summaries with a simple query. Importantly, the LLM is unaware of the community identity and only uses the sampled tweets for its summarization.

Definition. This section generates a dictionary-style definition of a term from each community’s perspective. This follows the same RAG process as the Summary section but asks the model for a definition instead of a summary.

Topic scatterplot. This section presents a two-dimensional interactive scatterplot that organizes individual tweets by topic, grouping similar topics closely together. Users can explore specific tweets by hovering over the corresponding points, enabling them to review the sampled tweets used for summary and definition generation. The process follows a well-established pipeline (e.g., BERTopic (Grootendorst, 2022)) transforming raw text into a scatterplot. It involves computing the embedding for each tweet using a sentence embedding model, then projecting these embeddings into two dimensions. By default, the points are clustered and color-coded based on the outcome of a clustering algorithm applied to these embeddings. For projection, we employ UMAP (McInnes et al., 2018), and for clustering, we use HDBSCAN (McInnes et al., 2017).

Sample list. This section lists the sampled tweets from each group, allowing users to read the information source.

4. Discussion and Evaluation

After deploying the Bridging Dictionary, we interviewed a professional journalist from Frontline at PBS and received positive feedback. Based on the interview, our statistics and summary features significantly aid in understanding the political divide in language use. Additionally, the topic scatterplot and sample list features are relied upon for supporting LLM-generated content and assisting journalists in making informed word choices. During our interview, two promising directions emerged: (1) enhancing the connection between LLM-generated content and information sources by considering human agency and trust in human-AI interaction, and (2) broadening the range of information sources beyond Twitter and regularly updating the dataset to reflect the evolving nature of language across different platforms and over time. We intend to further develop the tool based on these suggestions, conduct a more comprehensive field study involving more professional journalists and other users, and assess the tool’s impact on writing and editing.

References

(1)
Barde and Bainwad (2017) Bhagyashree Vyankatrao Barde and Anant Madhavrao Bainwad. 2017. An overview of topic modeling methods and tools. In 2017 International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, 745–750.
Chew et al. (2023) Robert Chew, John Bollenbacher, Michael Wenger, Jessica Speer, and Annice Kim. 2023. LLM-assisted content analysis: Using large language models to support deductive coding. arXiv preprint arXiv:2306.14924 (2023).
Crowston et al. (2012) Kevin Crowston, Eileen E Allen, and Robert Heckman. 2012. Using natural language processing technology for qualitative data analysis. International Journal of Social Research Methodology 15, 6 (2012), 523–543.
Gao et al. (2023) Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. ArXiv abs/2312.10997 (2023). https://api.semanticscholar.org/CorpusID:266359151
Gero et al. (2024) Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K Kummerfeld, and Elena L Glassman. 2024. Supporting Sensemaking of Large Language Model Outputs at Scale. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–21.
Grootendorst (2022) Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022).
Guetterman et al. (2018) Timothy C Guetterman, Tammy Chang, Melissa DeJonckheere, Tanmay Basu, Elizabeth Scruggs, and VG Vinod Vydiswaran. 2018. Augmenting qualitative text analysis with natural language processing: methodological study. Journal of medical Internet research 20, 6 (2018), e231.
Heltzel and Laurin (2020) Gordon Heltzel and Kristin Laurin. 2020. Polarization in America: Two possible futures. Current opinion in behavioral sciences 34 (2020), 179–184.
Hu et al. (2019) Nan Hu, Ting Zhang, Baojun Gao, and Indranil Bose. 2019. What do hotel customers complain about? Text analysis using structural topic model. Tourism Management 72 (2019), 417–426.
Isoaho et al. (2021) Karoliina Isoaho, Daria Gritsenko, and Eetu Mäkelä. 2021. Topic modeling and text analysis for qualitative policy research. Policy Studies Journal 49, 1 (2021), 300–324.
Jiang et al. (2022) Hang Jiang, Doug Beeferman, Brandon Roy, and Deb Roy. 2022. CommunityLM: Probing Partisan Worldviews from Language Models. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 6818–6826. https://aclanthology.org/2022.coling-1.593
Li et al. (2017) Ping Li, Benjamin Schloss, and D Jake Follmer. 2017. Speaking two “Languages” in America: A semantic space analysis of how presidential candidates and their supporters represent abstract political concepts differently. Behavior research methods 49, 5 (2017), 1668–1685.
Liu et al. (2022) Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, and Soroush Vosoughi. 2022. Quantifying and alleviating political bias in language models. Artificial Intelligence 304 (2022), 103654.
Liu et al. (2021) Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Lili Wang, and Soroush Vosoughi. 2021. Mitigating political bias in language models through reinforced calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14857–14866.
McCarty et al. (2016) Nolan McCarty, Keith T Poole, and Howard Rosenthal. 2016. Polarized America: The dance of ideology and unequal riches. mit Press.
McInnes et al. (2017) Leland McInnes, John Healy, Steve Astels, et al. 2017. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2, 11 (2017), 205.
McInnes et al. (2018) Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
Overney et al. (2024) Cassandra Overney, Belén Saldías, Dimitra Dimitrakopoulou, and Deb Roy. 2024. SenseMate: An Accessible and Beginner-Friendly Human-AI Platform for Qualitative Data Analysis. In Proceedings of IUI ’24. ACM, Greenville, 922–939. https://doi.org/10/gtt392
Poole and Rosenthal (1984) Keith T Poole and Howard Rosenthal. 1984. The polarization of American politics. The journal of politics 46, 4 (1984), 1061–1079.
R. KhudaBukhsh et al. (2021) Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet, and Tom Mitchell. 2021. We Don’t Speak the Same Language: Interpreting Polarization through Machine Translation. Proceedings of the AAAI Conference on Artificial Intelligence 35, 17 (May 2021), 14893–14901. https://ojs.aaai.org/index.php/AAAI/article/view/17748
Streamlit (2021) Streamlit. 2021. Streamlit – A faster way to build and share data apps. https://streamlit.io/
Webson et al. (2020) Albert Webson, Zhizhong Chen, Carsten Eickhoff, and Ellie Pavlick. 2020. Are “Undocumented Workers” the Same as “Illegal Aliens”? Disentangling Denotation and Connotation in Vector Spaces. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 4090–4105. https://doi.org/10.18653/v1/2020.emnlp-main.335
Xiao et al. (2023) Ziang Xiao, Xingdi Yuan, Q Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer. 2023. Supporting qualitative analysis with large language models: Combining codebook with GPT-3 for deductive coding. In Companion proceedings of the 28th international conference on intelligent user interfaces. 75–78.
Yan et al. (2013) Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In Proceedings of the 22nd international conference on World Wide Web. 1445–1456.