A Novel Lexicon for the Moral Foundation of Liberty

Oscar Araque
Universidad Politécnica de Madrid
Madrid, Spain
[email protected]
&Lorenzo Gatti
University of Twente
Twente, The Netherlands
[email protected] &Sergio Consoli
European Commission, Joint Research Centre (DG JRC)
Ispra, Italy
[email protected]
&Kyriaki Kalimeri
ISI Foundation
Turin, Italy
[email protected]

Abstract

The moral value of liberty is a central concept in our inference system when it comes to taking a stance towards controversial social issues such as vaccine hesitancy, climate change, or the right to abortion. Here, we propose a novel Liberty lexicon evaluated on more than 3,000 manually annotated data both in in- and out-of-domain scenarios. As a result of this evaluation, we produce a combined lexicon that constitutes the main outcome of this work. This final lexicon incorporates information from an ensemble of lexicons that have been generated using word embedding similarity (WE) and compositional semantics (CS). Our key contributions include enriching the liberty annotations, developing a robust liberty lexicon for broader application, and revealing the complexity of expressions related to liberty across different platforms. Through the evaluation, we show that the difficulty of the task calls for designing approaches that combine knowledge, in an effort of improving the representations of learning systems.

Keywords lexical resources, moral foundation theory, natural language processing

1 Introduction

Moral values are fundamental to our decision-making process, especially regarding controversial social issues. When taking a stance, for instance, on global warming or vaccine adherence, we consult - consciously or unconsciously - our moral system of values. The Moral Foundations Theory (MFT) was created precisely to explain morality across cultures Haidt and Joseph (2004), proposing five foundations, namely care, fairness, loyalty, authority and sanctity. In a much later revision, the theory was enhanced with a new sixth dimension: liberty (Haidt, 2012). The “Liberty/Oppression” foundation is about people’s reactance and resentment towards those who dominate them and restrict their liberty. Moral notions captured by the Liberty foundation include freedom of choice and individual responsibility of actions, which repeatedly emerged as fundamental decision-making drivers of crucial prosocial behaviors such as vaccine adherence (Amin et al., 2017; Beiró et al., 2023; Zhang et al., 2023), and cooperation during crisis (Mejova et al., 2023).

Recent works focus on the automatic detection of moral values in text, employing annotated lexicons, either for unsupervised detection or as features in a learning system (Mooijman et al., 2018; Rezapour et al., 2021; Kennedy et al., 2021; Preniqi et al., 2021; Mejova et al., 2023; Mokhberian et al., 2020; Zhang et al., 2023). Given that the liberty foundation was added to the MFT theory subsequently, there were initially no linguistic resources for it. Preliminary approaches to liberty assessment were based on purely data-driven lexical characterization Araque et al. (2021, 2022), lacking, however, a solid evaluation against a benchmarked ground-truth. This is precisely the gap we are addressing in this study.

We gathered data from various platforms to cover multiple aspects of the expression of the liberty foundation. In particular, we included (i) the Wikipedia¹¹1https://www.wikipedia.org and Conservapedia²²2https://www.conservapedia.com projects, encyclopedia projects of general content but diverse viewpoints, (ii) the r/Libertarian and r/Conservative communities on Reddit.com, forums of political discussion of general interest, (iii) the Black Lives Matter (BLM) and Election (Elect.) datasets from the MFTC Twitter Corpus Hoover et al. (2020), a collection of tweets discussing racial discrimination and the US Presidential Elections of 2016, respectively, as well as (iv) posts and comments from META’s Pages regarding the vaccination debate (Vaccine). The first two scenarios (Wikipedia vs Conservapedia and Libertarian vs Conservative on Reddit) act as “natural experiments” expressing the viewpoints of communities with diverse opinions and stances on the liberty foundation described by the MFT framework. To ensure a robust ground-truth, we obtained manual annotations of the expression of the moral foundation of liberty in the BLM, Elect., and META’s vaccination-related posts and comments.

We generated two lexicons per dataset, employing two complementary approaches; the word embedding similarity (WE) (Turney and Littman, 2003) and the compositional semantics (CS) (Liang et al., 2013). The first automatically extracts a set of seed words using frequency shifts, comparing new words’ embeddings to seed words’ embeddings to determine their alignment with the foundation’s principles. The second method assumes that each word expresses the side of the foundation more frequently present in the documents where the word appears. We explore a lexicon aggregation approach based on overlapping terms that combines the benefits of the two methods. We also propose a combined representation approach which takes into consideration the individual lexical resources while accounting for overfitting issues. Finally, we evaluate the lexicons obtained per dataset, both in cross-domain experimental setups and out-of-domain ones.

We contribute to the state of the art in moral foundation recognition in the following ways. We expand the benchmark dataset of the Moral Foundations Theory Corpus (MFTC) by providing valuable manual annotations on the liberty moral foundation and rendering them available to the scientific community. Our work developed a refined and versatile liberty lexicon capable of effectively generalizing over previously unseen domains; the final liberty lexicon as well as the intermediate ones will also be available online. Our research sheds light on the nuanced variations in the expression of liberty across different domains, providing valuable insights into how this critical moral foundation can manifest differently within diverse contexts. These contributions collectively enhance our understanding of moral analysis and pave the way for more accurate and comprehensive evaluations.

2 Related Works

The Moral Foundations Dictionary (MFD) Graham et al. (2009) is a collection of lemmas and associated moral traits, assembled by experts and typically used together with the Linguistic Inquiry and Word Count (LIWC) software Tausczik and Pennebaker (2010) to estimate moral traits and investigate differences in moral concerns between different cultural groups. Garten et al. (2018) proposed the Distributed Dictionary Representations (DDR) method based on psychological dictionaries and semantic similarity to quantify the presence of moral sentiment around a given topic. Later on, the authors extended the method, incorporating demographic embeddings into the language representations (Garten et al., 2019).

In an attempt to address several of the limitations of the MFD, Araque et al. (2020) proposed a data-driven generated lexicon, the MoralStrength, which expanded the original MFD employing the WordNet synsets and crowdsourced annotations. Different from the MFD, where each foundation is considered a bipolar of “virtue” and “vice”, MoralStrength treats each foundation as a continuum, assigning a numeric value of moral valence to each lemma that indicates the weight with which the lemma is expressing the specific value. Hopp et al. (2021) developed the extended Moral Foundations Dictionary (eMFD), a lexicon which expands the MFD based on crowdsourced annotations. Each lemma in eMFD is assigned a continuously weighted vector that expresses the probability that the lemma belongs to any of the five moral foundations.

Notably, none of the above lexicons though included the liberty moral foundation. A first attempt to derive a lexicon from assessing the presence of liberty in the text was presented by Araque et al. (2021). They considered pairs of Wikipedia Pages and their Conservapedia counterparts as natural expressions of the liberty-oppression divide. They created a series of word embeddings which were then compared through cosine similarity to a set of seed words defined by experts to generate a lexicon. Their design comes with the obvious conceptual limitations of considering the Wikipedia project as expressing a strongly libertarian position and initiating the embeddings with a list of manually selected seed words from expert annotators. More recently, Araque et al. (2022) proposed a liberty lexicon generation approach based aligning documents from online news sources with different worldviews. The LibertyMFD was later employed by Araque et al. (2023) to fine-tune the approach proposed by Consoli et al. (2022) for analysing how the Spanish news cover the female (un)employment topic in terms of sentiment and moral values, as well as how this sentiment evolves over time.

Although pioneering, their approach suffered from the lack of a solid ground-truth on which to evaluate the generated lexicons. Due to the lack of a ground-truth dataset, the lexicon evaluation was based on the assumption that news from different political orientations would express opposite notions with respect to the liberty moral foundation. Instead, here, we evaluate the lexicons against solid manual annotations for the liberty foundation of the benchmark Moral Foundations Twitter Corpus datasets (BLM and Elect.) which we render available to the scientific community.

3 Data Collection

Moral Foundations Twitter Corpus (MFTC)

The Moral Foundations Twitter Corpus is a corpus consisting of seven independent datasets (35k tweets in total), manually annotated for the original five moral foundations Hoover et al. (2020), but not the liberty foundation.

The Black Lives Matter Twitter Corpus (BLM) van der Veen (2022) and the Elections Corpus (Elect.) Davidson et al. (2017) are the two largest datasets in this collection, and we manually annotated them³³3We did not proceed to the annotation of the entire MFTC corpus due to funding limitations. as per the moral foundation of Liberty, relying on a popular tool for crowdsourcing and human validation, i.e., Amazon Mechanical Turk (AMT hereafter) provided by the Amazon SageMaker Ground Truth service. The Black Lives Matter Twitter Corpus focuses on tweets specifically regarding the Black Lives Matter movement, and it contains 4,352 tweets. The Election corpus relates to the US 2016 Presidential election, and consists of 4,370 tweets.

To ensure coherence, we followed the same procedure and annotation scheme that Hoover et al. (2017) used for the MFTC annotation. Moreover, inspired by the annotation approach of the MoralStrength lexicon Araque et al. (2020), we added the notion of “strength”, which indicates the degree to which each lemma expresses the liberty moral foundation in addition to its presence and polarity.

We assigned each tweet to nine independent annotators and asked them to rate the extent to which each tweet expressed a “Liberal” or “Oppressive” moral value on a scale from 1 to 9. The score magnitude represents the intensity of the Liberty/Oppression expressed in a tweet, as perceived by the annotator: a score close to 9 indicates that the sentence expresses a highly oppressive connotation, while a score value close to 1 is associated with a very libertarian connotation. Should the sentence not be associated with neither an oppressive nor a libertarian connotation, then the annotator could assign a neutral score. The intercoder agreement score provided by AMT is 92%⁴⁴4Annotations will be available upon acceptance..

Dataset	Original			Balanced
Dataset	Label	Instances (%)	Total instances	Label	Instances (%)	Total instances
Black Lives Matter (BLM)	Liberty	54%	4,340	Liberty/Oppression Neutral	50% 50%	1,600
	Neutral	18%
	Oppression	28%
Election	Liberty	56%	4,366	Liberty/Oppression Neutral	50% 50%	1,532
	Neutral	18%
	Oppression	26%
Reddit	Libertarian	51%	100,000	Libertarian	51%	100,000
Reddit	Conservative	49%	100,000	Conservative	49%	100,000
Wikipedia+ Conservapedia	Libertarian	50%	57,078	Libertarian	50%	57,078
Wikipedia+ Conservapedia	Conservative	50%	57,078	Conservative	50%	57,078
Vaccination	Liberty/Oppression	89%	1,576	Liberty/Oppression	50%	356
Vaccination	Neutral	11%	1,576	Neutral	50%	356

Table 1: Overview of the datasets used in this work. Generation of the lexicons is performed on the Original version of the datasets while training of the regression models is performed on the Balanced version of the datasets.

Reddit.

Reddit is increasingly becoming a reliable data source in computational studies (Proferes et al., 2021). Aiming to profile the language of libertarian and conservative users, we extracted textual content from the r/Libertarian and r/Conservative communities, which are self-proclaimed networks of libertarian and conservative ideas, respectively. Initially, we considered posts and comments published between August 2008 and April 2021, obtaining overall 1,127,005 documents. From these, we have filtered empty and other unusable content, and undersampled the rest to obtain a final amount of 100,000 instances.

Wikipedia+Conservapedia.

We use the dataset described in Araque et al. (2021), based on page alignment between Wikipedia and Conservapedia according to their title (henceforth the WikiCon dataset). More than 37,000 articles between Wikipedia and Conservapedia have been aligned, of which approximately 28,000 pages had identical titles, and the remaining were aligned based on redirect pages. The entire corpus contains 106 million tokens and 558,000 unique words. The dataset has been filtered using page categories related to politics, while a length ratio filter has been applied between Wikipedia and Conservapedia documents to improve dataset quality. This ratio compares the number of words in a Wikipedia document to the number of terms in the corresponding Conservapedia document, and excludes the pairs with ratio higher than 10, resulting in 57,078 documents split equally between 28,539 Conservapedia and Wikipedia sources.

Vaccination.

Finally, we use a dataset on vaccinations, which comprises anonymous posts and comments from about 200 Facebook Pages, collected through the Facebook API from January 2012 to June 2019 Prado et al. (2022). The total number of comments and posts from both sides of the vaccine debate amounts to 607,105. The creators of the dataset randomly selected approximately 1,500 comments and manually annotated the presence of the liberty moral foundation in the snippet, indicating also the polarity of the foundation as “virtue” (liberty) or “vice” (oppression). A summary of the datasets used in this work is presented in Table 1.

4 Methods & Evaluation

4.1 Data Preprocessing

A basic preprocessing was performed for all datasets, consisting of the following steps: stop words removal, token normalization, punctuation filtering, and removal of short words (i.e., terms with less than three letters). Additionally, since the original datasets have slightly different annotations schemes as seen in Table 1, we aligned them, creating a binarised and balanced version of each dataset aggregating the labels accordingly. The binarisation process was performed by aggregating the Liberty and Oppression labels, thus creating a dataset where the annotation is either “expresses liberty/oppression” or “doesn’t express this moral foundation”, then balancing the classes by randomly undersampling the most populated class to match the population of the smaller class.

4.2 Lexicon Generation

Word Embedding Similarity

Based on the approach proposed by Turney and Littman (2003), our first strategy for generating lexicons relies on word embedding similarity between the vectors of the positive and negative instances of a dataset’s documents. Hence, the method relies on a set of seed words that accurately represent the domains we aim to differentiate. However, arbitrary selection of seed words can bias the output, since variations in the seed word list lead to differences in the final lexicon. To overcome this issue, we obtained the set of seed words in a data-driven way by estimating the frequency shifts (Gallagher et al., 2021) of the lemmas between the positive and negative documents, as done in Araque et al. (2022). This approach helps us to avoid the limitations of arbitrarily selecting the seed words. Thus, we consider the relative frequency of a word $w$ in a set of documents $D$ :

p_{w}^{(D)}=\frac{f_{w}^{(D)}}{\sum_{w^{\prime}\in W^{(D)}}f_{w^{\prime}}^{(D)}}

(1)

where $w^{\prime}\in W^{(d)}$ are the words in vocabulary set $W^{(D)}$ except for $w$ . We compute the frequency shift with relation to the relative frequency per word $w$ between two different sets of documents as:

\delta p_{w}=p_{w}^{(2)}-p_{w}^{(1)}

(2)

The seed word lists are generated based on prominent differences in word frequency shifts. We apply a minimum frequency threshold of 100 to filter out less common lemmas. We then use the word2vec algorithm (Le and Mikolov, 2014) to compute the vector for each word, using the standard parameter setting and a vector dimension of 300. The lexicon is generated by estimating the cosine similarity between the word vectors obtained using the emerging seed words. To compute the moral polarity of a word $w_{i}$ from the documents, we use the sets of seed words for the “oppressive” orientation ( $S_{C}$ ) and the “liberty” direction ( $S_{L}$ ), and estimate the polarity based on the cosine similarity:

\sum_{w_{j}\in S_{L}}\text{sim}(w_{i},w_{j})-\sum_{w_{k}\in S_{C}}\text{sim}(w% _{i},w_{k})

(3)

where sim represents the cosine similarity as estimated by the word embedding model. The obtained polarity is positive if $w_{i}$ is related to the positive seed words and a negative value if the word is more related towards the negative seed words. For the rest of the paper, we refer to this model as the WE model and we generate one lexicon for each dataset (except Vaccine, which is only used for testing).

Compositional Semantics

The second approach involves using the Compositional Semantics (CS) method (Liang et al., 2013), previously used to generate emotion lexicons Staiano and Guerini (2014); Araque et al. (2019). The CS method applies a projection of moral values from a document to its words. The underlying assumption is that each word is associated with the moral value present in the documents where the word appears more frequently.

To generate a word-by-moral association matrix ( $M_{WM}$ ), we first create a document-by-moral matrix $M_{DM}$ , which shows the distribution of the liberty foundation across the training dataset. We then generate a word-by-document matrix $M_{WD}$ , which indicates the number of occurrences for each word in the vocabulary within a given document, normalized by the total number of words per document. To obtain the word-by-moral matrix, we perform a multiplication using the following expression:

M_{WM}=M_{WD}\cdot M_{DM}

(4)

Using this approach, words and their corresponding value of liberty can be merged by calculating the product of the weight of a word and the weight of the moral value in each document. The resulting scores are then normalised (column-wise), over-representation issues are addressed, and each lemma is scaled (row-wise) to sum up to one. Previous validation of lexicons has shown that this normalisation approach is suitable Araque et al. (2019, 2022). This approach is referred to as the CS model and we generate one lexicon for each dataset except Vaccine.

Overlap Lexicon

The domain-specific lexicons express the liberty dimension that is dependent on the topics of the dataset on which they are generated from. However, we are interested in deriving a general, higher-level representation, so that the final users of the resource have a unified and domain-independent resource. To this regard, we synthesise a unified resource merging the obtained discrete lexicons. This approach (i) augments the coverage of the consolidated lexicon and (ii) discards uncommon tokens and their annotations. The basic process to obtain such a lexicon starts by defining a unified vocabulary as the union of the vocabularies of all individual lexicons. This union can be controlled with a selection parameter expressed as a percentage value. That is, if we define a selection parameter of 50%, a word would be included in the union of vocabularies if it appears in at least the 50% of all considered lexicons.

Then, we align the numeric assignment each token has in the individual lexicon. We estimate the average score of these assignments, incorporating them into the unified resource, if the volume of annotations satisfies the threshold stipulated by the chosen proportion (selection parameter). We denote the obtained lexicon as Lexicon Overlap (avg.)⁵⁵5The lexicon will be released upon acceptance..

4.3 Evaluation

To evaluate the performance of the generated resources we designed a wide array of supervised classification tasks.

In-domain evaluation

. We analyze the in-domain performance of our lexicons by testing them on a left-out set of the datasets they are generated from (consisting of 20% of the original data). Notice how, depending on the dataset, the task is slightly different due to the different type of annotations: (i) for the BLM and Elect. datasets, the classifier should predict whether the document expresses notions of liberty/oppression or is neutral; (ii) for the Reddit and WikiCon datasets, it should predict whether a document expresses the libertarian or the conservative point of view..

To avoid overfitting, each model is training on the training set of the respective balanced dataset (see Table 1), leaving the test set for evaluation.

Out-of-domain evaluation

. We perform a series of out-of-domain experiments, testing how well the lexicons can generalize to different domains. In particular, we measure the performance of: (i) the lexicons generated from Reddit and WikiCon used on the BLM and Election datasets; (ii) the lexicons generated from BLM and Elect. used on the Reddit and WikiCon datasets; (iii) the lexicons generated from BLM and Elect., trained on BLM/Elect. and tested on the Vaccine dataset.

Here the different annotation schemes and domains could potentially pose a bigger challenge for the classifier; however, while in the first two cases we can expect to see the impact of the different vocabulary, the train/test split is still coming from the same dataset (in other words, the model has to learn a task using sub-optimal features, but having “coherent” data for training and testing). The Vaccine dataset is instead used to test the out-of-domain performance of the lexicons when the annotation scheme of the evaluation dataset (i.e., the presence or absence of liberty foundation) is coherent with the annotation scheme of the dataset from which the lexicons are generated (BLM and Elect.), but no ideal training data is available (as we are training on an annotated dataset -BLM or Elect- that is different from the Vaccine test dataset).

For all experiments we utilized logistic regression (Alpaydin, 2020) and represented each document using a vector of the same length as each lexicon vocabulary. The feature vectors are constructed as follows; each document is represented by a vector of equal size to the lexicon. For those tokens in the document present in the lexicon, the vector contains the respective polarity score, otherwise zero. Since this type of representation dramatically simplifies the linguistic information present in the document, we enhance the classification design with two more experiments. We extend each vector representation with the “statistical summary” functions, namely the average, maximum, median, variance, and a peak-to-peak score of the lexicon values of that document. This offers the learning models a more complete view of the text.

Refer to caption — Figure 1: Diagram of aggregation through combined representation for an example case with 4 individual lexicons.

Combined lexicons

. To test whether it is possible to obtain a more “general purpose model”, we evaluate two ways of combining the information coming from the different lexicons: (i) the lexicon overlap described in Section 4.2, which averages the values for words that appear in multiple lexicons; (ii) the combined representation, which is not a lexicon, but a method of learning a unified representation by taking into account all available lexicons.

The advantage of the first method is its simplicity, and that it results in an interpretable lexicon. On the other hand, the combined representation allows a learning model to observe simultaneously all information contained in the individual lexicons (including words not shared among them); the model might then be able to exploit existing interactions among them.

While the strength of this second approach is that it provides a comprehensive representation obtained through all individual lexicons, overfitting may occur given the large dimensionality of representation. To avoid such issue, we include a feature selection mechanism in the learning model so that the dimension of the feature vector can be reduced. Our approach is based on the Singular Value Decomposition technique (SVD) (Halko et al., 2011) for transforming the representation of a single lexicon into a continuous vector, which is then input to a machine learning algorithm, in our case a logistic regressor. Figure 1 illustrates the hierarchical structure of the proposed model.

Since these two approaches take into account all generated lexicons, we can consider the results of the lexicon overlap and combined representation methods: (i) an in-domain evaluation when applied to the BLM, Elect., Reddit or WikiCon datasets (since they are used to generate the individual lexicons), (ii) an out-of-domain evaluation, when applied to the Vaccine dataset (since this is only used as a test set).

Baselines.

As baseline models, we train two classifiers using a unigram representation that includes a frequency-selected vocabulary of sizes 1,000 and 10,000 tokens respectively. The two sizes are comparable to the size of the obtained lexicons (see Table 2).

Lexicon ranking.

To assess the general quality of the lexicons and obtain an overall ranking of their performance, we have performed the Friedman statistical test over all the evaluation results (Araque et al., 2017; Demšar, 2006). In the Friedman test a lower ranking implies a better result for a certain method in comparison to the rest. In case of ties, these are resolved by averaging the obtained ranks. The Friedman test has been performed with $\alpha=0.05$ , rejecting the null hypothesis. We report the macro-averaged F-score as well as the Friedman rank for the overall evaluation of each resource.

Lexicon source	Tok. count CS	Tok. count WE
BLM	724	6,764
Elect.	1,994	8,777
Reddit	10,881	63,965
WikiCon	61,859	62,564
Lexicon overlap	22,391

Table 2: Number of tokens per lexicon generated for each method.

5 Results & Discussion

5.1 Lexicon Generation

Table 2 shows the vocabulary size for all the generated lexicons. As previously mentioned, we generated one lexicon from each of the datasets (see Table 1) using the two proposed methods WE and CS, except for the Vaccine dataset, which is used only for the out of domain evaluation. Finally, using the overlapping approach, we obtained a representation that combines the shared tokens from the individual lexicons, using their average scores related to liberty. For the lexicons generated with the Compositional Semantics method, we applied a 10 frequency cut-off for Reddit and WikiCon. Due to the limited number of annotated instances in the BLM and Election datasets, we have set a 6 and 3 frequency cut-off, respectively. These cut-off variations have been experimentally validated on the training data, and are in line with the literature Araque et al. (2019). For the lexicons generated with the WE method, the same frequency cut-off has been applied to the Reddit and WikiCon lexicons, while we did not apply any cut-off for the BLM and Election lexicons generated this way, to increase their vocabulary size. Generally, we have observed that the two methods show a dependency between the number of annotated instances in the training data and the resulting vocabulary size.

For the overlapping lexicons and due to space limitations, we report the data for the lexicon generated using as selection (cut-off) parameter 40%, which resulted the best combination in the supervised evaluation. To do this, we evaluated the selection parameter in the range [10%, 20%, …, 100%] on the train sets of the considered datasets using 10-fold cross-validation. This selection justifies the more limited number of tokens with respect to the aggregation of all the lexicons’ tokens.

5.2 Lexicon Evaluation

	BLM	Elect.	Reddit	WikiCon	Vaccine	Vaccine	Friedman
					(BLM)	(Elect.)	Rank
Features:
Unigram (1000)	50.90	50.12	66.80	83.84	52.11	43.66	9.1
Unigram (10,000)	51.84	51.81	68.39	88.10	51.17	49.70	8.8
BLM (CS)	51.53	48.38	61.99	81.43	47.12	33.33	8.6
Elect. (CS)	51.21	50.52	63.59	84.89	41.44	35.67	10.8
Reddit (CS)	52.20	49.78	69.01	89.96	48.82	52.39	3.8
WikiCon (CS)	50.24	49.65	68.12	90.34	52.63	42.10	5.8
BLM (WE)	46.20	53.23	65.37	85.56	52.06	57.78	7.8
Elect. (WE)	49.76	47.50	66.84	88.68	49.15	58.15	5.6
Reddit (WE)	51.19	55.32	64.32	88.50	53.72	54.85	8.0
WikiCon (WE)	50.67	53.58	65.84	88.64	51.86	54.09	6.8
Lexicon Overlap	52.40	50.63	67.54	89.29	52.44	42.91	3.5
Combined Repr.	54.14	53.84	68.12	90.26	54.42	53.81	3.0

Table 3: Unified F1-macro scores and Friedman ranks. Each model is trained on feature sets estimated by the lexicon reported on each row, with training and testing done on the datasets reported in the column name. The “Vaccine (BLM)” and “Vaccine (Elect.)” columns are the results training on BLM/Elect. and testing on the balanced vaccine dataset, while the features are extracted from the lexicons of each row. Friedman rank shows the best to the worse performing model overall experiments (lower is better). In bold we indicate the lexicon that provides the most discriminatory features for each scenario.

Table 3 reports the results of the evaluation. As described (see Sect. 4.3), for each dataset, we extract linguistic features employing each of the generated lexicons and employ those to train a logistic regression model per dataset. Then we employ the obtained model to infer the liberty moral class of the respective test set.

In-domain evaluation

Our expectation would be that the models trained on features emerged from lexicons generated on the respective data source would outperform the rest. However, we notice that this is not true in most cases, except when models trained with the Reddit and WikiCon train sets on features extracted from the Reddit and WikiCon lexicons respectively are employed to distinguish between notions of liberty or oppression.

Looking at these results, it can be seen that generally the learning models trained on the CS lexicon features improve over the unigram baselines, showing that these lexicons capture useful representations. In particular, CS lexicons are consistently on-par or above the baseline, when the lexicon and the dataset are coherent (e.g. lexicon features generated from the BLM dataset, trained and tested to predict the BLM dataset annotations).

In contrast, the overlap lexicon approach, combining information from different lexicons, shows a fairly consistent performance across all datasets. These results indicate that the difficulty of the task calls for combined knowledge, since enriching the representations with linguistic information from different contexts and writing styles improves the recognition of the liberty moral value in text.

We obtain further confirmation for this hypothesis by looking at the Friedman test: when considering all evaluation combinations, the Combined Representations ranks as the best approach, followed by the lexicon overlap. This is to be expected, as this method makes use of all lexicons simultaneously, and learns internal representations that can be exploited by a machine learning model.

Out-of-domain evaluation.

At the level of individual lexicons, the Reddit one generated with the Compositional Semantics method achieves very good performance overall (being the third best performing lexicon in the Friedman rank) also when used on non-Reddit datasets. This may be an effect of the larger number of tokens and general quality of the original dataset, which probably includes a richer vocabulary for a variety of topics discussed by libertarians and conservatives. Besides, this observation offers the insight that, even though the annotations of the used lexicon are not completely aligned (e.g., using the Reddit lexicon for predicting Liberty/Oppression, while the Reddit dataset from which it is generated captures the Libertarian/Conservative divide), the knowledge captured by the lexicon can aid in the classification task. A possible explanation is that the lexicons cover the whole gamut of association strengths, thus capturing a balanced view of language and not just words strongly correlated with the liberty foundation; this could help the classifier learn the threshold between documents expressing this foundation (which will have more words with “extreme” values) and those which do not (probably consisting of mostly “neutral” words).

Regarding the “stricter” out-of-domain evaluation, the right side of Table 3 reports the results obtained when models fed with features extracted by the BLM, Elect, WikiCon, and Reddit lexicons (either CS or WE), are trained on the BLM and Elect. training dataset and tested on the Vaccine dataset. We notice that, again, the models trained on the Overlapping Lexicon are consistently performing well, while feature extraction based on individual lexicons led to models that did not consistently outperform the baseline.

This experiment offers interesting insights into the generalization capabilities of the proposed method. Although, the absolute best performance is obtained with the Elect. (WE) lexicon trained on the Election dataset, is not a generalisable finding; the same lexicon trained on the BLM data fails to outperform the baseline. Overall, this design offers insights on the adaptability and generalization capability of the lexicons. Interestingly, the combined representation approach ranks always high validating the fact that combining knowledge from different base lexicons does improve the understanding of the liberty foundation.

Domain Specific Insights

A recurrent pattern is that the overlapping lexicon outperforms the other lexicons in both in-domain and out-of-domain experiments. To gain more insights on the effect of the social context on the moral nuances a specific lemma may have, we employed the TOMEA approach proposed by Liscio et al. (2023). According to TOMEA, the overlapping lexicon differs the most with respect to the BLM lexicons generated by both the CS and the WE methods with scores .17 and .13, respectively. Glancing into the most distant words, we have “fake”, “lawmaker”, “elected”, “supporters”, “antifa”, “openly”, “tweets”, “sympathizer”, “tyranny”, “globalist”, “dead”, to be considered more oppressive than the average lexicons in the individual BLM lexicons than in the others. Such domain specific nuances may be important when analysing a specific argument but can introduce biased in the models when analysing broader subjects. TOMEA to this respect is a valuable method to gain insights and foster the transparency and accountability of the findings.

6 Conclusion

The aim of this study was to provide a lexical resource for the moral foundation of liberty able to generalise across various domains. The “Liberty/Oppression” foundation expresses people’s inclinations towards autonomy and their resistance to dominion.

Our contributions to the current state of the art are manifold. Firstly, we provide an enriched version of the MFTC annotations for the BLM and ELECT datasets as per the liberty foundation. Further, we generated a series of lexicons with two complementary approaches and thoroughly evaluated them via a series of both in- and out-of domain experimental scenarios. Aside the individual lexicons, we also proposed a final version combining the information from both approaches which is the resource we propose as the final Liberty Moral Lexicon. This resource showed solid generalisability potentials. Moreover, we design a combined representation that exploits information in all generated lexicons, thus offering a classifier a more comprehensive representation. As seen in the experimental evaluation, the generated resources capture relevant knowledge that can be leveraged to assessing liberty in texts. These insights are supported by the Friedman test, that offers us a ranking of methods. Finally, by employing the TOMEA method, we provide insights into the dynamics of linguistic variability according to the context in which the word is used.

References

Alpaydin [2020] Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
Amin et al. [2017] Avnika B. Amin, Robert A. Bednarczyk, Cara E. Ray, Cara E Ray, Cara Ray, Cara Ray, Kala J. Melchiori, Jesse Graham, Jeffrey R. Huntsinger, Saad B. Omer, Saad B. Omer, and Saad B. Omer. Association of moral values with vaccine hesitancy. Nature Human Behaviour, 2017. doi: 10.1038/s41562-017-0256-5.
Araque et al. [2019] O. Araque, L. Gatti, J. Staiano, and M. Guerini. Depechemood++: a bilingual emotion lexicon built through simple yet powerful techniques. IEEE Transactions on Affective Computing, pages 1–1, 2019. doi: 10.1109/TAFFC.2019.2934444.
Araque et al. [2017] Oscar Araque, Ignacio Corcuera-Platas, J. Fernando Sánchez-Rada, and Carlos A. Iglesias. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77:236–246, 2017. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2017.02.002. URL https://www.sciencedirect.com/science/article/pii/S0957417417300751.
Araque et al. [2020] Oscar Araque, Lorenzo Gatti, and Kyriaki Kalimeri. Moralstrength: Exploiting a moral lexicon and embedding similarity for moral foundations prediction. Knowledge-based systems, 191, 2020.
Araque et al. [2021] Oscar Araque, Lorenzo Gatti, and Kyriaki Kalimeri. The language of liberty: A preliminary study. In Companion Proceedings of the Web Conference 2021, WWW ’21, pages 623–626, 2021. ISBN 9781450383134. doi: 10.1145/3442442.3452351. URL https://doi.org/10.1145/3442442.3452351.
Araque et al. [2022] Oscar Araque, Lorenzo Gatti, and Kyriaki Kalimeri. LibertyMFD: A Lexicon to Assess the Moral Foundation of Liberty. In Proceedings of the 2022 ACM Conference on Information Technology for Social Good, GoodIT ’22, page 154–160, 2022. ISBN 9781450392846. doi: 10.1145/3524458.3547264. URL https://doi.org/10.1145/3524458.3547264.
Araque et al. [2023] Oscar Araque, Luca Barbaglia, Francesco Berlingieri, Marco Colagrossi, Sergio Consoli, Lorenzo Gatti, Caterina Mauri, and Kyriaki Kalimeri. Beyond the headlines: Understanding sentiments and morals impacting female employment in spain. In Workshop Proceedings of the 17th International AAAI Conference on Web and Social Media (Workshop on Data for the Wellbeing of Most Vulnerable), 2023. doi: 10.36190/2023.03.
Beiró et al. [2023] Mariano Gastón Beiró, Jacopo D’Ignazi, Victoria Perez Bustos, María Florencia Prado, and Kyriaki Kalimeri. Moral narratives around the vaccination debate on facebook. In Proceedings of the ACM Web Conference 2023, pages 4134–4141, 2023.
Consoli et al. [2022] Sergio Consoli, Luca Barbaglia, and Sebastiano Manzan. Fine-grained, aspect-based sentiment analysis on economic and financial lexicon. Knowledge-Based Systems, 247:108781, 2022. doi: 10.1016/j.knosys.2022.108781.
Davidson et al. [2017] Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International Conference on Web and Social Media, ICWSM 2017, page 512 – 515, 2017.
Demšar [2006] Janez Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan):1–30, 2006.
Gallagher et al. [2021] Ryan J Gallagher, Morgan R Frank, Lewis Mitchell, Aaron J Schwartz, Andrew J Reagan, Christopher M Danforth, and Peter Sheridan Dodds. Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts. EPJ Data Science, 10(1):4, 2021. doi: 10.1140/epjds/s13688-021-00260-3.
Garten et al. [2018] Justin Garten, Joe Hoover, Kate M Johnson, Reihane Boghrati, Carol Iskiwitch, and Morteza Dehghani. Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis. Behavior Research Methods, 50(1):344–361, 2018.
Garten et al. [2019] Justin Garten, Brendan Kennedy, Joe Hoover, Kenji Sagae, and Morteza Dehghani. Incorporating demographic embeddings into language understanding. Cognitive science, 43(1), 2019.
Graham et al. [2009] Jesse Graham, Jonathan Haidt, and Brian A Nosek. Liberals and conservatives rely on different sets of moral foundations. Journal of personality and social psychology, 96(5):1029, 2009. doi: 10.1037/a0015141.
Haidt [2012] Jonathan Haidt. The righteous mind: Why good people are divided by politics and religion. Vintage, 2012.
Haidt and Joseph [2004] Jonathan Haidt and Craig Joseph. Intuitive ethics: How innately prepared intuitions generate culturally variable virtues. Daedalus, 133(4):55–66, 2004.
Halko et al. [2011] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288, 2011. doi: 10.1137/090771806.
Hoover et al. [2020] Joe Hoover, Gwenyth Portillo-Wightman, Leigh Yeh, Shreya Havaldar, Aida Mostafazadeh Davani, Ying Lin, Brendan Kennedy, Mohammad Atari, Zahra Kamel, Madelyn Mendlen, et al. Moral foundations twitter corpus: A collection of 35k tweets annotated for moral sentiment. Social Psychological and Personality Science, 11(8):1057–1071, 2020.
Hoover et al. [2017] Joseph Hoover, Kate Johnson-Grey, Morteza Dehghani, and Jesse Graham. Moral values coding guide. PsyArXiv, 2017.
Hopp et al. [2021] Frederic R Hopp, Jacob T Fisher, Devin Cornell, Richard Huskey, and René Weber. The extended moral foundations dictionary (emfd): Development and applications of a crowd-sourced approach to extracting moral intuitions from text. Behavior research methods, 53(1):232–246, 2021.
Kennedy et al. [2021] Brendan Kennedy, Mohammad Atari, Aida Mostafazadeh Davani, Joe Hoover, Ali Omrani, Jesse Graham, and Morteza Dehghani. Moral concerns are differentially observable in language. Cognition, 212:104696, 2021.
Le and Mikolov [2014] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International Conference on Machine Learning, pages 1188–1196, 2014.
Liang et al. [2013] Percy Liang, Michael I Jordan, and Dan Klein. Learning dependency-based compositional semantics. Computational Linguistics, 39(2):389–446, 2013.
Liscio et al. [2023] Enrico Liscio, Oscar Araque, Lorenzo Gatti, Ionut Constantinescu, Catholijn Jonker, Kyriaki Kalimeri, and Pradeep Kumar Murukannaiah. What does a text classifier learn about morality? an explainable method for cross-domain comparison of moral rhetoric. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14113–14132, July 2023. doi: 10.18653/v1/2023.acl-long.789. URL https://aclanthology.org/2023.acl-long.789.
Mejova et al. [2023] Yelena Mejova, Kyriaki Kalimeri, and Gianmarco De Francisci Morales. Authority without care: Moral values behind the mask mandate response. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, pages 614–625, 2023.
Mokhberian et al. [2020] Negar Mokhberian, Andrés Abeliuk, Patrick Cummings, and Kristina Lerman. Moral framing and ideological bias of news. In International Conference on Social Informatics, pages 206–219. Springer, 2020.
Mooijman et al. [2018] Marlon Mooijman, Joe Hoover, Ying Lin, Heng Ji, and Morteza Dehghani. Moralization in social networks and the emergence of violence during protests. Nature Human Behaviour, 2(6):389–396, 2018.
Prado et al. [2022] Maria Florencia Prado, Victoria Perez Bustos, Kyriaki Kalimeri, and Mariano G. Beiro. Narratives around the vaccination discourse on the facebook platform. arXiv preprint, 2022.
Preniqi et al. [2021] Vjosa Preniqi, Kyriaki Kalimeri, and Charalampos Saitis. Modelling moral traits with music listening preferences and demographics. arXiv preprint arXiv:2107.00349, 2021.
Proferes et al. [2021] Nicholas Proferes, Naiyan Jones, Sarah Gilbert, Casey Fiesler, and Michael Zimmer. Studying reddit: A systematic overview of disciplines, approaches, methods, and ethics. Social Media + Society, 7(2):20563051211019004, 2021. doi: 10.1177/20563051211019004.
Rezapour et al. [2021] Rezvaneh Rezapour, Ly Dinh, and Jana Diesner. Incorporating the measurement of moral foundations theory into analyzing stances on controversial topics. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media, pages 177–188, 2021.
Staiano and Guerini [2014] Jacopo Staiano and Marco Guerini. Depeche mood: a lexicon for emotion analysis from crowd annotated news. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 427–433, Baltimore, Maryland, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/P14-2070. URL https://aclanthology.org/P14-2070.
Tausczik and Pennebaker [2010] Yla R Tausczik and James W Pennebaker. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1):24–54, 2010.
Turney and Littman [2003] Peter D Turney and Michael L Littman. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4):315–346, 2003.
van der Veen [2022] A. Maurits van der Veen. Blmtwitter: The black lives matter (blm) twitter corpus. SocArxiv, 10.31235/osf.io/kna9s:1–11, 2022.
Zhang et al. [2023] Weiyu Zhang, Rong Wang, and Haodong Liu. Moral expressions, sources, and frames: Examining covid-19 vaccination posts by facebook public pages. Computers in Human Behavior, 138:107479, 2023. ISSN 0747-5632. doi: https://doi.org/10.1016/j.chb.2022.107479. URL https://www.sciencedirect.com/science/article/pii/S0747563222002990.