How Similar Are Elected Politicians and Their Constituents? Quantitative Evidence From Online Social Networks

Waleed Iqbal1, Gareth Tyson1, 2, Ignacio Castro1
Abstract

How similar are politicians to those who vote for them? This is a critical question at the heart of democratic representation and particularly relevant at times when political dissatisfaction and populism are on the rise. To answer this question we compare the online discourse of elected politicians and their constituents. We collect a two and a half years (September 2020 – February 2023) constituency-level dataset for USA and UK that includes: (i) the Twitter timelines (5.6 Million tweets) of elected political representatives (595 UK Members of Parliament and 433 USA Representatives), (ii) the Nextdoor posts (21.8 Million posts) of the constituency (98.4% USA and 91.5% UK constituencies). We find that elected politicians tend to be equally similar to their constituents in terms of content and style regardless of whether a constituency elects a right or left-wing politician. The size of the electoral victory and the level of income of a constituency shows a nuanced picture. The narrower the electoral victory, the more similar the style and the more dissimilar the content is. The lower the income of a constituency, the more similar the content is. In terms of style, poorer constituencies tend to have a more similar sentiment and more dissimilar psychological text traits (i.e. measured with LIWC categories).

1 Introduction

Dissatisfaction with democracy is reaching an all time high (Foa et al. 2020) and populism is on the rise (Martelli and Jaffrelot 2023; Pew Research Center 2022). Recent survey data shows that voters do not feel well represented by their elected politicians (Pew Research Center 2024). Jones (Jones 2020) argues that left-wing politicians in the UK are increasingly dissimilar to the lower income voters that they claim to represent. Piketty (Piketty 2018) left-wing parties are increasingly failing to represent their traditional lower income voters across western countries.

This disconnect between political representatives and their constituents seems to be at the heart of the growing political distrust. Voters worldwide feel that politicians should be more alike to the voters they represent (Pew Research Center 2023). Evidence from the UK shows that the perception that political representatives have little common ground with voters is indeed a key driver of political distrust (Valgarösson et al. 2021). In this paper we look at this precise issue. We study the similarities between political representatives and their constituents by comparing the online discourse of elected representatives and the voters in their respective constituency. We are particularly interested in analyzing whether the degree of similarity varies depending on factors such as political ideology, income, and the margin of victory in the most recent elections.

To examine this, we collect two large datasets with the online discourse of elected politicians and their constituents, and map the elected politicians to their constituents (i.e. who elected them) for most constituencies in USA and UK. Specifically, we collect (i) Twitter timelines of elected politicians, and (ii) Nextdoor posts from their respective constituents. Nextdoor is a location-based social network where users interact within closed social networks of neighbors that have validated their home addresses, i.e. a constituent posting about a local issue is in fact residing in that constituency. We leverage this to map users to their respective constituency. Our Nextdoor dataset includes 21.8 Million posts from 190,706 neighborhoods (433 (98.4%) constituencies) in the United States (USA) and 21,046 neighborhoods (595 (91.5%) constituencies) in the United Kingdom (UK) between September 2020 and February 2023. These constituencies cover 98.6% population in the USA and 94.6% population in the UK.

We also collect Twitter timelines of Members of Parliament (MPs) from the UK’s House of Commons and USA Representatives (USAReps) from the USA’s House of Representatives. This includes 5.6 Million tweets from 433 USAReps’ account timelines out of 440 USAReps and 595 UK MPs’ account timelines out of 650 UK MPs.

Using this data, we conduct a constituency-level study: for each constituency, we compare the online discourse of the elected politician (Twitter) with the online discourse of the constituents (Nextdoor) who elected the politician. We compare the online discourse both in terms of content and style by looking at the semantic similarity of the content and psychological traits in the text (LIWC categories) and text sentiment. We address the following Research Questions (RQs):

  • RQ1: Are right-wing politicians more similar to their constituencies than left-wing politicians?

  • RQ2: Are politicians in more disputed constituencies more similar to their constituents?

  • RQ3: Are politicians in poorer constituencies more similar to their constituents than in richer ones?

Our main findings include:

  • We find that elected politicians are frequently equally similar in terms of content and style. This is the case regardless of whether they are a right or left-wing politician. The level of similarity is relatively low but higher than when comparing constituents with elected politicians of other constituencies: we compare right-wing constituencies with left-wing politicians (and vice versa) and find that the similarity is substantially lower.

  • The size of the victory in the elections shows a nuanced picture. We find that narrower electoral victories are associated with a more similar style (both in terms of LIWC categories and sentiment). We find the opposite for content: the larger the victory the more similar the content tends to be.

  • Income is also related with varying similarities. Constituencies with lower income tend to have more similar content. In terms of style, poorer constituencies tend to have a more similar sentiment and the opposite is true for the psychological markers (i.e. LIWC categories).

2 Data and Methodology

2.1 Datasets

Nextdoor primer.  Nextdoor is a location-based social network with over 305,000 registered neighborhoods in 11 countries and over 69 million users (Nextdoor 2023). Nextdoor divides geographical areas into neighborhoods and assigns users to the neighborhood where they reside. To ensure that a user is a neighbor of a particular neighborhood, new users validate their home addresses (e.g. via regular “snail” mail).

For each neighborhood, Nextdoor creates a dedicated forum where users post and interact (e.g. reply, and react to each other’s posts). Users exclusively interact with their neighbors, i.e. the users of the neighborhood they are associated with. As a result, the data from a neighborhood exclusively includes the posts of the users who have validated their location in that geographical area. We refer to the specific areas into which Nextdoor divides a region as neighborhood.

Nextdoor Data Collection.  We collect 21,845,284 Nextdoor posts from 212,644 neighborhoods between September 2020 to February 2023 using our custom web scrapers. Our data includes all neighborhoods of all USA voting states111We do not include neighborhoods in American Samoa, the U.S. Virgin Islands, Guam, the Northern Mariana Islands, and Puerto Rico. (17,397,380 posts from 190,761 USA neighborhoods) and UK (4,447,906 posts, 21,883 neighborhoods). We also discard a constituency when it does not have a political representative (2 vacant USA constituencies with 51 neighborhoods and 3024 Nextdoor posts where representatives passed away) or if the representative has no Twitter account (55 UK constituencies with 837 neighborhoods and 9098 posts).

Our final dataset includes 21,833,162 posts from 211,752 neighborhoods (Table 1). Our data has almost 10 times more posts and triples the number of neighborhoods in previous work (Iqbal et al. 2023) which only included the 10 largest UK cities (15.8% neighborhoods, 7.96% posts), 33.7% USA neighborhoods (12.6% posts) in the USA, and 19 months less (November 2020–September 2021).

To gather information about Nextdoor neighborhoods and geolocate them, we employ the methodology in (Iqbal et al. 2023). Similarly, we map each neighborhood to the available lowest geographical granularity in official statistical data: the ZIP code in the USA and the Lower Layer Super Output Area (LSOA) in the UK.

Constituency Data.  We map neighborhoods into the official political constituencies according to the data from the USA House of Representatives (USA House of Representatives 2023) and the UK House of Commons (UK House of Commons 2023). We refer to the representative of a constituency as its elected politician for both, Members (USAReps) from the US’s House of Representative222https://www.house.gov/ and the Members of Parliament (MPs) from UK’s House of Commons.333https://www.parliament.uk/business/commons/

For each constituency, we combine all the Nextdoor posts of the corresponding neighborhoods and compare them with all the posts of the elected politician for the constituency.

We refer each constituency as left or right-wing based on the political leaning of their elected politician’s party. In our data, there are two USA parties: the right-wing Republican Party and the left-wing Democratic Party. We find 220 (50.8%) right-wing constituencies (covering 51.8% of the USA neighborhoods and 47.3% of its posts), and 213 (49.2%) constituencies with 48.2% USA neighborhoods, and 52.7% of its posts).

In the UK we find 11 political parties with elected politicians. We classify them into right and left-wing as in (Jolly et al. 2022). Three small parties are not included in (Jolly et al. 2022) classification: Social Democratic and Labour Party (2 constituencies, 10,944 posts), Alba Party (2 constituencies, 1,645 posts), and Alliance Party of Northern Ireland (1 constituency, 4,224 posts). We manually classify these as left-wing by inspecting their corresponding official website (Jarrett 2016).

We find 315 (52.1%) UK constituencies, 11,224 (53.3%) neighborhoods and 2,011,624 (45.3%) posts within right-wing constituencies; and 280 (47.9%) constituencies, 9,822 (46.7%) neighborhoods and 2,427,182 (54.7%) posts within left-wing ones.

Twitter Data.  We also collect the Twitter timelines of the elected politicians of each constituency in the dataset using Twitter Academic API.444https://developer.twitter.com/en/use-cases/do-research/academic-research We obtain 1,391,063 tweets from 433 USAReps’ account timelines out of 440 USAReps. We excluded seven USAReps from our US dataset either represent non-voting estates or have deceased. Therefore, we do not include their constituencies in our dataset. We also collect 4,203,521 tweets from the Twitter timelines of 595 UK MPs. We identify 55 MPs without Twitter accounts and, as mentioned before, we do not include these 55 constituencies (837 Nextdoor neighborhoods, 9098 Nextdoor posts) in our dataset.

Our USA data includes 63.32% of tweets from left-wing elected politicians and 36.68% from right-wing ones. For the UK, our data contains 66.37% of tweets by left-wing politicians and 33.63% from right-wing ones.

Attributes USA UK Total Nextdoor posts 17,394,356 4,438,806 21,833,162 Tweets 1,391,063 4,203,521 5,594,584 Neighborhoods 190,706 21,046 211,752 Zip codes (USA)/LSOAs (UK) 38,497 16,235 54,732 Neighbors 48,602,160 9,744,948 58,347,108 Constituencies 433 595 1028 Twitter accounts (elected politicians) 433 595 1028

Table 1: Nextdoor and Twitter Dataset (after data cleaning).

2.2 Data Augmentation

Income and population.  For each neighborhood, we collect socioeconomic data at the constituency level from the official statistics. For the USA constituencies, we obtain the population and median annual income from the latest census (US Census 2022). For the UK, we obtain the population and median annual income from the UK’s Office of National Statistics from the latest Census update (ONS UK 2022).

Political and polling data.  We collect the constituency name, Twitter username, party affiliation, and polling results from the House of Representatives (USA House of Representatives 2023) and the House of Commons (UK House of Commons 2023). Elected politicians and constituents might have a different online discourse depending on how disputed in the polls their constituency is. To assess this, we rank constituencies based on the size of the electoral victory of the winning candidate over its immediate competitor and calculate deciles of the size of the victory. The first decile corresponds to the least disputed constituencies (i.e. landslide victory) and the tenth one to the most disputed ones (i.e. narrow victory).

Text embeddings.  To investigate whether the text posted differs across constituencies, we obtain semantic features of the posts via embedding. Prior to obtaining vector embeddings of our text, we pre-process our Nextdoor and Twitter datasets (e.g. removing mentions, URLs, etc.). For each constituency, we combine all the posts of all the respective neighborhoods. We then convert each post’s text into a single vector embedding using the pre-trained sentence transformer model all-mpnet-base-v2. This model is tuned to map every sentence or short paragraph (up to 384 tokens) to a 768-dimensional vector space while preserving relevant text features (Song et al. 2020).

The number of tokens for all posts in a constituency is high and we cannot directly employ all-mpnet-base-v2. The number of tokens for the constituency are in higher orders of magnitude. The median number of tokens across constituencies in the Twitter data is 132,153 and 216,956 for the USA and UK, and for the Nextdoor data they are even larger with a median of 1,640,203 and 191,990 tokens in USA and UK. These numbers even exceed the maximum sequence length size for highest maximum sequence length sentence embedding model, which is 8096 tokens per input (Günther et al. 2023) and of our model (maximum sequence length of up to 384 tokens per input). Figure 8 shows the distribution of pre-processed tokens in each constituency, dataset, and country.

To deal with this challenge, we compute text embedding of each post in constituency and then aggregate the textual embedding of all posts of a constituency with pair-wise mean-pooling aggregation. Mean-pooled textual embedding can represent the complete text corpus into single embedding while maintaining the context of text (Arora, Liang, and Ma 2017; Oh, Li, and Wang 2023; Singh 2022). A benefit of this approach, is the weighted importance of content discussed more frequently. We finally obtain a 768 dimensional (mean-pooled) embedding for the Nextdoor and Twitter data in each constituency.

LIWC Categories.  Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al. 2015) is a lexicon-based tool that measures psychologically relevant dimensions in text. These dimensions capture both linguistic (e.g. personal pronoun usage) and psychological aspects (e.g. effective and social processes) which have been externally validated (e.g. measurement of emotions (Kahn et al. 2007) and social hierarchies (Kacewicz et al. 2014)). The LIWC categories consist of 117 categories including 11 root categories, 8 summary variables categories, and 98 subcategories within the root ones (Boyd et al. 2022). We merge all posts from a constituency into a single corpus of text and obtain LIWC scores on the constituency level using the LIWC 2022 dictionary (academic license).555https://www.liwc.app/ Note that for ease of comparison, we re-scale the LIWC scores from 0–99 to 0-1 so they are comparable with the rest of our results.

Post sentiment.  We label each post’s sentiment with a pre-trained Valence Aware Dictionary and Sentiment Reasoner (VADER) model (Hutto and Gilbert 2014). VADER outperforms the typical human reader for social media data (VADER’s F1=0.96, Human F1=0.84) (Hutto and Gilbert 2014). This also allows for comparison with earlier results with Nextdoor data used this same model (Iqbal et al. 2023).

Topic Modeling.  We identify topics in our data using BERTopic (Huggingface 2022). BERTopic relies on pre-trained transformer-based language models to build document embeddings. It then clusters these embeddings and generates topic representations using the class-based TF-IDF technique (Grootendorst 2022). We use HDBSCAN clustering (Campello, Moulavi, and Sander 2013) for our BERTopic model and combine the Twitter and Nextdoor data. We calculate our BERTopic model on the different number of topics (1–100) achieving the highest coherence score (0.48) on 50 topics (Figure 11).

Refer to caption
(a) Nextdoor
Refer to caption
(b) Twitter
Figure 1: Cumulative distribution of Nextdoor and Twitter data across constituencies in USA and UK.

2.3 Population and data coverage

Our data includes 98% constituencies in USA and 92% constituencies in UK (and all zipcode/LSOA for each constituency). This provides comprehensive geographical coverage and substantially extends the data in (Iqbal et al. 2023). Figure 1 shows the cumulative distribution of posts (Nextdoor) and tweets (Twitter) at the constituency level for USA and UK. We observe similar distributions across both countries with 40% of the constituencies contribute around 59% and 61% of the Nextdoor posts in USA and UK, respectively.

We observe similar trends for tweets with 40% elected politicians responsible for about 61% and 66% of the tweets in USA and UK, respectively.

Population Nextdoor Posts Tweets Nextdoor Users USA UK USA UK USA UK USA UK Population 1 1 0.87 0.92 0.92 0.88 0.83 0.81 Nextdoor Posts 0.87 0.92 1 1 0.90 0.89 0.93 0.88 Tweets 0.92 0.88 0.90 0.89 1 1 N/A N/A Nextdoor Users 0.83 0.81 0.93 0.88 N/A N/A 1 1

Table 2: Correlation between posts, population, neighborhoods, neighbors, and official population.

We further investigate how well our data covers the underlying population by calculating the Pearson correlation between the population of constituencies, tweets and the Nextdoor posts and users in Table 2. Similarly to (Iqbal et al. 2023), we observe high correlation coefficients (above 0.8), giving us confidence in the ability of our data to reflect the underlying population.

3 Content Similarity

We first wonder up to what extent elected politicians and constituents discuss the same things. We first compare the topics they both discuss and then compare how similar their discourse is by comparing the embeddings of their respective posts.

3.1 Topics similarity

We first examine which topics are discussed by elected politicians and constituents across different constituencies. Figure 2 presents the 20 most discussed topics for USA and UK. These top 20 topics cover more than 85% content discussed over Nextdoor and Twitter (i.e. the remaining 30 topics cover just 15% of the content).

Elected politicians and their constituents discuss similar topics across the political spectrum.  The most discussed topics are similar regardless of the political color of a constituency. This is true for both elected politicians (in Twitter) and constituents (in Nextdoor), although in a slightly different order and size. “Education” and “Natural Disasters” are the most discussed topics in online discourse of elected politician (25.3% and 18.2% of total tweets) and constituents (9.8% and 4.3% of total ND posts) in USA. We observe similar case in UK where “Festivity” and “Cost of Living Crisis” are most discussed topics by elected politicians (25.1% and 20.4% of total tweets) and constituents (10.6% and 4.9% of total ND posts). The topics differ across both countries albeit with some overlaps (i.e. education, Covid-19, Energy-crisis, Veterans, Taxes, Elections).

To investigate this further, for each constituency we compare elected politicians and their constituents by calculating the cosine similarity of the topics discussed, weighted by their occurrence. Table 3 shows the mean cosine similarity between topic embeddings across constituencies in USA and UK. We find that elected politicians and constituents discuss similar topics in their constituency regardless of whether a constituency is right or left-wing. The close-to-zero variance indicates that this is the case for most constituencies.

Refer to caption
(a) USA
Refer to caption
(b) UK
Figure 2: Distribution of posts/tweets by topics over Nextdoor and Twitter.

Elected politicians and their constituents discuss similar topics regardless of how contested a constituency is or the level of income.  Elected politicians and constituents discuss similar topics regardless of the voting majority and the income level of a constituency. Table 4 shows how the similarity of topics used by elected politicians and constituents varies depending of the income of a constituency, and the winning margin of the elected politician in the last elections. Using results from table 4, we show this variation with the Inter-Decile Range (IDR) that compares the the constituencies with the highest (9th decile) and lowest (1st decile) income/winning majority. -ve/+ve sign with value shows decreasing/increase trend towards 9th decile.

We observe that cosine similarities of topics discussed in right and left-wing constituencies are very close. Results from Table 3 shows this trend is similar in both the USA (IDR based on Winning Majority and income for right-wing: -0.0058; 0.002, IDR based on Winning Majority and income for left-wing: -0.0049; 0.003) and the UK (IDR based on Winning Majority and income for right-wing: 0.011; 0.004, IDR based on Winning Majority and income for left-wing: 0.016; -0.007).

USA UK Left-wing Right-wing Left-wing Right-wing Mean 0.5848 0.5920 0.4963 0.4952 Std. Deviation 0.0148 0.0098 0.0043 0.0071 Variance 0.0002 0.0001 0.0001 0.0001

Table 3: Descriptive statistics for cosine similarity of topics discussed by constituents and elected politicians in different constituencies.

3.2 Semantic similarity

We were expecting to observe clear differences in the topics discussed by elected politicians and their constituents depending on ideology, income or the size of the winning majority. To our surprise, the data revealed the opposite.

We investigate the issue further by directly measuring the similarity of the content posted by elected politicians and constituents by calculating the cosine similarity between the embeddings of their respective posts on Twitter and Nextdoor. Figure 3 and Figure 4 show the median cosine similarity (line) with overall range of cosine similarity scores (shaded region around line) across USA and UK constituencies. We plot the cosine similarity for constituencies depending on the size of the electoral victory in Figure 3 and for different levels of income in Figure 4. The highest decile (10) corresponds to the most disputed constituencies in Figure 3 and to the poorest ones in Figure 4.

Winning Majority Income USA UK USA UK Left-wing Right-wing Left-wing Right-wing Left-wing Right-wing Left-wing Right-wing IDR IDR IDR IDR IDR IDR IDR IDR Topics -0.0049 -0.0058 0.016 0.011 0.003 0.002 -0.007 0.004 Posts 0.27 -0.23 -0.26 -0.34 0.42 0.58 0.67 0.41 Sentiment -0.042 -0.057 -0.041 -0.039 -0.022 -0.016 -0.061 -0.067

Table 4: Inter-decile range for cosine similarity for topics and posts, and absolute differences of compound sentiment scores between constituents and elected politicians across different deciles of winning majority and income.
Refer to caption
Figure 3: Distribution of cosine similarity of mean-pooled textual embeddings between constituents and elected politicians over deciles of winning vote majority in different constituencies (from higher to lower).

Right and left-wing elected politicians discuss similar issues to their constituents.  The similarity of what elected politicians and constituents discuss changes little across the political spectrum. We observe that similarity is limited with an overall average cosine similarity score of 0.330.330.330.33 in the USA and 0.270.270.270.27 in the UK. However, this similarity varies scarcely across the ideological spectrum: the average similarity between right and left-wing constituencies is 0.280.280.280.28 and 0.220.220.220.22 in USA and 0.290.290.290.29 and 0.230.230.230.23 UK. The differences between cosine similarity of overall data and data from right-wing and left-wing constituencies are 0.05 and 0.11 in the USA and 0.02 and 0.04 in the UK.

Elected politicians in less disputed constituencies have a more similar discourse to their constituents.  Figure 3 shows the distribution of similarities of mean-pooled textual embeddings, i.e. computed for each constituency by comparing constituents (Nextdoor posts) and elected politicians (Twitter). We plot this from higher to lower deciles of the winning majority, i.e. constituencies in the 10th decile have the narrowest victory and those in the 1st decile represent the largest victory. We find that less disputed constituencies tend to have more similarity between constituents and elected politicians. This is particularly in the UK and right-wing USA constituencies with an average Interdecile Range (IDR) of 0.270.27-0.27- 0.27 and exactly the opposite (0.270.270.270.27) for USA left-wing constituencies.

A possible reason for this trend is that politicians that succeed in being elected, have a more similar style to their constituents. However, without data from the competing candidates it is not possible to validate the hypothesis.

Refer to caption
Figure 4: Distribution of cosine similarity of mean-pooled textual embeddings between Nextdoor and Twitter data over deciles by income levels in different constituencies (from richest to poorest).

Elected politicians of poorer constituencies have more similar discourse to their constituents.  We observe in Figure 4 that both in USA and UK, both in right and left-wing constituencies, there is a clear trend towards greater similarity as the level of income of a constituency decreases (average IDR=0.520.520.520.52). Interestingly, left-wing and right-wing constituencies have analogous similarities (average of 0.280.280.280.28 and 0.220.220.220.22 in USA and 0.290.290.290.29 and 0.230.230.230.23 UK).

Elected politicians have higher similarity with constituents that they represent than with those that they do not.  So far we have compared elected politicians with their respective constituents. We now investigate whether politicians from right-wing constituencies are similar in discourse to constituents from left-wing constituencies and vice versa. We compute the cosine similarity of the online discourse between elected politicians and constituents from opposing political sides, based on winning majority and income. For this analysis and differently from the previous, there is no pair-wise mapping between elected politicians and constituents. Instead, we aggregate our data based on different deciles of winning majority and income. We then compute the cosine similarity between the posts of the elected politicians and constituents from different constituencies in each decile.

Figures 9 and 10 display the distributions of cosine similarity scores between right-wing constituents and left-wing elected politicians, and left-wing constituents and right-wing elected politicians across deciles of winning majority and income. We observe that the similarity is lower (close to zero) than in our earlier analysis, when we conducted within constituency comparisons. We argue that this is reasonable as elected politicians are more likely to have common concerns with their constituents than with citizens that they do not represent and cannot vote for them (i.e. because they belong to a different constituency).

4 Style-related similarities

The previous section identified some trends in how elected politicians and their constituents differ in the content they post online depending on the income or the winning majority of the constituency. We now investigate whether there are also differences that pertain more to the tone or style than to the content. This is, while they might discuss the same topic, e.g. immigration, might differ in the style, choice of words, and tone used. To capture better that nuance, we now use LIWC categories and sentiment analysis.

Winning Majority Income USA UK USA UK Left-wing Right-wing Left-wing Right-wing Left-wing Right-wing Left-wing Right-wing Tone -0.0671 -0.0236 -0.0822 0.0941 -0.0278 -0.0674 -0.011 -0.1129 Authentic -0.02 -0.0802 0.0213 0.0297 -0.0299 0.0494 -0.0116 -0.0405 Analytic -0.0317 -0.0051 -0.0075 0.0214 0.0431 0.0804 0.0268 -0.0131 Clout 0.0143 -0.1057 -0.0282 -0.0272 0.0092 0.0364 0.0505 -0.0045 Linguistic -0.0093 -0.0009 0.0036 0.0209 0.0052 0.0279 -0.0092 -0.0099

Table 5: Inter-decile range for difference of LIWC categories scores between elected politicians and constituents.

4.1 Psychological similarities

Sylwester et al. found that some LIWC categories666i.e. “1st person singular pronoun (i)”, “1st person plural noun (we)”, “swear words (swear)”, “positive sentiment (emo__\__pos), “negative sentiment (emo__\__neg)”,“Anxiety (emo__\__aux)”, “Feeling (feel)”, “tentative (tentat)”, “certainty (certitude)”, “achievement (achieve)”, religion (relig)”, and “death (death)” identify political orientation (left and right-wing) in Twitter posts (Sylwester and Purver 2015). We therefore expect that some LIWC categories might reflect the different ways in which constituents and elected politicians express themselves.

To observe the variations in style of discourse between elected politician and constituent, we compute differences between LIWC category scores across constituencies. To see whether these differences are significant, we apply two-sample t-test on LIWC category scores from tweets and posts. We verify that the underlying distributions are independent, a requirement for two sample t-test. We calculate the mutual information score (Peng, Long, and Ding 2005) for every LIWC category in the Twitter and Nextdoor datasets finding values close to 0 (varying between 0.08 and 0.13), where 0 indicates complete independence.

We find that there are only five LIWC categories (i.e. Tone, Analytic, Clout, Authentic, and Linguistic) with statistically significant (i.e. p0.05𝑝0.05p\geq 0.05italic_p ≥ 0.05) differences between discourse style of elected politicians and constituents. These categories cover a large number of subcategories for writing style e.g. Tone covers usage of emotions-related words in writings which are further divided into subcategories  e.g. emo_pos(Positive emotions) and emo_neg(negative emotions), Analytic covers usage of words related formal thinking and reasoning. Clout covers words related to leadership and status, Authentic covers words related to perceived honesty, and Linguistic category refers to the usage of writing structure such as verbs, nouns, and pronouns. Figure 5 shows the differences between the five LIWC scores of the constituents and their respective elected politicians of each constituency.

Refer to caption
Figure 5: Distribution of differences of top five LIWC categories score of online discourse between constituents and elected politicians (Scale:0-1)

Elected politicians have a similar discourse style to their constituents regardless of whether the constituency is right or left-wing.  Constituents and their elected politicians generally have similar styles of online discourse. We find that the differences between right-wing and left-wing constituencies are small even when significant. We only find a relatively large difference between right and left-wing for the differences in the LIWC category of Tone (median difference of 0.570.570.570.57 and 0.480.480.480.48, and 0.420.420.420.42 and 0.580.580.580.58 in the USA and UK respectively). We find that elected politicians use more negative tone (Tone scores below 0.5 suggest a more negative emotional tone) (median politicians Tone LIWC score of 0.220.220.220.22 and 0.290.290.290.29 in USA and UK) than their constituents (median constituents Tone LIWC score of 0.730.730.730.73 and 0.80.80.80.8 in USA and UK) in online discourse. We also analyze their tone of online discourse across left-wing and right-wing constituencies and find similar results to aforementioned ones in left-wing and right-wing constituencies for elected politicians (0.250.250.250.25 and 0.170.170.170.17 in USA, 0.210.210.210.21 and 0.310.310.310.31 in UK) and constituents (0.770.770.770.77 and 0.690.690.690.69 in USA, 0.780.780.780.78 and 0.820.820.820.82 in UK).

We also analyze the categories reported to be a good identifier of political preferences (Sylwester and Purver 2015). We find that while they might help identify right and left-wing individuals, but scores for these categories are very similar for both constituents and elected politicians.

Elected politicians of more disputed constituencies have more similar discourse style to constituents.  Table 5 shows the inter-decile range (IDR) of differences of LIWC scores (for the five significant categories) between elected politicians and their respective constituents. Depending on size of the winning majority, we observe that the IDR values of most categories (Tone, Analytic, Authentic, and Clout) are negative in left-wing and right-wing constituencies in the USA and left-wing constituencies in UK, but positive in right-wing constituencies in the UK. This trend shows that elected politicians tend to use more a more similar style of discourse to their constituents when the margin of victory is low.

A potential hypothesis is that politicians whose victory is not secure are try to be more empathetic with potential voters in order to secure their votes. The findings here are however in opposition to those in Section 3.2 showing a nuanced interplay between constituents and politicians.

Elected politicians in richer constituencies tend to have more similar style to their constituents.  We also observe that the similarity in style decreases in lower-income constituencies except for right-wing constituencies in the UK. Table 5 shows that style similarity decreases in low-income constituencies except for Tone and Authentic categories. We also observe that style similarity is increasing in low-income right-wing constituencies in the UK with negative values of IDR.

4.2 Sentiment similarity

To further our analysis on style differences, we know analyze differences in sentiment.

Refer to caption
Figure 6: Distribution of the difference of the median compound sentiment score of elected politicians’ tweets on Twitter and constituents’ posts on Nextdoor over deciles by the winning voting majority in different constituencies (from higher to lower)

The narrower the majority, the more similar the sentiment between elected politicians and their constituents is.  Elected politicians of constituencies with narrow majorities have more similar sentiments to their constituents. Figure 6 shows a decreasing (absolute) sentiment difference between elected politicians and their respective constituents, as the winning majority narrows. This is true for both countries and across the political spectrum (see Table 4 for IDR values). This is aligned with our LIWC categories analysis and again opposed to the findings in the previous section (Section 3.2).

Elected politicians from poorer constituencies have more similar sentiment to their constituents.  Elected politicians from poorer constituencies have more similar sentiments to their constituents.

Refer to caption
Figure 7: Distribution of difference of median compound sentiment score of elected politicians’ tweets on Twitter and constituents’ posts on Nextdoor over deciles by income levels in different constituencies (from richest to poorest).

Figure 7 shows how the (absolute) sentiment difference between elected politicians and their respective constituents decreases with the level of income of the constituency (see Table  4 for IDR values). This is in accordance with our earlier findings in Section 3.2: constituents and their elected politicians are more similar in both content and style in poorer neighborhoods.

5 Limitations

Comparing Twitter with Nextdoor.  There is a particularly important difference between Twitter and Nextdoor: Twitter welcomes general content and Nextdoor focuses on local issues. Politicians can run political campaigns in Twitter, but political campaigns are not allowed on Nextdoor (Nextdoor 2019). This might result in relatively low similarities between constituents and elected representatives. This difference is constant though, it will affect rather absolute than relative values, allowing us to compare how similar constituents and their respective elected politicians are across constituencies. We also compute similarities between politicians and constituencies that did not elect them. We see that the similarity is consistently and substantially lower.

Data Representivity and bias.  Our data has good geographic coverage and a high correlation with the underlying population. While we miss some constituencies (2 in USA, 55 in the UK), this is unlikely to have a large effect in the results. We do not know, however, how representative of neighborhood the Nextdoor data is, i.e. we have good coverage of regions with different levels of income, but we have no intra-neighborhood visibility. Our Nextdoor data might suffer from bias towards individuals with certain socioeconomic status (e.g. richer individuals might be more likely to use social media (Pew Research Center 2018)). This is a common challenge in quantitative research with social media data and studies frequently rely on the level of usage as a proxy of representativity (Chetty et al. 2022; Bailey et al. 2018, 2020; Jones et al. 2013).

6 Related Work

Both voters and politicians have used social media to discuss politics. The literature has extensively analyzed the political debate on social media and specially in Twitter, finding a trend of growing political polarization (Garimella and Weber 2017; Esteve Del Valle, Broersma, and Ponsioen 2022). A number of works have also studied how politicians use social media for political purposes (Parmelee, Perkins, and Beasley 2023; Agarwal, Sastry, and Wood 2019).

Our work differs from these in that we focus on the relationship between politicians and voters. As such, closer to our work is the research on how politicians engage with voters on social media (mediabaxter2016members; Hofmann et al. 2013; Tromble 2018). Graham et al. (Graham et al. 2013) analyzed tweets from 416 UK candidates and Tromble (Tromble 2018) analyzed tweets from 992 elected politicians (418 American, 434 British, and 140 Dutch).

Our work again differ in that instead of looking at the engagement of politicians with a subset of individuals who might belong to a different constituency, we explicitly pair politicians and constituents across USA and UK. To do this we collect a comprehensive dataset from Twitter, for elected politicians, and Nextdoor, for their constituents.

To the best of our knowledge, there has been only one prior quantitative study of Nextdoor. Iqbal et al (Iqbal et al. 2023) show that Nextdoor can be used to predict socioeconomic parameters of their neighborhoods. Nextdor maps users to geographical neighborhoods, and similarly to this work, we leverage the geographical tagging of Nextdoor users and group them into political constituencies. This allows us to ensure that we compare elected politicians with residents of their constituency.

7 Conclusion

This paper examined how similar elected politicians and their constituents are. We examined this in terms of the similarities in content and style of their online discourse. To do this, we conducted a large-scale analysis where we collected 21.4 Million posts from 433 USA and 595 UK constituencies from Nextdoor, and 5.6 Million tweets from the accounts of the elected politicians of the same constituencies. We found that elected politicians tend to be equally similar to their constituents in terms of discourse content and style regardless of whether a constituency elects a right or left-wing politician. The size of the electoral victory and the level of income of a constituency showed a nuanced picture. We found that narrower electoral victories were associated with a more similar discourse style (both in terms of LIWC categories and sentiment). We found the opposite for the discourse’s content: the larger the victory the more similar the content tended to be. Poorer constituencies showed more similarity in terms of content. In terms of style, poorer constituencies showed a more similar sentiment and the opposite was true for the psychological text traits (i.e. measured with LIWC categories).

8 Ethical Statement

This research study has been approved by the Institutional Review Board (IRB) at the researchers’ institution. The authors have no competing interests or funding that could undermine this research. We employ users’ public post records from Nextdoor to study their conversations. Nextdoor data is public, as there is the expectation that strangers can view the posts (Townsend and Wallace 2016). Upon collection, we anonymize the data before use and store it in a secure silo. We aggregate our data and analyze it at a constituency level to prevent user identification. We discard any user-level information. Our work does not share or redistribute Nextdoor content, as per Nextdoor’s Terms of Service. Importantly, web crawling is legal for non-commercial research in the UK (IPO, UK 2021) and the USA (TechCrunch 2022), where the data collection is performed.

We also gather tweets from elected politicians in the USA and UK using Twitter Academic API access. This data is considered public data because it is accessible publicly and anyone can interact with this data even without permission of the original author of the tweet. Townsend et al. have discussed in this situation in great detail in case 5 (Townsend and Wallace 2016).

References

  • Agarwal, Sastry, and Wood (2019) Agarwal, P.; Sastry, N.; and Wood, E. 2019. Tweeting mps: Digital engagement between citizens and members of parliament in the uk. In Proceedings of the International AAAI Conference on Web and Social Media, volume 13, 26–37.
  • Arora, Liang, and Ma (2017) Arora, S.; Liang, Y.; and Ma, T. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.
  • Bailey et al. (2018) Bailey, M.; Cao, R.; Kuchler, T.; and Stroebel, J. 2018. The economic effects of social networks: Evidence from the housing market. Journal of Political Economy, 126(6): 2224–2276.
  • Bailey et al. (2020) Bailey, M.; Johnston, D.; Koenen, M.; Kuchler, T.; Russel, D.; and Stroebel, J. 2020. Social Networks Shape Beliefs and Behaviors: Evidence from Social Distancing during the COVID-19 Pandemic.
  • Boyd et al. (2022) Boyd, R. L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J. W. 2022. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin, 1–47.
  • Campello, Moulavi, and Sander (2013) Campello, R. J.; Moulavi, D.; and Sander, J. 2013. Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining, 160–172. Springer.
  • Chetty et al. (2022) Chetty, R.; Jackson, M. O.; Kuchler, T.; Stroebel, J.; Hendren, N.; Fluegge, R. B.; Gong, S.; Gonzalez, F.; Grondin, A.; Jacob, M.; et al. 2022. Social capital I: measurement and associations with economic mobility. Nature, 608(7921): 108–121.
  • Esteve Del Valle, Broersma, and Ponsioen (2022) Esteve Del Valle, M.; Broersma, M.; and Ponsioen, A. 2022. Political interaction beyond party lines: Communication ties and party polarization in parliamentary twitter networks. Social science computer review, 40(3): 736–755.
  • Foa et al. (2020) Foa, R. S.; Klassen, A.; Slade, M.; Rand, A.; and Collins, R. 2020. The global satisfaction with democracy report 2020. Bennett Institute for Public Policy, University of Cambridge.
  • Garimella and Weber (2017) Garimella, V. R. K.; and Weber, I. 2017. A long-term analysis of polarization on Twitter. In Proceedings of the International AAAI Conference on Web and social media, volume 11, 528–531.
  • Graham et al. (2013) Graham, T.; Broersma, M.; Hazelhoff, K.; and Van’t Haar, G. 2013. Between broadcasting political messages and interacting with voters: The use of Twitter during the 2010 UK general election campaign. Information, communication & society, 16(5): 692–716.
  • Grootendorst (2022) Grootendorst, M. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
  • Günther et al. (2023) Günther, M.; Milliken, L.; Geuter, J.; Mastrapas, G.; Wang, B.; and Xiao, H. 2023. Jina embeddings: A novel set of high-performance sentence embedding models. arXiv preprint arXiv:2307.11224.
  • Hofmann et al. (2013) Hofmann, S.; Beverungen, D.; Räckers, M.; and Becker, J. 2013. What makes local governments’ online communications successful? Insights from a multi-method analysis of Facebook. Government information quarterly, 30(4): 387–396.
  • Huggingface (2022) Huggingface. 2022. all-mpnet-base-v2. https://huggingface.co/sentence-transformers/all-mpnet-base-v2. Accessed: 2022-07-10.
  • Hutto and Gilbert (2014) Hutto, C.; and Gilbert, E. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, volume 8, 216–225.
  • IPO, UK (2021) IPO, UK. 2021. Exceptions to copyright. https://www.gov.uk/guidance/exceptions-to-copyright#text-and-data-mining-for-non-commercial-research. Accessed: 2022-07-10.
  • Iqbal et al. (2023) Iqbal, W.; Ghafouri, V.; Tyson, G.; Suarez-Tangil, G.; and Castro, I. 2023. Lady and the Tramp Nextdoor: Online Manifestations of Real-World Inequalities in the Nextdoor Social Network. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, 399–410.
  • Jarrett (2016) Jarrett, H. 2016. The single transferable vote and the Alliance Party of Northern Ireland. Representation, 52(4): 311–323.
  • Jolly et al. (2022) Jolly, S.; Bakker, R.; Hooghe, L.; Marks, G.; Polk, J.; Rovny, J.; Steenbergen, M.; and Vachudova, M. A. 2022. Chapel Hill expert survey trend file, 1999–2019. Electoral studies, 75: 102420.
  • Jones et al. (2013) Jones, J. J.; Settle, J. E.; Bond, R. M.; Fariss, C. J.; Marlow, C.; and Fowler, J. H. 2013. Inferring tie strength from online directed behavior. PloS one, 8(1): e52168.
  • Jones (2020) Jones, O. 2020. Chavs: The demonization of the working class. Verso books.
  • Kacewicz et al. (2014) Kacewicz, E.; Pennebaker, J. W.; Davis, M.; Jeon, M.; and Graesser, A. C. 2014. Pronoun use reflects standings in social hierarchies. Journal of Language and Social Psychology, 33(2): 125–143.
  • Kahn et al. (2007) Kahn, J. H.; Tobin, R. M.; Massey, A. E.; and Anderson, J. A. 2007. Measuring emotional expression with the Linguistic Inquiry and Word Count. The American journal of psychology, 120(2): 263–286.
  • Martelli and Jaffrelot (2023) Martelli, J.-t.; and Jaffrelot, C. 2023. Do populist leaders mimic the language of ordinary citizens? Evidence from India. Political Psychology, 44(5): 1141–1160.
  • Nextdoor (2019) Nextdoor. 2019. Nextdoor Political Advertising. https://blog.nextdoor.co.uk/2019/11/13/why-we-dont-allow-political-advertising-on-nextdoor/. Accessed: 2024-04-10.
  • Nextdoor (2023) Nextdoor. 2023. Nextdoor Users: How Many People Use Nextdoor in 2023? https://earthweb.com/how-many-people-use-nextdoor/. Accessed: 2023-08-10.
  • Oh, Li, and Wang (2023) Oh, M.; Li, J.; and Wang, G. 2023. Tadse: Template-aware dialogue sentence embeddings. arXiv preprint arXiv:2305.14299.
  • ONS UK (2022) ONS UK. 2022. Estimates of the population for the UK, England, Wales, Scotland and Northern Ireland. https://www.ons.gov.uk/peoplepopulationandcommunit/
    populationandmigration/populationestimates/dataset/
    populationestimatesforukenglandandwalesscotlandand//
    northernireland.
    Accessed: 2022-07-10.
  • Parmelee, Perkins, and Beasley (2023) Parmelee, J. H.; Perkins, S. C.; and Beasley, B. 2023. Personalization of politicians on Instagram: what Generation Z wants to see in political posts. Information, Communication & Society, 26(9): 1773–1788.
  • Peng, Long, and Ding (2005) Peng, H.; Long, F.; and Ding, C. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8): 1226–1238.
  • Pennebaker et al. (2015) Pennebaker, J. W.; Boyd, R. L.; Jordan, K.; and Blackburn, K. 2015. The development and psychometric properties of LIWC2015. Technical report.
  • Pew Research Center (2018) Pew Research Center. 2018. Publics Globally Want Unbiased News Coverage, but Are Divided on Whether Their News Media Deliver.
  • Pew Research Center (2022) Pew Research Center. 2022. Populists in Europe – especially those on the right – have increased their vote shares in recent elections.
  • Pew Research Center (2023) Pew Research Center. 2023. Representative Democracy Remains a Popular Ideal, but People Around the World Are Critical of How It’s Working.
  • Pew Research Center (2024) Pew Research Center. 2024. Views about political representation.
  • Piketty (2018) Piketty, T. 2018. Brahmin left vs merchant right: Rising inequality & the changing structure of political conflict.
  • Singh (2022) Singh, N. 2022. niksss at HinglishEval: Language-agnostic BERT-based Contextual Embeddings with Catboost for Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text. arXiv preprint arXiv:2206.08910.
  • Song et al. (2020) Song, K.; Tan, X.; Qin, T.; Lu, J.; and Liu, T.-Y. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33: 16857–16867.
  • Sylwester and Purver (2015) Sylwester, K.; and Purver, M. 2015. Twitter language use reflects psychological differences between democrats and republicans. PloS one, 10(9): e0137422.
  • TechCrunch (2022) TechCrunch. 2022. Web scraping is legal, US appeals court reaffirms. https://techcrunch.com/2022/04/18/web-scraping-legal-court/. Accessed: 2022-07-10.
  • Townsend and Wallace (2016) Townsend, L.; and Wallace, C. 2016. Social media research: A guide to ethics. University of Aberdeen, 1: 16.
  • Tromble (2018) Tromble, R. 2018. The great leveler? Comparing citizen–politician Twitter engagement across three Western democracies. European political science, 17(2): 223–239.
  • UK House of Commons (2023) UK House of Commons. 2023. Find Your MP. https://members.parliament.uk//members/commons. Accessed: 2023-08-10.
  • US Census (2022) US Census. 2022. US Census Bureau Release. https://www.census.gov/data.html. Accessed: 2022-07-10.
  • USA House of Representatives (2023) USA House of Representatives. 2023. Find Your Representative. https://www.house.gov/representatives/find-your-representative. Accessed: 2023-08-10.
  • Valgarösson et al. (2021) Valgarösson, V. O.; Clarke, N.; Jennings, W.; and Stoker, G. 2021. The good politician and political trust: An authenticity gap in British politics? Political Studies, 69(4): 858–880.

9 Appendix

Refer to caption
Figure 8: Distribution of tokens from social media text in USA and UK constituencies in Twitter and Nextdoor after pre-processing.
Refer to caption
Figure 9: Distribution of cosine similarity of mean-pooled textual embeddings between left-wing constituents and right-wing elected politicians and vice versa over deciles of winning vote majority in different constituencies (from higher to lower).
Refer to caption
Figure 10: Distribution of cosine similarity of mean-pooled textual embeddings between left-wing constituents and right-wing elected politician and vice versa over deciles by income levels in different constituencies (from richest to poorest).
Refer to caption
Figure 11: Coherence Score for BERTopic on Nextdoor and Twitter Datasets.