-
Modeling the amplification of epidemic spread by misinformed populations
Authors:
Matthew R. DeVerna,
Francesco Pierri,
Yong-Yeol Ahn,
Santo Fortunato,
Alessandro Flammini,
Filippo Menczer
Abstract:
Understanding how misinformation affects the spread of disease is crucial for public health, especially given recent research indicating that misinformation can increase vaccine hesitancy and discourage vaccine uptake. However, it is difficult to investigate the interaction between misinformation and epidemic outcomes due to the dearth of data-informed holistic epidemic models. Here, we employ an…
▽ More
Understanding how misinformation affects the spread of disease is crucial for public health, especially given recent research indicating that misinformation can increase vaccine hesitancy and discourage vaccine uptake. However, it is difficult to investigate the interaction between misinformation and epidemic outcomes due to the dearth of data-informed holistic epidemic models. Here, we employ an epidemic model that incorporates a large, mobility-informed physical contact network as well as the distribution of misinformed individuals across counties derived from social media data. The model allows us to simulate and estimate various scenarios to understand the impact of misinformation on epidemic spreading. Using this model, we present a worst-case scenario in which a heavily misinformed population would result in an additional 14% of the U.S. population becoming infected over the course of the COVID-19 epidemic, compared to a best-case scenario.
△ Less
Submitted 30 July, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Social Bots: Detection and Challenges
Authors:
Kai-Cheng Yang,
Onur Varol,
Alexander C. Nwala,
Mohsen Sayyadiharikandeh,
Emilio Ferrara,
Alessandro Flammini,
Filippo Menczer
Abstract:
While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online di…
▽ More
While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online discussions to show their real-world implications and the need for detection methods. Then we discuss the challenges of bot detection methods and use Botometer, a publicly available bot detection tool, as a case study to describe recent developments in this area. We close with a practical guide on how to handle social bots in social media research.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Friction Interventions to Curb the Spread of Misinformation on Social Media
Authors:
Laura Jahn,
Rasmus K. Rendsvig,
Alessandro Flammini,
Filippo Menczer,
Vincent F. Hendricks
Abstract:
Social media has enabled the spread of information at unprecedented speeds and scales, and with it the proliferation of high-engagement, low-quality content. *Friction* -- behavioral design measures that make the sharing of content more cumbersome -- might be a way to raise the quality of what is spread online. Here, we study the effects of friction with and without quality-recognition learning. E…
▽ More
Social media has enabled the spread of information at unprecedented speeds and scales, and with it the proliferation of high-engagement, low-quality content. *Friction* -- behavioral design measures that make the sharing of content more cumbersome -- might be a way to raise the quality of what is spread online. Here, we study the effects of friction with and without quality-recognition learning. Experiments from an agent-based model suggest that friction alone decreases the number of posts without improving their quality. A small amount of friction combined with learning, however, increases the average quality of posts significantly. Based on this preliminary evidence, we propose a friction intervention with a learning component about the platform's community standards, to be tested via a field experiment. The proposed intervention would have minimal effects on engagement and may easily be deployed at scale.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Emergence of simple and complex contagion dynamics from weighted belief networks
Authors:
Rachith Aiyappa,
Alessandro Flammini,
Yong-Yeol Ahn
Abstract:
Social contagion is a ubiquitous and fundamental process that drives individual and social changes. Although social contagion arises as a result of cognitive processes and biases, the integration of cognitive mechanisms with the theory of social contagion remains an open challenge. In particular, studies on social phenomena usually assume contagion dynamics to be either simple or complex, rather t…
▽ More
Social contagion is a ubiquitous and fundamental process that drives individual and social changes. Although social contagion arises as a result of cognitive processes and biases, the integration of cognitive mechanisms with the theory of social contagion remains an open challenge. In particular, studies on social phenomena usually assume contagion dynamics to be either simple or complex, rather than allowing it to emerge from cognitive mechanisms, despite empirical evidence indicating that a social system can exhibit a spectrum of contagion dynamics -- from simple to complex -- simultaneously. Here, we propose a model of interacting beliefs, from which both simple and complex contagion dynamics can organically arise. Our model also elucidates how a fundamental mechanism of complex contagion -- resistance -- can come about from cognitive mechanisms.
△ Less
Submitted 29 April, 2024; v1 submitted 5 January, 2023;
originally announced January 2023.
-
A General Language for Modeling Social Media Account Behavior
Authors:
Alexander C. Nwala,
Alessandro Flammini,
Filippo Menczer
Abstract:
Malicious actors exploit social media to inflate stock prices, sway elections, spread misinformation, and sow discord. To these ends, they employ tactics that include the use of inauthentic accounts and campaigns. Methods to detect these abuses currently rely on features specifically designed to target suspicious behaviors. However, the effectiveness of these methods decays as malicious behaviors…
▽ More
Malicious actors exploit social media to inflate stock prices, sway elections, spread misinformation, and sow discord. To these ends, they employ tactics that include the use of inauthentic accounts and campaigns. Methods to detect these abuses currently rely on features specifically designed to target suspicious behaviors. However, the effectiveness of these methods decays as malicious behaviors evolve. To address this challenge, we propose a general language for modeling social media account behavior. Words in this language, called BLOC, consist of symbols drawn from distinct alphabets representing user actions and content. The language is highly flexible and can be applied to model a broad spectrum of legitimate and suspicious online behaviors without extensive fine-tuning. Using BLOC to represent the behaviors of Twitter accounts, we achieve performance comparable to or better than state-of-the-art methods in the detection of social bots and coordinated inauthentic behavior.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Reconciling the Quality vs Popularity Dichotomy in Online Cultural Markets
Authors:
Rossano Gaeta,
Michele Garetto,
Giancarlo Ruffo,
Alessandro Flammini
Abstract:
We propose a simple model of an idealized online cultural market in which $N$ items, endowed with a hidden quality metric, are recommended to users by a ranking algorithm possibly biased by the current items' popularity. Our goal is to better understand the underlying mechanisms of the well-known fact that popularity bias can prevent higher-quality items from becoming more popular than lower-quali…
▽ More
We propose a simple model of an idealized online cultural market in which $N$ items, endowed with a hidden quality metric, are recommended to users by a ranking algorithm possibly biased by the current items' popularity. Our goal is to better understand the underlying mechanisms of the well-known fact that popularity bias can prevent higher-quality items from becoming more popular than lower-quality items, producing an undesirable misalignment between quality and popularity rankings. We do so under the assumption that users, having limited time/attention, are able to discriminate the best-quality only within a random subset of the items. We discover the existence of a harmful regime in which improper use of popularity can seriously compromise the emergence of quality, and a benign regime in which wise use of popularity, coupled with a small discrimination effort on behalf of users, guarantees the perfect alignment of quality and popularity ranking. Our findings clarify the effects of algorithmic popularity bias on quality outcomes, and may inform the design of more principled mechanisms for techno-social cultural markets.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Universality, criticality and complexity of information propagation in social media
Authors:
Daniele Notarmuzi,
Claudio Castellano,
Alessandro Flammini,
Dario Mazzilli,
Filippo Radicchi
Abstract:
Information avalanches in social media are typically studied in a similar fashion as avalanches of neuronal activity in the brain. Whereas a large body of literature reveals substantial agreement about the existence of a unique process characterizing neuronal activity across organisms, the dynamics of information in online social media is far less understood. Statistical laws of information avalan…
▽ More
Information avalanches in social media are typically studied in a similar fashion as avalanches of neuronal activity in the brain. Whereas a large body of literature reveals substantial agreement about the existence of a unique process characterizing neuronal activity across organisms, the dynamics of information in online social media is far less understood. Statistical laws of information avalanches are found in previous studies to be not robust across systems, and radically different processes are used to represent plausible driving mechanisms for information propagation. Here, we analyze almost 1 billion time-stamped events collected from a multitude of online platforms -- including Telegram, Twitter and Weibo -- over observation windows longer than 10 years to show that the propagation of information in social media is a universal and critical process. Universality arises from the observation of identical macroscopic patterns across platforms, irrespective of the details of the specific system at hand. Critical behavior is deduced from the power-law distributions, and corresponding hyperscaling relations, characterizing size and duration of avalanches of information. Neuronal activity may be modeled as a simple contagion process, where only a single exposure to activity may be sufficient for its diffusion. On the contrary, statistical testing on our data indicates that a mixture of simple and complex contagion, where involvement of an individual requires exposure from multiple acquaintances, characterizes the propagation of information in social media. We show that the complexity of the process is correlated with the semantic content of the information that is propagated. Conversational topics about music, movies and TV shows tend to propagate as simple contagion processes, whereas controversial discussions on political/societal themes obey the rules of complex contagion.
△ Less
Submitted 6 October, 2021; v1 submitted 31 August, 2021;
originally announced September 2021.
-
Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal
Authors:
Francesco Pierri,
Brea Perry,
Matthew R. DeVerna,
Kai-Cheng Yang,
Alessandro Flammini,
Filippo Menczer,
John Bryden
Abstract:
Widespread uptake of vaccines is necessary to achieve herd immunity. However, uptake rates have varied across U.S. states during the first six months of the COVID-19 vaccination program. Misbeliefs may play an important role in vaccine hesitancy, and there is a need to understand relationships between misinformation, beliefs, behaviors, and health outcomes. Here we investigate the extent to which…
▽ More
Widespread uptake of vaccines is necessary to achieve herd immunity. However, uptake rates have varied across U.S. states during the first six months of the COVID-19 vaccination program. Misbeliefs may play an important role in vaccine hesitancy, and there is a need to understand relationships between misinformation, beliefs, behaviors, and health outcomes. Here we investigate the extent to which COVID-19 vaccination rates and vaccine hesitancy are associated with levels of online misinformation about vaccines. We also look for evidence of directionality from online misinformation to vaccine hesitancy. We find a negative relationship between misinformation and vaccination uptake rates. Online misinformation is also correlated with vaccine hesitancy rates taken from survey data. Associations between vaccine outcomes and misinformation remain significant when accounting for political as well as demographic and socioeconomic factors. While vaccine hesitancy is strongly associated with Republican vote share, we observe that the effect of online misinformation on hesitancy is strongest across Democratic rather than Republican counties. Granger causality analysis shows evidence for a directional relationship from online misinformation to vaccine hesitancy. Our results support a need for interventions that address misbeliefs, allowing individuals to make better-informed health decisions.
△ Less
Submitted 12 July, 2022; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Right and left, partisanship predicts (asymmetric) vulnerability to misinformation
Authors:
Dimitar Nikolov,
Alessandro Flammini,
Filippo Menczer
Abstract:
We analyze the relationship between partisanship, echo chambers, and vulnerability to online misinformation by studying news sharing behavior on Twitter. While our results confirm prior findings that online misinformation sharing is strongly correlated with right-leaning partisanship, we also uncover a similar, though weaker trend among left-leaning users. Because of the correlation between a user…
▽ More
We analyze the relationship between partisanship, echo chambers, and vulnerability to online misinformation by studying news sharing behavior on Twitter. While our results confirm prior findings that online misinformation sharing is strongly correlated with right-leaning partisanship, we also uncover a similar, though weaker trend among left-leaning users. Because of the correlation between a user's partisanship and their position within a partisan echo chamber, these types of influence are confounded. To disentangle their effects, we perform a regression analysis and find that vulnerability to misinformation is most strongly influenced by partisanship for both left- and right-leaning users.
△ Less
Submitted 21 January, 2021; v1 submitted 3 October, 2020;
originally announced October 2020.
-
Political audience diversity and news reliability in algorithmic ranking
Authors:
Saumya Bhadani,
Shun Yamaya,
Alessandro Flammini,
Filippo Menczer,
Giovanni Luca Ciampaglia,
Brendan Nyhan
Abstract:
Newsfeed algorithms frequently amplify misinformation and other low-quality content. How can social media platforms more effectively promote reliable information? Existing approaches are difficult to scale and vulnerable to manipulation. In this paper, we propose using the political diversity of a website's audience as a quality signal. Using news source reliability ratings from domain experts and…
▽ More
Newsfeed algorithms frequently amplify misinformation and other low-quality content. How can social media platforms more effectively promote reliable information? Existing approaches are difficult to scale and vulnerable to manipulation. In this paper, we propose using the political diversity of a website's audience as a quality signal. Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards. We then incorporate audience diversity into a standard collaborative filtering framework and show that our improved algorithm increases the trustworthiness of websites suggested to users -- especially those who most frequently consume misinformation -- while keeping recommendations relevant. These findings suggest that partisan audience diversity is a valuable signal of higher journalistic standards that should be incorporated into algorithmic ranking decisions.
△ Less
Submitted 6 March, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Detection of Novel Social Bots by Ensembles of Specialized Classifiers
Authors:
Mohsen Sayyadiharikandeh,
Onur Varol,
Kai-Cheng Yang,
Alessandro Flammini,
Filippo Menczer
Abstract:
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result,…
▽ More
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result, supervised learning techniques suffer severe performance deterioration when attempting to detect behaviors not observed in the training data. Moreover, tuning these models to recognize novel bots requires retraining with a significant amount of new annotations, which are expensive to obtain. To address these issues, we propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule. The ensemble of specialized classifiers (ESC) can better generalize, leading to an average improvement of 56\% in F1 score for unseen accounts across datasets. Furthermore, novel bot behaviors are learned with fewer labeled examples during retraining. We deployed ESC in the newest version of Botometer, a popular tool to detect social bots in the wild, with a cross-validation AUC of 0.99.
△ Less
Submitted 14 August, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Unveiling Coordinated Groups Behind White Helmets Disinformation
Authors:
Diogo Pacheco,
Alessandro Flammini,
Filippo Menczer
Abstract:
Propaganda, disinformation, manipulation, and polarization are the modern illnesses of a society increasingly dependent on social media as a source of news. In this paper, we explore the disinformation campaign, sponsored by Russia and allies, against the Syria Civil Defense (a.k.a. the White Helmets). We unveil coordinated groups using automatic retweets and content duplication to promote narrati…
▽ More
Propaganda, disinformation, manipulation, and polarization are the modern illnesses of a society increasingly dependent on social media as a source of news. In this paper, we explore the disinformation campaign, sponsored by Russia and allies, against the Syria Civil Defense (a.k.a. the White Helmets). We unveil coordinated groups using automatic retweets and content duplication to promote narratives and/or accounts. The results also reveal distinct promoting strategies, ranging from the small groups sharing the exact same text repeatedly, to complex "news website factories" where dozens of accounts synchronously spread the same news from multiple sites.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Uncovering Coordinated Networks on Social Media: Methods and Case Studies
Authors:
Diogo Pacheco,
Pik-Mai Hui,
Christopher Torres-Lugo,
Bao Tran Truong,
Alessandro Flammini,
Filippo Menczer
Abstract:
Coordinated campaigns are used to influence and manipulate social media platforms and their users, a critical challenge to the free exchange of information online. Here we introduce a general, unsupervised network-based methodology to uncover groups of accounts that are likely coordinated. The proposed method constructs coordination networks based on arbitrary behavioral traces shared among accoun…
▽ More
Coordinated campaigns are used to influence and manipulate social media platforms and their users, a critical challenge to the free exchange of information online. Here we introduce a general, unsupervised network-based methodology to uncover groups of accounts that are likely coordinated. The proposed method constructs coordination networks based on arbitrary behavioral traces shared among accounts. We present five case studies of influence campaigns, four of which in the diverse contexts of U.S. elections, Hong Kong protests, the Syrian civil war, and cryptocurrency manipulation. In each of these cases, we detect networks of coordinated Twitter accounts by examining their identities, images, hashtag sequences, retweets, or temporal patterns. The proposed approach proves to be broadly applicable to uncover different kinds of coordination across information warfare scenarios.
△ Less
Submitted 7 April, 2021; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Recency predicts bursts in the evolution of author citations
Authors:
Filipi Nascimento Silva,
Aditya Tandon,
Diego Raphael Amancio,
Alessandro Flammini,
Filippo Menczer,
Staša Milojević,
Santo Fortunato
Abstract:
The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evid…
▽ More
The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evidence of a correlation between the number of citations most recently accrued by an author and the number of citations they receive in the future. Using a simple model where the probability for an author to receive new citations depends only on the number of citations collected in the previous 12-24 months, we are able to reproduce both the citation and burst size distributions of authors across multiple decades.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Massive Multi-Agent Data-Driven Simulations of the GitHub Ecosystem
Authors:
Jim Blythe,
John Bollenbacher,
Di Huang,
Pik-Mai Hui,
Rachel Krohn,
Diogo Pacheco,
Goran Muric,
Anna Sapienza,
Alexey Tregubov,
Yong-Yeol Ahn,
Alessandro Flammini,
Kristina Lerman,
Filippo Menczer,
Tim Weninger,
Emilio Ferrara
Abstract:
Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based simulation framework, which we are currently extendin…
▽ More
Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based simulation framework, which we are currently extending to allow simulating other planetary-scale techno-social systems. The challenge problem measured participant's ability, given 30 months of meta-data on user activity on GitHub, to predict the next months' activity as measured by a broad range of metrics applied to ground truth, using agent-based simulation. The challenge required scaling to a simulation of roughly 3 million agents producing a combined 30 million actions, acting on 6 million repositories with commodity hardware. It was also important to use the data optimally to predict the agent's next moves. We describe the agent framework and the data analysis employed by one of the winning teams in the challenge. Six different agent models were tested based on a variety of machine learning and statistical methods. While no single method proved the most accurate on every metric, the broadly most successful sampled from a stationary probability distribution of actions and repositories for each agent. Two reasons for the success of these agents were their use of a distinct characterization of each agent, and that GitHub users change their behavior relatively slowly.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Quantifying the Vulnerabilities of the Online Public Square to Adversarial Manipulation Tactics
Authors:
Bao Tran Truong,
Xiaodan Lou,
Alessandro Flammini,
Filippo Menczer
Abstract:
Social media, seen by some as the modern public square, is vulnerable to manipulation. By controlling inauthentic accounts impersonating humans, malicious actors can amplify disinformation within target communities. The consequences of such operations are difficult to evaluate due to the challenges posed by collecting data and carrying out ethical experiments that would influence online communitie…
▽ More
Social media, seen by some as the modern public square, is vulnerable to manipulation. By controlling inauthentic accounts impersonating humans, malicious actors can amplify disinformation within target communities. The consequences of such operations are difficult to evaluate due to the challenges posed by collecting data and carrying out ethical experiments that would influence online communities. Here we use a social media model that simulates information diffusion in an empirical network to quantify the impacts of several adversarial manipulation tactics on the quality of content. We find that the presence of influential accounts, a hallmark of social media, exacerbates the vulnerabilities of online communities to manipulation. Among the explored tactics that bad actors can employ, infiltrating a community is the most likely to make low-quality content go viral. Such harm can be further compounded by inauthentic agents flooding the network with low-quality, yet appealing content, but is mitigated when bad actors focus on specific targets, such as influential or vulnerable individuals. These insights suggest countermeasures that platforms could employ to increase the resilience of social media users to manipulation.
△ Less
Submitted 11 June, 2024; v1 submitted 13 July, 2019;
originally announced July 2019.
-
Social Influence and Unfollowing Accelerate the Emergence of Echo Chambers
Authors:
Kazutoshi Sasahara,
Wen Chen,
Hao Peng,
Giovanni Luca Ciampaglia,
Alessandro Flammini,
Filippo Menczer
Abstract:
While social media make it easy to connect with and access information from anyone, they also facilitate basic influence and unfriending mechanisms that may lead to segregated and polarized clusters known as "echo chambers." Here we study the conditions in which such echo chambers emerge by introducing a simple model of information sharing in online social networks with the two ingredients of infl…
▽ More
While social media make it easy to connect with and access information from anyone, they also facilitate basic influence and unfriending mechanisms that may lead to segregated and polarized clusters known as "echo chambers." Here we study the conditions in which such echo chambers emerge by introducing a simple model of information sharing in online social networks with the two ingredients of influence and unfriending. Users can change both their opinions and social connections based on the information to which they are exposed through sharing. The model dynamics show that even with minimal amounts of influence and unfriending, the social network rapidly devolves into segregated, homogeneous communities. These predictions are consistent with empirical data from Twitter. Although our findings suggest that echo chambers are somewhat inevitable given the mechanisms at play in online social media, they also provide insights into possible mitigation strategies.
△ Less
Submitted 24 August, 2020; v1 submitted 9 May, 2019;
originally announced May 2019.
-
Arming the public with artificial intelligence to counter social bots
Authors:
Kai-Cheng Yang,
Onur Varol,
Clayton A. Davis,
Emilio Ferrara,
Alessandro Flammini,
Filippo Menczer
Abstract:
The increased relevance of social media in our daily life has been accompanied by efforts to manipulate online conversations and opinions. Deceptive social bots -- automated or semi-automated accounts designed to impersonate humans -- have been successfully exploited for these kinds of abuse. Researchers have responded by developing AI tools to arm the public in the fight against social bots. Here…
▽ More
The increased relevance of social media in our daily life has been accompanied by efforts to manipulate online conversations and opinions. Deceptive social bots -- automated or semi-automated accounts designed to impersonate humans -- have been successfully exploited for these kinds of abuse. Researchers have responded by developing AI tools to arm the public in the fight against social bots. Here we review the literature on different types of bots, their impact, and detection methods. We use the case study of Botometer, a popular bot detection tool developed at Indiana University, to illustrate how people interact with AI countermeasures. A user experience survey suggests that bot detection has become an integral part of the social media experience for many users. However, barriers in interpreting the output of AI tools can lead to fundamental misunderstandings. The arms race between machine learning methods to develop sophisticated bots and effective countermeasures makes it necessary to update the training data and features of detection tools. We again use the Botometer case to illustrate both algorithmic and interpretability improvements of bot scores, designed to meet user expectations. We conclude by discussing how future AI developments may affect the fight between malicious bots and the public.
△ Less
Submitted 6 February, 2019; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Quantifying Biases in Online Information Exposure
Authors:
Dimitar Nikolov,
Mounia Lalmas,
Alessandro Flammini,
Filippo Menczer
Abstract:
Our consumption of online information is mediated by filtering, ranking, and recommendation algorithms that introduce unintentional biases as they attempt to deliver relevant and engaging content. It has been suggested that our reliance on online technologies such as search engines and social media may limit exposure to diverse points of view and make us vulnerable to manipulation by disinformatio…
▽ More
Our consumption of online information is mediated by filtering, ranking, and recommendation algorithms that introduce unintentional biases as they attempt to deliver relevant and engaging content. It has been suggested that our reliance on online technologies such as search engines and social media may limit exposure to diverse points of view and make us vulnerable to manipulation by disinformation. In this paper, we mine a massive dataset of Web traffic to quantify two kinds of bias: (i) homogeneity bias, which is the tendency to consume content from a narrow set of information sources, and (ii) popularity bias, which is the selective exposure to content from top sites. Our analysis reveals different bias levels across several widely used Web platforms. Search exposes users to a diverse set of sources, while social media traffic tends to exhibit high popularity and homogeneity bias. When we focus our analysis on traffic to news sites, we find higher levels of popularity bias, with smaller differences across applications. Overall, our results quantify the extent to which our choices of online systems confine us inside "social bubbles."
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Weight Thresholding on Complex Networks
Authors:
Xiaoran Yan,
Lucas G. S. Jeub,
Alessandro Flammini,
Filippo Radicchi,
Santo Fortunato
Abstract:
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation…
▽ More
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation between topology and weight that characterizes real networks. On the other hand, the behavior of other properties is generally system dependent.
△ Less
Submitted 5 October, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Optimal modularity in complex contagion
Authors:
Azadeh Nematzadeh,
Nathaniel Rodriguez,
Alessandro Flammini,
Yong-Yeol Ahn
Abstract:
In this chapter, we apply the theoretical framework introduced in the previous chapter to study how the modular structure of the social network affects the spreading of complex contagion. In particular, we focus on the notion of optimal modularity, that predicts the occurrence of global cascades when the network exhibits just the right amount of modularity. Here we generalize the findings by assum…
▽ More
In this chapter, we apply the theoretical framework introduced in the previous chapter to study how the modular structure of the social network affects the spreading of complex contagion. In particular, we focus on the notion of optimal modularity, that predicts the occurrence of global cascades when the network exhibits just the right amount of modularity. Here we generalize the findings by assuming the presence of multiple communities and an uniform distribution of seeds across the network. Finally, we offer some insights into the temporal evolution of cascades in the regime of the optimal modularity.
△ Less
Submitted 31 May, 2018;
originally announced June 2018.
-
Anatomy of an online misinformation network
Authors:
Chengcheng Shao,
Pik-Mai Hui,
Lei Wang,
Xinwen Jiang,
Alessandro Flammini,
Filippo Menczer,
Giovanni Luca Ciampaglia
Abstract:
Massive amounts of fake news and conspiratorial content have spread over social media before and after the 2016 US Presidential Elections despite intense fact-checking efforts. How do the spread of misinformation and fact-checking compete? What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? How to reduce the overa…
▽ More
Massive amounts of fake news and conspiratorial content have spread over social media before and after the 2016 US Presidential Elections despite intense fact-checking efforts. How do the spread of misinformation and fact-checking compete? What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? How to reduce the overall amount of misinformation? To explore these questions we built Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. Hoaxy filters public tweets that include links to unverified claims or fact-checking articles. We perform k-core decomposition on a diffusion network obtained from two million retweets produced by several hundred thousand accounts over the six months before the election. As we move from the periphery to the core of the network, fact-checking nearly disappears, while social bots proliferate. The number of users in the main core reaches equilibrium around the time of the election, with limited churn and increasingly dense connections. We conclude by quantifying how effectively the network can be disrupted by penalizing the most central nodes. These findings provide a first look at the anatomy of a massive online misinformation diffusion network.
△ Less
Submitted 18 January, 2018;
originally announced January 2018.
-
RelSifter: Scoring Triples from Type-like Relations - The Samphire Triple Scorer at WSDM Cup 2017
Authors:
Prashant Shiralkar,
Mihai Avram,
Giovanni Luca Ciampaglia,
Filippo Menczer,
Alessandro Flammini
Abstract:
We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as 'profession' and 'nationality.' To provide additional contextual information about individuals and relations we supplement the data provided as part of the WSDM 2017 Triple Score contest with Wikidata and DBpedia, two large-scale knowledge graphs (KG).…
▽ More
We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as 'profession' and 'nationality.' To provide additional contextual information about individuals and relations we supplement the data provided as part of the WSDM 2017 Triple Score contest with Wikidata and DBpedia, two large-scale knowledge graphs (KG). Our hypothesis is that any type relation, i.e., a specific profession like 'actor' or 'scientist,' can be described by the set of typical "activities" of people known to have that type relation. For example, actors are known to star in movies, and scientists are known for their academic affiliations. In a KG, this information is to be found on a properly defined subset of the second-degree neighbors of the type relation. This form of local information can be used as part of a learning algorithm to predict relevance scores for new, unseen triples. When scoring 'profession' and 'nationality' triples our experiments based on this approach result in an accuracy equal to 73% and 78%, respectively. These performance metrics are roughly equivalent or only slightly below the state of the art prior to the present contest. This suggests that our approach can be effective for evaluating facts, despite the skewness in the number of facts per individual mined from KGs.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Finding Streams in Knowledge Graphs to Support Fact Checking
Authors:
Prashant Shiralkar,
Alessandro Flammini,
Filippo Menczer,
Giovanni Luca Ciampaglia
Abstract:
The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human f…
▽ More
The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker's productivity by surfacing relevant facts and patterns to aid their analysis. To this end, we present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a (subject, predicate, object) triple. We view a knowledge graph of background information about real-world entities as a flow network, and knowledge as a fluid, abstract commodity. We show that computational fact checking of such a triple then amounts to finding a "knowledge stream" that emanates from the subject node and flows toward the object node through paths connecting them. Evaluation on a range of real-world and hand-crafted datasets of facts related to entertainment, business, sports, geography and more reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful path patterns and surface relevant facts that may help a human fact checker corroborate or refute a claim.
△ Less
Submitted 23 August, 2017;
originally announced August 2017.
-
The spread of low-credibility content by social bots
Authors:
Chengcheng Shao,
Giovanni Luca Ciampaglia,
Onur Varol,
Kaicheng Yang,
Alessandro Flammini,
Filippo Menczer
Abstract:
The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to d…
▽ More
The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
△ Less
Submitted 24 May, 2018; v1 submitted 24 July, 2017;
originally announced July 2017.
-
How algorithmic popularity bias hinders or promotes quality
Authors:
Azadeh Nematzadeh,
Giovanni Luca Ciampaglia,
Filippo Menczer,
Alessandro Flammini
Abstract:
Algorithms that favor popular items are used to help us select among many choices, from engaging articles on a social media news feed to songs and books that others have purchased, and from top-raked search engine results to highly-cited scientific papers. The goal of these algorithms is to identify high-quality items such as reliable news, beautiful movies, prestigious information sources, and im…
▽ More
Algorithms that favor popular items are used to help us select among many choices, from engaging articles on a social media news feed to songs and books that others have purchased, and from top-raked search engine results to highly-cited scientific papers. The goal of these algorithms is to identify high-quality items such as reliable news, beautiful movies, prestigious information sources, and important discoveries --- in short, high-quality content should rank at the top. Prior work has shown that choosing what is popular may amplify random fluctuations and ultimately lead to sub-optimal rankings. Nonetheless, it is often assumed that recommending what is popular will help high-quality content "bubble up" in practice. Here we identify the conditions in which popularity may be a viable proxy for quality content by studying a simple model of cultural market endowed with an intrinsic notion of quality. A parameter representing the cognitive cost of exploration controls the critical trade-off between quality and popularity. We find a regime of intermediate exploration cost where an optimal balance exists, such that choosing what is popular actually promotes high-quality items to the top. Outside of these limits, however, popularity bias is more likely to hinder quality. These findings clarify the effects of algorithmic popularity bias on quality outcomes, and may inform the design of more principled mechanisms for techno-social cultural markets.
△ Less
Submitted 14 July, 2017; v1 submitted 3 July, 2017;
originally announced July 2017.
-
Early Detection of Promoted Campaigns on Social Media
Authors:
Onur Varol,
Emilio Ferrara,
Filippo Menczer,
Alessandro Flammini
Abstract:
Social media expose millions of users every day to information campaigns --- some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes. It is therefore important to be able to detect whethe…
▽ More
Social media expose millions of users every day to information campaigns --- some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes. It is therefore important to be able to detect whether a meme is being artificially promoted at the very moment it becomes wildly popular. This problem has important social implications and poses numerous technical challenges. As a first step, here we focus on discriminating between trending memes that are either organic or promoted by means of advertisement. The classification is not trivial: ads cause bursts of attention that can be easily mistaken for those of organic trends. We designed a machine learning framework to classify memes that have been labeled as trending on Twitter.After trending, we can rely on a large volume of activity data. Early detection, occurring immediately at trending time, is a more challenging problem due to the minimal volume of activity data that is available prior to trending.Our supervised learning framework exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user meta-data. We explore different methods for encoding feature time series. Using millions of tweets containing trending hashtags, we achieve 75% AUC score for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithms by introducing random temporal shifts on the trend time series. Feature selection analysis reveals that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available.
△ Less
Submitted 22 March, 2017;
originally announced March 2017.
-
Online Human-Bot Interactions: Detection, Estimation, and Characterization
Authors:
Onur Varol,
Emilio Ferrara,
Clayton A. Davis,
Filippo Menczer,
Alessandro Flammini
Abstract:
Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the…
▽ More
Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.
△ Less
Submitted 27 March, 2017; v1 submitted 8 March, 2017;
originally announced March 2017.
-
Limited individual attention and online virality of low-quality information
Authors:
Xiaoyan Qiu,
Diego F. M. Oliveira,
Alireza Sahami Shirazi,
Alessandro Flammini,
Filippo Menczer
Abstract:
Social media are massive marketplaces where ideas and news compete for our attention. Previous studies have shown that quality is not a necessary condition for online virality and that knowledge about peer choices can distort the relationship between quality and popularity. However, these results do not explain the viral spread of low-quality information, such as the digital misinformation that th…
▽ More
Social media are massive marketplaces where ideas and news compete for our attention. Previous studies have shown that quality is not a necessary condition for online virality and that knowledge about peer choices can distort the relationship between quality and popularity. However, these results do not explain the viral spread of low-quality information, such as the digital misinformation that threatens our democracy. We investigate quality discrimination in a stylized model of online social network, where individual agents prefer quality information, but have behavioral limitations in managing a heavy flow of information. We measure the relationship between the quality of an idea and its likelihood to become prevalent at the system level. We find that both information overload and limited attention contribute to a degradation in the market's discriminative power. A good tradeoff between discriminative power and diversity of information is possible according to the model. However, calibration with empirical data characterizing information load and finite attention in real social media reveals a weak correlation between quality and popularity of information. In these realistic conditions, the model predicts that high-quality information has little advantage over low-quality information.
△ Less
Submitted 10 January, 2019; v1 submitted 10 January, 2017;
originally announced January 2017.
-
Information Overload in Group Communication: From Conversation to Cacophony in the Twitch Chat
Authors:
Azadeh Nematzadeh,
Giovanni Luca Ciampaglia,
Yong-Yeol Ahn,
Alessandro Flammini
Abstract:
Online communication channels, especially social web platforms, are rapidly replacing traditional ones. Online platforms allow users to overcome physical barriers, enabling worldwide participation. However, the power of online communication bears an important negative consequence --- we are exposed to too much information to process. Too many participants, for example, can turn online public space…
▽ More
Online communication channels, especially social web platforms, are rapidly replacing traditional ones. Online platforms allow users to overcome physical barriers, enabling worldwide participation. However, the power of online communication bears an important negative consequence --- we are exposed to too much information to process. Too many participants, for example, can turn online public spaces into noisy, overcrowded fora where no meaningful conversation can be held. Here we analyze a large dataset of public chat logs from Twitch, a popular video streaming platform, in order to examine how information overload affects online group communication. We measure structural and textual features of conversations such as user output, interaction, and information content per message across a wide range of information loads. Our analysis reveals the existence of a transition from a conversational state to a cacophony --- a state of overload with lower user participation, more copy-pasted messages, and less information per message. These results hold both on average and at the individual level for the majority of users. This study provides a quantitative basis for further studies of the social effects of information overload, and may guide the design of more resilient online communication systems.
△ Less
Submitted 20 October, 2016;
originally announced October 2016.
-
Predicting online extremism, content adopters, and interaction reciprocity
Authors:
Emilio Ferrara,
Wen-Qiang Wang,
Onur Varol,
Alessandro Flammini,
Aram Galstyan
Abstract:
We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvem…
▽ More
We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvement with extremist campaigns. We also leverage millions of tweets generated by a random sample of 25 thousand regular users who were exposed to, or consumed, extremist content. We carry out three forecasting tasks, (i) to detect extremist users, (ii) to estimate whether regular users will adopt extremist content, and finally (iii) to predict whether users will reciprocate contacts initiated by extremists. All forecasting tasks are set up in two scenarios: a post hoc (time independent) prediction task on aggregated data, and a simulated real-time prediction task. The performance of our framework is extremely promising, yielding in the different forecasting scenarios up to 93% AUC for extremist user detection, up to 80% AUC for content adoption prediction, and finally up to 72% AUC for interaction reciprocity forecasting. We conclude by providing a thorough feature analysis that helps determine which are the emerging signals that provide predictive power in different scenarios.
△ Less
Submitted 2 May, 2016;
originally announced May 2016.
-
Hoaxy: A Platform for Tracking Online Misinformation
Authors:
Chengcheng Shao,
Giovanni Luca Ciampaglia,
Alessandro Flammini,
Filippo Menczer
Abstract:
Massive amounts of misinformation have been observed to spread in uncontrolled fashion across social media. Examples include rumors, hoaxes, fake news, and conspiracy theories. At the same time, several journalistic organizations devote significant efforts to high-quality fact checking of online claims. The resulting information cascades contain instances of both accurate and inaccurate informatio…
▽ More
Massive amounts of misinformation have been observed to spread in uncontrolled fashion across social media. Examples include rumors, hoaxes, fake news, and conspiracy theories. At the same time, several journalistic organizations devote significant efforts to high-quality fact checking of online claims. The resulting information cascades contain instances of both accurate and inaccurate information, unfold over multiple time scales, and often reach audiences of considerable size. All these factors pose challenges for the study of the social dynamics of online news sharing. Here we introduce Hoaxy, a platform for the collection, detection, and analysis of online misinformation and its related fact-checking efforts. We discuss the design of the platform and present a preliminary analysis of a sample of public tweets containing both fake news and fact checking. We find that, in the aggregate, the sharing of fact-checking content typically lags that of misinformation by 10--20 hours. Moreover, fake news are dominated by very active users, while fact checking is a more grass-roots activity. With the increasing risks connected to massive online misinformation, social news observatories have the potential to help researchers, journalists, and the general public understand the dynamics of real and fake news sharing.
△ Less
Submitted 4 March, 2016;
originally announced March 2016.
-
BotOrNot: A System to Evaluate Social Bots
Authors:
Clayton A. Davis,
Onur Varol,
Emilio Ferrara,
Alessandro Flammini,
Filippo Menczer
Abstract:
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOr…
▽ More
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
△ Less
Submitted 2 February, 2016;
originally announced February 2016.
-
The DARPA Twitter Bot Challenge
Authors:
V. S. Subrahmanian,
Amos Azaria,
Skylar Durst,
Vadim Kagan,
Aram Galstyan,
Kristina Lerman,
Linhong Zhu,
Emilio Ferrara,
Alessandro Flammini,
Filippo Menczer,
Andrew Stevens,
Alexander Dekhtyar,
Shuyang Gao,
Tad Hogg,
Farshad Kooti,
Yan Liu,
Onur Varol,
Prashant Shiralkar,
Vinod Vydiswaran,
Qiaozhu Mei,
Tim Hwang
Abstract:
A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before t…
▽ More
A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.
△ Less
Submitted 21 April, 2016; v1 submitted 19 January, 2016;
originally announced January 2016.
-
Defining and identifying Sleeping Beauties in science
Authors:
Qing Ke,
Emilio Ferrara,
Filippo Radicchi,
Alessandro Flammini
Abstract:
A Sleeping Beauty (SB) in science refers to a paper whose importance is not recognized for several years after publication. Its citation history exhibits a long hibernation period followed by a sudden spike of popularity. Previous studies suggest a relative scarcity of SBs. The reliability of this conclusion is, however, heavily dependent on identification methods based on arbitrary threshold para…
▽ More
A Sleeping Beauty (SB) in science refers to a paper whose importance is not recognized for several years after publication. Its citation history exhibits a long hibernation period followed by a sudden spike of popularity. Previous studies suggest a relative scarcity of SBs. The reliability of this conclusion is, however, heavily dependent on identification methods based on arbitrary threshold parameters for sleeping time and number of citations, applied to small or monodisciplinary bibliographic datasets. Here we present a systematic, large-scale, and multidisciplinary analysis of the SB phenomenon in science. We introduce a parameter-free measure that quantifies the extent to which a specific paper can be considered an SB. We apply our method to 22 million scientific papers published in all disciplines of natural and social sciences over a time span longer than a century. Our results reveal that the SB phenomenon is not exceptional. There is a continuous spectrum of delayed recognition where both the hibernation period and the awakening intensity are taken into account. Although many cases of SBs can be identified by looking at monodisciplinary bibliographic data, the SB phenomenon becomes much more apparent with the analysis of multidisciplinary datasets, where we can observe many examples of papers achieving delayed yet exceptional importance in disciplines different from those where they were originally published. Our analysis emphasizes a complex feature of citation dynamics that so far has received little attention, and also provides empirical evidence against the use of short-term citation metrics in the quantification of scientific impact.
△ Less
Submitted 24 May, 2015;
originally announced May 2015.
-
Attention on Weak Ties in Social and Communication Networks
Authors:
Lilian Weng,
Márton Karsai,
Nicola Perra,
Filippo Menczer,
Alessandro Flammini
Abstract:
Granovetter's weak tie theory of social networks is built around two central hypotheses. The first states that strong social ties carry the large majority of interaction events; the second maintains that weak social ties, although less active, are often relevant for the exchange of especially important information (e.g., about potential new jobs in Granovetter's work). While several empirical stud…
▽ More
Granovetter's weak tie theory of social networks is built around two central hypotheses. The first states that strong social ties carry the large majority of interaction events; the second maintains that weak social ties, although less active, are often relevant for the exchange of especially important information (e.g., about potential new jobs in Granovetter's work). While several empirical studies have provided support for the first hypothesis, the second has been the object of far less scrutiny. A possible reason is that it involves notions relative to the nature and importance of the information that are hard to quantify and measure, especially in large scale studies. Here, we search for empirical validation of both Granovetter's hypotheses. We find clear empirical support for the first. We also provide empirical evidence and a quantitative interpretation for the second. We show that attention, measured as the fraction of interactions devoted to a particular social connection, is high on weak ties --- possibly reflecting the postulated informational purposes of such ties --- but also on very strong ties. Data from online social media and mobile communication reveal network-dependent mixtures of these two effects on the basis of a platform's typical usage. Our results establish a clear relationships between attention, importance, and strength of social links, and could lead to improved algorithms to prioritize social media content.
△ Less
Submitted 31 August, 2017; v1 submitted 10 May, 2015;
originally announced May 2015.
-
Measuring Online Social Bubbles
Authors:
Dimitar Nikolov,
Diego F. M. Oliveira,
Alessandro Flammini,
Filippo Menczer
Abstract:
Social media have quickly become a prevalent channel to access information, spread ideas, and influence opinions. However, it has been suggested that social and algorithmic filtering may cause exposure to less diverse points of view, and even foster polarization and misinformation. Here we explore and validate this hypothesis quantitatively for the first time, at the collective and individual leve…
▽ More
Social media have quickly become a prevalent channel to access information, spread ideas, and influence opinions. However, it has been suggested that social and algorithmic filtering may cause exposure to less diverse points of view, and even foster polarization and misinformation. Here we explore and validate this hypothesis quantitatively for the first time, at the collective and individual levels, by mining three massive datasets of web traffic, search logs, and Twitter posts. Our analysis shows that collectively, people access information from a significantly narrower spectrum of sources through social media and email, compared to search. The significance of this finding for individual exposure is revealed by investigating the relationship between the diversity of information sources experienced by users at the collective and individual level. There is a strong correlation between collective and individual diversity, supporting the notion that when we use social media we find ourselves inside "social bubbles". Our results could lead to a deeper understanding of how technology biases our exposure to new information.
△ Less
Submitted 28 October, 2015; v1 submitted 25 February, 2015;
originally announced February 2015.
-
On predictability of rare events leveraging social media: a machine learning perspective
Authors:
Lei Le,
Emilio Ferrara,
Alessandro Flammini
Abstract:
Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be eff…
▽ More
Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be effectively leveraged are still unverified at least in a systematic way. It is also unclear how social-media-based predictions compare to those based on alternative information sources. To address these issues, here we develop a machine learning framework that leverages social media streams to automatically identify and predict the outcomes of soccer matches. We focus in particular on matches in which at least one of the possible outcomes is deemed as highly unlikely by professional bookmakers. We argue that sport events offer a systematic approach for testing the predictive power of social media, and allow to compare such power against the rigorous baselines set by external sources. Despite such strict baselines, our framework yields above 8% marginal profit when used to inform simple betting strategies. The system is based on real-time sentiment analysis and exploits data collected immediately before the games, allowing for informed bets. We discuss the rationale behind our approach, describe the learning framework, its prediction performance and the return it provides as compared to a set of betting strategies. To test our framework we use both historical Twitter data from the 2014 FIFA World Cup games, and real-time Twitter data collected by monitoring the conversations about all soccer matches of four major European tournaments (FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014.
△ Less
Submitted 20 February, 2015;
originally announced February 2015.
-
Computational fact checking from knowledge networks
Authors:
Giovanni Luca Ciampaglia,
Prashant Shiralkar,
Luis M. Rocha,
Johan Bollen,
Filippo Menczer,
Alessandro Flammini
Abstract:
Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under proper…
▽ More
Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.
△ Less
Submitted 14 January, 2015;
originally announced January 2015.
-
Quality versus quantity in scientific impact
Authors:
Jasleen Kaur,
Emilio Ferrara,
Filippo Menczer,
Alessandro Flammini,
Filippo Radicchi
Abstract:
Citation metrics are becoming pervasive in the quantitative evaluation of scholars, journals and institutions. More then ever before, hiring, promotion, and funding decisions rely on a variety of impact metrics that cannot disentangle quality from quantity of scientific output, and are biased by factors such as discipline and academic age. Biases affecting the evaluation of single papers are compo…
▽ More
Citation metrics are becoming pervasive in the quantitative evaluation of scholars, journals and institutions. More then ever before, hiring, promotion, and funding decisions rely on a variety of impact metrics that cannot disentangle quality from quantity of scientific output, and are biased by factors such as discipline and academic age. Biases affecting the evaluation of single papers are compounded when one aggregates citation-based metrics across an entire publication record. It is not trivial to compare the quality of two scholars that during their careers have published at different rates in different disciplines in different periods of time. We propose a novel solution based on the generation of a statistical baseline specifically tailored on the academic profile of each researcher. Our method can decouple the roles of quantity and quality of publications to explain how a certain level of impact is achieved. The method is flexible enough to allow for the evaluation of, and fair comparison among, arbitrary collections of papers --- scholar publication records, journals, and entire institutions; and can be extended to simultaneously suppresses any source of bias. We show that our method can capture the quality of the work of Nobel laureates irrespective of number of publications, academic age, and discipline, even when traditional metrics indicate low impact in absolute terms. We further apply our methodology to almost a million scholars and over six thousand journals to measure the impact that cannot be explained by the volume of publications alone.
△ Less
Submitted 15 December, 2014; v1 submitted 26 November, 2014;
originally announced November 2014.
-
Clustering memes in social media streams
Authors:
Mohsen JafariAsbagh,
Emilio Ferrara,
Onur Varol,
Filippo Menczer,
Alessandro Flammini
Abstract:
The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information car…
▽ More
The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them over time with the new ones. The evaluation of our framework is carried out by using a dataset of Twitter trending topics. Over a one-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.
-
The production of information in the attention economy
Authors:
Giovanni Luca Ciampaglia,
Alessandro Flammini,
Filippo Menczer
Abstract:
Online traces of human activity offer novel opportunities to study the dynamics of complex knowledge exchange networks, and in particular how the relationship between demand and supply of information is mediated by competition for our limited individual attention. The emergent patterns of collective attention determine what new information is generated and consumed. Can we measure the relationship…
▽ More
Online traces of human activity offer novel opportunities to study the dynamics of complex knowledge exchange networks, and in particular how the relationship between demand and supply of information is mediated by competition for our limited individual attention. The emergent patterns of collective attention determine what new information is generated and consumed. Can we measure the relationship between demand and supply for new information about a topic? Here we propose a normalization method to compare attention bursts statistics across topics that have an heterogeneous distribution of attention. Through analysis of a massive dataset on traffic to Wikipedia, we find that the production of new knowledge is associated to significant shifts of collective attention, which we take as a proxy for its demand. What we observe is consistent with a scenario in which the allocation of attention toward a topic stimulates the demand for information about it, and in turn the supply of further novel information. Our attempt to quantify demand and supply of information, and our finding about their temporal ordering, may lead to the development of the fundamental laws of the attention economy, and a better understanding of the social exchange of knowledge in online and offline information networks.
△ Less
Submitted 15 September, 2014;
originally announced September 2014.
-
The Rise of Social Bots
Authors:
Emilio Ferrara,
Onur Varol,
Clayton Davis,
Filippo Menczer,
Alessandro Flammini
Abstract:
The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media eco…
▽ More
The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media ecosystems, but their abundance is uncertain. While many bots are benign, one can design harmful bots with the goals of persuading, smearing, or deceiving. Here we discuss the characteristics of modern, sophisticated social bots, and how their presence can endanger online ecosystems and our society. We then review current efforts to detect social bots on Twitter. Features related to content, network, sentiment, and temporal patterns of activity are imitated by bots but at the same time can help discriminate synthetic behaviors from human ones, yielding signatures of engineered social tampering.
△ Less
Submitted 6 March, 2017; v1 submitted 19 July, 2014;
originally announced July 2014.
-
Evolution of Online User Behavior During a Social Upheaval
Authors:
Onur Varol,
Emilio Ferrara,
Christine L. Ogan,
Filippo Menczer,
Alessandro Flammini
Abstract:
Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first c…
▽ More
Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first characterize the spatio-temporal nature of the conversation about the Gezi Park demonstrations, showing that similarity in trends of discussion mirrors geographic cues. We then describe the characteristics of the users involved in this conversation and what roles they played. We study how roles and individual influence evolved during the period of the upheaval. This analysis reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. We conclude by observing how the online and offline worlds are tightly intertwined, showing that exogenous events, such as political speeches or police actions, affect social media conversations and trigger changes in individual behavior.
△ Less
Submitted 27 June, 2014;
originally announced June 2014.
-
Optimal network modularity for information diffusion
Authors:
Azadeh Nematzadeh,
Emilio Ferrara,
Alessandro Flammini,
Yong-Yeol Ahn
Abstract:
We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and nu…
▽ More
We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and numerical simulations, we demonstrate the existence of an optimal network modularity, where global diffusion require the minimal number of early adopters.
△ Less
Submitted 18 September, 2014; v1 submitted 6 January, 2014;
originally announced January 2014.
-
Traveling Trends: Social Butterflies or Frequent Fliers?
Authors:
Emilio Ferrara,
Onur Varol,
Filippo Menczer,
Alessandro Flammini
Abstract:
Trending topics are the online conversations that grab collective attention on social media. They are continually changing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken independently, trends and geography have been discuss…
▽ More
Trending topics are the online conversations that grab collective attention on social media. They are continually changing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken independently, trends and geography have been discussed in recent literature on online social media; although, so far, little has been done to characterize the relation between trends and geography. Here we investigate more than eleven thousand topics that trended on Twitter in 63 main US locations during a period of 50 days in 2013. This data allows us to study the origins and pathways of trends, how they compete for popularity at the local level to emerge as winners at the country level, and what dynamics underlie their production and consumption in different geographic areas. We identify two main classes of trending topics: those that surface locally, coinciding with three different geographic clusters (East coast, Midwest and Southwest); and those that emerge globally from several metropolitan areas, coinciding with the major air traffic hubs of the country. These hubs act as trendsetters, generating topics that eventually trend at the country level, and driving the conversation across the country. This poses an intriguing conjecture, drawing a parallel between the spread of information and diseases: Do trends travel faster by airplane than over the Internet?
△ Less
Submitted 9 October, 2013;
originally announced October 2013.
-
Clustering Memes in Social Media
Authors:
Emilio Ferrara,
Mohsen JafariAsbagh,
Onur Varol,
Vahed Qazvinian,
Filippo Menczer,
Alessandro Flammini
Abstract:
The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme,…
▽ More
The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.
△ Less
Submitted 9 October, 2013;
originally announced October 2013.
-
The Digital Evolution of Occupy Wall Street
Authors:
Michael D. Conover,
Emilio Ferrara,
Filippo Menczer,
Alessandro Flammini
Abstract:
We examine the temporal evolution of digital communication activity relating to the American anti-capitalist movement Occupy Wall Street. Using a high-volume sample from the microblogging site Twitter, we investigate changes in Occupy participant engagement, interests, and social connectivity over a fifteen month period starting three months prior to the movement's first protest action. The result…
▽ More
We examine the temporal evolution of digital communication activity relating to the American anti-capitalist movement Occupy Wall Street. Using a high-volume sample from the microblogging site Twitter, we investigate changes in Occupy participant engagement, interests, and social connectivity over a fifteen month period starting three months prior to the movement's first protest action. The results of this analysis indicate that, on Twitter, the Occupy movement tended to elicit participation from a set of highly interconnected users with pre-existing interests in domestic politics and foreign social movements. These users, while highly vocal in the months immediately following the birth of the movement, appear to have lost interest in Occupy related communication over the remainder of the study period.
△ Less
Submitted 23 June, 2013;
originally announced June 2013.
-
The Geospatial Characteristics of a Social Movement Communication Network
Authors:
Michael D. Conover,
Clayton Davis,
Emilio Ferrara,
Karissa McKelvey,
Filippo Menczer,
Alessandro Flammini
Abstract:
Social movements rely in large measure on networked communication technologies to organize and disseminate information relating to the movements' objectives. In this work we seek to understand how the goals and needs of a protest movement are reflected in the geographic patterns of its communication network, and how these patterns differ from those of stable political communication. To this end, w…
▽ More
Social movements rely in large measure on networked communication technologies to organize and disseminate information relating to the movements' objectives. In this work we seek to understand how the goals and needs of a protest movement are reflected in the geographic patterns of its communication network, and how these patterns differ from those of stable political communication. To this end, we examine an online communication network reconstructed from over 600,000 tweets from a thirty-six week period covering the birth and maturation of the American anticapitalist movement, Occupy Wall Street. We find that, compared to a network of stable domestic political communication, the Occupy Wall Street network exhibits higher levels of locality and a hub and spoke structure, in which the majority of non-local attention is allocated to high-profile locations such as New York, California, and Washington D.C. Moreover, we observe that information flows across state boundaries are more likely to contain framing language and references to the media, while communication among individuals in the same state is more likely to reference protest action and specific places and and times. Tying these results to social movement theory, we propose that these features reflect the movement's efforts to mobilize resources at the local level and to develop narrative frames that reinforce collective purpose at the national level.
△ Less
Submitted 23 June, 2013;
originally announced June 2013.
-
Stochastic fluctuations and the detectability limit of network communities
Authors:
Lucio Floretta,
Jonas Liechti,
Alessandro Flammini,
Paolo De Los Rios
Abstract:
We have analyzed the detectability limits of network communities in the framework of the popular Girvan and Newman benchmark. By carefully taking into account the inevitable stochastic fluctuations that affect the construction of each and every instance of the benchmark, we come to the conclusions that the native, putative partition of the network is completely lost even before the in-degree/out-d…
▽ More
We have analyzed the detectability limits of network communities in the framework of the popular Girvan and Newman benchmark. By carefully taking into account the inevitable stochastic fluctuations that affect the construction of each and every instance of the benchmark, we come to the conclusions that the native, putative partition of the network is completely lost even before the in-degree/out-degree ratio becomes equal to the one of a structure-less Erdös-Rényi network. We develop a simple iterative scheme, analytically well described by an infinite branching-process, to provide an estimate of the true detectability limit. Using various algorithms based on modularity optimization, we show that all of them behave (semi-quantitatively) in the same way, with the same functional form of the detectability threshold as a function of the network parameters. Because the same behavior has also been found by further modularity-optimization methods and for methods based on different heuristics implementations, we conclude that indeed a correct definition of the detectability limit must take into account the stochastic fluctuations of the network construction.
△ Less
Submitted 18 June, 2013; v1 submitted 10 June, 2013;
originally announced June 2013.