-
A Public Dataset For the ZKsync Rollup
Authors:
Maria Inês Silva,
Johnnatan Messias,
Benjamin Livshits
Abstract:
Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer~2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync…
▽ More
Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer~2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync Era archive node and made it freely available to external parties. In this paper, we provide details on this dataset and how it was created, showcase a few example analyses that can be performed with it, and discuss some future research directions. We also publish and share the code used in our analysis on GitHub to promote reproducibility and to support further research.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Layer-2 Arbitrage: An Empirical Analysis of Swap Dynamics and Price Disparities on Rollups
Authors:
Krzysztof Gogol,
Johnnatan Messias,
Deborah Miori,
Claudio Tessone,
Benjamin Livshits
Abstract:
This paper explores the dynamics of Decentralized Finance (DeFi) within the Layer-2 ecosystem, focusing on Automated Market Makers (AMM) and arbitrage on Ethereum rollups. We observe significant shifts in trading activity from Ethereum to rollups, with swaps on rollups happening 2-3 times more often, though, with lower trade volume. By examining the price differences between AMMs and centralized e…
▽ More
This paper explores the dynamics of Decentralized Finance (DeFi) within the Layer-2 ecosystem, focusing on Automated Market Makers (AMM) and arbitrage on Ethereum rollups. We observe significant shifts in trading activity from Ethereum to rollups, with swaps on rollups happening 2-3 times more often, though, with lower trade volume. By examining the price differences between AMMs and centralized exchanges, we discover over 0.5 million unexploited arbitrage opportunities on rollups. Remarkably, we observe that these opportunities last, on average, 10 to 20 blocks, requiring adjustments to the LVR metrics to avoid double-counting arbitrage. Our results show that arbitrage in Arbitrum, Base, and Optimism pools ranges from 0.03% to 0.05% of trading volume, while in zkSync Era it oscillates around 0.25%, with the LVR metric overestimating arbitrage by a factor of five. Rollups offer not only lower gas fees, but also provide faster block production, leading to significant differences compared to the trading and arbitrage dynamics of Ethereum.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
The Writing is on the Wall: Analyzing the Boom of Inscriptions and its Impact on EVM-compatible Blockchains
Authors:
Johnnatan Messias,
Krzysztof Gogol,
Maria Inês Silva,
Benjamin Livshits
Abstract:
Despite the level of attention given to rollups there is limited empirical research about their performance. To address this gap, we conduct a comprehensive data-driven analysis of the late 2023 transaction boom that is attributed to inscriptions: a novel approach to record data onto a blockchain with no outside server needed. Inscriptions were first introduced on the Bitcoin blockchain to allow f…
▽ More
Despite the level of attention given to rollups there is limited empirical research about their performance. To address this gap, we conduct a comprehensive data-driven analysis of the late 2023 transaction boom that is attributed to inscriptions: a novel approach to record data onto a blockchain with no outside server needed. Inscriptions were first introduced on the Bitcoin blockchain to allow for the representation of NFTs or ERC-20-like tokens without smart contracts, but were later spread to other blockchains.
This work examines the applications of inscription transactions in Ethereum and its major EVM-compatible rollups and their impact on blockchain scalability during periods of sudden transaction surges. We found that on certain days, inscription-related transactions comprised over 89% on Arbitrum, over 88% on zkSync Era, and over 53% on Ethereum. Furthermore, 99% of these transactions were related to the minting of meme coins, followed by limited trading activity. Unlike L1 blockchains, during periods of transaction surges, zkSync and Arbitrum experienced lower median gas fees, attributable to the compression of L2 transactions for a single L1 batch. Additionally, zkSync Era, a ZK rollup, demonstrated a stronger reduction in fees than optimistic rollups considered in our study: Arbitrum, Base, and Optimism.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios
Authors:
Reza Mahjourian,
Rongbing Mu,
Valerii Likhosherstov,
Paul Mougin,
Xiukun Huang,
Joao Messias,
Shimon Whiteson
Abstract:
This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenari…
▽ More
This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenario embedding, we ensure that the final generated scenario is fully conditioned on all available context in the existing scene. Our unified modeling approach, combined with autoregressive agent injection, conditions the placement and motion trajectory of every new agent on all existing agents and their trajectories, leading to realistic scenarios with low collision rates. Our experimental results show that UniGen outperforms prior state of the art on the Waymo Open Motion Dataset.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
The Writing is on the Wall: Analyzing the Boom of Inscriptions and its Impact on Rollup Performance and Cost Efficiency
Authors:
Krzysztof Gogol,
Johnnatan Messias,
Maria Ines Silva,
Benjamin Livshits
Abstract:
Late 2023 witnessed significant user activity on EVM chains, resulting in a surge in transaction activity and putting many rollups into the first live test. While some rollups performed well, some others experienced downtime during this period, affecting transaction finality time and gas fees. To address the lack of empirical research on rollups, we perform the first study during a heightened acti…
▽ More
Late 2023 witnessed significant user activity on EVM chains, resulting in a surge in transaction activity and putting many rollups into the first live test. While some rollups performed well, some others experienced downtime during this period, affecting transaction finality time and gas fees. To address the lack of empirical research on rollups, we perform the first study during a heightened activity during the late 2023 transaction boom, as attributed to inscriptions - a novel technique that enables NFT and ERC-20 token creation on Bitcoin and other blockchains. We observe that minting inscription-based meme tokens on zkSync Era allows for trading at a fraction of the costs, compared to the Bitcoin or Ethereum networks. We also found that the increased transaction activity, over 99% attributed to the minting of new inscription tokens, positively affected other users of zkSync Era, resulting in lowered gas fees. Unlike L1 blockchains, ZK rollups may experience lower gas fees with increased transaction volume. Lastly, the introduction of blobs - a form of temporary data storage - decreased the gas costs of Ethereum rollups, but also raised a number of questions about the security of inscription-based tokens.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Quantifying Arbitrage in Automated Market Makers: An Empirical Study of Ethereum ZK Rollups
Authors:
Krzysztof Gogol,
Johnnatan Messias,
Deborah Miori,
Claudio Tessone,
Benjamin Livshits
Abstract:
Arbitrage can arise from the simultaneous purchase and sale of the same asset in different markets in order to profit from a difference in its price. This work systematically reviews arbitrage opportunities between Automated Market Makers (AMMs) on Ethereum ZK rollups, and Centralised Exchanges (CEXs). First, we propose a theoretical framework to measure such arbitrage opportunities and derive a f…
▽ More
Arbitrage can arise from the simultaneous purchase and sale of the same asset in different markets in order to profit from a difference in its price. This work systematically reviews arbitrage opportunities between Automated Market Makers (AMMs) on Ethereum ZK rollups, and Centralised Exchanges (CEXs). First, we propose a theoretical framework to measure such arbitrage opportunities and derive a formula for the related Maximal Arbitrage Value (MAV) that accounts for both price divergences and liquidity available in the trading venues. Then, we empirically measure the historical MAV available between SyncSwap, an AMM on zkSync Era, and Binance, and investigate how quickly misalignments in price are corrected against explicit and implicit market costs. Overall, the cumulative MAV from July to September 2023 on the USDC-ETH SyncSwap pool amounts to $104.96k (0.24% of trading volume).
△ Less
Submitted 26 June, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Liquid Staking Tokens in Automated Market Makers
Authors:
Krzysztof Gogol,
Robin Fritsch,
Malte Schlosser,
Johnnatan Messias,
Benjamin Kraner,
Claudio Tessone
Abstract:
This paper studies liquid staking tokens (LSTs) on automated market makers (AMMs), both theoretically and empirically. LSTs are tokenized representations of staked assets on proof-of-stake blockchains. First, we model LST-liquidity on AMMs theoretically, categorizing suitable AMM types for LST liquidity and deriving formulas for the necessary returns from trading fees to adequately compensate liqu…
▽ More
This paper studies liquid staking tokens (LSTs) on automated market makers (AMMs), both theoretically and empirically. LSTs are tokenized representations of staked assets on proof-of-stake blockchains. First, we model LST-liquidity on AMMs theoretically, categorizing suitable AMM types for LST liquidity and deriving formulas for the necessary returns from trading fees to adequately compensate liquidity providers under the particular price trajectories of LSTs. For the latter, two relevant metrics are considered: (1) losses compared to holding the liquidity outside the AMM (loss-versus-holding, or "impermanent loss"), and (2) the relative profitability compared to fully staking the capital (loss-versus-staking) which is specifically tailored to the case of LST-liquidity. Next, we empirically measure these metrics for Ethereum LSTs across the most relevant AMM pools. We find that, while trading fees often compensate for impermanent loss, fully staking is more profitable for many pools, raising questions about the sustainability of the current LST liquidity allocation to AMMs.
△ Less
Submitted 19 July, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Cross-border Exchange of CBDCs using Layer-2 Blockchain
Authors:
Krzysztof Gogol,
Johnnatan Messias,
Malte Schlosser,
Benjamin Kraner,
Claudio Tessone
Abstract:
This paper proposes a novel multi-layer blockchain architecture for the cross-border trading of CBDCs. The permissioned layer-2, by relying on the public consensus of the underlying network, assures the security and integrity of the transactions and ensures interoperability with domestic CBDCs implementations. Multiple Layer-3s operate various Automated Market Makers (AMMs) and compete with each o…
▽ More
This paper proposes a novel multi-layer blockchain architecture for the cross-border trading of CBDCs. The permissioned layer-2, by relying on the public consensus of the underlying network, assures the security and integrity of the transactions and ensures interoperability with domestic CBDCs implementations. Multiple Layer-3s operate various Automated Market Makers (AMMs) and compete with each other for the lowest costs. To provide insights into the practical implications of the system, simulations of trading costs are conducted based on historical FX rates, with Project Mariana as a benchmark. The study shows that, even with liquidity fragmentation, a multi-layer and multi-AMM setup is more cost-efficient than a single AMM.
△ Less
Submitted 30 January, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Airdrops: Giving Money Away Is Harder Than It Seems
Authors:
Johnnatan Messias,
Aviv Yaish,
Benjamin Livshits
Abstract:
Airdrops are used by blockchain applications and protocols to attract an initial user base, and to grow the user base over time. In the case of many airdrops, tokens are distributed to select users as a "reward" for interacting with the underlying protocol, with a long-term goal of creating a loyal community that will generate genuine economic activity well after the airdrop. Although airdrops are…
▽ More
Airdrops are used by blockchain applications and protocols to attract an initial user base, and to grow the user base over time. In the case of many airdrops, tokens are distributed to select users as a "reward" for interacting with the underlying protocol, with a long-term goal of creating a loyal community that will generate genuine economic activity well after the airdrop. Although airdrops are widely used by the blockchain industry, a proper understanding of the factors contributing to an airdrop's success is generally lacking. In this work, we outline the design space for airdrops, and specify a reasonable list of outcomes that an airdrop should ideally result in. We then analyze on-chain data from several larger-scale airdrops to empirically evaluate the success of previous airdrops, with respect to our desiderata. In our analysis, we demonstrate that airdrop farmers frequently dispose of the lion's share of airdrops proceeds via exchanges. Our analysis is followed by an overview of common pitfalls that common airdrop designs lend themselves to, which are then used to suggest concrete guidelines for better airdrops.
△ Less
Submitted 24 May, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Understanding Blockchain Governance: Analyzing Decentralized Voting to Amend DeFi Smart Contracts
Authors:
Johnnatan Messias,
Vabuk Pahari,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
Smart contracts are contractual agreements between participants of a blockchain, who cannot implicitly trust one another. They are software programs that run on top of a blockchain, and we may need to change them from time to time (e.g., to fix bugs or address new use cases). Governance protocols define the means for amending or changing these smart contracts without any centralized authority. The…
▽ More
Smart contracts are contractual agreements between participants of a blockchain, who cannot implicitly trust one another. They are software programs that run on top of a blockchain, and we may need to change them from time to time (e.g., to fix bugs or address new use cases). Governance protocols define the means for amending or changing these smart contracts without any centralized authority. They distribute the decision-making power to every user of the smart contract: Users vote on accepting or rejecting every change.
In this work, we review and characterize decentralized governance in practice, using Compound and Uniswap -- two widely used governance protocols -- as a case study. We reveal a high concentration of voting power in both Compound and Uniswap: 10 voters hold together 57.86% and 44.72% of the voting power, respectively. Although proposals to change or amend the protocol receive, on average, a substantial number of votes (i.e., 89.39%) in favor within the Compound protocol, they require fewer than three voters to obtain 50% or more votes. We show that voting on Compound proposals can be unfairly expensive for small token holders, and we discover voting coalitions that can further marginalize these users.
△ Less
Submitted 21 April, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Dissecting Bitcoin and Ethereum Transactions: On the Lack of Transaction Contention and Prioritization Transparency in Blockchains
Authors:
Johnnatan Messias,
Vabuk Pahari,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
In permissionless blockchains, transaction issuers include a fee to incentivize miners to include their transactions. To accurately estimate this prioritization fee for a transaction, transaction issuers (or blockchain participants, more generally) rely on two fundamental notions of transparency, namely contention and prioritization transparency. Contention transparency implies that participants a…
▽ More
In permissionless blockchains, transaction issuers include a fee to incentivize miners to include their transactions. To accurately estimate this prioritization fee for a transaction, transaction issuers (or blockchain participants, more generally) rely on two fundamental notions of transparency, namely contention and prioritization transparency. Contention transparency implies that participants are aware of every pending transaction that will contend with a given transaction for inclusion. Prioritization transparency states that the participants are aware of the transaction or prioritization fees paid by every such contending transaction. Neither of these notions of transparency holds well today. Private relay networks, for instance, allow users to send transactions privately to miners. Besides, users can offer fees to miners via either direct transfers to miners' wallets or off-chain payments -- neither of which are public. In this work, we characterize the lack of contention and prioritization transparency in Bitcoin and Ethereum resulting from such practices. We show that private relay networks are widely used and private transactions are quite prevalent. We show that the lack of transparency facilitates miners to collude and overcharge users who may use these private relay networks despite them offering little to no guarantees on transaction prioritization. The lack of these transparencies in blockchains has crucial implications for transaction issuers as well as the stability of blockchains. Finally, we make our data sets and scripts publicly available.
△ Less
Submitted 24 May, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving
Authors:
Angad Singh,
Omar Makhlouf,
Maximilian Igl,
Joao Messias,
Arnaud Doucet,
Shimon Whiteson
Abstract:
Multi-object state estimation is a fundamental problem for robotic applications where a robot must interact with other moving objects. Typically, other objects' relevant state features are not directly observable, and must instead be inferred from observations. Particle filtering can perform such inference given approximate transition and observation models. However, these models are often unknown…
▽ More
Multi-object state estimation is a fundamental problem for robotic applications where a robot must interact with other moving objects. Typically, other objects' relevant state features are not directly observable, and must instead be inferred from observations. Particle filtering can perform such inference given approximate transition and observation models. However, these models are often unknown a priori, yielding a difficult parameter estimation problem since observations jointly carry transition and observation noise. In this work, we consider learning maximum-likelihood parameters using particle methods. Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates. By contrast, we exploit Fisher's identity to obtain a particle-based approximation of the score function (the gradient of the log likelihood) that yields a low variance estimate while only requiring stepwise differentiation through the transition and observation models. We apply our method to real data collected from autonomous vehicles (AVs) and show that it learns better models than existing techniques and is more stable in training, yielding an effective smoother for tracking the trajectories of vehicles around an AV.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Selfish & Opaque Transaction Ordering in the Bitcoin Blockchain: The Case for Chain Neutrality
Authors:
Johnnatan Messias,
Mohamed Alzayat,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau,
Alan Mislove
Abstract:
Most public blockchain protocols, including the popular Bitcoin and Ethereum blockchains, do not formally specify the order in which miners should select transactions from the pool of pending (or uncommitted) transactions for inclusion in the blockchain. Over the years, informal conventions or "norms" for transaction ordering have, however, emerged via the use of shared software by miners, e.g., t…
▽ More
Most public blockchain protocols, including the popular Bitcoin and Ethereum blockchains, do not formally specify the order in which miners should select transactions from the pool of pending (or uncommitted) transactions for inclusion in the blockchain. Over the years, informal conventions or "norms" for transaction ordering have, however, emerged via the use of shared software by miners, e.g., the GetBlockTemplate (GBT) mining protocol in Bitcoin Core. Today, a widely held view is that Bitcoin miners prioritize transactions based on their offered "transaction fee-per-byte." Bitcoin users are, consequently, encouraged to increase the fees to accelerate the commitment of their transactions, particularly during periods of congestion. In this paper, we audit the Bitcoin blockchain and present statistically significant evidence of mining pools deviating from the norms to accelerate the commitment of transactions for which they have (i) a selfish or vested interest, or (ii) received dark-fee payments via opaque (non-public) side-channels. As blockchains are increasingly being used as a record-keeping substrate for a variety of decentralized (financial technology) systems, our findings call for an urgent discussion on defining neutrality norms that miners must adhere to when ordering transactions in the chains. Finally, we make our data sets and scripts publicly available.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Modeling Coordinated vs. P2P Mining: An Analysis of Inefficiency and Inequality in Proof-of-Work Blockchains
Authors:
Mohamed Alzayat,
Johnnatan Messias,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
We study efficiency in a proof-of-work blockchain with non-zero latencies, focusing in particular on the (inequality in) individual miners' efficiencies. Prior work attributed differences in miners' efficiencies mostly to attacks, but we pursue a different question: Can inequality in miners' efficiencies be explained by delays, even when all miners are honest? Traditionally, such efficiency-relate…
▽ More
We study efficiency in a proof-of-work blockchain with non-zero latencies, focusing in particular on the (inequality in) individual miners' efficiencies. Prior work attributed differences in miners' efficiencies mostly to attacks, but we pursue a different question: Can inequality in miners' efficiencies be explained by delays, even when all miners are honest? Traditionally, such efficiency-related questions were tackled only at the level of the overall system, and in a peer-to-peer (P2P) setting where miners directly connect to one another. Despite it being common today for miners to pool compute capacities in a mining pool managed by a centralized coordinator, efficiency in such a coordinated setting has barely been studied.
In this paper, we propose a simple model of a proof-of-work blockchain with latencies for both the P2P and the coordinated settings. We derive a closed-form expression for the efficiency in the coordinated setting with an arbitrary number of miners and arbitrary latencies, both for the overall system and for each individual miner. We leverage this result to show that inequalities arise from variability in the delays, but that if all miners are equidistant from the coordinator, they have equal efficiency irrespective of their compute capacities. We then prove that, under a natural consistency condition, the overall system efficiency in the P2P setting is higher than that in the coordinated setting. Finally, we perform a simulation-based study to demonstrate that even in the P2P setting delays between miners introduce inequalities, and that there is a more complex interplay between delays and compute capacities.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Learning from Demonstration in the Wild
Authors:
Feryal Behbahani,
Kyriacos Shiarlis,
Xi Chen,
Vitaly Kurin,
Sudhanshu Kasewa,
Ciprian Stirbu,
João Gomes,
Supratik Paul,
Frans A. Oliehoek,
João Messias,
Shimon Whiteson
Abstract:
Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring an…
▽ More
Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose Video to Behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.
△ Less
Submitted 25 March, 2019; v1 submitted 8 November, 2018;
originally announced November 2018.
-
On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook
Authors:
Filipe N. Ribeiro,
Koustuv Saha,
Mahmoudreza Babaei,
Lucas Henrique,
Johnnatan Messias,
Fabricio Benevenuto,
Oana Goga,
Krishna P. Gummadi,
Elissa M. Redmiles
Abstract:
Targeted advertising is meant to improve the efficiency of matching advertisers to their customers. However, targeted advertising can also be abused by malicious advertisers to efficiently reach people susceptible to false stories, stoke grievances, and incite social conflict. Since targeted ads are not seen by non-targeted and non-vulnerable people, malicious ads are likely to go unreported and t…
▽ More
Targeted advertising is meant to improve the efficiency of matching advertisers to their customers. However, targeted advertising can also be abused by malicious advertisers to efficiently reach people susceptible to false stories, stoke grievances, and incite social conflict. Since targeted ads are not seen by non-targeted and non-vulnerable people, malicious ads are likely to go unreported and their effects undetected. This work examines a specific case of malicious advertising, exploring the extent to which political ads from the Russian Intelligence Research Agency (IRA) run prior to 2016 U.S. elections exploited Facebook's targeted advertising infrastructure to efficiently target ads on divisive or polarizing topics (e.g., immigration, race-based policing) at vulnerable sub-populations. In particular, we do the following: (a) We conduct U.S. census-representative surveys to characterize how users with different political ideologies report, approve, and perceive truth in the content of the IRA ads. Our surveys show that many ads are "divisive": they elicit very different reactions from people belonging to different socially salient groups. (b) We characterize how these divisive ads are targeted to sub-populations that feel particularly aggrieved by the status quo. Our findings support existing calls for greater transparency of content and targeting of political ads. (c) We particularly focus on how the Facebook ad API facilitates such targeting. We show how the enormous amount of personal data Facebook aggregates about users and makes available to advertisers enables such malicious targeting.
△ Less
Submitted 21 November, 2018; v1 submitted 28 August, 2018;
originally announced August 2018.
-
Characterizing Interconnections and Linguistic Patterns in Twitter
Authors:
Johnnatan Messias
Abstract:
Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic aspect. Despite numerous efforts that explore demographic aspects in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this dissertation, we attempt to identify gender and race of T…
▽ More
Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic aspect. Despite numerous efforts that explore demographic aspects in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this dissertation, we attempt to identify gender and race of Twitter users located in the United States using advanced image processing algorithms from Face++. We investigate how different demographic groups connect with each other and differentiate them regarding linguistic styles and also their interests. We quantify to what extent one group follows and interacts with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. We also extract linguistic features from six categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus) in order to identify the similarities and the differences in the messages they share in Twitter. Furthermore, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we use the topics of interest that we retrieve from each user. Our analysis shows that users identified as white and male tend to attain higher positions, in terms of the number of followers and number of times in another user's lists, in Twitter. There are clear differences in the way of writing across different demographic groups in both gender and race domains as well as in the topic of interest. We hope our effort can stimulate the development of new theories of demographic information in the online space. Finally, we developed a Web-based system that leverages the demographic aspects of users to provide transparency to the Twitter trending topics system.
△ Less
Submitted 30 March, 2018;
originally announced April 2018.
-
White, Man, and Highly Followed: Gender and Race Inequalities in Twitter
Authors:
Johnnatan Messias,
Pantelis Vikatos,
Fabricio Benevenuto
Abstract:
Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic factor. Despite numerous efforts that explore demographic factors in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this paper, we attempt to identify gender and race of Twitter…
▽ More
Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic factor. Despite numerous efforts that explore demographic factors in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this paper, we attempt to identify gender and race of Twitter users located in U.S. using advanced image processing algorithms from Face++. Then, we investigate how different demographic groups (i.e. male/female, Asian/Black/White) connect with other. We quantify to what extent one group follow and interact with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. Our analysis shows that users identified as White and male tend to attain higher positions in Twitter, in terms of the number of followers and number of times in user's lists. We hope our effort can stimulate the development of new theories of demographic information in the online space.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
Demographics of News Sharing in the U.S. Twittersphere
Authors:
Julio C. S. Reis,
Haewoon Kwak,
Jisun An,
Johnnatan Messias,
Fabricio Benevenuto
Abstract:
The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterizati…
▽ More
The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach. Among our main findings, we show that males and white users tend to be more active in terms of sharing news, biasing the news audience to the interests of these demographic groups. Our results also quantify differences in interests of news sharing across demographics, which has implications for personalized news digests.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
Linguistic Diversities of Demographic Groups in Twitter
Authors:
Pantelis Vikatos,
Johnnatan Messias,
Manoel Miranda,
Fabricio Benevenuto
Abstract:
The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algor…
▽ More
The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media
Authors:
Juhi Kulshrestha,
Motahhare Eslami,
Johnnatan Messias,
Muhammad Bilal Zafar,
Saptarshi Ghosh,
Krishna P. Gummadi,
Karrie Karahalios
Abstract:
Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises f…
▽ More
Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises from the data that serves as the input to the ranking system and the bias that arises from the ranking system itself. In this paper, we propose a framework to quantify these distinct biases and apply this framework to politics-related queries on Twitter. We found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results and in different ways. We discuss the consequences of these biases and possible mechanisms to signal this bias in social media search systems' interfaces.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations
Authors:
Abhijnan Chakraborty,
Johnnatan Messias,
Fabricio Benevenuto,
Saptarshi Ghosh,
Niloy Ganguly,
Krishna P. Gummadi
Abstract:
Users of social media sites like Facebook and Twitter rely on crowdsourced content recommendation systems (e.g., Trending Topics) to retrieve important and useful information. Contents selected for recommendation indirectly give the initial users who promoted (by liking or posting) the content an opportunity to propagate their messages to a wider audience. Hence, it is important to understand the…
▽ More
Users of social media sites like Facebook and Twitter rely on crowdsourced content recommendation systems (e.g., Trending Topics) to retrieve important and useful information. Contents selected for recommendation indirectly give the initial users who promoted (by liking or posting) the content an opportunity to propagate their messages to a wider audience. Hence, it is important to understand the demographics of people who make a content worthy of recommendation, and explore whether they are representative of the media site's overall population. In this work, using extensive data collected from Twitter, we make the first attempt to quantify and explore the demographic biases in the crowdsourced recommendations. Our analysis, focusing on the selection of trending topics, finds that a large fraction of trends are promoted by crowds whose demographics are significantly different from the overall Twitter population. More worryingly, we find that certain demographic groups are systematically under-represented among the promoters of the trending topics. To make the demographic biases in Twitter trends more transparent, we developed and deployed a Web-based service 'Who-Makes-Trends' at twitter-app.mpi-sws.org/who-makes-trends.
△ Less
Submitted 1 April, 2017;
originally announced April 2017.
-
From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies
Authors:
Johnnatan Messias,
Fabricio Benevenuto,
Ingmar Weber,
Emilio Zagheni
Abstract:
Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millio…
▽ More
Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millions of Google+ users in order to study migration "clusters", i.e. groups of countries in which individuals have lived. For the first time, we consider information about more than two countries people have lived in. We argue that these data are very valuable because this type of information is not available in traditional demographic sources which record country-to-country migration flows independent of each other. We show that migration clusters of country triads cannot be identified using information about bilateral flows alone. To demonstrate the additional insights that can be gained by using data about migration clusters, we first develop a model that tries to predict the prevalence of a given triad using only data about its constituent pairs. We then inspect the groups of three countries which are more or less prominent, compared to what we would expect based on bilateral flows alone. Next, we identify a set of features such as a shared language or colonial ties that explain which triple of country pairs are more or less likely to be clustered when looking at country triples. Then we select and contrast a few cases of clusters that provide some qualitative information about what our data set shows. The type of data that we use is potentially available for a number of social media services. We hope that this first study about migration clusters will stimulate the use of Web data for the development of new theories of international migration that could not be tested appropriately before.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.