Search | arXiv e-print repository

Exploring the Use of Abusive Generative AI Models on Civitai

Authors: Yiluo Wei, Yiming Zhu, Pan Hui, Gareth Tyson

Abstract: The rise of generative AI is transforming the landscape of digital imagery, and exerting a significant influence on online creative communities. This has led to the emergence of AI-Generated Content (AIGC) social platforms, such as Civitai. These distinctive social platforms allow users to build and share their own generative AI models, thereby enhancing the potential for more diverse artistic exp… ▽ More The rise of generative AI is transforming the landscape of digital imagery, and exerting a significant influence on online creative communities. This has led to the emergence of AI-Generated Content (AIGC) social platforms, such as Civitai. These distinctive social platforms allow users to build and share their own generative AI models, thereby enhancing the potential for more diverse artistic expression. Designed in the vein of social networks, they also provide artists with the means to showcase their creations (generated from the models), engage in discussions, and obtain feedback, thus nurturing a sense of community. Yet, this openness also raises concerns about the abuse of such platforms, e.g., using models to disseminate deceptive deepfakes or infringe upon copyrights. To explore this, we conduct the first comprehensive empirical study of an AIGC social platform, focusing on its use for generating abusive content. As an exemplar, we construct a comprehensive dataset covering Civitai, the largest available AIGC social platform. Based on this dataset of 87K models and 2M images, we explore the characteristics of content and discuss strategies for moderation to better govern these platforms. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted to ACM Multimedia 2024

arXiv:2407.06422 [pdf, other]

Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version)

Authors: Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, Gareth Tyson

Abstract: Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for social computing tasks, aiming to reduce the complexity and cost of undertaking web research. To evaluate ChatGPT's potential, we re-annotate seven datasets using C… ▽ More Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for social computing tasks, aiming to reduce the complexity and cost of undertaking web research. To evaluate ChatGPT's potential, we re-annotate seven datasets using ChatGPT, covering topics related to pressing social issues like COVID-19 misinformation, social bot deception, cyberbully, clickbait news, and the Russo-Ukrainian War. Our findings demonstrate that ChatGPT exhibits promise in handling these data annotation tasks, albeit with some challenges. Across the seven datasets, ChatGPT achieves an average annotation F1-score of 72.00%. Its performance excels in clickbait news annotation, correctly labeling 89.66% of the data. However, we also observe significant variations in performance across individual labels. Our study reveals predictable patterns in ChatGPT's annotation performance. Thus, we propose GPT-Rater, a tool to predict if ChatGPT can correctly label data for a given annotation task. Researchers can use this to identify where ChatGPT might be suitable for their annotation requirements. We show that GPT-Rater effectively predicts ChatGPT's performance. It performs best on a clickbait headlines dataset by achieving an average F1-score of 95.00%. We believe that this research opens new avenues for analysis and can reduce barriers to engaging in social computing research. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Extended version of accepted short paper to ASONAM 2024. arXiv admin note: text overlap with arXiv:2304.10145

arXiv:2407.03255 [pdf, other]

How Similar Are Elected Politicians and Their Constituents? Quantitative Evidence From Online Social Networks

Authors: Waleed Iqbal, Gareth Tyson, Ignacio Castro

Abstract: How similar are politicians to those who vote for them? This is a critical question at the heart of democratic representation and particularly relevant at times when political dissatisfaction and populism are on the rise. To answer this question we compare the online discourse of elected politicians and their constituents. We collect a two and a half years (September 2020 - February 2023) constitu… ▽ More How similar are politicians to those who vote for them? This is a critical question at the heart of democratic representation and particularly relevant at times when political dissatisfaction and populism are on the rise. To answer this question we compare the online discourse of elected politicians and their constituents. We collect a two and a half years (September 2020 - February 2023) constituency-level dataset for USA and UK that includes: (i) the Twitter timelines (5.6 Million tweets) of elected political representatives (595 UK Members of Parliament and 433 USA Representatives), (ii) the Nextdoor posts (21.8 Million posts) of the constituency (98.4% USA and 91.5% UK constituencies). We find that elected politicians tend to be equally similar to their constituents in terms of content and style regardless of whether a constituency elects a right or left-wing politician. The size of the electoral victory and the level of income of a constituency shows a nuanced picture. The narrower the electoral victory, the more similar the style and the more dissimilar the content is. The lower the income of a constituency, the more similar the content is. In terms of style, poorer constituencies tend to have a more similar sentiment and more dissimilar psychological text traits (i.e. measured with LIWC categories). △ Less

Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.19277 [pdf, other]

The Emergence of Threads: The Birth of a New Social Network

Authors: Peixian Zhang, Yupeng He, Ehsan-Ul Haq, Jiahui He, Gareth Tyson

Abstract: Threads, a new microblogging platform from Meta, was launched in July 2023. In contrast to prior new platforms, Threads was borne out of an existing parent platform, Instagram, for which all users must already possess an account. This offers a unique opportunity to study platform evolution, to understand how one existing platform can support the "birth" of another. With this in mind, this paper pr… ▽ More Threads, a new microblogging platform from Meta, was launched in July 2023. In contrast to prior new platforms, Threads was borne out of an existing parent platform, Instagram, for which all users must already possess an account. This offers a unique opportunity to study platform evolution, to understand how one existing platform can support the "birth" of another. With this in mind, this paper provides an initial exploration of Threads, contrasting it with its parent, Instagram. We compare user behaviour within and across the two social media platforms, focusing on posting frequency, content preferences, and engagement patterns. Utilising a temporal analysis framework, we identify consistent daily posting trends on the parent platform and uncover contrasting behaviours when comparing intra-platform and cross-platform activities. Our findings reveal that Threads engages more with political and AI-related topics, compared to Instagram which focuses more on lifestyle and fashion topics. Our analysis also shows that user activities align more closely on weekends across both platforms. Engagement analysis suggests that users prefer to post about topics that garner more likes and that topic consistency is maintained when users transition from Instagram to Threads. Our research provides insights into user behaviour and offers a basis for future studies on Threads. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.07430 [pdf, other]

Learning Domain-Invariant Features for Out-of-Context News Detection

Authors: Yimeng Gu, Mengqi Zhang, Ignacio Castro, Shu Wu, Gareth Tyson

Abstract: Multimodal out-of-context news is a common type of misinformation on online media platforms. This involves posting a caption, alongside an invalid out-of-context news image. Reflecting its importance, researchers have developed models to detect such misinformation. However, a common limitation of these models is that they only consider the scenario where pre-labeled data is available for each doma… ▽ More Multimodal out-of-context news is a common type of misinformation on online media platforms. This involves posting a caption, alongside an invalid out-of-context news image. Reflecting its importance, researchers have developed models to detect such misinformation. However, a common limitation of these models is that they only consider the scenario where pre-labeled data is available for each domain, failing to address the out-of-context news detection on unlabeled domains (e.g., unverified news on new topics or agencies). In this work, we therefore focus on domain adaptive out-of-context news detection. In order to effectively adapt the detection model to unlabeled news topics or agencies, we propose ConDA-TTA (Contrastive Domain Adaptation with Test-Time Adaptation) which applies contrastive learning and maximum mean discrepancy (MMD) to learn the domain-invariant feature. In addition, it leverages target domain statistics during test-time to further assist domain adaptation. Experimental results show that our approach outperforms baselines in 5 out of 7 domain adaptation settings on two public datasets, by as much as 2.93% in F1 and 2.08% in accuracy. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2404.12738 [pdf, other]

doi 10.1109/TNET.2024.3398778

DeviceRadar: Online IoT Device Fingerprinting in ISPs using Programmable Switches

Authors: Ruoyu Li, Qing Li, Tao Lin, Qingsong Zou, Dan Zhao, Yucheng Huang, Gareth Tyson, Guorui Xie, Yong Jiang

Abstract: Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of teraby… ▽ More Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of terabytes per day in ISP networks. This paper proposes DeviceRadar, an online IoT device fingerprinting framework that achieves accurate, real-time processing in ISPs using programmable switches. We innovatively exploit "key packets" as a basis of fingerprints only using packet sizes and directions, which appear periodically while exhibiting differences across different IoT devices. To utilize them, we propose a packet size embedding model to discover the spatial relationships between packets. Meanwhile, we design an algorithm to extract the "key packets" of each device, and propose an approach that jointly considers the spatial relationships and the key packets to produce a neighboring key packet distribution, which can serve as a feature vector for machine learning models for inference. Last, we design a model transformation method and a feature extraction process to deploy the model on a programmable data plane within its constrained arithmetic operations and memory to achieve line-speed processing. Our experiments show that DeviceRadar can achieve state-of-the-art accuracy across 77 IoT devices with 40 Gbps throughput, and requires only 1.3% of the processing time compared to GPU-accelerated approaches. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE/ACM Transactions on Networking (ToN)

arXiv:2404.03048 [pdf, other]

Decentralised Moderation for Interoperable Social Networks: A Conversation-based Approach for Pleroma and the Fediverse

Authors: Vibhor Agarwal, Aravindh Raman, Nishanth Sastry, Ahmed M. Abdelmoniem, Gareth Tyson, Ignacio Castro

Abstract: The recent development of decentralised and interoperable social networks (such as the "fediverse") creates new challenges for content moderators. This is because millions of posts generated on one server can easily "spread" to another, even if the recipient server has very different moderation policies. An obvious solution would be to leverage moderation tools to automatically tag (and filter) po… ▽ More The recent development of decentralised and interoperable social networks (such as the "fediverse") creates new challenges for content moderators. This is because millions of posts generated on one server can easily "spread" to another, even if the recipient server has very different moderation policies. An obvious solution would be to leverage moderation tools to automatically tag (and filter) posts that contravene moderation policies, e.g. related to toxic speech. Recent work has exploited the conversational context of a post to improve this automatic tagging, e.g. using the replies to a post to help classify if it contains toxic speech. This has shown particular potential in environments with large training sets that contain complete conversations. This, however, creates challenges in a decentralised context, as a single conversation may be fragmented across multiple servers. Thus, each server only has a partial view of an entire conversation because conversations are often federated across servers in a non-synchronized fashion. To address this, we propose a decentralised conversation-aware content moderation approach suitable for the fediverse. Our approach employs a graph deep learning model (GraphNLI) trained locally on each server. The model exploits local data to train a model that combines post and conversational information captured through random walks to detect toxicity. We evaluate our approach with data from Pleroma, a major decentralised and interoperable micro-blogging network containing 2 million conversations. Our model effectively detects toxicity on larger instances, exclusively trained using their local post information (0.8837 macro-F1). Our approach has considerable scope to improve moderation in decentralised and interoperable social networks such as Pleroma or Mastodon. △ Less

Submitted 16 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted at International AAAI Conference on Web and Social Media (ICWSM) 2024. Please cite accordingly!

arXiv:2402.18463 [pdf, other]

Understanding the Impact of AI Generated Content on Social Media: The Pixiv Case

Authors: Yiluo Wei, Gareth Tyson

Abstract: In the last two years, Artificial Intelligence Generated Content (AIGC) has received significant attention, leading to an anecdotal rise in the amount of AIGC being shared via social media platforms. The impact of AIGC and its implications are of key importance to social platforms, e.g., regarding the implementation of policies, community formation, and algorithmic design. Yet, to date, we know li… ▽ More In the last two years, Artificial Intelligence Generated Content (AIGC) has received significant attention, leading to an anecdotal rise in the amount of AIGC being shared via social media platforms. The impact of AIGC and its implications are of key importance to social platforms, e.g., regarding the implementation of policies, community formation, and algorithmic design. Yet, to date, we know little about how the arrival of AIGC has impacted the social media ecosystem. To fill this gap, we present a comprehensive study of Pixiv, an online community for artists who wish to share and receive feedback on their illustrations. Pixiv hosts over 100 million artistic submissions and receives more than 1 billion page views per month (as of 2023). Importantly, it allows both human and AI generated content to be uploaded. Exploiting this, we perform the first analysis of the impact that AIGC has had on the social media ecosystem, through the lens of Pixiv. Based on a dataset of 15.2 million posts (including 2.4 million AI-generated images), we measure the impact of AIGC on the Pixiv community, as well as the differences between AIGC and human-generated content in terms of content creation and consumption patterns. Our results offer key insight to how AIGC is changing the dynamics of social media platforms like Pixiv. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.05709 [pdf, other]

Exploring the Nostr Ecosystem: A Study of Decentralization and Resilience

Authors: Yiluo Wei, Gareth Tyson

Abstract: Nostr is an open decentralized social network launched in 2022. From a user's perspective, it is similar to a micro-blogging service like Twitter. However, the underlying infrastructure is very different, and Nostr boasts a range of unique features that set it apart. Nostr introduces the concept of relays, which act as open storage servers that receive, store, and distribute user posts. Each user… ▽ More Nostr is an open decentralized social network launched in 2022. From a user's perspective, it is similar to a micro-blogging service like Twitter. However, the underlying infrastructure is very different, and Nostr boasts a range of unique features that set it apart. Nostr introduces the concept of relays, which act as open storage servers that receive, store, and distribute user posts. Each user is uniquely identified by a public key, ensuring authenticity of posts through digital signatures. Consequently, users are able to securely send and receive posts through various relays, which frees them from single-server reliance and enhances post availability (e.g., making it more censorship resistant). The Nostr ecosystem has garnered significant attention, boasting 4 million users and 60 million posts in just 2 years. To understand its characteristics and challenges, we conduct the first large-scale measurement of the Nostr ecosystem, spanning from July 1, 2023, to December 31, 2023. Our study focuses on two key aspects: Nostr relays and post replication strategies. We find that Nostr achieves superior decentralization compared to traditional Fediverse applications. However, relay availability remains a challenge, where financial sustainability (particularly for free-to-use relays) emerges as a contributing factor. We also find that the replication of posts across relays enhances post availability but introduces significant overhead. To address this, we propose two design innovations. One to control the number of post replications, and another to reduce the overhead during post retrieval. Via data-driven evaluations, we demonstrate their effectiveness without negatively impacting the system. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Under Review

arXiv:2402.01697 [pdf, other]

doi 10.1145/3589334.3645642

APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT

Authors: Yiming Zhu, Zhizhuo Yin, Gareth Tyson, Ehsan-Ul Haq, Lik-Hang Lee, Pan Hui

Abstract: Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely… ▽ More Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms. △ Less

Submitted 20 February, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

Comments: Accepted by WWW 2024; Camera-ready version

arXiv:2311.15294 [pdf, other]

A Study of Partisan News Sharing in the Russian invasion of Ukraine

Authors: Yiming Zhu, Ehsan-Ul Haq, Gareth Tyson, Lik-Hang Lee, Yuyang Wang, Pan Hui

Abstract: Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characteri… ▽ More Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characterize the role of such sharing in influencing users' communications. Our analysis covers an eight-month dataset across six Reddit communities related to the Russian invasion. We first perform an analysis of the temporal evolution of partisan news sharing. We confirm that the invasion stimulates discussion in the observed communities, accompanied by an increased volume of partisan news sharing. Next, we characterize users' response to such sharing. We observe that partisan bias plays a role in narrowing its propagation. More biased media is less likely to be spread across multiple subreddits. However, we find that partisan news sharing attracts more users to engage in the discussion, by generating more comments. We then built a predictive model to identify users likely to spread partisan news. The prediction is challenging though, with 61.57% accuracy on average. Our centrality analysis on the commenting network further indicates that the users who disseminate partisan news possess lower network influence in comparison to those who propagate neutral news. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.13442 [pdf, other]

Temporal Network Analysis of Email Communication Patterns in a Long Standing Hierarchy

Authors: Matthew Russell Barnes, Mladen Karan, Stephen McQuistin, Colin Perkins, Gareth Tyson, Matthew Purver, Ignacio Castro, Richard G. Clegg

Abstract: An important concept in organisational behaviour is how hierarchy affects the voice of individuals, whereby members of a given organisation exhibit differing power relations based on their hierarchical position. Although there have been prior studies of the relationship between hierarchy and voice, they tend to focus on more qualitative small-scale methods and do not account for structural aspects… ▽ More An important concept in organisational behaviour is how hierarchy affects the voice of individuals, whereby members of a given organisation exhibit differing power relations based on their hierarchical position. Although there have been prior studies of the relationship between hierarchy and voice, they tend to focus on more qualitative small-scale methods and do not account for structural aspects of the organisation. This paper develops large-scale computational techniques utilising temporal network analysis to measure the effect that organisational hierarchy has on communication patterns within an organisation, focusing on the structure of pairwise interactions between individuals. We focus on one major organisation as a case study - the Internet Engineering Task Force (IETF) - a major technical standards development organisation for the Internet. A particularly useful feature of the IETF is a transparent hierarchy, where participants take on explicit roles (e.g. Area Directors, Working Group Chairs). Its processes are also open, so we have visibility into the communication of people at different hierarchy levels over a long time period. We utilise a temporal network dataset of 989,911 email interactions among 23,741 participants to study how hierarchy impacts communication patterns. We show that the middle levels of the IETF are growing in terms of their dominance in communications. Higher levels consistently experience a higher proportion of incoming communication than lower levels, with higher levels initiating more communications too. We find that communication tends to flow "up" the hierarchy more than "down". Finally, we find that communication with higher-levels is associated with future communication more than for lower-levels, which we interpret as "facilitation". We conclude by discussing the implications this has on patterns within the wider IETF and for other organisations. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.09934 [pdf, other]

Echo Chambers within the Russo-Ukrainian War: The Role of Bipartisan Users

Authors: Peixian Zhang, Ehsan-Ul Haq, Yiming Zhu, Pan Hui, Gareth Tyson

Abstract: The ongoing Russia-Ukraine war has been extensively discussed on social media. One commonly observed problem in such discussions is the emergence of echo chambers, where users are rarely exposed to opinions outside their worldview. Prior literature on this topic has assumed that such users hold a single consistent view. However, recent work has revealed that complex topics (such as the war) often… ▽ More The ongoing Russia-Ukraine war has been extensively discussed on social media. One commonly observed problem in such discussions is the emergence of echo chambers, where users are rarely exposed to opinions outside their worldview. Prior literature on this topic has assumed that such users hold a single consistent view. However, recent work has revealed that complex topics (such as the war) often trigger bipartisanship among certain people. With this in mind, we study the presence of echo chambers on Twitter related to the Russo-Ukrainian war. We measure their presence and identify an important subset of bipartisan users who vary their opinions during the invasion. We explore the role they play in the communications graph and identify features that distinguish them from remaining users. We conclude by discussing their importance and how they can improve the quality of discourse surrounding the war. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2306.13667 [pdf, other]

Ghost Booking as a New Philanthropy Channel: A Case Study on Ukraine-Russia Conflict

Authors: Fachrina Dewi Puspitasari, Gareth Tyson, Ehsan-Ul Haq, Pan Hui, Lik-Hang Lee

Abstract: The term ghost booking has recently emerged as a new way to conduct humanitarian acts during the conflict between Russia and Ukraine in 2022. The phenomenon describes the events where netizens donate to Ukrainian citizens through no-show bookings on the Airbnb platform. Impressively, the social fundraising act that used to be organized on donation-based crowdfunding platforms is shifted into a sha… ▽ More The term ghost booking has recently emerged as a new way to conduct humanitarian acts during the conflict between Russia and Ukraine in 2022. The phenomenon describes the events where netizens donate to Ukrainian citizens through no-show bookings on the Airbnb platform. Impressively, the social fundraising act that used to be organized on donation-based crowdfunding platforms is shifted into a sharing economy platform market and thus gained more visibility. Although the donation purpose is clear, the motivation of donors in selecting a property to book remains concealed. Thus, our study aims to explore peer-to-peer donation behavior on a platform that was originally intended for economic exchanges, and further identifies which platform attributes effectively drive donation behaviors. We collect over 200K guest reviews from 16K Airbnb property listings in Ukraine by employing two collection methods (screen scraping and HTML parsing). Then, we distinguish ghost bookings among guest reviews. Our analysis uncovers the relationship between ghost booking behavior and the platform attributes, and pinpoints several attributes that influence ghost booking. Our findings highlight that donors incline to credible properties explicitly featured with humanitarian needs, i.e., the hosts in penury. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at ACM Hypertext 2023

ACM Class: J.4

arXiv:2306.11390 [pdf, other]

An Analysis of Twitter Discourse on the War Between Russia and Ukraine

Authors: Haris Bin Zia, Ehsan Ul Haq, Ignacio Castro, Pan Hui, Gareth Tyson

Abstract: On the 21st of February 2022, Russia recognised the Donetsk People's Republic and the Luhansk People's Republic, three days before launching an invasion of Ukraine. Since then, an active debate has taken place on social media, mixing organic discussions with coordinated information campaigns. The scale of this discourse, alongside the role that information warfare has played in the invasion, make… ▽ More On the 21st of February 2022, Russia recognised the Donetsk People's Republic and the Luhansk People's Republic, three days before launching an invasion of Ukraine. Since then, an active debate has taken place on social media, mixing organic discussions with coordinated information campaigns. The scale of this discourse, alongside the role that information warfare has played in the invasion, make it vital to better understand this ecosystem. We therefore present a study of pro-Ukrainian vs. pro-Russian discourse through the lens of Twitter. We do so from two perspectives: (i) the content that is shared; and (ii) the users who participate in the sharing. We first explore the scale and nature of conversations, including analysis of hashtags, toxicity and media sharing. We then study the users who drive this, highlighting a significant presence of new users and bots. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2304.10145 [pdf, other]

Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks

Authors: Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, Gareth Tyson

Abstract: The release of ChatGPT has uncovered a range of possibilities whereby large language models (LLMs) can substitute human intelligence. In this paper, we seek to understand whether ChatGPT has the potential to reproduce human-generated label annotations in social computing tasks. Such an achievement could significantly reduce the cost and complexity of social computing research. As such, we use Chat… ▽ More The release of ChatGPT has uncovered a range of possibilities whereby large language models (LLMs) can substitute human intelligence. In this paper, we seek to understand whether ChatGPT has the potential to reproduce human-generated label annotations in social computing tasks. Such an achievement could significantly reduce the cost and complexity of social computing research. As such, we use ChatGPT to relabel five seminal datasets covering stance detection (2x), sentiment analysis, hate speech, and bot detection. Our results highlight that ChatGPT does have the potential to handle these data annotation tasks, although a number of challenges remain. ChatGPT obtains an average accuracy 0.609. Performance is highest for the sentiment analysis dataset, with ChatGPT correctly annotating 64.9% of tweets. Yet, we show that performance varies substantially across individual labels. We believe this work can open up new lines of analysis and act as a basis for future research into the exploitation of ChatGPT for human annotation tasks. △ Less

Submitted 22 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.05232 [pdf, other]

Lady and the Tramp Nextdoor: Online Manifestations of Economic Inequalities in the Nextdoor Social Network

Authors: Waleed Iqbal, Vahid Ghafouri, Gareth Tyson, Guillermo Suarez-Tangil, Ignacio Castro

Abstract: From health to education, income impacts a huge range of life choices. Earlier research has leveraged data from online social networks to study precisely this impact. In this paper, we ask the opposite question: do different levels of income result in different online behaviors? We demonstrate it does. We present the first large-scale study of Nextdoor, a popular location-based social network. We… ▽ More From health to education, income impacts a huge range of life choices. Earlier research has leveraged data from online social networks to study precisely this impact. In this paper, we ask the opposite question: do different levels of income result in different online behaviors? We demonstrate it does. We present the first large-scale study of Nextdoor, a popular location-based social network. We collect 2.6 Million posts from 64,283 neighborhoods in the United States and 3,325 neighborhoods in the United Kingdom, to examine whether online discourse reflects the income and income inequality of a neighborhood. We show that posts from neighborhoods with different incomes indeed differ, e.g. richer neighborhoods have a more positive sentiment and discuss crimes more, even though their actual crime rates are much lower. We then show that user-generated content can predict both income and inequality. We train multiple machine learning models and predict both income (R-squared=0.841) and inequality (R-squared=0.77). △ Less

Submitted 26 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: The Paper is accepted for publication in ICWSM 2023

arXiv:2303.09008 [pdf, other]

doi 10.1145/3543507.3583327

Not Seen, Not Heard in the Digital World! Measuring Privacy Practices in Children's Apps

Authors: Ruoxi Sun, Minhui Xue, Gareth Tyson, Shuo Wang, Seyit Camtepe, Surya Nepal

Abstract: The digital age has brought a world of opportunity to children. Connectivity can be a game-changer for some of the world's most marginalized children. However, while legislatures around the world have enacted regulations to protect children's online privacy, and app stores have instituted various protections, privacy in mobile apps remains a growing concern for parents and wider society. In this p… ▽ More The digital age has brought a world of opportunity to children. Connectivity can be a game-changer for some of the world's most marginalized children. However, while legislatures around the world have enacted regulations to protect children's online privacy, and app stores have instituted various protections, privacy in mobile apps remains a growing concern for parents and wider society. In this paper, we explore the potential privacy issues and threats that exist in these apps. We investigate 20,195 mobile apps from the Google Play store that are designed particularly for children (Family apps) or include children in their target user groups (Normal apps). Using both static and dynamic analysis, we find that 4.47% of Family apps request location permissions, even though collecting location information from children is forbidden by the Play store, and 81.25% of Family apps use trackers (which are not allowed in children's apps). Even major developers with 40+ kids apps on the Play store use ad trackers. Furthermore, we find that most permission request notifications are not well designed for children, and 19.25% apps have inconsistent content age ratings across the different protection authorities. Our findings suggest that, despite significant attention to children's privacy, a large gap between regulatory provisions, app store policies, and actual development practices exist. Our research sheds light for government policymakers, app stores, and developers. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted at the Web Conference 2023

arXiv:2302.14294 [pdf, other]

Flocking to Mastodon: Tracking the Great Twitter Migration

Authors: Haris Bin Zia, Jiahui He, Aravindh Raman, Ignacio Castro, Nishanth Sastry, Gareth Tyson

Abstract: The acquisition of Twitter by Elon Musk has spurred controversy and uncertainty among Twitter users. The move raised as many praises as concerns, particularly regarding Musk's views on free speech. As a result, a large number of Twitter users have looked for alternatives to Twitter. Mastodon, a decentralized micro-blogging social network, has attracted the attention of many users and the general m… ▽ More The acquisition of Twitter by Elon Musk has spurred controversy and uncertainty among Twitter users. The move raised as many praises as concerns, particularly regarding Musk's views on free speech. As a result, a large number of Twitter users have looked for alternatives to Twitter. Mastodon, a decentralized micro-blogging social network, has attracted the attention of many users and the general media. In this paper, we track and analyze the migration of 136,009 users from Twitter to Mastodon. Our analysis sheds light on the user-driven pressure towards centralization in a decentralized ecosystem and identifies the strong influence of the social network in platform migration. We also characterize the activity of migrated users on both Twitter and Mastodon. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.05915 [pdf, other]

doi 10.1145/3543507.3583487

Will Admins Cope? Decentralized Moderation in the Fediverse

Authors: Ishaku Hassan Anaobi, Aravindh Raman, Ignacio Castro, Haris Bin Zia, Dami Ibosiola, Gareth Tyson

Abstract: As an alternative to Twitter and other centralized social networks, the Fediverse is growing in popularity. The recent, and polemical, takeover of Twitter by Elon Musk has exacerbated this trend. The Fediverse includes a growing number of decentralized social networks, such as Pleroma or Mastodon, that share the same subscription protocol (ActivityPub). Each of these decentralized social networks… ▽ More As an alternative to Twitter and other centralized social networks, the Fediverse is growing in popularity. The recent, and polemical, takeover of Twitter by Elon Musk has exacerbated this trend. The Fediverse includes a growing number of decentralized social networks, such as Pleroma or Mastodon, that share the same subscription protocol (ActivityPub). Each of these decentralized social networks is composed of independent instances that are run by different administrators. Users, however, can interact with other users across the Fediverse regardless of the instance they are signed up to. The growing user base of the Fediverse creates key challenges for the administrators, who may experience a growing burden. In this paper, we explore how large that overhead is, and whether there are solutions to alleviate the burden. We study the overhead of moderation on the administrators. We observe a diversity of administrator strategies, with evidence that administrators on larger instances struggle to find sufficient resources. We then propose a tool, WatchGen, to semi-automate the process. △ Less

Submitted 12 February, 2023; originally announced February 2023.

arXiv:2301.06316 [pdf, other]

A Twitter Dataset for Pakistani Political Discourse

Authors: Ehsan-Ul Haq, Haris Bin Zia, Reza Hadi Mogavi, Gareth Tyson, Yang K. Lu, Tristan Braud, Pan Hui

Abstract: We share the largest dataset for the Pakistani Twittersphere consisting of over 49 million tweets, collected during one of the most politically active periods in the country. We collect the data after the deposition of the government by a No Confidence Vote in April 2022. This large-scale dataset can be used for several downstream tasks such as political bias, bots detection, trolling behavior, (d… ▽ More We share the largest dataset for the Pakistani Twittersphere consisting of over 49 million tweets, collected during one of the most politically active periods in the country. We collect the data after the deposition of the government by a No Confidence Vote in April 2022. This large-scale dataset can be used for several downstream tasks such as political bias, bots detection, trolling behavior, (dis)misinformation, and censorship related to Pakistani Twitter users. In addition, this dataset provides a large collection of tweets in Urdu and Roman Urdu that can be used for optimizing language processing tasks. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2211.06013 [pdf, other]

Exploring Mental Health Communications among Instagram Coaches

Authors: Ehsan-Ul Haq, Lik-Hang Lee, Gareth Tyson, Reza Hadi Mogavi, Tristan Braud, Pan Hui

Abstract: There has been a significant expansion in the use of online social networks (OSNs) to support people experiencing mental health issues. This paper studies the role of Instagram influencers who specialize in coaching people with mental health issues. Using a dataset of 97k posts, we characterize such users' linguistic and behavioural features. We explore how these observations impact audience engag… ▽ More There has been a significant expansion in the use of online social networks (OSNs) to support people experiencing mental health issues. This paper studies the role of Instagram influencers who specialize in coaching people with mental health issues. Using a dataset of 97k posts, we characterize such users' linguistic and behavioural features. We explore how these observations impact audience engagement (as measured by likes). We show that the support provided by these accounts varies based on their self-declared professional identities. For instance, Instagram accounts that declare themselves as Authors offer less support than accounts that label themselves as Coach. We show that increasing information support in general communication positively affects user engagement. However, the effect of vocabulary on engagement is not consistent across the Instagram account types. Our findings shed light on this understudied topic and guide how mental health practitioners can improve outreach. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2208.05877 [pdf, other]

doi 10.1145/3544216.3544232

Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web

Authors: Dennis Trautwein, Aravindh Raman, Gareth Tyson, Ignacio Castro, Will Scott, Moritz Schubotz, Bela Gipp, Yiannis Psaras

Abstract: Recent years have witnessed growing consolidation of web operations. For example, the majority of web traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response to this, the "Decentralized Web" attempts to distribute ownership and operation of web services more evenly. This paper describes the design and im… ▽ More Recent years have witnessed growing consolidation of web operations. For example, the majority of web traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response to this, the "Decentralized Web" attempts to distribute ownership and operation of web services more evenly. This paper describes the design and implementation of the largest and most widely used Decentralized Web platform - the InterPlanetary File System (IPFS) - an open-source, content-addressable peer-to-peer network that provides distributed data storage and delivery. IPFS has millions of daily content retrievals and already underpins dozens of third-party applications. This paper evaluates the performance of IPFS by introducing a set of measurement methodologies that allow us to uncover the characteristics of peers in the IPFS network. We reveal presence in more than 2700 Autonomous Systems and 152 countries, the majority of which operate outside large central cloud providers like Amazon or Azure. We further evaluate IPFS performance, showing that both publication and retrieval delays are acceptable for a wide range of use cases. Finally, we share our datasets, experiences and lessons learned. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Comments: 14 pages, 11 figures

ACM Class: C.2.2; C.2.1

Journal ref: SIGCOMM '22, August 22-26, 2022, Amsterdam, Netherlands

arXiv:2206.05107 [pdf, other]

A Reddit Dataset for the Russo-Ukrainian Conflict in 2022

Authors: Yiming Zhu, Ehsan-ul Haq, Lik-Hang Lee, Gareth Tyson, Pan Hui

Abstract: Reddit consists of sub-communities that cover a focused topic. This paper provides a list of relevant subreddits for the ongoing Russo-Ukrainian crisis. We perform an exhaustive subreddit exploration using keyword search and shortlist 12 subreddits as potential candidates that contain nominal discourse related to the crisis. These subreddits contain over 300,000 posts and 8 million comments collec… ▽ More Reddit consists of sub-communities that cover a focused topic. This paper provides a list of relevant subreddits for the ongoing Russo-Ukrainian crisis. We perform an exhaustive subreddit exploration using keyword search and shortlist 12 subreddits as potential candidates that contain nominal discourse related to the crisis. These subreddits contain over 300,000 posts and 8 million comments collectively. We provide an additional categorization of content into two categories, "R-U Conflict", and "Military Related", based on their primary focus. We further perform content characterization of those subreddits. The results show a surge of posts and comments soon after Russia launched the invasion. "Military Related" posts are more likely to receive more replies than "R-U Conflict" posts. Our textual analysis shows an apparent preference for the Pro-Ukraine stance in "R-U Conflict", while "Military Related" retain a neutral stance. △ Less

Submitted 20 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2204.12709 [pdf, other]

Toxicity in the Decentralized Web and the Potential for Model Sharing

Authors: Haris Bin Zia, Aravindh. Raman, Ignacio Castro, Ishaku Hassan Anaobi, Emiliano De Cristofaro, Nishanth Sastry, Gareth Tyson

Abstract: The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is cha… ▽ More The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Journal ref: Published in the Proceedings of the 2022 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'22). Please cite accordingly

arXiv:2203.03077 [pdf, other]

A Study of Third-party Resources Loading on Web

Authors: Muhammad Ikram, Rahat Masood, Gareth Tyson, Mohamed Ali Kaafar, Roya Ensafi

Abstract: This paper performs a large-scale study of dependency chains in the web, to find that around 50% of first-party websites render content that they did not directly load. Although the majority (84.91%) of websites have short dependency chains (below 3 levels), we find websites with dependency chains exceeding 30. Using VirusTotal, we show that 1.2% of these third-parties are classified as suspicious… ▽ More This paper performs a large-scale study of dependency chains in the web, to find that around 50% of first-party websites render content that they did not directly load. Although the majority (84.91%) of websites have short dependency chains (below 3 levels), we find websites with dependency chains exceeding 30. Using VirusTotal, we show that 1.2% of these third-parties are classified as suspicious -- although seemingly small, this limited set of suspicious third-parties have remarkable reach into the wider ecosystem. We find that 73% of websites under-study load resources from suspicious third-parties, and 24.8% of first-party webpages contain at least three third-parties classified as suspicious in their dependency chain. By running sandboxed experiments, we observe a range of activities with the majority of suspicious JavaScript codes downloading malware. △ Less

Submitted 6 March, 2022; originally announced March 2022.

Comments: 3 pages. arXiv admin note: substantial text overlap with arXiv:1901.07699

arXiv:2203.02955 [pdf, other]

Twitter Dataset for 2022 Russo-Ukrainian Crisis

Authors: Ehsan-Ul Haq, Gareth Tyson, Lik-Hang Lee, Tristan Braud, Pan Hui

Abstract: Online Social Networks (OSNs) play a significant role in information sharing during a crisis. The data collected during such a crisis can reflect the large scale public opinions and sentiment. In addition, OSN data can also be used to study different campaigns that are employed by various entities to engineer public opinions. Such information sharing campaigns can range from spreading factual info… ▽ More Online Social Networks (OSNs) play a significant role in information sharing during a crisis. The data collected during such a crisis can reflect the large scale public opinions and sentiment. In addition, OSN data can also be used to study different campaigns that are employed by various entities to engineer public opinions. Such information sharing campaigns can range from spreading factual information to propaganda and misinformation. We provide a Twitter dataset of the 2022 Russo-Ukrainian conflict. In the first release, we share over 1.6 million tweets shared during the 1st week of the crisis. △ Less

Submitted 6 March, 2022; originally announced March 2022.

arXiv:2111.10085 [pdf, other]

Mate! Are You Really Aware? An Explainability-Guided Testing Framework for Robustness of Malware Detectors

Authors: Ruoxi Sun, Minhui Xue, Gareth Tyson, Tian Dong, Shaofeng Li, Shuo Wang, Haojin Zhu, Seyit Camtepe, Surya Nepal

Abstract: Numerous open-source and commercial malware detectors are available. However, their efficacy is threatened by new adversarial attacks, whereby malware attempts to evade detection, e.g., by performing feature-space manipulation. In this work, we propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors when confronted with adversarial attacks. The fra… ▽ More Numerous open-source and commercial malware detectors are available. However, their efficacy is threatened by new adversarial attacks, whereby malware attempts to evade detection, e.g., by performing feature-space manipulation. In this work, we propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors when confronted with adversarial attacks. The framework introduces the concept of Accrued Malicious Magnitude (AMM) to identify which malware features could be manipulated to maximize the likelihood of evading detection. We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware. We find that (i) commercial antivirus engines are vulnerable to AMM-guided test cases; (ii) the ability of a manipulated malware generated using one detector to evade detection by another detector (i.e., transferability) depends on the overlap of features with large AMM values between the different detectors; and (iii) AMM values effectively measure the fragility of features (i.e., capability of feature-space manipulation to flip the prediction results) and explain the robustness of malware detectors facing evasion attacks. Our findings shed light on the limitations of current malware detectors, as well as how they can be improved. △ Less

Submitted 27 November, 2023; v1 submitted 19 November, 2021; originally announced November 2021.

Comments: Accepted at ESEC/FSE 2023. https://doi.org/10.1145/3611643.3616309

arXiv:2110.13500 [pdf, other]

Exploring Content Moderation in the Decentralised Web: The Pleroma Case

Authors: Anaobi Ishaku Hassan, Aravindh Raman, Ignacio Castro, Haris Bin Zia, Emiliano De Cristofaro, Nishanth Sastry, Gareth Tyson

Abstract: Decentralising the Web is a desirable but challenging goal. One particular challenge is achieving decentralised content moderation in the face of various adversaries (e.g. trolls). To overcome this challenge, many Decentralised Web (DW) implementations rely on federation policies. Administrators use these policies to create rules that ban or modify content that matches specific rules. This, howeve… ▽ More Decentralising the Web is a desirable but challenging goal. One particular challenge is achieving decentralised content moderation in the face of various adversaries (e.g. trolls). To overcome this challenge, many Decentralised Web (DW) implementations rely on federation policies. Administrators use these policies to create rules that ban or modify content that matches specific rules. This, however, can have unintended consequences for many users. In this paper, we present the first study of federation policies on the DW, their in-the-wild usage, and their impact on users. We identify how these policies may negatively impact "innocent" users and outline possible solutions to avoid this problem in the future. △ Less

Submitted 30 October, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies (ACM CoNext 2021)

arXiv:2110.07534 [pdf, other]

Understanding the Evolution of Blockchain Ecosystems: A Longitudinal Measurement Study of Bitcoin, Ethereum, and EOSIO

Authors: Ningyu He, Weihang Su, Zhou Yu, Xinyu Liu, Fengyi Zhao, Haoyu Wang, Xiapu Luo, Gareth Tyson, Lei Wu, Yao Guo

Abstract: The continuing expansion of the blockchain ecosystems has attracted much attention from the research community. However, although a large number of research studies have been proposed to understand the diverse characteristics of individual blockchain systems (e.g., Bitcoin or Ethereum), little is known at a comprehensive level on the evolution of blockchain ecosystems at scale, longitudinally, and… ▽ More The continuing expansion of the blockchain ecosystems has attracted much attention from the research community. However, although a large number of research studies have been proposed to understand the diverse characteristics of individual blockchain systems (e.g., Bitcoin or Ethereum), little is known at a comprehensive level on the evolution of blockchain ecosystems at scale, longitudinally, and across multiple blockchains. We argue that understanding the dynamics of blockchain ecosystems could provide unique insights that cannot be achieved through studying a single static snapshot or a single blockchain network alone. Based on billions of transaction records collected from three representative and popular blockchain systems (Bitcoin, Ethereum and EOSIO) over 10 years, we conduct the first study on the evolution of multiple blockchain ecosystems from different perspectives. Our exploration suggests that, although the overall blockchain ecosystem shows promising growth over the last decade, a number of worrying outliers exist that have disrupted its evolution. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2106.05184 [pdf, other]

Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case Study of WhatsApp

Authors: Pushkal Agarwal, Aravindh Raman, Damilola Ibosiola, Gareth Tyson, Nishanth Sastry, Kiran Garimella

Abstract: WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, understanding misbehavior on WhatsApp is an important issue. The sending of unwanted junk messages by unknown contacts via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. We address this gap by studying junk messaging on a mu… ▽ More WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, understanding misbehavior on WhatsApp is an important issue. The sending of unwanted junk messages by unknown contacts via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. We address this gap by studying junk messaging on a multilingual dataset of 2.6M messages sent to 5K public WhatsApp groups in India. We characterise both junk content and senders. We find that nearly 1 in 10 messages is unwanted content sent by junk senders, and a number of unique strategies are employed to reflect challenges faced on WhatsApp, e.g., the need to change phone numbers regularly. We finally experiment with on-device classification to automate the detection of junk, whilst respecting end-to-end encryption. △ Less

Submitted 12 February, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: A PREPRINT OF ACCEPTED PUBLICATION AT The Web Conference (WWW) 2022

arXiv:2102.06528 [pdf, other]

A Tale of Two Countries: A Longitudinal Cross-Country Study of Mobile Users' Reactions to the COVID-19 Pandemic Through the Lens of App Popularity

Authors: Liu Wang, Haoyu Wang, Yi Wang, Gareth Tyson, Fei Lyu

Abstract: The ongoing COVID-19 pandemic has profoundly impacted people's life around the world, including how they interact with mobile technologies. In this paper, we seek to develop an understanding of how the dynamic trajectory of a pandemic shapes mobile phone users' experiences. Through the lens of app popularity, we approach this goal from a cross-country perspective. We compile a dataset consisting o… ▽ More The ongoing COVID-19 pandemic has profoundly impacted people's life around the world, including how they interact with mobile technologies. In this paper, we seek to develop an understanding of how the dynamic trajectory of a pandemic shapes mobile phone users' experiences. Through the lens of app popularity, we approach this goal from a cross-country perspective. We compile a dataset consisting of six-month daily snapshots of the most popular apps in the iOS App Store in China and the US, where the pandemic has exhibited distinct trajectories. Using this longitudinal dataset, our analysis provides detailed patterns of app ranking during the pandemic at both category and individual app levels. We reveal that app categories' rankings are correlated with the pandemic, contingent upon country-specific development trajectories. Our work offers rich insights into how the COVID-19, a typical global public health crisis, has influence people's day-to-day interaction with the Internet and mobile technologies. △ Less

Submitted 30 March, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

arXiv:2011.09145 [pdf, other]

A First Look at COVID-19 Messages on WhatsApp in Pakistan

Authors: R. Tallal Javed, Mirza Elaaf Shuja, Muhammad Usama, Junaid Qadir, Waleed Iqbal, Gareth Tyson, Ignacio Castro, Kiran Garimella

Abstract: The worldwide spread of COVID-19 has prompted extensive online discussions, creating an `infodemic' on social media platforms such as WhatsApp and Twitter. However, the information shared on these platforms is prone to be unreliable and/or misleading. In this paper, we present the first analysis of COVID-19 discourse on public WhatsApp groups from Pakistan. Building on a large scale annotation of… ▽ More The worldwide spread of COVID-19 has prompted extensive online discussions, creating an `infodemic' on social media platforms such as WhatsApp and Twitter. However, the information shared on these platforms is prone to be unreliable and/or misleading. In this paper, we present the first analysis of COVID-19 discourse on public WhatsApp groups from Pakistan. Building on a large scale annotation of thousands of messages containing text and images, we identify the main categories of discussion. We focus on COVID-19 messages and understand the different types of images/text messages being propagated. By exploring user behavior related to COVID messages, we inspect how misinformation is spread. Finally, by quantifying the flow of information across WhatsApp and Twitter, we show how information spreads across platforms and how WhatsApp acts as a source for much of the information shared on Twitter. △ Less

Submitted 19 November, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

arXiv:2011.05757 [pdf, other]

Characterising and Detecting Sponsored Influencer Posts on Instagram

Authors: Koosha Zarei, Damilola Ibosiola, Reza Farahbakhsh, Zafar Gilani, Kiran Garimella, Noel Crespi, Gareth Tyson

Abstract: Recent years have seen a new form of advertisement campaigns emerge: those involving so-called social media influencers. These influencers accept money in return for promoting products via their social media feeds. Although this constitutes a new and interesting form of marketing, it also raises many questions, particularly related to transparency and regulation. For example, it can sometimes be u… ▽ More Recent years have seen a new form of advertisement campaigns emerge: those involving so-called social media influencers. These influencers accept money in return for promoting products via their social media feeds. Although this constitutes a new and interesting form of marketing, it also raises many questions, particularly related to transparency and regulation. For example, it can sometimes be unclear which accounts are officially influencers, or what even constitutes an influencer/advert. This is important in order to establish the integrity of influencers and to ensure compliance with advertisement regulation. We gather a large-scale Instagram dataset covering thousands of accounts advertising products, and create a categorisation based on the number of users they reach. We then provide a detailed analysis of the types of products being advertised by these accounts, their potential reach, and the engagement they receive from their followers. Based on our findings, we train machine learning models to distinguish sponsored content from non-sponsored, and identify cases where people are generating sponsored posts without officially labelling them. Our findings provide a first step towards understanding the under-studied space of online influencers that could be useful for researchers, marketers and policymakers. △ Less

Submitted 11 November, 2020; originally announced November 2020.

arXiv:2011.02673 [pdf, other]

Tracking Counterfeit Cryptocurrency End-to-end

Authors: Bingyu Gao, Haoyu Wang, Pengcheng Xia, Siwei Wu, Yajin Zhou, Xiapu Luo, Graeth Tyson

Abstract: The production of counterfeit money has a long history. It refers to the creation of imitation currency that is produced without the legal sanction of government. With the growth of the cryptocurrency ecosystem, there is expanding evidence that counterfeit cryptocurrency has also appeared. In this paper, we empirically explore the presence of counterfeit cryptocurrencies on Ethereum and measure th… ▽ More The production of counterfeit money has a long history. It refers to the creation of imitation currency that is produced without the legal sanction of government. With the growth of the cryptocurrency ecosystem, there is expanding evidence that counterfeit cryptocurrency has also appeared. In this paper, we empirically explore the presence of counterfeit cryptocurrencies on Ethereum and measure their impact. By analyzing over 190K ERC-20 tokens (or cryptocurrencies) on Ethereum, we have identified 2, 117 counterfeit tokens that target 94 of the 100 most popular cryptocurrencies. We perform an end-to-end characterization of the counterfeit token ecosystem, including their popularity, creators and holders, fraudulent behaviors and advertising channels. Through this, we have identified two types of scams related to counterfeit tokens and devised techniques to identify such scams. We observe that over 7,104 victims were deceived in these scams, and the overall financial loss sums to a minimum of $ 17 million (74,271.7 ETH). Our findings demonstrate the urgency to identify counterfeit cryptocurrencies and mitigate this threat. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Comments: accepted to SIGMETRICS 2021

arXiv:2010.08438 [pdf, other]

Impersonation on Social Media: A Deep Neural Approach to Identify Ingenuine Content

Authors: Koosha Zarei, Reza Farahbakhsh, Noel Crespi, Gareth Tyson

Abstract: Impersonators are playing an important role in the production and propagation of the content on Online Social Networks, notably on Instagram. These entities are nefarious fake accounts that intend to disguise a legitimate account by making similar profiles and then striking social media by fake content, which makes it considerably harder to understand which posts are genuinely produced. In this st… ▽ More Impersonators are playing an important role in the production and propagation of the content on Online Social Networks, notably on Instagram. These entities are nefarious fake accounts that intend to disguise a legitimate account by making similar profiles and then striking social media by fake content, which makes it considerably harder to understand which posts are genuinely produced. In this study, we focus on three important communities with legitimate verified accounts. Among them, we identify a collection of 2.2K impersonator profiles with nearly 10k generated posts, 68K comments, and 90K likes. Then, based on profile characteristics and user behaviours, we cluster them into two collections of `bot' and `fan'. In order to separate the impersonator-generated post from genuine content, we propose a Deep Neural Network architecture that measures `profiles' and `posts' features to predict the content type: `bot-generated', 'fan-generated', or `genuine' content. Our study shed light into this interesting phenomena and provides interesting observation on bot-generated content that can help us to understand the role of impersonators in the production of fake content on Instagram. △ Less

Submitted 16 October, 2020; originally announced October 2020.

arXiv:2006.10933 [pdf, other]

An Empirical Assessment of Global COVID-19 Contact Tracing Applications

Authors: Ruoxi Sun, Wei Wang, Minhui Xue, Gareth Tyson, Seyit Camtepe, Damith C. Ranasinghe

Abstract: The rapid spread of COVID-19 has made manual contact tracing difficult. Thus, various public health authorities have experimented with automatic contact tracing using mobile applications (or "apps"). These apps, however, have raised security and privacy concerns. In this paper, we propose an automated security and privacy assessment tool, COVIDGUARDIAN, which combines identification and analysis o… ▽ More The rapid spread of COVID-19 has made manual contact tracing difficult. Thus, various public health authorities have experimented with automatic contact tracing using mobile applications (or "apps"). These apps, however, have raised security and privacy concerns. In this paper, we propose an automated security and privacy assessment tool, COVIDGUARDIAN, which combines identification and analysis of Personal Identification Information (PII), static program analysis and data flow analysis, to determine security and privacy weaknesses. Furthermore, in light of our findings, we undertake a user study to investigate concerns regarding contact tracing apps. We hope that COVIDGUARDIAN, and the issues raised through responsible disclosure to vendors, can contribute to the safe deployment of mobile contact tracing. As part of this, we offer concrete guidelines, and highlight gaps between user requirements and app performance. △ Less

Submitted 22 January, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

Journal ref: In proceedings of the 43rd International Conference on Software Engineering (ICSE 2021)

arXiv:2005.07655 [pdf, other]

Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity

Authors: Steven R. Wilson, Walid Magdy, Barbara McGillivray, Gareth Tyson

Abstract: As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this researc… ▽ More As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, We explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter. △ Less

Submitted 18 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: Accepted at The Web Science Conference 2020

arXiv:2004.12226 [pdf, ps, other]

A First Instagram Dataset on COVID-19

Authors: Koosha Zarei, Reza Farahbakhsh, Noel Crespi, Gareth Tyson

Abstract: The novel coronavirus (COVID-19) pandemic outbreak is drastically shaping and reshaping many aspects of our life, with a huge impact on our social life. In this era of lockdown policies in most of the major cities around the world, we see a huge increase in people and professional engagement in social media. Social media is playing an important role in news propagation as well as keeping people in… ▽ More The novel coronavirus (COVID-19) pandemic outbreak is drastically shaping and reshaping many aspects of our life, with a huge impact on our social life. In this era of lockdown policies in most of the major cities around the world, we see a huge increase in people and professional engagement in social media. Social media is playing an important role in news propagation as well as keeping people in contact. At the same time, this source is both a blessing and a curse as the coronavirus infodemic has become a major concern, and is already a topic that needs special attention and further research. In this paper, we provide a multilingual coronavirus (COVID-19) Instagram dataset that we have been continuously collected since March 30, 2020. We are making our dataset available to the research community at Github. We believe that this contribution will help the community to better understand the dynamics behind this phenomenon in Instagram, as one of the major social media. This dataset could also help study the propagation of misinformation related to this outbreak. △ Less

Submitted 25 April, 2020; originally announced April 2020.

arXiv:2004.11480 [pdf, other]

Characterising User Content on a Multi-lingual Social Network

Authors: Pushkal Agarwal, Kiran Garimella, Sagar Joglekar, Nishanth Sastry, Gareth Tyson

Abstract: Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies in… ▽ More Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi-cultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: Accepted at ICWSM 2020, please cite the ICWSM version

arXiv:2002.05369 [pdf, other]

Characterizing EOSIO Blockchain

Authors: Yuheng Huang, Haoyu Wang, Lei Wu, Gareth Tyson, Xiapu Luo, Run Zhang, Xuanzhe Liu, Gang Huang, Xuxian Jiang

Abstract: EOSIO has become one of the most popular blockchain platforms since its mainnet launch in June 2018. In contrast to the traditional PoW-based systems (e.g., Bitcoin and Ethereum), which are limited by low throughput, EOSIO is the first high throughput Delegated Proof of Stake system that has been widely adopted by many applications. Although EOSIO has millions of accounts and billions of transacti… ▽ More EOSIO has become one of the most popular blockchain platforms since its mainnet launch in June 2018. In contrast to the traditional PoW-based systems (e.g., Bitcoin and Ethereum), which are limited by low throughput, EOSIO is the first high throughput Delegated Proof of Stake system that has been widely adopted by many applications. Although EOSIO has millions of accounts and billions of transactions, little is known about its ecosystem, especially related to security and fraud. In this paper, we perform a large-scale measurement study of the EOSIO blockchain and its associated DApps. We gather a large-scale dataset of EOSIO and characterize activities including money transfers, account creation and contract invocation. Using our insights, we then develop techniques to automatically detect bots and fraudulent activity. We discover thousands of bot accounts (over 30\% of the accounts in the platform) and a number of real-world attacks (301 attack accounts). By the time of our study, 80 attack accounts we identified have been confirmed by DApp teams, causing 828,824 EOS tokens losses (roughly 2.6 million US\$) in total. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.00115 [pdf, other]

doi 10.1007/978-3-030-44081-7_16

Dissecting the Workload of a Major Adult Video Portal

Authors: Andreas Grammenos, Aravindh Raman, Timm Böttger, Zafar Gilani, Gareth Tyson

Abstract: Adult content constitutes a major source of Internet traffic. As with many other platforms, these sites are incentivized to engage users and maintain them on the site. This engagement (e.g., through recommendations) shapes the journeys taken through such sites. Using data from a large content delivery network, we explore session journeys within an adult website. We take two perspectives. We first… ▽ More Adult content constitutes a major source of Internet traffic. As with many other platforms, these sites are incentivized to engage users and maintain them on the site. This engagement (e.g., through recommendations) shapes the journeys taken through such sites. Using data from a large content delivery network, we explore session journeys within an adult website. We take two perspectives. We first inspect the corpus available on these platforms. Following this, we investigate the session access patterns. We make a number of observations that could be exploited for optimizing delivery, e.g., that users often skip within video streams. △ Less

Submitted 31 January, 2020; originally announced February 2020.

Comments: 14 pages, 7 figures, Accepted at PAM 2020

arXiv:1909.06192 [pdf, ps, other]

An Empirical Study of the Cost of DNS-over-HTTPS

Authors: Timm Boettger, Felix Cuadrado, Gianni Antichi, Eder Leao Fernandes, Gareth Tyson, Ignacio Castro, Steve Uhlig

Abstract: DNS is a vital component for almost every networked application. Originally it was designed as an unencrypted protocol, making user security a concern. DNS-over-HTTPS (DoH) is the latest proposal to make name resolution more secure. In this paper we study the current DNS-over-HTTPS ecosystem, especially the cost of the additional security. We start by surveying the current DoH landscape by assessi… ▽ More DNS is a vital component for almost every networked application. Originally it was designed as an unencrypted protocol, making user security a concern. DNS-over-HTTPS (DoH) is the latest proposal to make name resolution more secure. In this paper we study the current DNS-over-HTTPS ecosystem, especially the cost of the additional security. We start by surveying the current DoH landscape by assessing standard compliance and supported features of public DoH servers. We then compare different transports for secure DNS, to highlight the improvements DoH makes over its predecessor, DNS-over-TLS (DoT). These improvements explain in part the significantly larger take-up of DoH in comparison to DoT. Finally, we quantify the overhead incurred by the additional layers of the DoH transport and their impact on web page load times. We find that these overheads only have limited impact on page load times, suggesting that it is possible to obtain the improved security of DoH with only marginal performance impact. △ Less

Submitted 13 September, 2019; originally announced September 2019.

Journal ref: ACM Internet Measurement Conference 2019 (IMC)

arXiv:1909.05801 [pdf, other]

Challenges in the Decentralised Web: The Mastodon Case

Authors: Aravindh Raman, Sagar Joglekar, Emiliano De Cristofaro, Nishanth Sastry, Gareth Tyson

Abstract: The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, Peer-Tube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, m… ▽ More The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, Peer-Tube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, modern DW platforms mostly rely on two key innovations: first, their open source software allows anybody to setup independent servers ("instances") that people can sign-up to and use within a local community; and second, they build on top of federation protocols so that instances can mesh together, in a peer-to-peer fashion, to offer a globally integrated platform. In this paper, we present a measurement-driven exploration of these two innovations, using a popular DW microblogging platform (Mastodon) as a case study. We focus on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards recentralisation. Finally, our measurements shed light on the behaviour of both administrators (i.e., people setting up instances) and regular users who sign-up to the platforms, also discussing a few techniques that may address some of the issues observed. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Journal ref: Proceedings of 19th ACM Internet Measurement Conference (IMC 2019)

arXiv:1903.06925 [pdf, other]

doi 10.1145/3308558.3313664

Pythia: a Framework for the Automated Analysis of Web Hosting Environments

Authors: Srdjan Matic, Gareth Tyson, Gianluca Stringhini

Abstract: A common approach when setting up a website is to utilize third party Web hosting and content delivery networks. Without taking this trend into account, any measurement study inspecting the deployment and operation of websites can be heavily skewed. Unfortunately, the research community lacks generalizable tools that can be used to identify how and where a given website is hosted. Instead, a numbe… ▽ More A common approach when setting up a website is to utilize third party Web hosting and content delivery networks. Without taking this trend into account, any measurement study inspecting the deployment and operation of websites can be heavily skewed. Unfortunately, the research community lacks generalizable tools that can be used to identify how and where a given website is hosted. Instead, a number of ad hoc techniques have emerged, e.g., using Autonomous System databases, domain prefixes for CNAME records. In this work we propose Pythia, a novel lightweight approach for identifying Web content hosted on third-party infrastructures, including both traditional Web hosts and content delivery networks. Our framework identifies the organization to which a given Web page belongs, and it detects which Web servers are self-hosted and which ones leverage third-party services to provide contents. To test our framework we run it on 40,000 URLs and evaluate its accuracy, both by comparing the results with similar services and with a manually validated groundtruth. Our tool achieves an accuracy of 90% and detects that under 11% of popular domains are self-hosted. We publicly release our tool to allow other researchers to reproduce our findings, and to apply it to their own studies. △ Less

Submitted 13 May, 2019; v1 submitted 16 March, 2019; originally announced March 2019.

arXiv:1903.01517 [pdf, other]

doi 10.1007/s11192-019-03086-z

A Bibliometric Analysis of Publications in Computer Networking Research

Authors: Waleed Iqbal, Junaid Qadir, Gareth Tyson, Adnan Noor Mian, Saeed Ul Hassan, Jon Crowcroft

Abstract: This study uses the article content and metadata of four important computer networking periodicals-IEEE Communications Surveys and Tutorials (COMST), IEEE/ACM Transactions on Networking (TON), ACM Special Interest Group on Data Communications (SIGCOMM), and IEEE International Conference on Computer Communications (INFOCOM)-obtained using ACM, IEEE Xplore, Scopus and CrossRef, for an 18-year period… ▽ More This study uses the article content and metadata of four important computer networking periodicals-IEEE Communications Surveys and Tutorials (COMST), IEEE/ACM Transactions on Networking (TON), ACM Special Interest Group on Data Communications (SIGCOMM), and IEEE International Conference on Computer Communications (INFOCOM)-obtained using ACM, IEEE Xplore, Scopus and CrossRef, for an 18-year period (2000-2017) to address important bibliometrics questions. All of the venues are prestigious, yet they publish quite different research. The first two of these periodicals (COMST and TON) are highly reputed journals of the fields while SIGCOMM and INFOCOM are considered top conferences of the field. SIGCOMM and INFOCOM publish new original research. TON has a similar genre and publishes new original research as well as the extended versions of different research published in the conferences such as SIGCOMM and INFOCOM, while COMST publishes surveys and reviews (which not only summarize previous works but highlight future research opportunities). In this study, we aim to track the co-evolution of trends in the COMST and TON journals and compare them to the publication trends in INFOCOM and SIGCOMM. Our analyses of the computer networking literature include: (a) metadata analysis; (b) content-based analysis; and (c) citation analysis. In addition, we identify the significant trends and the most influential authors, institutes and countries, based on the publication count as well as article citations. Through this study, we are proposing a methodology and framework for performing a comprehensive bibliometric analysis on computer networking research. To the best of our knowledge, no such study has been undertaken in computer networking until now. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 36 Pages, Currently submitted to Springer Scientometrics Journal 2019

MSC Class: 62-07

Journal ref: Scientometrics 2019

arXiv:1902.05796 [pdf, other]

doi 10.1145/3308558.3313438

Who Watches the Watchmen: Exploring Complaints on the Web

Authors: Damilola Ibosiola, Ignacio Castro, Gianluca Stringhini, Steve Uhlig, Gareth Tyson

Abstract: Under increasing scrutiny, many web companies now offer bespoke mechanisms allowing any third party to file complaints (e.g., requesting the de-listing of a URL from a search engine). While this self-regulation might be a valuable web governance tool, it places huge responsibility within the hands of these organisations that demands close examination. We present the first large-scale study of web… ▽ More Under increasing scrutiny, many web companies now offer bespoke mechanisms allowing any third party to file complaints (e.g., requesting the de-listing of a URL from a search engine). While this self-regulation might be a valuable web governance tool, it places huge responsibility within the hands of these organisations that demands close examination. We present the first large-scale study of web complaints (over 1 billion URLs). We find a range of complainants, largely focused on copyright enforcement. Whereas the majority of organisations are occasional users of the complaint system, we find a number of bulk senders specialised in targeting specific types of domain. We identify a series of trends and patterns amongst both the domains and complainants. By inspecting the availability of the domains, we also observe that a sizeable portion go offline shortly after complaints are generated. This paper sheds critical light on how complaints are issued, who they pertain to and which domains go offline after complaints are issued. △ Less

Submitted 29 June, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

Comments: The Web Conference 2019

arXiv:1901.07699 [pdf, other]

The Chain of Implicit Trust: An Analysis of the Web Third-party Resources Loading

Authors: Muhammad Ikram, Rahat Masood, Gareth Tyson, Mohamed Ali Kaafar, Noha Loizon, Roya Ensafi

Abstract: The Web is a tangled mass of interconnected services, where websites import a range of external resources from various third-party domains. However, the latter can further load resources hosted on other domains. For each website, this creates a dependency chain underpinned by a form of implicit trust between the first-party and transitively connected third-parties. The chain can only be loosely co… ▽ More The Web is a tangled mass of interconnected services, where websites import a range of external resources from various third-party domains. However, the latter can further load resources hosted on other domains. For each website, this creates a dependency chain underpinned by a form of implicit trust between the first-party and transitively connected third-parties. The chain can only be loosely controlled as first-party websites often have little, if any, visibility of where these resources are loaded from. This paper performs a large-scale study of dependency chains in the Web, to find that around 50% of first-party websites render content that they did not directly load. Although the majority (84.91%) of websites have short dependency chains (below 3 levels), we find websites with dependency chains exceeding 30. Using VirusTotal, we show that 1.2% of these third-parties are classified as suspicious --- although seemingly small, this limited set of suspicious third-parties have remarkable reach into the wider ecosystem. By running sandboxed experiments, we observe a range of activities with the majority of suspicious JavaScript downloading malware; worryingly, we find this propensity is greater among implicitly trusted JavaScripts. △ Less

Submitted 18 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Comments: 12 pages

arXiv:1812.06156 [pdf, other]

doi 10.1109/SNAMS.2018.8554898

Trollslayer: Crowdsourcing and Characterization of Abusive Birds in Twitter

Authors: Alvaro Garcia-Recuero, Aneta Morawin, Gareth Tyson

Abstract: As of today, abuse is a pressing issue to participants and administrators of Online Social Networks (OSN). Abuse in Twitter can spawn from arguments generated for influencing outcomes of a political election, the use of bots to automatically spread misinformation, and generally speaking, activities that deny, disrupt, degrade or deceive other participants and, or the network. Given the difficulty… ▽ More As of today, abuse is a pressing issue to participants and administrators of Online Social Networks (OSN). Abuse in Twitter can spawn from arguments generated for influencing outcomes of a political election, the use of bots to automatically spread misinformation, and generally speaking, activities that deny, disrupt, degrade or deceive other participants and, or the network. Given the difficulty in finding and accessing a large enough sample of abuse ground truth from the Twitter platform, we built and deployed a custom crawler that we use to judiciously collect a new dataset from the Twitter platform with the aim of characterizing the nature of abusive users, a.k.a abusive birds, in the wild. We provide a comprehensive set of features based on users' attributes, as well as social-graph metadata. The former includes metadata about the account itself, while the latter is computed from the social graph among the sender and the receiver of each message. Attribute-based features are useful to characterize user's accounts in OSN, while graph-based features can reveal the dynamics of information dissemination across the network. In particular, we derive the Jaccard index as a key feature to reveal the benign or malicious nature of directed messages in Twitter. To the best of our knowledge, we are the first to propose such a similarity metric to characterize abuse in Twitter. △ Less

Submitted 14 December, 2018; originally announced December 2018.

Comments: SNAMS 2018

arXiv:1810.10963 [pdf, other]

Shaping the Internet: 10 Years of IXP Growth

Authors: Timm Böttger, Gianni Antichi, Eder L. Fernandes, Roberto di Lallo, Marc Bruyere, Steve Uhlig, Gareth Tyson, Ignacio Castro

Abstract: Over the past decade, IXPs have been playing a key role in enabling interdomain connectivity. Their traffic volumes have grown dramatically and their physical presence has spread throughout the world. While the relevance of IXPs is undeniable, their long-term contribution to the shaping of the current Internet is not fully understood yet. In this paper, we look into the impact on Internet routes… ▽ More Over the past decade, IXPs have been playing a key role in enabling interdomain connectivity. Their traffic volumes have grown dramatically and their physical presence has spread throughout the world. While the relevance of IXPs is undeniable, their long-term contribution to the shaping of the current Internet is not fully understood yet. In this paper, we look into the impact on Internet routes of the intense IXP growth over the last decade. We observe that while in general IXPs only have a small effect in path shortening, very large networks do enjoy a clear IXP-enabled path reduction. We also observe a diversion of the routes, away from the central Tier-1 ASes supported by IXPs. Interestingly, we also find that whereas IXP membership has grown, large and central ASes have steadily moved away from public IXP peerings, whereas smaller ones have embraced them. Despite all this changes, we find though that a clear hierarchy remains, with a small group of highly central networks △ Less

Submitted 8 July, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

Showing 1–50 of 66 results for author: Tyson, G