Search | arXiv e-print repository

Personality Analysis for Social Media Users using Arabic language and its Effect on Sentiment Analysis

Authors: Mokhaiber Dandash, Masoud Asadpour

Abstract: Social media is heading towards more and more personalization, where individuals reveal their beliefs, interests, habits, and activities, simply offering glimpses into their personality traits. This study, explores the correlation between the use of Arabic language on twitter, personality traits and its impact on sentiment analysis. We indicated the personality traits of users based on the informa… ▽ More Social media is heading towards more and more personalization, where individuals reveal their beliefs, interests, habits, and activities, simply offering glimpses into their personality traits. This study, explores the correlation between the use of Arabic language on twitter, personality traits and its impact on sentiment analysis. We indicated the personality traits of users based on the information extracted from their profile activities, and the content of their tweets. Our analysis incorporated linguistic features, profile statistics (including gender, age, bio, etc.), as well as additional features like emoticons. To obtain personality data, we crawled the timelines and profiles of users who took the 16personalities test in Arabic on 16personalities.com. Our dataset, "AraPers", comprised 3,250 users who shared their personality results on twitter. We implemented various machine learning techniques, to reveal personality traits and developed a dedicated model for this purpose, achieving a 74.86% accuracy rate with BERT, analysis of this dataset proved that linguistic features, profile features and derived model can be used to differentiate between different personality traits. Furthermore, our findings demonstrated that personality affect sentiment in social media. This research contributes to the ongoing efforts in developing robust understanding of the relation between human behaviour on social media and personality features for real-world applications, such as political discourse analysis, and public opinion tracking. △ Less

Submitted 22 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2207.02160 [pdf, other]

A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks

Authors: Israa Khalaf Salman Al-Tameemi, Mohammad-Reza Feizi-Derakhshi, Saeed Pashazadeh, Mohammad Asadpour

Abstract: Social media networks have become a significant aspect of people's lives, serving as a platform for their ideas, opinions and emotions. Consequently, automated sentiment analysis (SA) is critical for recognising people's feelings in ways that other information sources cannot. The analysis of these feelings revealed various applications, including brand evaluations, YouTube film reviews and healthc… ▽ More Social media networks have become a significant aspect of people's lives, serving as a platform for their ideas, opinions and emotions. Consequently, automated sentiment analysis (SA) is critical for recognising people's feelings in ways that other information sources cannot. The analysis of these feelings revealed various applications, including brand evaluations, YouTube film reviews and healthcare applications. As social media continues to develop, people post a massive amount of information in different forms, including text, photos, audio and video. Thus, traditional SA algorithms have become limited, as they do not consider the expressiveness of other modalities. By including such characteristics from various material sources, these multimodal data streams provide new opportunities for optimising the expected results beyond text-based SA. Our study focuses on the forefront field of multimodal SA, which examines visual and textual data posted on social media networks. Many people are more likely to utilise this information to express themselves on these platforms. To serve as a resource for academics in this rapidly growing field, we introduce a comprehensive overview of textual and visual SA, including data pre-processing, feature extraction techniques, sentiment benchmark datasets, and the efficacy of multiple classification methodologies suited to each field. We also provide a brief introduction of the most frequently utilised data fusion strategies and a summary of existing research on visual-textual SA. Finally, we highlight the most significant challenges and investigate several important sentiment applications. △ Less

Submitted 6 December, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2107.10326 [pdf]

COfEE: A Comprehensive Ontology for Event Extraction from text

Authors: Ali Balali, Masoud Asadpour, Seyed Hossein Jafari

Abstract: Data is published on the web over time in great volumes, but majority of the data is unstructured, making it hard to understand and difficult to interpret. Information Extraction (IE) methods obtain structured information from unstructured data. One of the challenging IE tasks is Event Extraction (EE) which seeks to derive information about specific incidents and their actors from the text. EE is… ▽ More Data is published on the web over time in great volumes, but majority of the data is unstructured, making it hard to understand and difficult to interpret. Information Extraction (IE) methods obtain structured information from unstructured data. One of the challenging IE tasks is Event Extraction (EE) which seeks to derive information about specific incidents and their actors from the text. EE is useful in many domains such as building a knowledge base, information retrieval and summarization. In the past decades, some event ontologies like ACE, CAMEO and ICEWS were developed to define event forms, actors and dimensions of events observed in the text. These event ontologies still have some shortcomings such as covering only a few topics like political events, having inflexible structure in defining argument roles and insufficient gold-standard data. To address these concerns, we propose an event ontology, namely COfEE, that incorporates both expert domain knowledge and a data-driven approach for identifying events from text. COfEE consists of two hierarchy levels (event types and event sub-types) that include new categories relating to environmental issues, cyberspace and criminal activity which need to be monitored instantly. Also, dynamic roles according to each event sub-type are defined to capture various dimensions of events. In a follow-up experiment, the proposed ontology is evaluated on Wikipedia events, and it is shown to be general and comprehensive. Moreover, in order to facilitate the preparation of gold-standard data for event extraction, a language-independent online tool is presented based on COfEE. A gold-standard dataset annotated by 10 human experts is also prepared consisting 24K news articles in Persian language. Finally, we present a supervised method based on deep learning techniques to automatically extract relevant events and corresponding actors. △ Less

Submitted 10 November, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

arXiv:2008.03951 [pdf, other]

Behavioral Modeling of Persian Instagram Users to detect Bots

Authors: Muhammad Bazm, Masoud Asadpour

Abstract: Bots are user accounts in social media which are controlled by computer programs. Similar to many other things, they are used for both good and evil purposes. One nefarious use-case for them is to spread misinformation or biased data in the networks. There are many pieces of research being performed based on social media data and their results validity is extremely threatened by the harmful data b… ▽ More Bots are user accounts in social media which are controlled by computer programs. Similar to many other things, they are used for both good and evil purposes. One nefarious use-case for them is to spread misinformation or biased data in the networks. There are many pieces of research being performed based on social media data and their results validity is extremely threatened by the harmful data bots spread. Consequently, effective methods and tools are required for detecting bots and then removing misleading data spread by the bots. In the present research, a method for detecting Instagram bots is proposed. There is no data set including samples of Instagram bots and genuine accounts, thus the current research has begun with gathering such a data set with respect to generality concerns such that it includes 1,000 data points in each group. The main approach is supervised machine learning and classic models are preferred compared to deep neural networks. The final model is evaluated using multiple methods starting with 10-fold cross-validation. After that, confidence in classification studies and is followed by feature importance analysis and feature behavior against the target probability computed by the model. In the end, an experiment is designed to measure the models effectiveness in an operational environment. Finally, It is strongly concluded that the model performs very well in all evaluation experiments. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2003.08615 [pdf]

doi 10.1016/j.knosys.2020.106492

Joint Event Extraction along Shortest Dependency Paths using Graph Convolutional Networks

Authors: Ali Balali, Masoud Asadpour, Ricardo Campos, Adam Jatowt

Abstract: Event extraction (EE) is one of the core information extraction tasks, whose purpose is to automatically identify and extract information about incidents and their actors from texts. This may be beneficial to several domains such as knowledge bases, question answering, information retrieval and summarization tasks, to name a few. The problem of extracting event information from texts is longstandi… ▽ More Event extraction (EE) is one of the core information extraction tasks, whose purpose is to automatically identify and extract information about incidents and their actors from texts. This may be beneficial to several domains such as knowledge bases, question answering, information retrieval and summarization tasks, to name a few. The problem of extracting event information from texts is longstanding and usually relies on elaborately designed lexical and syntactic features, which, however, take a large amount of human effort and lack generalization. More recently, deep neural network approaches have been adopted as a means to learn underlying features automatically. However, existing networks do not make full use of syntactic features, which play a fundamental role in capturing very long-range dependencies. Also, most approaches extract each argument of an event separately without considering associations between arguments which ultimately leads to low efficiency, especially in sentences with multiple events. To address the two above-referred problems, we propose a novel joint event extraction framework that aims to extract multiple event triggers and arguments simultaneously by introducing shortest dependency path (SDP) in the dependency graph. We do this by eliminating irrelevant words in the sentence, thus capturing long-range dependencies. Also, an attention-based graph convolutional network is proposed, to carry syntactically related information along the shortest paths between argument candidates that captures and aggregates the latent associations between arguments; a problem that has been overlooked by most of the literature. Our results show a substantial improvement over state-of-the-art methods. △ Less

Submitted 19 March, 2020; originally announced March 2020.

Journal ref: Knowledge-Based Systems, Volume 210, Year 2020, Page 106492

arXiv:1912.03496 [pdf]

Overlapping Communities and the Prediction of Missing Links in Multiplex Networks

Authors: Amir Mahdi Abdolhosseini-Qomi, Naser Yazdani, Masoud Asadpour

Abstract: Multiplex networks are a representation of real-world complex systems as a set of entities (i.e. nodes) connected via different types of connections (i.e. layers). The observed connections in these networks may not be complete and the link prediction task is about locating the missing links across layers. Here, the main challenge is about collecting relevant evidence from different layers to assis… ▽ More Multiplex networks are a representation of real-world complex systems as a set of entities (i.e. nodes) connected via different types of connections (i.e. layers). The observed connections in these networks may not be complete and the link prediction task is about locating the missing links across layers. Here, the main challenge is about collecting relevant evidence from different layers to assist the link prediction task. It is known that co-membership in communities increases the likelihood of connectivity between nodes. We discuss that co-membership in the communities of the similar layers augments the chance of connectivity. The layers are considered similar if they show significant inter-layer community overlap. Moreover, we found that although the presence of link is correlated in layers but the extent of this correlation is not the same across different communities. Our proposed, ML-BNMTF, as a link prediction method in multiplex networks, is devised based on these findings. ML-BNMTF outperforms baseline methods specifically when the global link overlap is low. △ Less

Submitted 15 May, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

arXiv:1908.10053 [pdf]

SimBins: An information-theoretic approach to link prediction in real multiplex networks

Authors: Seyed Hossein Jafari, Amir Mahdi Abdolhosseini-Qomi, Maseud Rahgozar, Masoud Asadpour, Naser Yazdani

Abstract: The entities of real-world networks are connected via different types of connections (i.e. layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exis… ▽ More The entities of real-world networks are connected via different types of connections (i.e. layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method -- SimBins -- is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applied to various datasets from different domains, SimBins proves to be robust and superior than compared methods in majority of experimented cases in terms of accuracy of link prediction. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks. △ Less

Submitted 4 December, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: 22 pages, 3 figures, 2 tables

arXiv:1906.09422 [pdf]

Link Prediction in Real-World Multiplex Networks via Layer Reconstruction Method

Authors: Amir Mahdi Abdolhosseini-Qomi, Seyed Hossein Jafari, Amirheckmat Taghizadeh, Naser Yazdani, Masoud Asadpour, Masoud Rahgozar

Abstract: A large body of research on link prediction problem is devoted to finding missing links in single-layer (simplex) networks. The proposed link prediction methods compute a similarity measure between unconnected node pairs based on the observed structure of the network. However, extension of notion of similarity to multiplex networks is a two-fold challenge. The layers of real-world multiplex networ… ▽ More A large body of research on link prediction problem is devoted to finding missing links in single-layer (simplex) networks. The proposed link prediction methods compute a similarity measure between unconnected node pairs based on the observed structure of the network. However, extension of notion of similarity to multiplex networks is a two-fold challenge. The layers of real-world multiplex networks do not have the same organization yet are not of totally different organizations. So, it should be determined that how similar are the layers of a multiplex network. On the other hand, it is needed to be known that how similar layers can contribute in link prediction task on a target layer with missing links. Eigenvectors are known to well reflect the structural features of networks. Therefore, two layers of a multiplex network are similar w.r.t. structural features if they share similar eigenvectors. Experiments show that layers of real-world multiplex networks are similar w.r.t. structural features and the value of similarity is far beyond their randomized counterparts. Furthermore, it is shown that missing links are highly predictable if their addition or removal do not significantly change the network structural features. Otherwise, if the change is significant a similar copy of structural features may come to help. Based on this concept, Layer Reconstruction Method (LRM) finds the best reconstruction of the observed structure of the target layer with structural features of other similar layers. Experiments on real multiplex networks from different disciplines show that this method benefits from information redundancy in the networks and helps the performance of link prediction to stay robust even under high fraction of missing links. △ Less

Submitted 22 June, 2019; originally announced June 2019.

arXiv:1807.10494 [pdf]

DeepLink: A Novel Link Prediction Framework based on Deep Learning

Authors: Mohammad Mehdi Keikha, Maseud Rahgozar, Masoud Asadpour

Abstract: Recently, link prediction has attracted more attentions from various disciplines such as computer science, bioinformatics and economics. In this problem, unknown links between nodes are discovered based on numerous information such as network topology, profile information and user generated contents. Most of the previous researchers have focused on the structural features of the networks. While th… ▽ More Recently, link prediction has attracted more attentions from various disciplines such as computer science, bioinformatics and economics. In this problem, unknown links between nodes are discovered based on numerous information such as network topology, profile information and user generated contents. Most of the previous researchers have focused on the structural features of the networks. While the recent researches indicate that contextual information can change the network topology. Although, there are number of valuable researches which combine structural and content information, but they face with the scalability issue due to feature engineering. Because, majority of the extracted features are obtained by a supervised or semi supervised algorithm. Moreover, the existing features are not general enough to indicate good performance on different networks with heterogeneous structures. Besides, most of the previous researches are presented for undirected and unweighted networks. In this paper, a novel link prediction framework called "DeepLink" is presented based on deep learning techniques. In contrast to the previous researches which fail to automatically extract best features for the link prediction, deep learning reduces the manual feature engineering. In this framework, both the structural and content information of the nodes are employed. The framework can use different structural feature vectors, which are prepared by various link prediction methods. It considers all proximity orders that are presented in a network during the structural feature learning. We have evaluated the performance of DeepLink on two real social network datasets including Telegram and irBlogs. On both datasets, the proposed framework outperforms several structural and hybrid approaches for link prediction problem. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: 19 pages, 9 figures

MSC Class: 91D30

arXiv:1710.05199 [pdf]

doi 10.1016/j.knosys.2018.02.028

Community Aware Random Walk for Network Embedding

Authors: Mohammad Mehdi Keikha, Maseud Rahgozar, Masoud Asadpour

Abstract: Social network analysis provides meaningful information about behavior of network members that can be used for diverse applications such as classification, link prediction. However, network analysis is computationally expensive because of feature learning for different applications. In recent years, many researches have focused on feature learning methods in social networks. Network embedding repr… ▽ More Social network analysis provides meaningful information about behavior of network members that can be used for diverse applications such as classification, link prediction. However, network analysis is computationally expensive because of feature learning for different applications. In recent years, many researches have focused on feature learning methods in social networks. Network embedding represents the network in a lower dimensional representation space with the same properties which presents a compressed representation of the network. In this paper, we introduce a novel algorithm named "CARE" for network embedding that can be used for different types of networks including weighted, directed and complex. Current methods try to preserve local neighborhood information of nodes, whereas the proposed method utilizes local neighborhood and community information of network nodes to cover both local and global structure of social networks. CARE builds customized paths, which are consisted of local and global structure of network nodes, as a basis for network embedding and uses the Skip-gram model to learn representation vector of nodes. Subsequently, stochastic gradient descent is applied to optimize our objective function and learn the final representation of nodes. Our method can be scalable when new nodes are appended to network without information loss. Parallelize generation of customized random walks is also used for speeding up CARE. We evaluate the performance of CARE on multi label classification and link prediction tasks. Experimental results on various networks indicate that the proposed method outperforms others in both Micro and Macro-f1 measures for different size of training data. △ Less

Submitted 19 February, 2018; v1 submitted 14 October, 2017; originally announced October 2017.

Comments: 17 pages, 3 figures, 4 Tables

MSC Class: 68T30 ACM Class: H.3.4

Journal ref: Knowledge-Based Systems Volume 148, 15 May 2018, Pages 47-54

arXiv:1708.01891 [pdf]

A Behavioral Analysis on the Reselection of Seed Nodes in Independent Cascade Based Influence Maximization

Authors: Ali Vardasbi, Heshaam Faili, Masoud Asadpour

Abstract: Influence maximization serves as the main goal of a variety of social network activities such as viral marketing and campaign advertising. The independent cascade model for the influence spread assumes a one-time chance for each activated node to influence its neighbors. This reasonable assumption cannot be bypassed, since otherwise the influence probabilities of the nodes, modeled by the edge wei… ▽ More Influence maximization serves as the main goal of a variety of social network activities such as viral marketing and campaign advertising. The independent cascade model for the influence spread assumes a one-time chance for each activated node to influence its neighbors. This reasonable assumption cannot be bypassed, since otherwise the influence probabilities of the nodes, modeled by the edge weights, would be altered. On the other hand, the manually activated seed set nodes can be reselected without violating the model parameters or assumptions. The reselection of a seed set node, simply means paying extra budget to a previously paid node in order for it to retry its influential skills on its uninfluenced neighbors. This view divides the influence maximization process into two cases: the simple case where the reselection of the nodes is not considered and the reselection case. In this study we will analyze the behavior of real world networks on the difference between these two influence maximization cases. First we will show that the difference between the simple and the reselection cases constitutes a wide spectrum of networks ranging from the reselection-independent ones, where the reselection case has no noticeable advantage to the simple case, to the reselection-friendly ones, where the influence spread in the reselection case is twice the one in the simple case. Then we will correlate this dynamic to other influence maximization dynamics of the network. Finally, a significant entanglement between this dynamic and the network structure is shown and verified by the experiments. In other words, a series of conditions on the network structure is specified whose fulfilment is a sign for a reselection-friendly network. As a result of this entanglement, reselection-friendly networks can be spotted without performing the time consuming influence maximization algorithms. △ Less

Submitted 6 August, 2017; originally announced August 2017.

arXiv:1504.07361

Graph-based Method for Summarized Storyline Generation in Twitter

Authors: Nazanin Dehghani, Masoud Asadpour

Abstract: Twitter has become a leading source of real-time world-wide information and a great medium for exploring emerging events, breaking news and general topics which most matter to a broad audience. On the other hand, the explosive rate of incoming information in Twitter leads users to experience information overload. Whereas, a significant fraction of tweets are about news events, summarizing the stor… ▽ More Twitter has become a leading source of real-time world-wide information and a great medium for exploring emerging events, breaking news and general topics which most matter to a broad audience. On the other hand, the explosive rate of incoming information in Twitter leads users to experience information overload. Whereas, a significant fraction of tweets are about news events, summarizing the storyline of events can be helpful for users to easily access to the relevant and key information hidden among tweets and thereby draw high level conclusions. Storytelling is the task of providing chronological summaries of significant sub-events development and sketching the relationship between sub-events. In this paper, we propose a novel framework to generate a summarized storyline of news events from social point of view. Utilizing the concepts in graph-theory, we identify sub-events, summarize the evolution of sub-events and generate a coherent storyline of them. Our approach models a storyline as a directed tree of social salient sub-events evolving over time. To overcome the enormous number of redundant tweets, we keep distilled information in super-tweets. Experiments performed on a large scale data set from tweets sent during the Iranian Presidential Election (#IranElection) and the results demonstrate the efficiency and effectiveness of our framework. △ Less

Submitted 1 April, 2017; v1 submitted 28 April, 2015; originally announced April 2015.

Comments: 19 pages, 11 figures This paper has been withdrawn by the author because the method improved through some significant modifications and it will be submitted to another journal

Showing 1–12 of 12 results for author: Asadpour, M