Search | arXiv e-print repository

Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Hassan Sajjad, Tom De Schepper, Markus Schedl

Abstract: Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimodal learning method, which is less susceptible to t… ▽ More Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimodal learning method, which is less susceptible to the impact of missing modalities. It consists of a single-branch network sharing weights across multiple modalities to learn inter-modality representations to maximize performance as well as robustness to missing modalities. Extensive experiments are performed on four challenging datasets including textual-visual (UPMC Food-101, Hateful Memes, Ferramenta) and audio-visual modalities (VoxCeleb1). Our proposed method achieves superior performance when all modalities are present as well as in the case of missing modalities during training or testing compared to the existing state-of-the-art methods. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2407.20910 [pdf, other]

Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

Authors: Pujan Paudel, Mohammad Hammas Saeed, Rebecca Auger, Chris Wells, Gianluca Stringhini

Abstract: Automated soft moderation systems are unable to ascertain if a post supports or refutes a false claim, resulting in a large number of contextual false positives. This limits their effectiveness, for example undermining trust in health experts by adding warnings to their posts or resorting to vague warnings instead of granular fact-checks, which result in desensitizing users. In this paper, we prop… ▽ More Automated soft moderation systems are unable to ascertain if a post supports or refutes a false claim, resulting in a large number of contextual false positives. This limits their effectiveness, for example undermining trust in health experts by adding warnings to their posts or resorting to vague warnings instead of granular fact-checks, which result in desensitizing users. In this paper, we propose to incorporate stance detection into existing automated soft-moderation pipelines, with the goal of ruling out contextual false positives and providing more precise recommendations for social media content that should receive warnings. We develop a textual deviation task called Contrastive Textual Deviation (CTD) and show that it outperforms existing stance detection approaches when applied to soft moderation.We then integrate CTD into the stateof-the-art system for automated soft moderation Lambretta, showing that our approach can reduce contextual false positives from 20% to 2.1%, providing another important building block towards deploying reliable automated soft moderation tools on social media. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.19970 [pdf]

From Flat to Spatial: Comparison of 4 methods constructing 3D, 2 and 1/2D Models from 2D Plans with neural networks

Authors: Jacob Sam, Karan Patel, Mike Saad

Abstract: In the field of architecture, the conversion of single images into 2 and 1/2D and 3D meshes is a promising technology that enhances design visualization and efficiency. This paper evaluates four innovative methods: "One-2-3-45," "CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model," "Instant Mesh," and "Image-to-Mesh." These methods are at the forefront of this technology… ▽ More In the field of architecture, the conversion of single images into 2 and 1/2D and 3D meshes is a promising technology that enhances design visualization and efficiency. This paper evaluates four innovative methods: "One-2-3-45," "CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model," "Instant Mesh," and "Image-to-Mesh." These methods are at the forefront of this technology, focusing on their applicability in architectural design and visualization. They streamline the creation of 3D architectural models, enabling rapid prototyping and detailed visualization from minimal initial inputs, such as photographs or simple sketches.One-2-3-45 leverages a diffusion-based approach to generate multi-view reconstructions, ensuring high geometric fidelity and texture quality. CRM utilizes a convolutional network to integrate geometric priors into its architecture, producing detailed and textured meshes quickly and efficiently. Instant Mesh combines the strengths of multi-view diffusion and sparse-view models to offer speed and scalability, suitable for diverse architectural projects. Image-to-Mesh leverages a generative adversarial network (GAN) to produce 3D meshes from single images, focusing on maintaining high texture fidelity and geometric accuracy by incorporating image and depth map data into its training process. It uses a hybrid approach that combines voxel-based representations with surface reconstruction techniques to ensure detailed and realistic 3D models.This comparative study highlights each method's contribution to reducing design cycle times, improving accuracy, and enabling flexible adaptations to various architectural styles and requirements. By providing architects with powerful tools for rapid visualization and iteration, these advancements in 3D mesh generation are set to revolutionize architectural practices. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.18098 [pdf, other]

Unraveling the Web of Disinformation: Exploring the Larger Context of State-Sponsored Influence Campaigns on Twitter

Authors: Mohammad Hammas Saeed, Shiza Ali, Pujan Paudel, Jeremy Blackburn, Gianluca Stringhini

Abstract: Social media platforms offer unprecedented opportunities for connectivity and exchange of ideas; however, they also serve as fertile grounds for the dissemination of disinformation. Over the years, there has been a rise in state-sponsored campaigns aiming to spread disinformation and sway public opinion on sensitive topics through designated accounts, known as troll accounts. Past works on detecti… ▽ More Social media platforms offer unprecedented opportunities for connectivity and exchange of ideas; however, they also serve as fertile grounds for the dissemination of disinformation. Over the years, there has been a rise in state-sponsored campaigns aiming to spread disinformation and sway public opinion on sensitive topics through designated accounts, known as troll accounts. Past works on detecting accounts belonging to state-backed operations focus on a single campaign. While campaign-specific detection techniques are easier to build, there is no work done on developing systems that are campaign-agnostic and offer generalized detection of troll accounts unaffected by the biases of the specific campaign they belong to. In this paper, we identify several strategies adopted across different state actors and present a system that leverages them to detect accounts from previously unseen campaigns. We study 19 state-sponsored disinformation campaigns that took place on Twitter, originating from various countries. The strategies include sending automated messages through popular scheduling services, retweeting and sharing selective content and using fake versions of verified applications for pushing content. By translating these traits into a feature set, we build a machine learning-based classifier that can correctly identify up to 94% of accounts from unseen campaigns. Additionally, we run our system in the wild and find more accounts that could potentially belong to state-backed operations. We also present case studies to highlight the similarity between the accounts found by our system and those identified by Twitter. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Journal ref: International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2024)

arXiv:2407.16243 [pdf, other]

Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities

Authors: Muhammad Irzam Liaqat, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Saad Saeed, Hassan Sajjad, Tom De Schepper, Karthik Nandakumar, Muhammad Haris Khan Markus Schedl

Abstract: Multimodal learning has demonstrated remarkable performance improvements over unimodal architectures. However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing. This may be attributed to the commonly used multi-branch design containing modality-specific streams making the models reliant on the availability of a complete set of modalities. In… ▽ More Multimodal learning has demonstrated remarkable performance improvements over unimodal architectures. However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing. This may be attributed to the commonly used multi-branch design containing modality-specific streams making the models reliant on the availability of a complete set of modalities. In this work, we propose a robust textual-visual multimodal learning method, Chameleon, that completely deviates from the conventional multi-branch design. To enable this, we present the unification of input modalities into one format by encoding textual modality into visual representations. As a result, our approach does not require modality-specific branches to learn modality-independent multimodal representations making it robust to missing modalities. Extensive experiments are performed on four popular challenging datasets including Hateful Memes, UPMC Food-101, MM-IMDb, and Ferramenta. Chameleon not only achieves superior performance when all modalities are present at train/test time but also demonstrates notable resilience in the case of missing modalities. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.04147 [pdf, other]

ALPINE: An adaptive language-agnostic pruning method for language models for code

Authors: Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, Tushar Sharma

Abstract: Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce th… ▽ More Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce these models' computational overhead. The proposed method offers a pluggable layer that can be integrated with all Transformer-based models. With ALPINE, input sequences undergo adaptive compression throughout the pipeline, reaching a size up to $\times 3$ less their initial size, resulting in significantly reduced computational load. Our experiments on two software engineering tasks, defect prediction and code clone detection across three language models CodeBERT, GraphCodeBERT and UniXCoder show that ALPINE achieves up to a 50% reduction in FLOPs, a 58.1% decrease in memory footprint, and a 28.1% improvement in throughput on average. This led to a reduction in CO2 by up to $44.85$%. Importantly, it achieves the reduction in computation resources while maintaining up to 98.1% of the original predictive performance. These findings highlight the potential of ALPINE in making language models of code more resource-efficient and accessible while preserving their performance, contributing to the overall sustainability of adopting language models in software development. Also, it sheds light on redundant and noisy information in source code analysis corpora, as shown by the substantial sequence compression achieved by ALPINE. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.18776 [pdf, other]

Implicit Discourse Relation Classification For Nigerian Pidgin

Authors: Muhammed Saeed, Peter Bourgonje, Vera Demberg

Abstract: Despite attempts to make Large Language Models multi-lingual, many of the world's languages are still severely under-resourced. This widens the performance gap between NLP and AI applications aimed at well-financed, and those aimed at less-resourced languages. In this paper, we focus on Nigerian Pidgin (NP), which is spoken by nearly 100 million people, but has comparatively very few NLP resources… ▽ More Despite attempts to make Large Language Models multi-lingual, many of the world's languages are still severely under-resourced. This widens the performance gap between NLP and AI applications aimed at well-financed, and those aimed at less-resourced languages. In this paper, we focus on Nigerian Pidgin (NP), which is spoken by nearly 100 million people, but has comparatively very few NLP resources and corpora. We address the task of Implicit Discourse Relation Classification (IDRC) and systematically compare an approach translating NP data to English and then using a well-resourced IDRC tool and back-projecting the labels versus creating a synthetic discourse corpus for NP, in which we translate PDTB and project PDTB labels, and then train an NP IDR classifier. The latter approach of learning a "native" NP classifier outperforms our baseline by 13.27\% and 33.98\% in f$_{1}$ score for 4-way and 11-way classification, respectively. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.09630 [pdf, other]

Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition

Authors: Mehreen Saeed, Adrian Chan, Anupam Mijar, Joseph Moukarzel, Georges Habchi, Carlos Younes, Amin Elias, Chau-Wai Wong, Akram Khater

Abstract: We present the Manuscripts of Handwritten Arabic~(Muharaf) dataset, which is a machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of the art in handwritten… ▽ More We present the Manuscripts of Handwritten Arabic~(Muharaf) dataset, which is a machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of the art in handwritten text recognition (HTR), not only for Arabic manuscripts but also for cursive text in general. The Muharaf dataset includes diverse handwriting styles and a wide range of document types, including personal letters, diaries, notes, poems, church records, and legal correspondences. In this paper, we describe the data acquisition pipeline, notable dataset features, and statistics. We also provide a preliminary baseline result achieved by training convolutional neural networks using this data. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.20987 [pdf, other]

Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging

Authors: Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly

Abstract: Generative Adversarial Networks (GANs) have high computational costs to train their complex architectures. Throughout the training process, GANs' output is analyzed qualitatively based on the loss and synthetic images' diversity and quality. Based on this qualitative analysis, training is manually halted once the desired synthetic images are generated. By utilizing an early stopping criterion, the… ▽ More Generative Adversarial Networks (GANs) have high computational costs to train their complex architectures. Throughout the training process, GANs' output is analyzed qualitatively based on the loss and synthetic images' diversity and quality. Based on this qualitative analysis, training is manually halted once the desired synthetic images are generated. By utilizing an early stopping criterion, the computational cost and dependence on manual oversight can be reduced yet impacted by training problems such as mode collapse, non-convergence, and instability. This is particularly prevalent in biomedical imagery, where training problems degrade the diversity and quality of synthetic images, and the high computational cost associated with training makes complex architectures increasingly inaccessible. This work proposes a novel early stopping criteria to quantitatively detect training problems, halt training, and reduce the computational costs associated with synthesizing biomedical images. Firstly, the range of generator and discriminator loss values is investigated to assess whether mode collapse, non-convergence, and instability occur sequentially, concurrently, or interchangeably throughout the training of GANs. Secondly, utilizing these occurrences in conjunction with the Mean Structural Similarity Index (MS-SSIM) and Fréchet Inception Distance (FID) scores of synthetic images forms the basis of the proposed early stopping criteria. This work helps identify the occurrence of training problems in GANs using low-resource computational cost and reduces training time to generate diversified and high-quality synthetic images. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: This paper is accepted at the 35th IEEE Irish Signals and Systems Conference (ISSC 2024)

arXiv:2404.19238 [pdf, other]

Pilot Contamination in Massive MIMO Systems: Challenges and Future Prospects

Authors: Muhammad Kamran Saeed, Ashfaq Khokhar, Shakil Ahmed

Abstract: Massive multiple input multiple output (M-MIMO) technology plays a pivotal role in fifth-generation (5G) and beyond communication systems, offering a wide range of benefits, from increased spectral efficiency (SE) to enhanced energy efficiency and higher reliability. However, these advantages are contingent upon precise channel state information (CSI) availability at the base station (BS). Ensurin… ▽ More Massive multiple input multiple output (M-MIMO) technology plays a pivotal role in fifth-generation (5G) and beyond communication systems, offering a wide range of benefits, from increased spectral efficiency (SE) to enhanced energy efficiency and higher reliability. However, these advantages are contingent upon precise channel state information (CSI) availability at the base station (BS). Ensuring precise CSI is challenging due to the constrained size of the coherence interval and the resulting limitations on pilot sequence length. Therefore, reusing pilot sequences in adjacent cells introduces pilot contamination, hindering SE enhancement. This paper reviews recent advancements and addresses research challenges in mitigating pilot contamination and improving channel estimation, categorizing the existing research into three broader categories: pilot assignment schemes, advanced signal processing methods, and advanced channel estimation techniques. Salient representative pilot mitigation/assignment techniques are analyzed and compared in each category. Lastly, possible future research directions are discussed. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted At IWCMC 2024 Comm & SP Symposium

arXiv:2404.18264 [pdf, other]

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

Authors: Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg

Abstract: Nigerian Pidgin is an English-derived contact language and is traditionally an oral language, spoken by approximately 100 million people. No orthographic standard has yet been adopted, and thus the few available Pidgin datasets that exist are characterised by noise in the form of orthographic variations. This contributes to under-performance of models in critical NLP tasks. The current work is the… ▽ More Nigerian Pidgin is an English-derived contact language and is traditionally an oral language, spoken by approximately 100 million people. No orthographic standard has yet been adopted, and thus the few available Pidgin datasets that exist are characterised by noise in the form of orthographic variations. This contributes to under-performance of models in critical NLP tasks. The current work is the first to describe various types of orthographic variations commonly found in Nigerian Pidgin texts, and model this orthographic variation. The variations identified in the dataset form the basis of a phonetic-theoretic framework for word editing, which is used to generate orthographic variations to augment training data. We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis. The proposed variation generation framework augments the training data with new orthographic variants which are relevant for the test set but did not occur in the training set originally. Our results demonstrate the positive effect of augmenting the training data with a combination of real texts from other corpora as well as synthesized orthographic variation, resulting in performance improvements of 2.1 points in sentiment analysis and 1.4 BLEU points in translation to English. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted to LREC-COLING 2024 Main Conference

arXiv:2404.10188 [pdf, other]

Smart Pilot Assignment for IoT in Massive MIMO Systems: A Path Towards Scalable IoT Infrastructure

Authors: Muhammad Kamran Saeed, Ashfaq Khokhar

Abstract: 5G sets the foundation for an era of creativity with its faster speeds, increased data throughput, reduced latency, and enhanced IoT connectivity, all enabled by Massive MIMO (M-MIMO) technology. M-MIMO boosts network efficiency and enhances user experience by employing intelligent user scheduling. This paper presents a user scheduling scheme and pilot assignment strategy designed for IoT devices,… ▽ More 5G sets the foundation for an era of creativity with its faster speeds, increased data throughput, reduced latency, and enhanced IoT connectivity, all enabled by Massive MIMO (M-MIMO) technology. M-MIMO boosts network efficiency and enhances user experience by employing intelligent user scheduling. This paper presents a user scheduling scheme and pilot assignment strategy designed for IoT devices, emphasizing mitigating pilot contamination, a key obstacle to improving spectral efficiency (SE) and system scalability in M-MIMO networks. We utilize a user clustering-based pilot allocation scheme to boost IoT device scalability in M-MIMO systems. Additionally, our smart pilot allocation minimizes interference and enhances SE by treating pilot assignment as a graph coloring problem, optimizing it through integer linear programming (ILP). Recognizing the computational complexity of ILP, we introduced a binary search-based heuristic predicated on interference threshold to expedite the computation, while maintaining a near-optimal solution. The simulation results show a significant decrease in the required pilot overhead (about 17%), and substantial enhancement in SE (about 8-14%). △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted At ICC-2024

arXiv:2404.09342 [pdf, other]

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenario. The challenge uses a dataset namely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge. △ Less

Submitted 22 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: ACM Multimedia Conference - Grand Challenge

arXiv:2404.06144 [pdf, other]

Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability

Authors: Fatima Ezzeddine, Mirna Saad, Omran Ayoub, Davide Andreoletti, Martin Gjoreski, Ihab Sbeity, Marc Langheinrich, Silvia Giordano

Abstract: Anomaly detection (AD), also referred to as outlier detection, is a statistical process aimed at identifying observations within a dataset that significantly deviate from the expected pattern of the majority of the data. Such a process finds wide application in various fields, such as finance and healthcare. While the primary objective of AD is to yield high detection accuracy, the requirements of… ▽ More Anomaly detection (AD), also referred to as outlier detection, is a statistical process aimed at identifying observations within a dataset that significantly deviate from the expected pattern of the majority of the data. Such a process finds wide application in various fields, such as finance and healthcare. While the primary objective of AD is to yield high detection accuracy, the requirements of explainability and privacy are also paramount. The first ensures the transparency of the AD process, while the second guarantees that no sensitive information is leaked to untrusted parties. In this work, we exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP). We perform AD with different models and on various datasets, and we thoroughly evaluate the cost of privacy in terms of decreased accuracy and explainability. Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability, which depends on both the dataset and the considered AD model. We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2401.17967 [pdf, other]

CONCORD: Towards a DSL for Configurable Graph Code Representation

Authors: Mootez Saad, Tushar Sharma

Abstract: Deep learning is widely used to uncover hidden patterns in large code corpora. To achieve this, constructing a format that captures the relevant characteristics and features of source code is essential. Graph-based representations have gained attention for their ability to model structural and semantic information. However, existing tools lack flexibility in constructing graphs across different pr… ▽ More Deep learning is widely used to uncover hidden patterns in large code corpora. To achieve this, constructing a format that captures the relevant characteristics and features of source code is essential. Graph-based representations have gained attention for their ability to model structural and semantic information. However, existing tools lack flexibility in constructing graphs across different programming languages, limiting their use. Additionally, the output of these tools often lacks interoperability and results in excessively large graphs, making graph-based neural networks training slower and less scalable. We introduce CONCORD, a domain-specific language to build customizable graph representations. It implements reduction heuristics to reduce graphs' size complexity. We demonstrate its effectiveness in code smell detection as an illustrative use case and show that: first, CONCORD can produce code representations automatically per the specified configuration, and second, our heuristics can achieve comparable performance with significantly reduced size. CONCORD will help researchers a) create and experiment with customizable graph-based code representations for different software engineering tasks involving DL, b) reduce the engineering work to generate graph representations, c) address the issue of scalability in GNN models, and d) enhance the reproducibility of experiments in research through a standardized approach to code representation and analysis. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.09824 [pdf, other]

Conning the Crypto Conman: End-to-End Analysis of Cryptocurrency-based Technical Support Scams

Authors: Bhupendra Acharya, Muhammad Saad, Antonio Emanuele Cinà, Lea Schönherr, Hoang Dai Nguyen, Adam Oest, Phani Vadrevu, Thorsten Holz

Abstract: The mainstream adoption of cryptocurrencies has led to a surge in wallet-related issues reported by ordinary users on social media platforms. In parallel, there is an increase in an emerging fraud trend called cryptocurrency-based technical support scam, in which fraudsters offer fake wallet recovery services and target users experiencing wallet-related issues. In this paper, we perform a compre… ▽ More The mainstream adoption of cryptocurrencies has led to a surge in wallet-related issues reported by ordinary users on social media platforms. In parallel, there is an increase in an emerging fraud trend called cryptocurrency-based technical support scam, in which fraudsters offer fake wallet recovery services and target users experiencing wallet-related issues. In this paper, we perform a comprehensive study of cryptocurrency-based technical support scams. We present an analysis apparatus called HoneyTweet to analyze this kind of scam. Through HoneyTweet, we lure over 9K scammers by posting 25K fake wallet support tweets (so-called honey tweets). We then deploy automated systems to interact with scammers to analyze their modus operandi. In our experiments, we observe that scammers use Twitter as a starting point for the scam, after which they pivot to other communication channels (eg email, Instagram, or Telegram) to complete the fraud activity. We track scammers across those communication channels and bait them into revealing their payment methods. Based on the modes of payment, we uncover two categories of scammers that either request secret key phrase submissions from their victims or direct payments to their digital wallets. Furthermore, we obtain scam confirmation by deploying honey wallet addresses and validating private key theft. We also collaborate with the prominent payment service provider by sharing scammer data collections. The payment service provider feedback was consistent with our findings, thereby supporting our methodology and results. By consolidating our analysis across various vantage points, we provide an end-to-end scam lifecycle analysis and propose recommendations for scam mitigation. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2312.01339 [pdf, other]

doi 10.18653/v1/2023.arabicnlp-1.23

ArabIcros: AI-Powered Arabic Crossword Puzzle Generation for Educational Applications

Authors: Kamyar Zeinalipour, Mohamed Zaky Saad, Marco Maggini, Marco Gori

Abstract: This paper presents the first Arabic crossword puzzle generator driven by advanced AI technology. Leveraging cutting-edge large language models including GPT4, GPT3-Davinci, GPT3-Curie, GPT3-Babbage, GPT3-Ada, and BERT, the system generates distinctive and challenging clues. Based on a dataset comprising over 50,000 clue-answer pairs, the generator employs fine-tuning, few/zero-shot learning strat… ▽ More This paper presents the first Arabic crossword puzzle generator driven by advanced AI technology. Leveraging cutting-edge large language models including GPT4, GPT3-Davinci, GPT3-Curie, GPT3-Babbage, GPT3-Ada, and BERT, the system generates distinctive and challenging clues. Based on a dataset comprising over 50,000 clue-answer pairs, the generator employs fine-tuning, few/zero-shot learning strategies, and rigorous quality-checking protocols to enforce the generation of high-quality clue-answer pairs. Importantly, educational crosswords contribute to enhancing memory, expanding vocabulary, and promoting problem-solving skills, thereby augmenting the learning experience through a fun and engaging approach, reshaping the landscape of traditional learning methods. The overall system can be exploited as a powerful educational tool that amalgamates AI and innovative learning techniques, heralding a transformative era for Arabic crossword puzzles and the intersection of technology and education. △ Less

Submitted 26 January, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted Paper for ArabicNLP 2023 - The First Arabic Natural Language Processing Conference - Co-located with EMNLP 2023 in Singapore

arXiv:2311.15024 [pdf]

A Comparative Study of Watering Hole Attack Detection Using Supervised Neural Network

Authors: Mst. Nishita Aktar, Sornali Akter, Md. Nusaim Islam Saad, Jakir Hosen Jisun, Kh. Mustafizur Rahman, Md. Nazmus Sakib

Abstract: The state of security demands innovative solutions to defend against targeted attacks due to the growing sophistication of cyber threats. This study explores the nefarious tactic known as "watering hole attacks using supervised neural networks to detect and prevent these attacks. The neural network identifies patterns in website behavior and network traffic associated with such attacks. Testing on… ▽ More The state of security demands innovative solutions to defend against targeted attacks due to the growing sophistication of cyber threats. This study explores the nefarious tactic known as "watering hole attacks using supervised neural networks to detect and prevent these attacks. The neural network identifies patterns in website behavior and network traffic associated with such attacks. Testing on a dataset of confirmed attacks shows a 99% detection rate with a mere 0.1% false positive rate, demonstrating the model's effectiveness. In terms of prevention, the model successfully stops 95% of attacks, providing robust user protection. The study also suggests mitigation strategies, including web filtering solutions, user education, and security controls. Overall, this research presents a promising solution for countering watering hole attacks, offering strong detection, prevention, and mitigation strategies. △ Less

Submitted 12 February, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.13508 [pdf, other]

Naturalness of Attention: Revisiting Attention in Code Language Models

Authors: Mootez Saad, Tushar Sharma

Abstract: Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial interpretability insights by focusing solely on attention weights rather than considering the wider context modeling of Transformers. This study aims to shed some ligh… ▽ More Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial interpretability insights by focusing solely on attention weights rather than considering the wider context modeling of Transformers. This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights. We conduct an initial empirical study analyzing both attention distributions and transformed representations in CodeBERT. Across two programming languages, Java and Python, we find that the scaled transformation norms of the input better capture syntactic structure compared to attention weights alone. Our analysis reveals characterization of how CodeBERT embeds syntactic code properties. The findings demonstrate the importance of incorporating factors beyond just attention weights for rigorously understanding neural code models. This lays the groundwork for developing more interpretable models and effective uses of attention mechanisms in program analysis. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted at ICSE-NIER (2024) track

arXiv:2310.11266 [pdf]

Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models

Authors: Khushboo Verma, Marina Moore, Stephanie Wottrich, Karla Robles López, Nishant Aggarwal, Zeel Bhatt, Aagamjit Singh, Bradford Unroe, Salah Basheer, Nitish Sachdeva, Prinka Arora, Harmanjeet Kaur, Tanupreet Kaur, Tevon Hood, Anahi Marquez, Tushar Varshney, Nanfu Deng, Azaan Ramani, Pawanraj Ishwara, Maimoona Saeed, Tatiana López Velarde Peña, Bryan Barksdale, Sushovan Guha, Satwant Kumar

Abstract: In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify… ▽ More In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify evidence strength. For clinical decision-making to be appropriately assessed, an evaluation metric that is clinically aligned and validated is required. As a solution, we present ExpertMedQA, a multispecialty clinical benchmark comprised of open-ended, expert-level clinical questions, and validated by a diverse group of medical professionals. By demanding an in-depth understanding and critical appraisal of up-to-date clinical literature, ExpertMedQA rigorously evaluates LLM performance. BooksMed outperforms existing state-of-the-art models Med-PaLM 2, Almanac, and ChatGPT in a variety of medical scenarios. Therefore, a framework that mimics human cognitive stages could be a useful tool for providing reliable and evidence-based responses to clinical inquiries. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.03278 [pdf, other]

Mitigating Pilot Contamination and Enabling IoT Scalability in Massive MIMO Systems

Authors: Muhammad Kamran Saeed, Ahmed E. Kamal, Ashfaq Khokhar

Abstract: Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orth… ▽ More Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orthogonal pilot sequences, which results in dedicating more space of coherence block to pilot transmission than data transmission. This, in turn, also hinders the scalability of massive MIMO systems, particularly in accommodating a large number of IoT devices within a cell. To overcome these challenges, this paper devises an innovative pilot allocation scheme based on the data transfer patterns of IoT devices. The scheme assigns orthogonal pilot sequences to clusters of devices instead of individual devices, allowing multiple devices to utilize the same pilot for periodically transmitting data. Moreover, we formulate the pilot assignment problem as a graph coloring problem and use the max k-cut graph partitioning approach to overcome the pilot contamination in a multicell massive MIMO system. The proposed scheme significantly improves the spectral efficiency and enables the scalability of massive MIMO systems; for instance, by using ten orthogonal pilot sequences, we are able to accommodate 200 devices with only a 12.5% omission rate. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted At GLOBECOM 2023

arXiv:2310.02240 [pdf, other]

Spherical Rolling Robots Design, Modeling, and Control: A Systematic Literature Review

Authors: Aminata Diouf, Bruno Belzile, Maarouf Saad, David St-Onge

Abstract: Spherical robots have garnered increasing interest for their applications in exploration, tunnel inspection, and extraterrestrial missions. Diverse designs have emerged, including barycentric configurations, pendulum-based mechanisms, etc. In addition, a wide spectrum of control strategies has been proposed, ranging from traditional PID approaches to cutting-edge neural networks. Our systematic re… ▽ More Spherical robots have garnered increasing interest for their applications in exploration, tunnel inspection, and extraterrestrial missions. Diverse designs have emerged, including barycentric configurations, pendulum-based mechanisms, etc. In addition, a wide spectrum of control strategies has been proposed, ranging from traditional PID approaches to cutting-edge neural networks. Our systematic review aims to comprehensively identify and categorize locomotion systems and control schemes employed by spherical robots, spanning the years 1996 to 2023. A meticulous search across five databases yielded a dataset of 3189 records. As a result of our exhaustive analysis, we identified a collection of novel designs and control strategies. Leveraging the insights garnered, we provide valuable recommendations for optimizing the design and control aspects of spherical robots, supporting both novel design endeavors and the advancement of field deployments. Furthermore, we illuminate key research directions that hold the potential to unlock the full capabilities of spherical robots △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.12245 [pdf, other]

Adaptive Input-image Normalization for Solving the Mode Collapse Problem in GAN-based X-ray Images

Authors: Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly

Abstract: Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imag… ▽ More Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imagery. Furthermore, the absence of diverse features in synthetic images can degrade the performance of machine learning classifiers. The mode collapse problem impacts Generative Adversarial Networks' capacity to generate diversified images. Mode collapse comes in two varieties: intra-class and inter-class. In this paper, both varieties of the mode collapse problem are investigated, and their subsequent impact on the diversity of synthetic X-ray images is evaluated. This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Convolutional GAN and Auxiliary Classifier GAN to alleviate the mode collapse problems. Synthetically generated images are utilized for data augmentation and training a Vision Transformer model. The classification performance of the model is evaluated using accuracy, recall, and precision scores. Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images as evidenced by the superior diversity scores and classification scores. △ Less

Submitted 29 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Submitted to the Elsevier Journal

arXiv:2308.05247 [pdf, other]

TUBERAIDER: Attributing Coordinated Hate Attacks on YouTube Videos to their Source Communities

Authors: Mohammad Hammas Saeed, Kostantinos Papadamou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini

Abstract: Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the conte… ▽ More Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the context (and potentially the motivation) of an attack into consideration. We present TUBERAIDER, an attribution system achieving over 75% accuracy in detecting and attributing coordinated hate attacks on YouTube videos. We instantiate it using links to YouTube videos shared on 4chan's /pol/ board, r/The_Donald, and 16 Incels-related subreddits. We use a peak detector to identify a rise in the comment activity of a YouTube video, which signals that an attack may be occurring. We then train a machine learning classifier based on the community language (i.e., TF-IDF scores of relevant keywords) to perform the attribution. We test TUBERAIDER in the wild and present a few case studies of actual aggression attacks identified by it to showcase its effectiveness. △ Less

Submitted 22 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted for publication at the 18th International AAAI Conference on Web and Social Media (ICWSM 2024). Please cite accordingly

arXiv:2308.02505 [pdf, other]

Assessing Intra-class Diversity and Quality of Synthetically Generated Images in a Biomedical and Non-biomedical Setting

Authors: Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly

Abstract: In biomedical image analysis, data imbalance is common across several imaging modalities. Data augmentation is one of the key solutions in addressing this limitation. Generative Adversarial Networks (GANs) are increasingly being relied upon for data augmentation tasks. Biomedical image features are sensitive to evaluating the efficacy of synthetic images. These features can have a significant impa… ▽ More In biomedical image analysis, data imbalance is common across several imaging modalities. Data augmentation is one of the key solutions in addressing this limitation. Generative Adversarial Networks (GANs) are increasingly being relied upon for data augmentation tasks. Biomedical image features are sensitive to evaluating the efficacy of synthetic images. These features can have a significant impact on metric scores when evaluating synthetic images across different biomedical imaging modalities. Synthetically generated images can be evaluated by comparing the diversity and quality of real images. Multi-scale Structural Similarity Index Measure and Cosine Distance are used to evaluate intra-class diversity, while Frechet Inception Distance is used to evaluate the quality of synthetic images. Assessing these metrics for biomedical and non-biomedical imaging is important to investigate an informed strategy in evaluating the diversity and quality of synthetic images. In this work, an empirical assessment of these metrics is conducted for the Deep Convolutional GAN in a biomedical and non-biomedical setting. The diversity and quality of synthetic images are evaluated using different sample sizes. This research intends to investigate the variance in diversity and quality across biomedical and non-biomedical imaging modalities. Results demonstrate that the metrics scores for diversity and quality vary significantly across biomedical-to-biomedical and biomedical-to-non-biomedical imaging modalities. △ Less

Submitted 23 July, 2023; originally announced August 2023.

Comments: This work is accepted in 25th Irish Machine Vision and Image Processing (IMVIP) Conference

arXiv:2307.00382 [pdf, other]

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Authors: Pin-Jie Lin, Muhammed Saeed, Ernie Chang, Merel Scholman

Abstract: Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lin… ▽ More Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: To appear in INTERSPEECH 2023

arXiv:2306.02630 [pdf, other]

Covariance Adaptive Best Arm Identification

Authors: El Mehdi Saad, Gilles Blanchard, Nicolas Verzelen

Abstract: We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input $δ$, the goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $δ$, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the assumption of independent arms distributions, we pro… ▽ More We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input $δ$, the goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $δ$, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the assumption of independent arms distributions, we propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously. This framework allows the learner to estimate the covariance among the arms distributions, enabling a more efficient identification of the best arm. The relaxed setting we propose is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations in the outcomes. We introduce new algorithms that adapt to the unknown covariance of the arms and demonstrate through theoretical guarantees that substantial improvement can be achieved over the standard setting. Additionally, we provide new lower bounds for the relaxed setting and present numerical simulations that support their theoretical findings. △ Less

Submitted 20 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: New version with some minor corrections

Journal ref: Neurips 2023

arXiv:2306.02628 [pdf, other]

Active Ranking of Experts Based on their Performances in Many Tasks

Authors: El Mehdi Saad, Nicolas Verzelen, Alexandra Carpentier

Abstract: We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks. We consider the sequential setting where in each round, the learner has access to noisy evaluations of actively chosen pair of expert-task, given the information available up to the actual round. Given… ▽ More We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks. We consider the sequential setting where in each round, the learner has access to noisy evaluations of actively chosen pair of expert-task, given the information available up to the actual round. Given a confidence parameter $δ$ $\in$ (0, 1), we provide strategies allowing to recover the correct ranking of experts and develop a bound on the total number of queries made by our algorithm that hold with probability at least 1 -- $δ$. We show that our strategy is adaptive to the complexity of the problem (our bounds are instance dependent), and develop matching lower bounds up to a poly-logarithmic factor. Finally, we adapt our strategy to the relaxed problem of best expert identification and provide numerical simulation consistent with our theoretical results. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2304.13253 [pdf, other]

Analyzing In-browser Cryptojacking

Authors: Muhammad Saad, David Mohaisen

Abstract: Cryptojacking is the permissionless use of a target device to covertly mine cryptocurrencies. With cryptojacking, attackers use malicious JavaScript codes to force web browsers into solving proof-of-work puzzles, thus making money by exploiting the resources of the website visitors. To understand and counter such attacks, we systematically analyze the static, dynamic, and economic aspects of in-br… ▽ More Cryptojacking is the permissionless use of a target device to covertly mine cryptocurrencies. With cryptojacking, attackers use malicious JavaScript codes to force web browsers into solving proof-of-work puzzles, thus making money by exploiting the resources of the website visitors. To understand and counter such attacks, we systematically analyze the static, dynamic, and economic aspects of in-browser cryptojacking. For static analysis, we perform content, currency, and code-based categorization of cryptojacking samples to 1) measure their distribution across websites, 2) highlight their platform affinities, and 3) study their code complexities. We apply machine learning techniques to distinguish cryptojacking scripts from benign and malicious JavaScript samples with 100\% accuracy. For dynamic analysis, we analyze the effect of cryptojacking on critical system resources, such as CPU and battery usage. We also perform web browser fingerprinting to analyze the information exchange between the victim node and the dropzone cryptojacking server. We also build an analytical model to empirically evaluate the feasibility of cryptojacking as an alternative to online advertisement. Our results show a sizeable negative profit and loss gap, indicating that the model is economically infeasible. Finally, leveraging insights from our analyses, we build countermeasures for in-browser cryptojacking that improve the existing remedies. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 14 pages, 11 tables, 8 figures, and 69 references. arXiv admin note: substantial text overlap with arXiv:1809.02152

arXiv:2304.00472 [pdf, other]

Querying Large Language Models with SQL

Authors: Mohammed Saeed, Nicola De Cao, Paolo Papotti

Abstract: In many use-cases, information is stored in text but not available in structured data. However, extracting data from natural language text to precisely fit a schema, and thus enable querying, is a challenging task. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. Thus, we env… ▽ More In many use-cases, information is stored in text but not available in structured data. However, extracting data from natural language text to precisely fit a schema, and thus enable querying, is a challenging task. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. Thus, we envision the use of SQL queries to cover a broad range of data that is not captured by traditional databases by tapping the information in LLMs. To ground this vision, we present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM. The main idea is to execute some operators of the the query plan with prompts that retrieve data from the LLM. For a large class of SQL queries, querying LLMs returns well structured relations, with encouraging qualitative results. Preliminary experimental results make pre-trained LLMs a promising addition to the field of database systems, introducing a new direction for hybrid query processing. However, we pinpoint several research challenges that must be addressed to build a DBMS that exploits LLMs. While some of these challenges necessitate integrating concepts from the NLP literature, others offer novel research avenues for the DB community. △ Less

Submitted 25 October, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: Accepted for presentation at EDBT 2024 as Vision paper

arXiv:2303.13055 [pdf, other]

Reimagining Application User Interface (UI) Design using Deep Learning Methods: Challenges and Opportunities

Authors: Subtain Malik, Muhammad Tariq Saeed, Marya Jabeen Zia, Shahzad Rasool, Liaquat Ali Khan, Mian Ilyas Ahmed

Abstract: In this paper, we present a review of the recent work in deep learning methods for user interface design. The survey encompasses well known deep learning techniques (deep neural networks, convolutional neural networks, recurrent neural networks, autoencoders, and generative adversarial networks) and datasets widely used to design user interface applications. We highlight important problems and eme… ▽ More In this paper, we present a review of the recent work in deep learning methods for user interface design. The survey encompasses well known deep learning techniques (deep neural networks, convolutional neural networks, recurrent neural networks, autoencoders, and generative adversarial networks) and datasets widely used to design user interface applications. We highlight important problems and emerging research frontiers in this field. We believe that the use of deep learning for user interface design automation tasks could be one of the high potential fields for the advancement of the software development industry. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: A review paper on studies of UI design techniques and deep learning

arXiv:2303.08729 [pdf, other]

DACOS-A Manually Annotated Dataset of Code Smells

Authors: Himesh Nandani, Mootez Saad, Tushar Sharma

Abstract: Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually anno… ▽ More Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually annotated dataset containing 10,267 annotations for 5,192 code snippets. The dataset targets three kinds of code smells at different granularity: multifaceted abstraction, complex method, and long parameter list. The dataset is created in two phases. The first phase helps us identify the code snippets that are potentially subjective by determining the thresholds of metrics used to detect a smell. The second phase collects annotations for potentially subjective snippets. We also offer an extended dataset DACOSX that includes definitely benign and definitely smelly snippets by using the thresholds identified in the first phase. We have developed TagMan, a web application to help annotators view and mark the snippets one-by-one and record the provided annotations. We make the datasets and the web application accessible publicly. This dataset will help researchers working on smell detection techniques to build relevant and context-aware machine-learning models. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 4 pages

arXiv:2303.06129 [pdf, other]

Single-branch Network for Multimodal Training

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Muhammad Zaigham Zaheer, Karthik Nandakumar, Muhammad Haroon Yousaf, Arif Mahmood

Abstract: With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each mod… ▽ More With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed single-branch network on the challenging multimodal problem (face-voice association) for cross-modal verification and matching tasks with various loss formulations. Experimental results demonstrate the superiority of our proposed single-branch network over the existing methods in a wide range of experiments. Code: https://github.com/msaadsaeed/SBNet △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted at ICASSP 2023

arXiv:2302.13033 [pdf, other]

Speaker Recognition in Realistic Scenario Using Multimodal Data

Authors: Saqlain Hussain Shah, Muhammad Saad Saeed, Shah Nawaz, Muhammad Haroon Yousaf

Abstract: In recent years, an association is established between faces and voices of celebrities leveraging large scale audio-visual information from YouTube. The availability of large scale audio-visual datasets is instrumental in developing speaker recognition methods based on standard Convolutional Neural Networks. Thus, the aim of this paper is to leverage large scale audio-visual information to improve… ▽ More In recent years, an association is established between faces and voices of celebrities leveraging large scale audio-visual information from YouTube. The availability of large scale audio-visual datasets is instrumental in developing speaker recognition methods based on standard Convolutional Neural Networks. Thus, the aim of this paper is to leverage large scale audio-visual information to improve speaker recognition task. To achieve this task, we proposed a two-branch network to learn joint representations of faces and voices in a multimodal system. Afterwards, features are extracted from the two-branch network to train a classifier for speaker recognition. We evaluated our proposed framework on a large scale audio-visual dataset named VoxCeleb$1$. Our results show that addition of facial information improved the performance of speaker recognition. Moreover, our results indicate that there is an overlap between face and voice. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: Accepted at the International Conference on Artificial Intelligence (ICAI'2023)

arXiv:2211.12009 [pdf]

Deep-Learning-Based Computer Vision Approach For The Segmentation Of Ball Deliveries And Tracking In Cricket

Authors: Kumail Abbas, Muhammad Saeed, M. Imad Khan, Khandakar Ahmed, Hua Wang

Abstract: There has been a significant increase in the adoption of technology in cricket recently. This trend has created the problem of duplicate work being done in similar computer vision-based research works. Our research tries to solve one of these problems by segmenting ball deliveries in a cricket broadcast using deep learning models, MobileNet and YOLO, thus enabling researchers to use our work as a… ▽ More There has been a significant increase in the adoption of technology in cricket recently. This trend has created the problem of duplicate work being done in similar computer vision-based research works. Our research tries to solve one of these problems by segmenting ball deliveries in a cricket broadcast using deep learning models, MobileNet and YOLO, thus enabling researchers to use our work as a dataset for their research. The output from our research can be used by cricket coaches and players to analyze ball deliveries which are played during the match. This paper presents an approach to segment and extract video shots in which only the ball is being delivered. The video shots are a series of continuous frames that make up the whole scene of the video. Object detection models are applied to reach a high level of accuracy in terms of correctly extracting video shots. The proof of concept for building large datasets of video shots for ball deliveries is proposed which paves the way for further processing on those shots for the extraction of semantics. Ball tracking in these video shots is also done using a separate RetinaNet model as a sample of the usefulness of the proposed dataset. The position on the cricket pitch where the ball lands is also extracted by tracking the ball along the y-axis. The video shot is then classified as a full-pitched, good-length or short-pitched delivery. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2210.06334 [pdf, other]

A Self-attention Guided Multi-scale Gradient GAN for Diversified X-ray Image Synthesis

Authors: Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly

Abstract: Imbalanced image datasets are commonly available in the domain of biomedical image analysis. Biomedical images contain diversified features that are significant in predicting targeted diseases. Generative Adversarial Networks (GANs) are utilized to address the data limitation problem via the generation of synthetic images. Training challenges such as mode collapse, non-convergence, and instability… ▽ More Imbalanced image datasets are commonly available in the domain of biomedical image analysis. Biomedical images contain diversified features that are significant in predicting targeted diseases. Generative Adversarial Networks (GANs) are utilized to address the data limitation problem via the generation of synthetic images. Training challenges such as mode collapse, non-convergence, and instability degrade a GAN's performance in synthesizing diversified and high-quality images. In this work, MSG-SAGAN, an attention-guided multi-scale gradient GAN architecture is proposed to model the relationship between long-range dependencies of biomedical image features and improves the training performance using a flow of multi-scale gradients at multiple resolutions in the layers of generator and discriminator models. The intent is to reduce the impact of mode collapse and stabilize the training of GAN using an attention mechanism with multi-scale gradient learning for diversified X-ray image synthesis. Multi-scale Structural Similarity Index Measure (MS-SSIM) and Frechet Inception Distance (FID) are used to identify the occurrence of mode collapse and evaluate the diversity of synthetic images generated. The proposed architecture is compared with the multi-scale gradient GAN (MSG-GAN) to assess the diversity of generated synthetic images. Results indicate that the MSG-SAGAN outperforms MSG-GAN in synthesizing diversified images as evidenced by the MS-SSIM and FID scores. △ Less

Submitted 12 November, 2022; v1 submitted 9 October, 2022; originally announced October 2022.

Comments: Accepted in AICS-2022 Conference

arXiv:2208.10238 [pdf, other]

Learning Branched Fusion and Orthogonal Projection for Face-Voice Association

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, Sajid Javed, Muhammad Haroon Yousaf, Alessio Del Bue

Abstract: Recent years have seen an increased interest in establishing association between faces and voices of celebrities leveraging audio-visual information from YouTube. Prior works adopt metric learning methods to learn an embedding space that is amenable for associated matching and verification tasks. Albeit showing some progress, such formulations are, however, restrictive due to dependency on distanc… ▽ More Recent years have seen an increased interest in establishing association between faces and voices of celebrities leveraging audio-visual information from YouTube. Prior works adopt metric learning methods to learn an embedding space that is amenable for associated matching and verification tasks. Albeit showing some progress, such formulations are, however, restrictive due to dependency on distance-dependent margin parameter, poor run-time training complexity, and reliance on carefully crafted negative mining procedures. In this work, we hypothesize that an enriched representation coupled with an effective yet efficient supervision is important towards realizing a discriminative joint embedding space for face-voice association tasks. To this end, we propose a light-weight, plug-and-play mechanism that exploits the complementary cues in both modalities to form enriched fused embeddings and clusters them based on their identity labels via orthogonality constraints. We coin our proposed mechanism as fusion and orthogonal projection (FOP) and instantiate in a two-stream network. The overall resulting framework is evaluated on VoxCeleb1 and MAV-Celeb datasets with a multitude of tasks, including cross-modal verification and matching. Results reveal that our method performs favourably against the current state-of-the-art methods and our proposed formulation of supervision is more effective and efficient than the ones employed by the contemporary methods. In addition, we leverage cross-modal verification and matching tasks to analyze the impact of multiple languages on face-voice association. Code is available: \url{https://github.com/msaadsaeed/FOP} △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: Submitted: IEEE Transactions on Multimedia. arXiv admin note: substantial text overlap with arXiv:2112.10483

arXiv:2208.09214 [pdf, other]

doi 10.1145/3511808.3557279

Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?

Authors: Mohammed Saeed, Nicolas Traub, Maelle Nicolas, Gianluca Demartini, Paolo Papotti

Abstract: Fact-checking is one of the effective solutions in fighting online misinformation. However, traditional fact-checking is a process requiring scarce expert human resources, and thus does not scale well on social media because of the continuous flow of new content to be checked. Methods based on crowdsourcing have been proposed to tackle this challenge, as they can scale with a smaller cost, but, wh… ▽ More Fact-checking is one of the effective solutions in fighting online misinformation. However, traditional fact-checking is a process requiring scarce expert human resources, and thus does not scale well on social media because of the continuous flow of new content to be checked. Methods based on crowdsourcing have been proposed to tackle this challenge, as they can scale with a smaller cost, but, while they have shown to be feasible, have always been studied in controlled environments. In this work, we study the first large-scale effort of crowdsourced fact-checking deployed in practice, started by Twitter with the Birdwatch program. Our analysis shows that crowdsourcing may be an effective fact-checking strategy in some settings, even comparable to results obtained by human experts, but does not lead to consistent, actionable results in others. We processed 11.9k tweets verified by the Birdwatch program and report empirical evidence of i) differences in how the crowd and experts select content to be fact-checked, ii) how the crowd and the experts retrieve different resources to fact-check, and iii) the edge the crowd shows in fact-checking scalability and efficiency as compared to expert checkers. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Journal ref: Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM 2022)

arXiv:2208.08224 [pdf, other]

doi 10.3390/s22166088

Blind-Spot Collision Detection System for Commercial Vehicles Using Multi Deep CNN Architecture

Authors: Muhammad Muzammel, Mohd Zuki Yusoff, Mohamad Naufal Mohamad Saad, Faryal Sheikh, Muhammad Ahsan Awais

Abstract: Buses and heavy vehicles have more blind spots compared to cars and other road vehicles due to their large sizes. Therefore, accidents caused by these heavy vehicles are more fatal and result in severe injuries to other road users. These possible blind-spot collisions can be identified early using vision-based object detection approaches. Yet, the existing state-of-the-art vision-based object dete… ▽ More Buses and heavy vehicles have more blind spots compared to cars and other road vehicles due to their large sizes. Therefore, accidents caused by these heavy vehicles are more fatal and result in severe injuries to other road users. These possible blind-spot collisions can be identified early using vision-based object detection approaches. Yet, the existing state-of-the-art vision-based object detection models rely heavily on a single feature descriptor for making decisions. In this research, the design of two convolutional neural networks (CNNs) based on high-level feature descriptors and their integration with faster R-CNN is proposed to detect blind-spot collisions for heavy vehicles. Moreover, a fusion approach is proposed to integrate two pre-trained networks (i.e., Resnet 50 and Resnet 101) for extracting high level features for blind-spot vehicle detection. The fusion of features significantly improves the performance of faster R-CNN and outperformed the existing state-of-the-art methods. Both approaches are validated on a self-recorded blind-spot vehicle detection dataset for buses and an online LISA dataset for vehicle detection. For both proposed approaches, a false detection rate (FDR) of 3.05% and 3.49% are obtained for the self recorded dataset, making these approaches suitable for real time applications. △ Less

Submitted 19 August, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.05593 [pdf, other]

Evaluating the Quality and Diversity of DCGAN-based Generatively Synthesized Diabetic Retinopathy Imagery

Authors: Cristina-Madalina Dragan, Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly

Abstract: Publicly available diabetic retinopathy (DR) datasets are imbalanced, containing limited numbers of images with DR. This imbalance contributes to overfitting when training machine learning classifiers. The impact of this imbalance is exacerbated as the severity of the DR stage increases, affecting the classifiers' diagnostic capacity. The imbalance can be addressed using Generative Adversarial Net… ▽ More Publicly available diabetic retinopathy (DR) datasets are imbalanced, containing limited numbers of images with DR. This imbalance contributes to overfitting when training machine learning classifiers. The impact of this imbalance is exacerbated as the severity of the DR stage increases, affecting the classifiers' diagnostic capacity. The imbalance can be addressed using Generative Adversarial Networks (GANs) to augment the datasets with synthetic images. Generating synthetic images is advantageous if high-quality and diversified images are produced. To evaluate the quality and diversity of synthetic images, several evaluation metrics, such as Multi-Scale Structural Similarity Index (MS-SSIM), Cosine Distance (CD), and Fréchet Inception Distance (FID) are used. Understanding the effectiveness of each metric in evaluating the quality and diversity of GAN-based synthetic images is critical to select images for augmentation. To date, there has been limited analysis of the appropriateness of these metrics in the context of biomedical imagery. This work contributes an empirical assessment of these evaluation metrics as applied to synthetic Proliferative DR imagery generated by a Deep Convolutional GAN (DCGAN). Furthermore, the metrics' capacity to indicate the quality and diversity of synthetic images and a correlation with classifier performance is undertaken. This enables a quantitative selection of synthetic imagery and an informed augmentation strategy. Results indicate that FID is suitable for evaluating the quality, while MS-SSIM and CD are suitable for evaluating the diversity of synthetic imagery. Furthermore, the superior performance of Convolutional Neural Network (CNN) and EfficientNet classifiers, as indicated by the F1 and AUC scores, for the augmented datasets demonstrates the efficacy of synthetic imagery to augment the imbalanced dataset. △ Less

Submitted 30 August, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: 29 Pages, 8 Figures, submitted to MEDAL23: Advances in Deep Generative Models for Medical Artificial Intelligence (Springer Nature series)

arXiv:2208.04705 [pdf, other]

Classification of Stress via Ambulatory ECG and GSR Data

Authors: Zachary Dair, Muhammad Muneeb Saad, Urja Pawar, Samantha Dockray, Ruairi O'Reilly

Abstract: In healthcare, detecting stress and enabling individuals to monitor their mental health and wellbeing is challenging. Advancements in wearable technology now enable continuous physiological data collection. This data can provide insights into mental health and behavioural states through psychophysiological analysis. However, automated analysis is required to provide timely results due to the quant… ▽ More In healthcare, detecting stress and enabling individuals to monitor their mental health and wellbeing is challenging. Advancements in wearable technology now enable continuous physiological data collection. This data can provide insights into mental health and behavioural states through psychophysiological analysis. However, automated analysis is required to provide timely results due to the quantity of data collected. Machine learning has shown efficacy in providing an automated classification of physiological data for health applications in controlled laboratory environments. Ambulatory uncontrolled environments, however, provide additional challenges requiring further modelling to overcome. This work empirically assesses several approaches utilising machine learning classifiers to detect stress using physiological data recorded in an ambulatory setting with self-reported stress annotations. A subset of the training portion SMILE dataset enables the evaluation of approaches before submission. The optimal stress detection approach achieves 90.77% classification accuracy, 91.24 F1-Score, 90.42 Sensitivity and 91.08 Specificity, utilising an ExtraTrees classifier and feature imputation methods. Meanwhile, accuracy on the challenge data is much lower at 59.23% (submission #54 from BEaTS-MTU, username ZacDair). The cause of the performance disparity is explored in this work. △ Less

Submitted 8 June, 2023; v1 submitted 19 July, 2022; originally announced August 2022.

Comments: Associated Code to enable reproducible experimental work - https://github.com/ZacDair/EMBC_Release SMILE dataset provided by Computational Wellbeing Group (COMPWELL) https://compwell.rice.edu/workshops/embc2022/dataset - https://compwell.rice.edu/

ACM Class: I.2.m; J.3; J.4

Journal ref: EMBC 2022 Compwell Workshop

arXiv:2204.09660 [pdf]

doi 10.1016/j.dib.2022.108089

Medical Dataset Classification for Kurdish Short Text over Social Media

Authors: Ari M. Saeed, Shnya R. Hussein, Chro M. Ali, Tarik A. Rashid

Abstract: The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noi… ▽ More The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%. △ Less

Submitted 26 March, 2022; originally announced April 2022.

Comments: 11 pages

Journal ref: DIB, 2020

arXiv:2202.02489 [pdf, other]

Investigating the Challenges of Class Imbalance and Scale Variation in Object Detection in Aerial Images

Authors: Ahmed Elhagry, Mohamed Saeed

Abstract: While object detection is a common problem in computer vision, it is even more challenging when dealing with aerial satellite images. The variety in object scales and orientations can make them difficult to identify. In addition, there can be large amounts of densely packed small objects such as cars. In this project, we propose a few changes to the Faster-RCNN architecture. First, we experiment w… ▽ More While object detection is a common problem in computer vision, it is even more challenging when dealing with aerial satellite images. The variety in object scales and orientations can make them difficult to identify. In addition, there can be large amounts of densely packed small objects such as cars. In this project, we propose a few changes to the Faster-RCNN architecture. First, we experiment with different backbones to extract better features. We also modify the data augmentations and generated anchor sizes for region proposals in order to better handle small objects. Finally, we investigate the effects of different loss functions. Our proposed design achieves an improvement of 4.7 mAP over the baseline which used a vanilla Faster R-CNN with a ResNet-101 FPN backbone. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2201.12946 [pdf, ps, other]

Pauli Error Propagation-Based Gate Reschedulingfor Quantum Circuit Error Mitigation

Authors: Vedika Saravanan, Samah Mohamed Saeed

Abstract: Noisy Intermediate-Scale Quantum (NISQ) algorithms, which run on noisy quantum computers should be carefully designed to boost the output state fidelity. While several compilation approaches have been proposed to minimize circuit errors, they often omit the detailed circuit structure information that does not affect the circuit depth or the gate count. In the presence of spatial variation in the e… ▽ More Noisy Intermediate-Scale Quantum (NISQ) algorithms, which run on noisy quantum computers should be carefully designed to boost the output state fidelity. While several compilation approaches have been proposed to minimize circuit errors, they often omit the detailed circuit structure information that does not affect the circuit depth or the gate count. In the presence of spatial variation in the error rate of the quantum gates, adjusting the circuit structure can play a major role in mitigating errors. In this paper, we exploit the freedom of gate reordering based on the commutation rules to show the impact of gate error propagation paths on the output state fidelity of the quantum circuit, propose advanced predictive techniques to project the success rate of the circuit, and develop a new compilation phase post-quantum circuit mapping to improve its reliability. Our proposed approaches have been validated using a variety of quantum circuits with different success metrics, which are executed on IBM quantum computers. Our results show that rescheduling quantum gates based on their error propagation paths can significantly improve the fidelity of the quantum circuit in the presence of variable gate error rates. △ Less

Submitted 30 January, 2022; originally announced January 2022.

arXiv:2201.10324 [pdf, other]

Addressing the Intra-class Mode Collapse Problem using Adaptive Input Image Normalization in GAN-based X-ray Images

Authors: Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly

Abstract: Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imag… ▽ More Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imagery. Furthermore, the absence of diverse features in synthetic images can degrade the performance of machine learning classifiers. The mode collapse problem can impact a Generative Adversarial Network's capacity to generate diversified images. Mode collapse comes in two varieties: intra-class and inter-class. In this paper, the intra-class mode collapse problem is investigated, and its subsequent impact on the diversity of synthetic X-ray images is evaluated. This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization for the Deep Convolutional GAN to alleviate the intra-class mode collapse problem. Results demonstrate that the DCGAN with adaptive input-image normalization outperforms DCGAN with un-normalized X-ray images as evident by the superior diversity scores. △ Less

Submitted 12 April, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: Accepted to the IEEE EMBC22 Conference

arXiv:2201.07646 [pdf, other]

A Survey on Training Challenges in Generative Adversarial Networks for Biomedical Image Analysis

Authors: Muhammad Muneeb Saad, Ruairi O'Reilly, Mubashir Husain Rehmani

Abstract: In biomedical image analysis, the applicability of deep learning methods is directly impacted by the quantity of image data available. This is due to deep learning models requiring large image datasets to provide high-level performance. Generative Adversarial Networks (GANs) have been widely utilized to address data limitations through the generation of synthetic biomedical images. GANs consist of… ▽ More In biomedical image analysis, the applicability of deep learning methods is directly impacted by the quantity of image data available. This is due to deep learning models requiring large image datasets to provide high-level performance. Generative Adversarial Networks (GANs) have been widely utilized to address data limitations through the generation of synthetic biomedical images. GANs consist of two models. The generator, a model that learns how to produce synthetic images based on the feedback it receives. The discriminator, a model that classifies an image as synthetic or real and provides feedback to the generator. Throughout the training process, a GAN can experience several technical challenges that impede the generation of suitable synthetic imagery. First, the mode collapse problem whereby the generator either produces an identical image or produces a uniform image from distinct input features. Second, the non-convergence problem whereby the gradient descent optimizer fails to reach a Nash equilibrium. Thirdly, the vanishing gradient problem whereby unstable training behavior occurs due to the discriminator achieving optimal classification performance resulting in no meaningful feedback being provided to the generator. These problems result in the production of synthetic imagery that is blurry, unrealistic, and less diverse. To date, there has been no survey article outlining the impact of these technical challenges in the context of the biomedical imagery domain. This work presents a review and taxonomy based on solutions to the training problems of GANs in the biomedical imaging domain. This survey highlights important challenges and outlines future research directions about the training of GANs in the domain of biomedical imagery. △ Less

Submitted 10 August, 2023; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: Submitted to the AI Review Journal

arXiv:2201.07219 [pdf, other]

Contrastive Pretraining for Echocardiography Segmentation with Limited Data

Authors: Mohamed Saeed, Rand Muhtaseb, Mohammad Yaqub

Abstract: Contrastive learning has proven useful in many applications where access to labelled data is limited. The lack of annotated data is particularly problematic in medical image segmentation as it is difficult to have clinical experts manually annotate large volumes of data such as cardiac structures in ultrasound images of the heart. In this paper, We propose a self supervised contrastive learning me… ▽ More Contrastive learning has proven useful in many applications where access to labelled data is limited. The lack of annotated data is particularly problematic in medical image segmentation as it is difficult to have clinical experts manually annotate large volumes of data such as cardiac structures in ultrasound images of the heart. In this paper, We propose a self supervised contrastive learning method to segment the left ventricle from echocardiography when limited annotated images exist. Furthermore, we study the effect of contrastive pretraining on two well-known segmentation networks, UNet and DeepLabV3. Our results show that contrastive pretraining helps improve the performance on left ventricle segmentation, particularly when annotated data is scarce. We show how to achieve comparable results to state-of-the-art fully supervised algorithms when we train our models in a self-supervised fashion followed by fine-tuning on just 5\% of the data. We show that our solution outperforms what is currently published on a large public dataset (EchoNet-Dynamic) achieving a Dice score of 0.9252. We also compare the performance of our solution on another smaller dataset (CAMUS) to demonstrate the generalizability of our proposed solution. The code is available at (https://github.com/BioMedIA-MBZUAI/contrastive-echo). △ Less

Submitted 14 July, 2022; v1 submitted 16 January, 2022; originally announced January 2022.

arXiv:2112.10483 [pdf, other]

Fusion and Orthogonal Projection for Improved Face-Voice Association

Authors: Muhammad Saad Saeed, Muhammad Haris Khan, Shah Nawaz, Muhammad Haroon Yousaf, Alessio Del Bue

Abstract: We study the problem of learning association between face and voice, which is gaining interest in the computer vision community lately. Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks. Albeit showing some progress, such loss formulations are, however, restrictive due to dependency on distance-dependent marg… ▽ More We study the problem of learning association between face and voice, which is gaining interest in the computer vision community lately. Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks. Albeit showing some progress, such loss formulations are, however, restrictive due to dependency on distance-dependent margin parameter, poor run-time training complexity, and reliance on carefully crafted negative mining procedures. In this work, we hypothesize that enriched feature representation coupled with an effective yet efficient supervision is necessary in realizing a discriminative joint embedding space for improved face-voice association. To this end, we propose a light-weight, plug-and-play mechanism that exploits the complementary cues in both modalities to form enriched fused embeddings and clusters them based on their identity labels via orthogonality constraints. We coin our proposed mechanism as fusion and orthogonal projection (FOP) and instantiate in a two-stream pipeline. The overall resulting framework is evaluated on a large-scale VoxCeleb dataset with a multitude of tasks, including cross-modal verification and matching. Results show that our method performs favourably against the current state-of-the-art methods and our proposed supervision formulation is more effective and efficient than the ones employed by the contemporary methods. △ Less

Submitted 20 December, 2021; originally announced December 2021.

arXiv:2112.00443 [pdf, other]

TROLLMAGNIFIER: Detecting State-Sponsored Troll Accounts on Reddit

Authors: Mohammad Hammas Saeed, Shiza Ali, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

Abstract: Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIF… ▽ More Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian-sponsored troll accounts identified by Reddit, is that they show loose coordination, often interacting with each other to further specific narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identifies more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identifies 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classification. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2110.14205 [pdf, other]

FedPrune: Towards Inclusive Federated Learning

Authors: Muhammad Tahir Munir, Muhammad Mustansar Saeed, Mahad Ali, Zafar Ayyub Qazi, Ihsan Ayyub Qazi

Abstract: Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example,… ▽ More Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example, slow clients get dropped in FL schemes, such as Federated Averaging (FedAvg), which not only limits overall learning but also biases results towards fast clients. We propose FedPrune; a system that tackles this challenge by pruning the global model for slow clients based on their device characteristics. By doing so, slow clients can train a small model quickly and participate in FL which increases test accuracy as well as fairness. By using insights from Central Limit Theorem, FedPrune incorporates a new aggregation technique that achieves robust performance over non-IID data. Experimental evaluation shows that Fed- Prune provides robust convergence and better fairness compared to Federated Averaging. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Showing 1–50 of 99 results for author: Saad, M