-
Natural Language Task-Oriented Dialog System 2.0
Authors:
Adib Mosharrof,
A. B. Siddique
Abstract:
Task-oriented dialog (TOD) systems play a crucial role in facilitating efficient interactions between users and machines by focusing on achieving specific goals through natural language communication. These systems traditionally rely on manually annotated metadata, such as dialog states and policy annotations, which is labor-intensive, expensive, inconsistent, and prone to errors, thereby limiting…
▽ More
Task-oriented dialog (TOD) systems play a crucial role in facilitating efficient interactions between users and machines by focusing on achieving specific goals through natural language communication. These systems traditionally rely on manually annotated metadata, such as dialog states and policy annotations, which is labor-intensive, expensive, inconsistent, and prone to errors, thereby limiting the potential to leverage the vast amounts of available conversational data. A critical aspect of TOD systems involves accessing and integrating information from external sources to effectively engage users. The process of determining when and how to query external resources represents a fundamental challenge in system design, however existing approaches expect this information to provided in the context. In this paper, we introduce Natural Language Task Oriented Dialog System (NL-ToD), a novel model that removes the dependency on manually annotated turn-wise data by utilizing dialog history and domain schemas to create a Zero Shot Generalizable TOD system. We also incorporate query generation as a core task of the system, where the output of the system could be a response to the user or an API query to communicate with an external resource. To achieve a more granular analysis of the system output, we classify the output into multiple categories: slot filling, retrieval, and query generation. Our analysis reveals that slot filling is the most challenging TOD task for all models. Experimental results on three popular TOD datasets (SGD, KETOD and BiToD) shows the effectiveness of our approach as NL-ToD outperforms state-of-the-art approaches, particularly with a \textbf{31.4\%} and \textbf{82.1\%} improvement in the BLEU-4 score on the SGD and KETOD dataset.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Looking into Black Box Code Language Models
Authors:
Muhammad Umair Haider,
Umar Farooq,
A. B. Siddique,
Mark Marron
Abstract:
Language Models (LMs) have shown their application for tasks pertinent to code and several code~LMs have been proposed recently. The majority of the studies in this direction only focus on the improvements in performance of the LMs on different benchmarks, whereas LMs are considered black boxes. Besides this, a handful of works attempt to understand the role of attention layers in the code~LMs. No…
▽ More
Language Models (LMs) have shown their application for tasks pertinent to code and several code~LMs have been proposed recently. The majority of the studies in this direction only focus on the improvements in performance of the LMs on different benchmarks, whereas LMs are considered black boxes. Besides this, a handful of works attempt to understand the role of attention layers in the code~LMs. Nonetheless, feed-forward layers remain under-explored which consist of two-thirds of a typical transformer model's parameters.
In this work, we attempt to gain insights into the inner workings of code language models by examining the feed-forward layers. To conduct our investigations, we use two state-of-the-art code~LMs, Codegen-Mono and Ploycoder, and three widely used programming languages, Java, Go, and Python. We focus on examining the organization of stored concepts, the editability of these concepts, and the roles of different layers and input context size variations for output generation. Our empirical findings demonstrate that lower layers capture syntactic patterns while higher layers encode abstract concepts and semantics. We show concepts of interest can be edited within feed-forward layers without compromising code~LM performance. Additionally, we observe initial layers serve as ``thinking'' layers, while later layers are crucial for predicting subsequent code tokens. Furthermore, we discover earlier layers can accurately predict smaller contexts, but larger contexts need critical later layers' contributions. We anticipate these findings will facilitate better understanding, debugging, and testing of code~LMs.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
MobileConvRec: A Conversational Dataset for Mobile Apps Recommendations
Authors:
Srijata Maji,
Moghis Fereidouni,
Vinaik Chhetri,
Umar Farooq,
A. B. Siddique
Abstract:
Existing recommendation systems have focused on two paradigms: 1- historical user-item interaction-based recommendations and 2- conversational recommendations. Conversational recommendation systems facilitate natural language dialogues between users and the system, allowing the system to solicit users' explicit needs while enabling users to inquire about recommendations and provide feedback. Due t…
▽ More
Existing recommendation systems have focused on two paradigms: 1- historical user-item interaction-based recommendations and 2- conversational recommendations. Conversational recommendation systems facilitate natural language dialogues between users and the system, allowing the system to solicit users' explicit needs while enabling users to inquire about recommendations and provide feedback. Due to substantial advancements in natural language processing, conversational recommendation systems have gained prominence. Existing conversational recommendation datasets have greatly facilitated research in their respective domains. Despite the exponential growth in mobile users and apps in recent years, research in conversational mobile app recommender systems has faced substantial constraints. This limitation can primarily be attributed to the lack of high-quality benchmark datasets specifically tailored for mobile apps. To facilitate research for conversational mobile app recommendations, we introduce MobileConvRec. MobileConvRec simulates conversations by leveraging real user interactions with mobile apps on the Google Play store, originally captured in large-scale mobile app recommendation dataset MobileRec. The proposed conversational recommendation dataset synergizes sequential user-item interactions, which reflect implicit user preferences, with comprehensive multi-turn conversations to effectively grasp explicit user needs. MobileConvRec consists of over 12K multi-turn recommendation-related conversations spanning 45 app categories. Moreover, MobileConvRec presents rich metadata for each app such as permissions data, security and privacy-related information, and binary executables of apps, among others. We demonstrate that MobileConvRec can serve as an excellent testbed for conversational mobile app recommendation through a comparative study of several pre-trained large language models.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning
Authors:
Moghis Fereidouni,
A. B. Siddique
Abstract:
Traditional search systems focus on query formulation for effective results but face challenges in scenarios such as product searches where crucial product details (e.g., size, color) remain concealed until users visit specific product pages. This highlights the need for intelligent web navigation agents capable of formulating queries and navigating web pages according to users' high-level intents…
▽ More
Traditional search systems focus on query formulation for effective results but face challenges in scenarios such as product searches where crucial product details (e.g., size, color) remain concealed until users visit specific product pages. This highlights the need for intelligent web navigation agents capable of formulating queries and navigating web pages according to users' high-level intents. In response to this need, this work introduces a Grounded Language Agent for Intelligent Web Interactions, called GLAINTEL. Drawing upon advancements in language modeling and reinforcement learning, GLAINTEL investigates the efficacy of transformer-based models in enhancing the search capabilities of interactive web environments. Given the dynamic action space for each state in web navigation, GLAINTEL employs the Flan-T5 architecture and incorporates language modeling and value estimation heads. This work focuses on training smaller language models as agents across various scenarios, systematically evaluating the impact of human demonstrations on the training process. Specifically, we investigate scenarios where no human demonstrations are available and subsequently assess the effective utilization of such demonstrations. We also explore unsupervised domain adaptation for situations where demonstrations are confined to a specific domain. Experimental evaluations across diverse setups demonstrate the effectiveness of training agents in unsupervised settings, outperforming in-context learning-based approaches that employ larger models with up to 540 billion parameters. Surprisingly, behavioral cloning-based methods that straightforwardly use human demonstrations do not outperform unsupervised learning-based methods. Additionally, combining human demonstrations with Reinforcement Learning-based training yields results comparable to models utilizing GPT-4.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Zero-Shot Generalizable End-to-End Task-Oriented Dialog System using Context Summarization and Domain Schema
Authors:
Adib Mosharrof,
M. H. Maqbool,
A. B. Siddique
Abstract:
Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or…
▽ More
Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or task, and acquiring such data is prohibitively laborious and expensive, thus making it a bottleneck for scaling systems to a wide range of domains. To overcome this challenge, we introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD, that leverages domain schemas to allow for robust generalization to unseen domains and exploits effective summarization of the dialog history. We employ GPT-2 as a backbone model and introduce a two-step training process where the goal of the first step is to learn the general structure of the dialog data and the second step optimizes the response generation as well as intermediate outputs, such as dialog state and system actions. As opposed to state-of-the-art systems that are trained to fulfill certain intents in the given domains and memorize task-specific conversational patterns, ZS-ToD learns generic task-completion skills by comprehending domain semantics via domain schemas and generalizing to unseen domains seamlessly. We conduct an extensive experimental evaluation on SGD and SGD-X datasets that span up to 20 unique domains and ZS-ToD outperforms state-of-the-art systems on key metrics, with an improvement of +17% on joint goal accuracy and +5 on inform. Additionally, we present a detailed ablation study to demonstrate the effectiveness of the proposed components and training mechanism
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Toward Open-domain Slot Filling via Self-supervised Co-training
Authors:
Adib Mosharrof,
Moghis Fereidouni,
A. B. Siddique
Abstract:
Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to super…
▽ More
Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to supervised learning approaches in terms of performance. To minimize this performance gap and demonstrate the possibility of open-domain slot filling, we propose a Self-supervised Co-training framework, called SCot, that requires zero in-domain manually labeled training examples and works in three phases. Phase one acquires two sets of complementary pseudo labels automatically. Phase two leverages the power of the pre-trained language model BERT, by adapting it for the slot filling task using these sets of pseudo labels. In phase three, we introduce a self-supervised cotraining mechanism, where both models automatically select highconfidence soft labels to further improve the performance of the other in an iterative fashion. Our thorough evaluations show that SCot outperforms state-of-the-art models by 45.57% and 37.56% on SGD and MultiWoZ datasets, respectively. Moreover, our proposed framework SCot achieves comparable performance when compared to state-of-the-art fully supervised models.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function
Authors:
A. B. Siddique,
M. H. Maqbool,
Kshitija Taywade,
Hassan Foroosh
Abstract:
Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the ch…
▽ More
Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the challenge. Most existing works rely on supervised learning approaches and require laborious and expensive labeled training data for each user profile. Additionally, collecting and labeling data for each user profile is virtually impossible. In this work, we propose a novel framework, P-ToD, to personalize task-oriented dialog systems capable of adapting to a wide range of user profiles in an unsupervised fashion using a zero-shot generalizable reward function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases. Phase one performs task-specific training. Phase two kicks off unsupervised personalization by leveraging the proximal policy optimization algorithm that performs policy gradients guided by the zero-shot generalizable reward function. Our novel reward function can quantify the quality of the generated responses even for unseen profiles. The optional final phase fine-tunes the personalized model using a few labeled training examples. We conduct extensive experimental analysis using the personalized bAbI dialogue benchmark for five tasks and up to 180 diverse user profiles. The experimental results demonstrate that P-ToD, even when it had access to zero labeled examples, outperforms state-of-the-art supervised personalization models and achieves competitive performance on BLEU and ROUGE metrics when compared to a strong fully-supervised GPT-2 baseline
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
MobileRec: A Large-Scale Dataset for Mobile Apps Recommendation
Authors:
M. H. Maqbool,
Umar Farooq,
Adib Mosharrof,
A. B. Siddique,
Hassan Foroosh
Abstract:
Recommender systems have become ubiquitous in our digital lives, from recommending products on e-commerce websites to suggesting movies and music on streaming platforms. Existing recommendation datasets, such as Amazon Product Reviews and MovieLens, greatly facilitated the research and development of recommender systems in their respective domains. While the number of mobile users and applications…
▽ More
Recommender systems have become ubiquitous in our digital lives, from recommending products on e-commerce websites to suggesting movies and music on streaming platforms. Existing recommendation datasets, such as Amazon Product Reviews and MovieLens, greatly facilitated the research and development of recommender systems in their respective domains. While the number of mobile users and applications (aka apps) has increased exponentially over the past decade, research in mobile app recommender systems has been significantly constrained, primarily due to the lack of high-quality benchmark datasets, as opposed to recommendations for products, movies, and news. To facilitate research for app recommendation systems, we introduce a large-scale dataset, called MobileRec. We constructed MobileRec from users' activity on the Google play store. MobileRec contains 19.3 million user interactions (i.e., user reviews on apps) with over 10K unique apps across 48 categories. MobileRec records the sequential activity of a total of 0.7 million distinct users. Each of these users has interacted with no fewer than five distinct apps, which stands in contrast to previous datasets on mobile apps that recorded only a single interaction per user. Furthermore, MobileRec presents users' ratings as well as sentiments on installed apps, and each app contains rich metadata such as app name, category, description, and overall rating, among others. We demonstrate that MobileRec can serve as an excellent testbed for app recommendation through a comparative study of several state-of-the-art recommendation approaches. The quantitative results can act as a baseline for other researchers to compare their results against. The MobileRec dataset is available at https://huggingface.co/datasets/recmeapp/mobilerec.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Effect of Balancing Data Using Synthetic Data on the Performance of Machine Learning Classifiers for Intrusion Detection in Computer Networks
Authors:
Ayesha S. Dina,
A. B. Siddique,
D. Manivannan
Abstract:
Attacks on computer networks have increased significantly in recent days, due in part to the availability of sophisticated tools for launching such attacks as well as thriving underground cyber-crime economy to support it. Over the past several years, researchers in academia and industry used machine learning (ML) techniques to design and implement Intrusion Detection Systems (IDSes) for computer…
▽ More
Attacks on computer networks have increased significantly in recent days, due in part to the availability of sophisticated tools for launching such attacks as well as thriving underground cyber-crime economy to support it. Over the past several years, researchers in academia and industry used machine learning (ML) techniques to design and implement Intrusion Detection Systems (IDSes) for computer networks. Many of these researchers used datasets collected by various organizations to train ML models for predicting intrusions. In many of the datasets used in such systems, data are imbalanced (i.e., not all classes have equal amount of samples). With unbalanced data, the predictive models developed using ML algorithms may produce unsatisfactory classifiers which would affect accuracy in predicting intrusions. Traditionally, researchers used over-sampling and under-sampling for balancing data in datasets to overcome this problem. In this work, in addition to over-sampling, we also use a synthetic data generation method, called Conditional Generative Adversarial Network (CTGAN), to balance data and study their effect on various ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples to balance intrusion detection datasets. Based on extensive experiments using a widely used dataset NSL-KDD, we found that training ML models on dataset balanced with synthetic samples generated by CTGAN increased prediction accuracy by up to $8\%$, compared to training the same ML models over unbalanced data. Our experiments also show that the accuracy of some ML models trained over data balanced with random over-sampling decline compared to the same ML models trained over unbalanced data.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Generalized Zero-shot Intent Detection via Commonsense Knowledge
Authors:
A. B. Siddique,
Fuad Jamour,
Luxun Xu,
Vagelis Hristidis
Abstract:
Identifying user intents from natural language utterances is a crucial step in conversational systems that has been extensively studied as a supervised classification problem. However, in practice, new intents emerge after deploying an intent detection model. Thus, these models should seamlessly adapt and classify utterances with both seen and unseen intents -- unseen intents emerge after deployme…
▽ More
Identifying user intents from natural language utterances is a crucial step in conversational systems that has been extensively studied as a supervised classification problem. However, in practice, new intents emerge after deploying an intent detection model. Thus, these models should seamlessly adapt and classify utterances with both seen and unseen intents -- unseen intents emerge after deployment and they do not have training data. The few existing models that target this setting rely heavily on the scarcely available training data and overfit to seen intents data, resulting in a bias to misclassify utterances with unseen intents into seen ones. We propose RIDE: an intent detection model that leverages commonsense knowledge in an unsupervised fashion to overcome the issue of training data scarcity. RIDE computes robust and generalizable relationship meta-features that capture deep semantic relationships between utterances and intent labels; these features are computed by considering how the concepts in an utterance are linked to those in an intent label via commonsense knowledge. Our extensive experimental analysis on three widely-used intent detection benchmarks shows that relationship meta-features significantly increase the accuracy of detecting both seen and unseen intents and that RIDE outperforms the state-of-the-art model for unseen intents.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Linguistically-Enriched and Context-Aware Zero-shot Slot Filling
Authors:
A. B. Siddique,
Fuad Jamour,
Vagelis Hristidis
Abstract:
Slot filling is identifying contiguous spans of words in an utterance that correspond to certain parameters (i.e., slots) of a user request/query. Slot filling is one of the most important challenges in modern task-oriented dialog systems. Supervised learning approaches have proven effective at tackling this challenge, but they need a significant amount of labeled training data in a given domain.…
▽ More
Slot filling is identifying contiguous spans of words in an utterance that correspond to certain parameters (i.e., slots) of a user request/query. Slot filling is one of the most important challenges in modern task-oriented dialog systems. Supervised learning approaches have proven effective at tackling this challenge, but they need a significant amount of labeled training data in a given domain. However, new domains (i.e., unseen in training) may emerge after deployment. Thus, it is imperative that these models seamlessly adapt and fill slots from both seen and unseen domains -- unseen domains contain unseen slot types with no training data, and even seen slots in unseen domains are typically presented in different contexts. This setting is commonly referred to as zero-shot slot filling. Little work has focused on this setting, with limited experimental evaluation. Existing models that mainly rely on context-independent embedding-based similarity measures fail to detect slot values in unseen domains or do so only partially. We propose a new zero-shot slot filling neural model, LEONA, which works in three steps. Step one acquires domain-oblivious, context-aware representations of the utterance word by exploiting (a) linguistic features; (b) named entity recognition cues; (c) contextual embeddings from pre-trained language models. Step two fine-tunes these rich representations and produces slot-independent tags for each word. Step three exploits generalizable context-aware utterance-slot similarity features at the word level, uses slot-independent tags, and contextualizes them to produce slot-specific predictions for each word. Our thorough evaluation on four diverse public datasets demonstrates that our approach consistently outperforms the SOTA models by 17.52%, 22.15%, 17.42%, and 17.95% on average for unseen domains on SNIPS, ATIS, MultiWOZ, and SGD datasets, respectively.
△ Less
Submitted 16 January, 2021;
originally announced January 2021.
-
Deep Convolutional Neural Networks Model-based Brain Tumor Detection in Brain MRI Images
Authors:
Md. Abu Bakr Siddique,
Shadman Sakib,
Mohammad Mahmudur Rahman Khan,
Abyaz Kader Tanzeem,
Madiha Chowdhury,
Nowrin Yasmin
Abstract:
Diagnosing Brain Tumor with the aid of Magnetic Resonance Imaging (MRI) has gained enormous prominence over the years, primarily in the field of medical science. Detection and/or partitioning of brain tumors solely with the aid of MR imaging is achieved at the cost of immense time and effort and demands a lot of expertise from engaged personnel. This substantiates the necessity of fabricating an a…
▽ More
Diagnosing Brain Tumor with the aid of Magnetic Resonance Imaging (MRI) has gained enormous prominence over the years, primarily in the field of medical science. Detection and/or partitioning of brain tumors solely with the aid of MR imaging is achieved at the cost of immense time and effort and demands a lot of expertise from engaged personnel. This substantiates the necessity of fabricating an autonomous model brain tumor diagnosis. Our work involves implementing a deep convolutional neural network (DCNN) for diagnosing brain tumors from MR images. The dataset used in this paper consists of 253 brain MR images where 155 images are reported to have tumors. Our model can single out the MR images with tumors with an overall accuracy of 96%. The model outperformed the existing conventional methods for the diagnosis of brain tumor in the test dataset (Precision = 0.93, Sensitivity = 1.00, and F1-score = 0.97). Moreover, the proposed model's average precision-recall score is 0.93, Cohen's Kappa 0.91, and AUC 0.95. Therefore, the proposed model can help clinical experts verify whether the patient has a brain tumor and, consequently, accelerate the treatment procedure.
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
Prediction of Temperature and Rainfall in Bangladesh using Long Short Term Memory Recurrent Neural Networks
Authors:
Mohammad Mahmudur Rahman Khan,
Md. Abu Bakr Siddique,
Shadman Sakib,
Anas Aziz,
Ihtyaz Kader Tasawar,
Ziad Hossain
Abstract:
Temperature and rainfall have a significant impact on economic growth as well as the outbreak of seasonal diseases in a region. In spite of that inadequate studies have been carried out for analyzing the weather pattern of Bangladesh implementing the artificial neural network. Therefore, in this study, we are implementing a Long Short-term Memory (LSTM) model to forecast the month-wise temperature…
▽ More
Temperature and rainfall have a significant impact on economic growth as well as the outbreak of seasonal diseases in a region. In spite of that inadequate studies have been carried out for analyzing the weather pattern of Bangladesh implementing the artificial neural network. Therefore, in this study, we are implementing a Long Short-term Memory (LSTM) model to forecast the month-wise temperature and rainfall by analyzing 115 years (1901-2015) of weather data of Bangladesh. The LSTM model has shown a mean error of -0.38oC in case of predicting the month-wise temperature for 2 years and -17.64mm in case of predicting the rainfall. This prediction model can help to understand the weather pattern changes as well as studying seasonal diseases of Bangladesh whose outbreaks are dependent on regional temperature and/or rainfall.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
App-Aware Response Synthesis for User Reviews
Authors:
Umar Farooq,
A. B. Siddique,
Fuad Jamour,
Zhijia Zhao,
Vagelis Hristidis
Abstract:
Responding to user reviews promptly and satisfactorily improves application ratings, which is key to application popularity and success. The proliferation of such reviews makes it virtually impossible for developers to keep up with responding manually. To address this challenge, recent work has shown the possibility of automatic response generation. However, because the training review-response pa…
▽ More
Responding to user reviews promptly and satisfactorily improves application ratings, which is key to application popularity and success. The proliferation of such reviews makes it virtually impossible for developers to keep up with responding manually. To address this challenge, recent work has shown the possibility of automatic response generation. However, because the training review-response pairs are aggregated from many different apps, it remains challenging for such models to generate app-specific responses, which, on the other hand, are often desirable as apps have different features and concerns. Solving the challenge by simply building a model per app (i.e., training with review-response pairs of a single app) may be insufficient because individual apps have limited review-response pairs, and such pairs typically lack the relevant information needed to respond to a new review. To enable app-specific response generation, this work proposes AARSynth: an app-aware response synthesis system. The key idea behind AARSynth is to augment the seq2seq model with information specific to a given app. Given a new user review, it first retrieves the top-K most relevant app reviews and the most relevant snippet from the app description. The retrieved information and the new user review are then fed into a fused machine learning model that integrates the seq2seq model with a machine reading comprehension model. The latter helps digest the retrieved reviews and app description. Finally, the fused model generates a response that is customized to the given app. We evaluated AARSynth using a large corpus of reviews and responses from Google Play. The results show that AARSynth outperforms the state-of-the-art system by 22.2% on BLEU-4 score. Furthermore, our human study shows that AARSynth produces a statistically significant improvement in response quality compared to the state-of-the-art system.
△ Less
Submitted 10 November, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Performance Evaluation of t-SNE and MDS Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers
Authors:
Shadman Sakib,
Md. Abu Bakr Siddique,
Md. Abdur Rahman
Abstract:
The central goal of this paper is to establish two commonly available dimensionality reduction (DR) methods i.e. t-distributed Stochastic Neighbor Embedding (t-SNE) and Multidimensional Scaling (MDS) in Matlab and to observe their application in several datasets. These DR techniques are applied to nine different datasets namely CNAE9, Segmentation, Seeds, Pima Indians diabetes, Parkinsons, Movemen…
▽ More
The central goal of this paper is to establish two commonly available dimensionality reduction (DR) methods i.e. t-distributed Stochastic Neighbor Embedding (t-SNE) and Multidimensional Scaling (MDS) in Matlab and to observe their application in several datasets. These DR techniques are applied to nine different datasets namely CNAE9, Segmentation, Seeds, Pima Indians diabetes, Parkinsons, Movement Libras, Mammographic Masses, Knowledge, and Ionosphere acquired from UCI machine learning repository. By applying t-SNE and MDS algorithms, each dataset is transformed to the half of its original dimension by eliminating unnecessary features from the datasets. Subsequently, these datasets with reduced dimensions are fed into three supervised classification algorithms for classification. These classification algorithms are K Nearest Neighbors (KNN), Extended Nearest Neighbors (ENN), and Support Vector Machine (SVM). Again, all these algorithms are implemented in Matlab. The training and test data ratios are maintained as ninety percent: ten percent for each dataset. Upon accuracy observation, the efficiency for every dimensionality technique with availed classification algorithms is analyzed and the performance of each classifier is evaluated.
△ Less
Submitted 20 June, 2020;
originally announced July 2020.
-
Unsupervised Paraphrasing via Deep Reinforcement Learning
Authors:
A. B. Siddique,
Samet Oymak,
Vagelis Hristidis
Abstract:
Paraphrasing is expressing the meaning of an input sentence in different wording while maintaining fluency (i.e., grammatical and syntactical correctness). Most existing work on paraphrasing use supervised models that are limited to specific domains (e.g., image captions). Such models can neither be straightforwardly transferred to other domains nor generalize well, and creating labeled training d…
▽ More
Paraphrasing is expressing the meaning of an input sentence in different wording while maintaining fluency (i.e., grammatical and syntactical correctness). Most existing work on paraphrasing use supervised models that are limited to specific domains (e.g., image captions). Such models can neither be straightforwardly transferred to other domains nor generalize well, and creating labeled training data for new domains is expensive and laborious. The need for paraphrasing across different domains and the scarcity of labeled training data in many such domains call for exploring unsupervised paraphrase generation methods. We propose Progressive Unsupervised Paraphrasing (PUP): a novel unsupervised paraphrase generation method based on deep reinforcement learning (DRL). PUP uses a variational autoencoder (trained using a non-parallel corpus) to generate a seed paraphrase that warm-starts the DRL model. Then, PUP progressively tunes the seed paraphrase guided by our novel reward function which combines semantic adequacy, language fluency, and expression diversity measures to quantify the quality of the generated paraphrases in each iteration without needing parallel sentences. Our extensive experimental evaluation shows that PUP outperforms unsupervised state-of-the-art paraphrasing techniques in terms of both automatic metrics and user studies on four real datasets. We also show that PUP outperforms domain-adapted supervised algorithms on several datasets. Our evaluation also shows that PUP achieves a great trade-off between semantic similarity and diversity of expression.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers
Authors:
Md. Abu Bakr Siddique,
Shadman Sakib,
Md. Abdur Rahman
Abstract:
The central aim of this paper is to implement Deep Autoencoder and Neighborhood Components Analysis (NCA) dimensionality reduction methods in Matlab and to observe the application of these algorithms on nine unlike datasets from UCI machine learning repository. These datasets are CNAE9, Movement Libras, Pima Indians diabetes, Parkinsons, Knowledge, Segmentation, Seeds, Mammographic Masses, and Ion…
▽ More
The central aim of this paper is to implement Deep Autoencoder and Neighborhood Components Analysis (NCA) dimensionality reduction methods in Matlab and to observe the application of these algorithms on nine unlike datasets from UCI machine learning repository. These datasets are CNAE9, Movement Libras, Pima Indians diabetes, Parkinsons, Knowledge, Segmentation, Seeds, Mammographic Masses, and Ionosphere. First of all, the dimension of these datasets has been reduced to fifty percent of their original dimension by selecting and extracting the most relevant and appropriate features or attributes using Deep Autoencoder and NCA dimensionality reduction techniques. Afterward, each dataset is classified applying K-Nearest Neighbors (KNN), Extended Nearest Neighbors (ENN) and Support Vector Machine (SVM) classification algorithms. All classification algorithms are developed in the Matlab environment. In each classification, the training test data ratio is always set to ninety percent: ten percent. Upon classification, variation between accuracies is observed and analyzed to find the degree of compatibility of each dimensionality reduction technique with each classifier and to evaluate each classifier performance on each dataset.
△ Less
Submitted 24 December, 2019; v1 submitted 11 December, 2019;
originally announced December 2019.
-
Non-Intrusive Electrical Appliances Monitoring and Classification using K-Nearest Neighbors
Authors:
Mohammad Mahmudur Rahman Khan,
Md. Abu Bakr Siddique,
Shadman Sakib
Abstract:
Non-Intrusive Load Monitoring (NILM) is the method of detecting an individual device's energy signal from an aggregated energy consumption signature [1]. As existing energy meters provide very little to no information regarding the energy consumption of individual appliances apart from the aggregated power rating, the spotting of individual appliances' energy usages by NILM will not only provide c…
▽ More
Non-Intrusive Load Monitoring (NILM) is the method of detecting an individual device's energy signal from an aggregated energy consumption signature [1]. As existing energy meters provide very little to no information regarding the energy consumption of individual appliances apart from the aggregated power rating, the spotting of individual appliances' energy usages by NILM will not only provide consumers the feedback of appliance-specific energy usage but also lead to the changes of their consumption behavior which facilitate energy conservation. B Neenan et al. [2] have demonstrated that direct individual appliance-specific energy usage signals lead to consumers' behavioral changes which improves energy efficiency by as much as 15%. Upon disaggregation of an energy signal, the signal needs to be classified according to the appropriate appliance. Hence, the goal of this paper is to disaggregate total energy consumption data to individual appliance signature and then classify appliance-specific energy loads using a prominent supervised classification method known as K-Nearest Neighbors (KNN). To perform this operation we have used a publicly accessible dataset of power signals from several houses known as the REDD dataset. Before applying KNN, data is preprocessed for each device. Then KNN is applied to check whether their energy consumption signature is separable or not. KNN is applied with K=5.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Recognition of Handwritten Digit using Convolutional Neural Network in Python with Tensorflow and Comparison of Performance for Various Hidden Layers
Authors:
Fathma Siddique,
Shadman Sakib,
Md. Abu Bakr Siddique
Abstract:
In recent times, with the increase of Artificial Neural Network (ANN), deep learning has brought a dramatic twist in the field of machine learning by making it more artificially intelligent. Deep learning is remarkably used in vast ranges of fields because of its diverse range of applications such as surveillance, health, medicine, sports, robotics, drones, etc. In deep learning, Convolutional Neu…
▽ More
In recent times, with the increase of Artificial Neural Network (ANN), deep learning has brought a dramatic twist in the field of machine learning by making it more artificially intelligent. Deep learning is remarkably used in vast ranges of fields because of its diverse range of applications such as surveillance, health, medicine, sports, robotics, drones, etc. In deep learning, Convolutional Neural Network (CNN) is at the center of spectacular advances that mixes Artificial Neural Network (ANN) and up to date deep learning strategies. It has been used broadly in pattern recognition, sentence classification, speech recognition, face recognition, text categorization, document analysis, scene, and handwritten digit recognition. The goal of this paper is to observe the variation of accuracies of CNN to classify handwritten digits using various numbers of hidden layers and epochs and to make the comparison between the accuracies. For this performance evaluation of CNN, we performed our experiment using Modified National Institute of Standards and Technology (MNIST) dataset. Further, the network is trained using stochastic gradient descent and the backpropagation algorithm.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers
Authors:
Shadman Sakib,
Zahidun Ashrafi,
Md. Abu Bakr Siddique
Abstract:
Fruit recognition using Deep Convolutional Neural Network (CNN) is one of the most promising applications in computer vision. In recent times, deep learning based classifications are making it possible to recognize fruits from images. However, fruit recognition is still a problem for the stacked fruits on weighing scale because of the complexity and similarity. In this paper, a fruit recognition s…
▽ More
Fruit recognition using Deep Convolutional Neural Network (CNN) is one of the most promising applications in computer vision. In recent times, deep learning based classifications are making it possible to recognize fruits from images. However, fruit recognition is still a problem for the stacked fruits on weighing scale because of the complexity and similarity. In this paper, a fruit recognition system using CNN is proposed. The proposed method uses deep learning techniques for the classification. We have used Fruits-360 dataset for the evaluation purpose. From the dataset, we have established a dataset which contains 17,823 images from 25 different categories. The images are divided into training and test dataset. Moreover, for the classification accuracies, we have used various combinations of hidden layer and epochs for different cases and made a comparison between them. The overall performance losses of the network for different cases also observed. Finally, we have achieved the best test accuracy of 100% and a training accuracy of 99.79%.
△ Less
Submitted 25 January, 2020; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Unsupervised Segmentation Algorithms' Implementation in ITK for Tissue Classification via Human Head MRI Scans
Authors:
Shadman Sakib,
Md. Abu Bakr Siddique
Abstract:
Tissue classification is one of the significant tasks in the field of biomedical image analysis. Magnetic Resonance Imaging (MRI) is of great importance in tissue classification especially in the areas of brain tissue classification which is able to recognize anatomical areas of interest such as surgical planning, monitoring therapy, clinical drug trials, image registration, stereotactic neurosurg…
▽ More
Tissue classification is one of the significant tasks in the field of biomedical image analysis. Magnetic Resonance Imaging (MRI) is of great importance in tissue classification especially in the areas of brain tissue classification which is able to recognize anatomical areas of interest such as surgical planning, monitoring therapy, clinical drug trials, image registration, stereotactic neurosurgery, radiotherapy etc. The task of this paper is to implement different unsupervised classification algorithms in ITK and perform tissue classification (white matter, gray matter, cerebrospinal fluid (CSF) and background of the human brain). For this purpose, 5 grayscale head MRI scans are provided. In order of classifying brain tissues, three algorithms are used. These are: Otsu thresholding, Bayesian classification and Bayesian classification with Gaussian smoothing. The obtained classification results are analyzed in the results and discussion section.
△ Less
Submitted 25 January, 2020; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation
Authors:
Md. Abu Bakr Siddique,
Rezoana Bente Arif,
Mohammad Mahmudur Rahman Khan,
Zahidun Ashrafi
Abstract:
In this paper, several two-dimensional clustering scenarios are given. In those scenarios, soft partitioning clustering algorithms (Fuzzy C-means (FCM) and Possibilistic c-means (PCM)) are applied. Afterward, VAT is used to investigate the clustering tendency visually, and then in order of checking cluster validation, three types of indices (e.g., PC, DI, and DBI) were used. After observing the cl…
▽ More
In this paper, several two-dimensional clustering scenarios are given. In those scenarios, soft partitioning clustering algorithms (Fuzzy C-means (FCM) and Possibilistic c-means (PCM)) are applied. Afterward, VAT is used to investigate the clustering tendency visually, and then in order of checking cluster validation, three types of indices (e.g., PC, DI, and DBI) were used. After observing the clustering algorithms, it was evident that each of them has its limitations; however, PCM is more robust to noise than FCM as in case of FCM a noise point has to be considered as a member of any of the cluster.
△ Less
Submitted 12 May, 2019; v1 submitted 22 September, 2018;
originally announced September 2018.
-
ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities
Authors:
Mohammad Mahmudur Rahman Khan,
Md. Abu Bakr Siddique,
Rezoana Bente Arif,
Mahjabin Rahman Oishe
Abstract:
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an ad…
▽ More
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities.
△ Less
Submitted 29 November, 2018; v1 submitted 17 September, 2018;
originally announced September 2018.
-
Study and Observation of the Variations of Accuracies for Handwritten Digits Recognition with Various Hidden Layers and Epochs using Neural Network Algorithm
Authors:
Md. Abu Bakr Siddique,
Mohammad Mahmudur Rahman Khan,
Rezoana Bente Arif,
Zahidun Ashrafi
Abstract:
In recent days, Artificial Neural Network (ANN) can be applied to a vast majority of fields including business, medicine, engineering, etc. The most popular areas where ANN is employed nowadays are pattern and sequence recognition, novelty detection, character recognition, regression analysis, speech recognition, image compression, stock market prediction, Electronic nose, security, loan applicati…
▽ More
In recent days, Artificial Neural Network (ANN) can be applied to a vast majority of fields including business, medicine, engineering, etc. The most popular areas where ANN is employed nowadays are pattern and sequence recognition, novelty detection, character recognition, regression analysis, speech recognition, image compression, stock market prediction, Electronic nose, security, loan applications, data processing, robotics, and control. The benefits associated with its broad applications leads to increasing popularity of ANN in the era of 21st Century. ANN confers many benefits such as organic learning, nonlinear data processing, fault tolerance, and self-repairing compared to other conventional approaches. The primary objective of this paper is to analyze the influence of the hidden layers of a neural network over the overall performance of the network. To demonstrate this influence, we applied neural network with different layers on the MNIST dataset. Also, another goal is to observe the variations of accuracies of ANN for different numbers of hidden layers and epochs and to compare and contrast among them.
△ Less
Submitted 29 November, 2018; v1 submitted 17 September, 2018;
originally announced September 2018.
-
Study and Observation of the Variations of Accuracies for Handwritten Digits Recognition with Various Hidden Layers and Epochs using Convolutional Neural Network
Authors:
Rezoana Bente Arif,
Md. Abu Bakr Siddique,
Mohammad Mahmudur Rahman Khan,
Mahjabin Rahman Oishe
Abstract:
Nowadays, deep learning can be employed to a wide ranges of fields including medicine, engineering, etc. In deep learning, Convolutional Neural Network (CNN) is extensively used in the pattern and sequence recognition, video analysis, natural language processing, spam detection, topic categorization, regression analysis, speech recognition, image classification, object detection, segmentation, fac…
▽ More
Nowadays, deep learning can be employed to a wide ranges of fields including medicine, engineering, etc. In deep learning, Convolutional Neural Network (CNN) is extensively used in the pattern and sequence recognition, video analysis, natural language processing, spam detection, topic categorization, regression analysis, speech recognition, image classification, object detection, segmentation, face recognition, robotics, and control. The benefits associated with its near human level accuracies in large applications lead to the growing acceptance of CNN in recent years. The primary contribution of this paper is to analyze the impact of the pattern of the hidden layers of a CNN over the overall performance of the network. To demonstrate this influence, we applied neural network with different layers on the Modified National Institute of Standards and Technology (MNIST) dataset. Also, is to observe the variations of accuracies of the network for various numbers of hidden layers and epochs and to make comparison and contrast among them. The system is trained utilizing stochastic gradient and backpropagation algorithm and tested with feedforward algorithm.
△ Less
Submitted 29 November, 2018; v1 submitted 17 September, 2018;
originally announced September 2018.
-
Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository
Authors:
Mohammad Mahmudur Rahman Khan,
Rezoana Bente Arif,
Md. Abu Bakr Siddique,
Mahjabin Rahman Oishe
Abstract:
Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that portrays an input to an output hinged on training input-output pairs [3]. Most efficient and widely used supervised learning algorithms are K-Nearest Neighbors (KNN…
▽ More
Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that portrays an input to an output hinged on training input-output pairs [3]. Most efficient and widely used supervised learning algorithms are K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Large Margin Nearest Neighbor (LMNN), and Extended Nearest Neighbor (ENN). The main contribution of this paper is to implement these elegant learning algorithms on eleven different datasets from the UCI machine learning repository to observe the variation of accuracies for each of the algorithms on all datasets. Analyzing the accuracy of the algorithms will give us a brief idea about the relationship of the machine learning algorithms and the data dimensionality. All the algorithms are developed in Matlab. Upon such accuracy observation, the comparison can be built among KNN, SVM, LMNN, and ENN regarding their performances on each dataset.
△ Less
Submitted 29 November, 2018; v1 submitted 17 September, 2018;
originally announced September 2018.
-
Combinatorial Entropy Encoding
Authors:
Abu Bakar Siddique
Abstract:
This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of the message. Commercial data compression standards make use of Huffman or arithmetic coding at some stage of the compression process. In the proposed method, like…
▽ More
This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of the message. Commercial data compression standards make use of Huffman or arithmetic coding at some stage of the compression process. In the proposed method, like arithmetic coding entire string is mapped to an integer but is not based on fractional numbers. Unlike both arithmetic and Huffman coding no prior entropy model of the source is required. Simple intuitive algorithm based on multinomial coefficients is developed for entropy encoding that adoptively uses low number of bits for more frequent symbols. Correctness of the algorithm is demonstrated by an example.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.