Search | arXiv e-print repository

Design of a Rectangular Linear Microstrip Patch Antenna Array for 5G Communication

Authors: Muhammad Asfar Saeed, Augustine O. Nwajana

Abstract: This paper presents the design and characterization of a rectangular microstrip patch antenna array optimized for operation within the Ku-band frequency range. The antenna array is impedance-matched to 50 Ohms and utilizes a microstrip line feeding mechanism for excitation. The design maintains compact dimensions, with the overall antenna occupying an area of 29.5x7 mm. The antenna structure is mo… ▽ More This paper presents the design and characterization of a rectangular microstrip patch antenna array optimized for operation within the Ku-band frequency range. The antenna array is impedance-matched to 50 Ohms and utilizes a microstrip line feeding mechanism for excitation. The design maintains compact dimensions, with the overall antenna occupying an area of 29.5x7 mm. The antenna structure is modelled on an R03003 substrate material, featuring a dielectric constant of 3, a low-loss tangent of 0.0009, and a thickness of 1.574 mm. The substrate is backed by a conducting ground plane, and the array consists of six radiating patch elements positioned on top. Evaluation of the designed antenna array reveals a resonant frequency of 18GHz, with a -10 dB impedance bandwidth extending over 700MHz. The antenna demonstrates a high gain of 7.51dBi, making it well-suited for applications in 5G and future communication systems. Its compact form factor, cost-effectiveness, and broad impedance and radiation coverage further underscore its potential in these domains. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 4 pages, 5 figures, 2 tables

arXiv:2401.14107 [pdf, other]

Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement

Authors: Aaqib Saeed, Dimitris Spathis, Jungwoo Oh, Edward Choi, Ali Etemad

Abstract: Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the us… ▽ More Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the users and usually require rich metadata. As a result, label noise can become an increasingly thorny issue when labeling such data. In this paper, we propose a novel solution to address noisy label learning, entitled Few-Shot Human-in-the-Loop Refinement (FHLR). Our method initially learns a seed model using weak labels. Next, it fine-tunes the seed model using a handful of expert corrections. Finally, it achieves better generalizability and robustness by merging the seed and fine-tuned models via weighted parameter averaging. We evaluate our approach on four challenging tasks and datasets, and compare it against eight competitive baselines designed to deal with noisy labels. We show that FHLR achieves significantly better performance when learning from noisy labels and achieves state-of-the-art by a large margin, with up to 19% accuracy improvement under symmetric and asymmetric noise. Notably, we find that FHLR is particularly robust to increased label noise, unlike prior works that suffer from severe performance degradation. Our work not only achieves better generalization in high-stakes health sensing benchmarks but also sheds light on how noise affects commonly-used models. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.07981 [pdf]

doi 10.1016/j.ymssp.2024.111481

Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation

Authors: Haiming Yi, Lei Hou, Yuhong Jin, Nasser A. Saeed, Ali Kandil, Hao Duan

Abstract: Diffusion models have demonstrated powerful data generation capabilities in various research fields such as image generation. However, in the field of vibration signal generation, the criteria for evaluating the quality of the generated signal are different from that of image generation and there is a fundamental difference between them. At present, there is no research on the ability of diffusion… ▽ More Diffusion models have demonstrated powerful data generation capabilities in various research fields such as image generation. However, in the field of vibration signal generation, the criteria for evaluating the quality of the generated signal are different from that of image generation and there is a fundamental difference between them. At present, there is no research on the ability of diffusion model to generate vibration signal. In this paper, a Time Series Diffusion Method (TSDM) is proposed for vibration signal generation, leveraging the foundational principles of diffusion models. The TSDM uses an improved U-net architecture with attention block, ResBlock and TimeEmbedding to effectively segment and extract features from one-dimensional time series data. It operates based on forward diffusion and reverse denoising processes for time-series generation. Experimental validation is conducted using single-frequency, multi-frequency datasets, and bearing fault datasets. The results show that TSDM can accurately generate the single-frequency and multi-frequency features in the time series and retain the basic frequency features for the diffusion generation results of the bearing fault series. It is also found that the original DDPM could not generate high quality vibration signals, but the improved U-net in TSDM, which applied the combination of attention block and ResBlock, could effectively improve the quality of vibration signal generation. Finally, TSDM is applied to the small sample fault diagnosis of three public bearing fault datasets, and the results show that the accuracy of small sample fault diagnosis of the three datasets is improved by 32.380%, 18.355% and 9.298% at most, respectively. △ Less

Submitted 30 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Journal ref: Mechanical Systems and Signal Processing, 2024, 216: 111481

arXiv:2305.03058 [pdf, other]

Plug-and-Play Multilingual Few-shot Spoken Words Recognition

Authors: Aaqib Saeed, Vasileios Tsouvalas

Abstract: As technology advances and digital devices become prevalent, seamless human-machine communication is increasingly gaining significance. The growing adoption of mobile, wearable, and other Internet of Things (IoT) devices has changed how we interact with these smart devices, making accurate spoken words recognition a crucial component for effective interaction. However, building robust spoken words… ▽ More As technology advances and digital devices become prevalent, seamless human-machine communication is increasingly gaining significance. The growing adoption of mobile, wearable, and other Internet of Things (IoT) devices has changed how we interact with these smart devices, making accurate spoken words recognition a crucial component for effective interaction. However, building robust spoken words detection system that can handle novel keywords remains challenging, especially for low-resource languages with limited training data. Here, we propose PLiX, a multilingual and plug-and-play keyword spotting system that leverages few-shot learning to harness massive real-world data and enable the recognition of unseen spoken words at test-time. Our few-shot deep models are learned with millions of one-second audio clips across 20 languages, achieving state-of-the-art performance while being highly efficient. Extensive evaluations show that PLiX can generalize to novel spoken words given as few as just one support example and performs well on unseen languages out of the box. We release models and inference code to serve as a foundation for future research and voice-enabled user interface development for emerging devices. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: Code: https://github.com/FewshotML/plix

arXiv:2211.00119 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096465

Active Learning of Non-semantic Speech Tasks with Pretrained Models

Authors: Harlin Lee, Aaqib Saeed, Andrea L. Bertozzi

Abstract: Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with act… ▽ More Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning. ALOE uses pretrained models in conjunction with active learning to label data incrementally and learn classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data. △ Less

Submitted 25 February, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: Accepted at: ICASSP'23, Code: https://github.com/HarlinLee/ALOE

arXiv:2210.15283 [pdf, other]

On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors

Authors: Zaharah Bukhsh, Aaqib Saeed

Abstract: Out-of-distribution (OOD) detection is concerned with identifying data points that do not belong to the same distribution as the model's training data. For the safe deployment of predictive models in a real-world environment, it is critical to avoid making confident predictions on OOD inputs as it can lead to potentially dangerous consequences. However, OOD detection largely remains an under-explo… ▽ More Out-of-distribution (OOD) detection is concerned with identifying data points that do not belong to the same distribution as the model's training data. For the safe deployment of predictive models in a real-world environment, it is critical to avoid making confident predictions on OOD inputs as it can lead to potentially dangerous consequences. However, OOD detection largely remains an under-explored area in the audio (and speech) domain. This is despite the fact that audio is a central modality for many tasks, such as speaker diarization, automatic speech recognition, and sound event detection. To address this, we propose to leverage feature-space of the model with deep k-nearest neighbors to detect OOD samples. We show that this simple and flexible method effectively detects OOD inputs across a broad category of audio (and speech) datasets. Specifically, it improves the false positive rate (FPR@TPR95) by 17% and the AUROC score by 7% than other prior techniques. △ Less

Submitted 25 February, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted at ICASSP'23. Webpage: https://zaharah.github.io/ood_audio, Code: https://github.com/Zaharah/ood_audio

arXiv:2207.06921 [pdf, other]

Automatic Sleep Scoring from Large-scale Multi-channel Pediatric EEG

Authors: Harlin Lee, Aaqib Saeed

Abstract: Sleep is particularly important to the health of infants, children, and adolescents, and sleep scoring is the first step to accurate diagnosis and treatment of potentially life-threatening conditions. But pediatric sleep is severely under-researched compared to adult sleep in the context of machine learning for health, and sleep scoring algorithms developed for adults usually perform poorly on inf… ▽ More Sleep is particularly important to the health of infants, children, and adolescents, and sleep scoring is the first step to accurate diagnosis and treatment of potentially life-threatening conditions. But pediatric sleep is severely under-researched compared to adult sleep in the context of machine learning for health, and sleep scoring algorithms developed for adults usually perform poorly on infants. Here, we present the first automated sleep scoring results on a recent large-scale pediatric sleep study dataset that was collected during standard clinical care. We develop a transformer-based model that learns to classify five sleep stages from millions of multi-channel electroencephalogram (EEG) sleep epochs with 78% overall accuracy. Further, we conduct an in-depth analysis of the model performance based on patient demographics and EEG channels. The results point to the growing need for machine learning research on pediatric sleep. △ Less

Submitted 26 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

Comments: Learning from Time Series for Health. Workshop at NeurIPS 2022

arXiv:2207.05784 [pdf, other]

doi 10.1016/j.patrec.2023.11.028

Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices

Authors: Harlin Lee, Aaqib Saeed

Abstract: This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks. We train the model with knowledge distillation from a large and real-valued TRILLsson model with only a fraction of the dataset used to train TRILLsson. The resulting BRILLsson models are only 2MB in size with a latency less than 8ms, making them suitabl… ▽ More This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks. We train the model with knowledge distillation from a large and real-valued TRILLsson model with only a fraction of the dataset used to train TRILLsson. The resulting BRILLsson models are only 2MB in size with a latency less than 8ms, making them suitable for deployment in low-resource devices such as wearables. We evaluate BRILLsson on eight benchmark tasks (including but not limited to spoken language identification, emotion recognition, health condition diagnosis, and keyword spotting), and demonstrate that our proposed ultra-light and low-latency models perform as well as large-scale models. △ Less

Submitted 2 December, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

Journal ref: Pattern Recognition Letters, vol. 177, pp. 15-19, 2024

arXiv:2108.12811 [pdf]

Airplane Type Identification Based on Mask RCNN and Drone Images

Authors: W. T Alshaibani, Mustafa Helvaci, Ibraheem Shayea, Sawsan A. Saad, Azizul Azizan, Fitri Yakub

Abstract: For dealing with traffic bottlenecks at airports, aircraft object detection is insufficient. Every airport generally has a variety of planes with various physical and technological requirements as well as diverse service requirements. Detecting the presence of new planes will not address all traffic congestion issues. Identifying the type of airplane, on the other hand, will entirely fix the probl… ▽ More For dealing with traffic bottlenecks at airports, aircraft object detection is insufficient. Every airport generally has a variety of planes with various physical and technological requirements as well as diverse service requirements. Detecting the presence of new planes will not address all traffic congestion issues. Identifying the type of airplane, on the other hand, will entirely fix the problem because it will offer important information about the plane's technical specifications (i.e., the time it needs to be served and its appropriate place in the airport). Several studies have provided various contributions to address airport traffic jams; however, their ultimate goal was to determine the existence of airplane objects. This paper provides a practical approach to identify the type of airplane in airports depending on the results provided by the airplane detection process using mask region convolution neural network. The key feature employed to identify the type of airplane is the surface area calculated based on the results of airplane detection. The surface area is used to assess the estimated cabin length which is considered as an additional key feature for identifying the airplane type. The length of any detected plane may be calculated by measuring the distance between the detected plane's two furthest points. The suggested approach's performance is assessed using average accuracies and a confusion matrix. The findings show that this method is dependable. This method will greatly aid in the management of airport traffic congestion. △ Less

Submitted 29 August, 2021; originally announced August 2021.

Comments: 14 page

arXiv:2107.06877 [pdf, other]

Federated Self-Training for Semi-Supervised Audio Recognition

Authors: Vasileios Tsouvalas, Aaqib Saeed, Tanir Ozcelebi

Abstract: Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices like smartphones and virtual assistants, labeling is entrusted to the clients, or labels are extracted in an automated way. Specifically, in the case of audio data, acquiring semantic annotations can be prohibitively expensive and time-consuming. As a result,… ▽ More Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices like smartphones and virtual assistants, labeling is entrusted to the clients, or labels are extracted in an automated way. Specifically, in the case of audio data, acquiring semantic annotations can be prohibitively expensive and time-consuming. As a result, an abundance of audio data remains unlabeled and unexploited on users' devices. Most existing federated learning approaches focus on supervised learning without harnessing the unlabeled data. In this work, we study the problem of semi-supervised learning of audio models via self-training in conjunction with federated learning. We propose FedSTAR to exploit large-scale on-device unlabeled data to improve the generalization of audio recognition models. We further demonstrate that self-supervised pre-trained models can accelerate the training of on-device models, significantly improving convergence to within fewer training rounds. We conduct experiments on diverse public audio classification datasets and investigate the performance of our models under varying percentages of labeled and unlabeled data. Notably, we show that with as little as 3% labeled data available, FedSTAR on average can improve the recognition rate by 13.28% compared to the fully supervised federated model. △ Less

Submitted 25 February, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

arXiv:2105.11999 [pdf, other]

Throughput-Fairness Tradeoffs in Mobility Platforms

Authors: Arjun Balasingam, Karthik Gopalakrishnan, Radhika Mittal, Venkat Arun, Ahmed Saeed, Mohammad Alizadeh, Hamsa Balakrishnan, Hari Balakrishnan

Abstract: This paper studies the problem of allocating tasks from different customers to vehicles in mobility platforms, which are used for applications like food and package delivery, ridesharing, and mobile sensing. A mobility platform should allocate tasks to vehicles and schedule them in order to optimize both throughput and fairness across customers. However, existing approaches to scheduling tasks in… ▽ More This paper studies the problem of allocating tasks from different customers to vehicles in mobility platforms, which are used for applications like food and package delivery, ridesharing, and mobile sensing. A mobility platform should allocate tasks to vehicles and schedule them in order to optimize both throughput and fairness across customers. However, existing approaches to scheduling tasks in mobility platforms ignore fairness. We introduce Mobius, a system that uses guided optimization to achieve both high throughput and fairness across customers. Mobius supports spatiotemporally diverse and dynamic customer demands. It provides a principled method to navigate inherent tradeoffs between fairness and throughput caused by shared mobility. Our evaluation demonstrates these properties, along with the versatility and scalability of Mobius, using traces gathered from ridesharing and aerial sensing applications. Our ridesharing case study shows that Mobius can schedule more than 16,000 tasks across 40 customers and 200 vehicles in an online manner. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: Technical report for paper to appear at ACM MobiSys 2021

arXiv:2102.09099 [pdf]

doi 10.1093/gigascience/giac037

NuCLS: A scalable crowdsourcing, deep learning approach and dataset for nucleus classification, localization and segmentation

Authors: Mohamed Amgad, Lamees A. Atteya, Hagar Hussein, Kareem Hosny Mohammed, Ehab Hafiz, Maha A. T. Elsebaie, Ahmed M. Alhusseiny, Mohamed Atef AlMoslemany, Abdelmagid M. Elmatboly, Philip A. Pappalardo, Rokia Adel Sakr, Pooya Mobadersany, Ahmad Rachid, Anas M. Saad, Ahmad M. Alkashash, Inas A. Ruhban, Anas Alrefai, Nada M. Elgazar, Ali Abdulkarim, Abo-Alela Farag, Amira Etman, Ahmed G. Elsaeed, Yahya Alagha, Yomna A. Amer, Ahmed M. Raslan , et al. (12 additional authors not shown)

Abstract: High-resolution mapping of cells and tissue structures provides a foundation for developing interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate mappings given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the… ▽ More High-resolution mapping of cells and tissue structures provides a foundation for developing interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate mappings given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the time and effort required from pathologists. In this paper we describe an approach for engaging crowds of medical students and pathologists that was used to produce a dataset of over 220,000 annotations of cell nuclei in breast cancers. We show how suggested annotations generated by a weak algorithm can improve the accuracy of annotations generated by non-experts and can yield useful data for training segmentation algorithms without laborious manual tracing. We systematically examine interrater agreement and describe modifications to the MaskRCNN model to improve cell mapping. We also describe a technique we call Decision Tree Approximation of Learned Embeddings (DTALE) that leverages nucleus segmentations and morphologic features to improve the transparency of nucleus classification models. The annotation data produced in this study are freely available for algorithm development and benchmarking at: https://sites.google.com/view/nucls. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Journal ref: GigaScience, 11 (2022)

arXiv:2011.13131 [pdf]

A New Paradigm for Water Level Regulation using Three Pond Model with Fuzzy Inference System for Run of River Hydropower Plant

Authors: Ahmad Saeed, Ebrahim Shahzad, Laeeq Aslam, Ijaz Mansoor Qureshi, Adnan Umar Khan, Muhammad Iqbal

Abstract: The energy generation of a run of river hydropower plant depends upon the flow of river and the variations in the water flow makes the energy production unreliable. This problem is usually solved by constructing a small pond in front of the run of river hydropower plant. However, changes in water level of conventional single pond model results in sags, surges and unpredictable power fluctuations.… ▽ More The energy generation of a run of river hydropower plant depends upon the flow of river and the variations in the water flow makes the energy production unreliable. This problem is usually solved by constructing a small pond in front of the run of river hydropower plant. However, changes in water level of conventional single pond model results in sags, surges and unpredictable power fluctuations. This work proposes three pond model instead of traditional single pond model. The volume of water in three ponds is volumetrically equivalent to the traditional single pond but it reduces the dependency of the run of river power plant on the flow of river. Moreover, three pond model absorbs the water surges and disturbances more efficiently. The three pond system, modeled as non-linear hydraulic three tank system, is being applied with fuzzy inference system and standard PID based methods for smooth and efficient level regulation. The results of fuzzy inference system are across-the-board improved in terms of regulation and disturbances handling as compared to conventional PID controller. △ Less

Submitted 26 November, 2020; originally announced November 2020.

arXiv:2010.13694 [pdf, other]

Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

Authors: Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

Abstract: We propose CHARM, a method for training a single neural network across inconsistent input channels. Our work is motivated by Electroencephalography (EEG), where data collection protocols from different headsets result in varying channel ordering and number, which limits the feasibility of transferring trained systems across datasets. Our approach builds upon attention mechanisms to estimate a late… ▽ More We propose CHARM, a method for training a single neural network across inconsistent input channels. Our work is motivated by Electroencephalography (EEG), where data collection protocols from different headsets result in varying channel ordering and number, which limits the feasibility of transferring trained systems across datasets. Our approach builds upon attention mechanisms to estimate a latent reordering matrix from each input signal and map input channels to a canonical order. CHARM is differentiable and can be composed further with architectures expecting a consistent channel ordering to build end-to-end trainable classifiers. We perform experiments on four EEG classification datasets and demonstrate the efficacy of CHARM via simulated shuffling and masking of input channels. Moreover, our method improves the transfer of pre-trained representations between datasets collected with different protocols. △ Less

Submitted 21 October, 2020; originally announced October 2020.

arXiv:2010.13082 [pdf, other]

Context Aware 3D UNet for Brain Tumor Segmentation

Authors: Parvez Ahmad, Saqib Qamar, Linlin Shen, Adnan Saeed

Abstract: Deep convolutional neural network (CNN) achieves remarkable performance for medical image analysis. UNet is the primary source in the performance of 3D CNN architectures for medical imaging tasks, including brain tumor segmentation. The skip connection in the UNet architecture concatenates features from both encoder and decoder paths to extract multi-contextual information from image data. The mul… ▽ More Deep convolutional neural network (CNN) achieves remarkable performance for medical image analysis. UNet is the primary source in the performance of 3D CNN architectures for medical imaging tasks, including brain tumor segmentation. The skip connection in the UNet architecture concatenates features from both encoder and decoder paths to extract multi-contextual information from image data. The multi-scaled features play an essential role in brain tumor segmentation. However, the limited use of features can degrade the performance of the UNet approach for segmentation. In this paper, we propose a modified UNet architecture for brain tumor segmentation. In the proposed architecture, we used densely connected blocks in both encoder and decoder paths to extract multi-contextual information from the concept of feature reusability. In addition, residual-inception blocks (RIB) are used to extract the local and global information by merging features of different kernel sizes. We validate the proposed architecture on the multi-modal brain tumor segmentation challenge (BRATS) 2020 testing dataset. The dice (DSC) scores of the whole tumor (WT), tumor core (TC), and enhancing tumor (ET) are 89.12%, 84.74%, and 79.12%, respectively. △ Less

Submitted 27 November, 2020; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: Accepted for MICCAI 2020 Brain Lesions (BrainLes) Workshop

arXiv:2010.10915 [pdf, other]

Contrastive Learning of General-Purpose Audio Representations

Authors: Aaqib Saeed, David Grangier, Neil Zeghidour

Abstract: We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio. Our approach is based on contrastive learning: it learns a representation which assigns high similarity to audio segments extracted from the same recording while assigning lower similarity to segments from different recordings. We build on top of recent advances in contrastive learnin… ▽ More We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio. Our approach is based on contrastive learning: it learns a representation which assigns high similarity to audio segments extracted from the same recording while assigning lower similarity to segments from different recordings. We build on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio. We pre-train embeddings on the large-scale Audioset database and transfer these representations to 9 diverse classification tasks, including speech, music, animal sounds, and acoustic scenes. We show that despite its simplicity, our method significantly outperforms previous self-supervised systems. We furthermore conduct ablation studies to identify key design choices and release a library to pre-train and fine-tune COLA models. △ Less

Submitted 21 October, 2020; originally announced October 2020.

arXiv:2008.06971 [pdf]

Physical Action Categorization using Signal Analysis and Machine Learning

Authors: Asad Mansoor Khan, Ayesha Sadiq, Sajid Gul Khawaja, Norah Saleh Alghamdi, Muhammad Usman Akram, Ali Saeed

Abstract: Daily life of thousands of individuals around the globe suffers due to physical or mental disability related to limb movement. The quality of life for such individuals can be made better by use of assistive applications and systems. In such scenario, mapping of physical actions from movement to a computer aided application can lead the way for solution. Surface Electromyography (sEMG) presents a n… ▽ More Daily life of thousands of individuals around the globe suffers due to physical or mental disability related to limb movement. The quality of life for such individuals can be made better by use of assistive applications and systems. In such scenario, mapping of physical actions from movement to a computer aided application can lead the way for solution. Surface Electromyography (sEMG) presents a non-invasive mechanism through which we can translate the physical movement to signals for classification and use in applications. In this paper, we propose a machine learning based framework for classification of 4 physical actions. The framework looks into the various features from different modalities which contribution from time domain, frequency domain, higher order statistics and inter channel statistics. Next, we conducted a comparative analysis of k-NN, SVM and ELM classifier using the feature set. Effect of different combinations of feature set has also been recorded. Finally, the classifier accuracy with SVM and 1-NN based classifier for a subset of features gives an accuracy of 95.21 and 95.83 respectively. Additionally, we have also proposed that dimensionality reduction by use of PCA leads to only a minor drop of less than 5.55% in accuracy while using only 9.22% of the original feature set. These finding are useful for algorithm designer to choose the best approach keeping in mind the resources available for execution of algorithm. △ Less

Submitted 1 February, 2022; v1 submitted 16 August, 2020; originally announced August 2020.

arXiv:1911.02274 [pdf, other]

doi 10.1109/ICIP40778.2020.9191340

Where is the Fake? Patch-Wise Supervised GANs for Texture Inpainting

Authors: Ahmed Ben Saad, Youssef Tamaazousti, Josselin Kherroubi, Alexis He

Abstract: We tackle the problem of texture inpainting where the input images are textures with missing values along with masks that indicate the zones that should be generated. Many works have been done in image inpainting with the aim to achieve global and local consistency. But these works still suffer from limitations when dealing with textures. In fact, the local information in the image to be completed… ▽ More We tackle the problem of texture inpainting where the input images are textures with missing values along with masks that indicate the zones that should be generated. Many works have been done in image inpainting with the aim to achieve global and local consistency. But these works still suffer from limitations when dealing with textures. In fact, the local information in the image to be completed needs to be used in order to achieve local continuities and visually realistic texture inpainting. For this, we propose a new segmentor discriminator that performs a patch-wise real/fake classification and is supervised by input masks. During training, it aims to locate the fake and thus backpropagates consistent signal to the generator. We tested our approach on the publicly available DTD dataset and showed that it achieves state-of-the-art performances and better deals with local consistency than existing methods. △ Less

Submitted 9 March, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

arXiv:1910.07234 [pdf, other]

doi 10.3390/electronics10070820

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

Authors: Adel Ammar, Anis Koubaa, Mohanned Ahmed, Abdulrahman Saad, Bilel Benjdira

Abstract: In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, nam… ▽ More In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV's altitude, camera resolution, and object size. A total of 39 training experiments were conducted to account for the effect of different hyperparameter values. The objective of this work is to conduct the most robust and exhaustive comparison between these two cutting-edge algorithms on the specific domain of aerial images. By using a variety of metrics, we show that YOLOv3 yields better performance in most configurations, except that it exhibits a lower recall and less confident detections when object sizes and scales in the testing dataset differ largely from those in the training dataset. △ Less

Submitted 22 December, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

arXiv:1910.00653 [pdf, other]

Smart Palm: An IoT Framework for Red Palm Weevil Early Detection

Authors: Anis Koubaa, Abdulrahman Aldawood, Bassel Saeed, Abdullatif Hadid, Mohanned Ahmed, Abdulrahman Saad, Hesham Alkhouja, Mohamed Alkanhal

Abstract: Smart agriculture is an evolving trend in agriculture industry, where sensors are embedded into plants to collect vital data and help in decision making to ensure higher quality of crops and prevent pests, disease, and other possible threats. In Saudi Arabia, growing palms is the most important agricultural activity, and there is an increasing need to leverage smart agriculture technology to impro… ▽ More Smart agriculture is an evolving trend in agriculture industry, where sensors are embedded into plants to collect vital data and help in decision making to ensure higher quality of crops and prevent pests, disease, and other possible threats. In Saudi Arabia, growing palms is the most important agricultural activity, and there is an increasing need to leverage smart agriculture technology to improve the production of dates and prevent diseases. One of the most critical diseases of palms if the red palm weevil, which is an insect that causes a lot of damage to palm trees and can devast large areas of palm trees. The most challenging problem is that the effect of the weevil is not visible by humans until the palm reaches an advanced infestation state. For this reason, there is a need to use advanced technology for early detection and prevention of infestation propagation. In this project, we have developed am IoT based smart palm monitoring prototype as a proof-of-concept that (1) allows to monitor palms remotely using smart agriculture sensors, (2) contribute to the early detection of red palm weevil. Users can use web/mobile application to interact with their palm farms and help them in getting early detection of possible infestations. We used Elm company IoT platform to interface between the sensor layer and the user layer. In addition, we have collected data using accelerometer sensors and we applied signal processing and statistical techniques to analyze collected data and determine a fingerprint of the infestation. △ Less

Submitted 21 September, 2019; originally announced October 2019.

arXiv:1405.1823 [pdf, other]

Up and Away: A Cheap UAV Cyber-Physical Testbed (Work in Progress)

Authors: Ahmed Saeed, Azin Neishaboori, Amr Mohamed, Khaled Harras

Abstract: Cyber-Physical Systems (CPS) have the promise of presenting the next evolution in computing with potential applications that include aerospace, transportation, robotics, and various automation systems. These applications motivate advances in the different sub-fields of CPS (e.g. mobile computing and communication, control, and vision). However, deploying and testing complete CPSs is known to be a… ▽ More Cyber-Physical Systems (CPS) have the promise of presenting the next evolution in computing with potential applications that include aerospace, transportation, robotics, and various automation systems. These applications motivate advances in the different sub-fields of CPS (e.g. mobile computing and communication, control, and vision). However, deploying and testing complete CPSs is known to be a complex and expensive task. In this paper, we present the design, implementation, and evaluation of Up and Away (UnA): a testbed for Cyber-Physical Systems that use UAVs as their physical component. UnA aims at abstracting the control of physical components of the system to reduce the complexity of UAV oriented Cyber-Physical Systems experiments. In addition, UnA provides an API to allow for converting CPS simulations into physical experiments using a few simple steps. We present a case study bringing a mobile-camera-based surveillance system simulation to life using UnA. △ Less

Submitted 8 May, 2014; originally announced May 2014.

Comments: 4 pages, 3 figures

arXiv:1110.5181 [pdf, other]

Paraglide: Interactive Parameter Space Partitioning for Computer Simulations

Authors: Steven Bergner, Michael Sedlmair, Sareh Nabi, Ahmed Saad, Torsten Möller

Abstract: In this paper we introduce paraglide, a visualization system designed for interactive exploration of parameter spaces of multi-variate simulation models. To get the right parameter configuration, model developers frequently have to go back and forth between setting parameters and qualitatively judging the outcomes of their model. During this process, they build up a grounded understanding of the p… ▽ More In this paper we introduce paraglide, a visualization system designed for interactive exploration of parameter spaces of multi-variate simulation models. To get the right parameter configuration, model developers frequently have to go back and forth between setting parameters and qualitatively judging the outcomes of their model. During this process, they build up a grounded understanding of the parameter effects in order to pick the right setting. Current state-of-the-art tools and practices, however, fail to provide a systematic way of exploring these parameter spaces, making informed decisions about parameter settings a tedious and workload-intensive task. Paraglide endeavors to overcome this shortcoming by assisting the sampling of the parameter space and the discovery of qualitatively different model outcomes. This results in a decomposition of the model parameter space into regions of distinct behaviour. We developed paraglide in close collaboration with experts from three different domains, who all were involved in developing new models for their domain. We first analyzed current practices of six domain experts and derived a set of design requirements, then engaged in a longitudinal user-centered design process, and finally conducted three in-depth case studies underlining the usefulness of our approach. △ Less

Submitted 24 October, 2011; originally announced October 2011.

Report number: SFU-CMPT TR 2011-06 ACM Class: G.3; G.4; H.5.2; I.6; I.6.4; I.6.6

Showing 1–22 of 22 results for author: Saeed, A