Search | arXiv e-print repository

arXiv:2408.16126 [pdf, other]

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Authors: Ke Chen, Jiaqi Su, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Zeyu Jin

Abstract: Achieving robust speech separation for overlapping speakers in various acoustic environments with noise and reverberation remains an open challenge. Although existing datasets are available to train separators for specific scenarios, they do not effectively generalize across diverse real-world scenarios. In this paper, we present a novel data simulation pipeline that produces diverse training data… ▽ More Achieving robust speech separation for overlapping speakers in various acoustic environments with noise and reverberation remains an open challenge. Although existing datasets are available to train separators for specific scenarios, they do not effectively generalize across diverse real-world scenarios. In this paper, we present a novel data simulation pipeline that produces diverse training data from a range of acoustic environments and content, and propose new training paradigms to improve quality of a general speech separation model. Specifically, we first introduce AC-SIM, a data simulation pipeline that incorporates broad variations in both content and acoustics. Then we integrate multiple training objectives into the permutation invariant training (PIT) to enhance separation quality and generalization of the trained model. Finally, we conduct comprehensive objective and human listening experiments across separation architectures and benchmarks to validate our methods, demonstrating substantial improvement of generalization on both non-homologous and real-world test sets. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: In Proceedings of the 25th Annual Conference of the International Speech Communication Association, Interspeech 2024

arXiv:2408.08101 [pdf]

Stochastic Real-Time Economic Dispatch for Integrated Electric and Gas Systems Considering Uncertainty Propagation and Pipeline Leakage

Authors: eiyao Zhao, Zhengshuo Li, Jiahui Zhang, Xiang Bai, Jia Su

Abstract: Gas-fired units (GFUs) with rapid regulation capabilities are considered an effective tool to mitigate fluctuations in the generation of renewable energy sources and have coupled electricity power systems (EPSs) and natural gas systems (NGSs) more tightly. However, this tight coupling leads to uncertainty propagation, a challenge for the real-time dispatch of such integrated electric and gas syste… ▽ More Gas-fired units (GFUs) with rapid regulation capabilities are considered an effective tool to mitigate fluctuations in the generation of renewable energy sources and have coupled electricity power systems (EPSs) and natural gas systems (NGSs) more tightly. However, this tight coupling leads to uncertainty propagation, a challenge for the real-time dispatch of such integrated electric and gas systems (IEGSs). Moreover, pipeline leakage failures in the NGS may threaten the electricity supply reliability of the EPS through GFUs. To address these problems, this paper first establishes an operational model considering gas pipeline dynamic characteristics under uncertain leakage failures for the NGS and then presents a stochastic IEGS real-time economic dispatch (RTED) model considering both uncertainty propagation and pipeline leakage uncertainty. To quickly solve this complicated large-scale stochastic optimization problem, a novel notion of the coupling boundary dynamic adjustment region considering pipeline leakage failure (LCBDAR) is proposed to characterize the dynamic characteristics of the NGS boundary connecting GFUs. Based on the LCBDAR, a noniterative decentralized solution is proposed to decompose the original stochastic RTED model into two subproblems that are solved separately by the EPS and NGS operators, thus preserving their data privacy. In particular, only one-time data interaction from the NGS to the EPS is required. Case studies on several IEGSs at different scales demonstrate the effectiveness of the proposed method. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2405.09470 [pdf, other]

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.13673 [pdf]

doi 10.1016/j.jmsy.2024.04.013

In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review

Authors: Lequn Chen, Guijun Bi, Xiling Yao, Jinlong Su, Chaolin Tan, Wenhe Feng, Michalis Benakis, Youxiang Chew, Seung Ki Moon

Abstract: Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including… ▽ More Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including optical-based monitoring, acoustic-based sensing, laser line scanning, and operando X-ray monitoring. These techniques are evaluated for their capabilities and limitations in detecting defects within Laser Powder Bed Fusion (LPBF) and Laser Directed Energy Deposition (LDED) processes. Furthermore, the review discusses emerging multisensor monitoring and machine learning (ML)-assisted defect detection methods, benchmarking ML models tailored for in-situ defect detection. The paper also discusses in-situ adaptive defect remediation strategies that advance LAM towards zero-defect autonomous operations, focusing on real-time closed-loop feedback control and defect correction methods. Research gaps such as the need for standardization, improved reliability and sensitivity, and decision-making strategies beyond early stopping are highlighted. Future directions are proposed, with an emphasis on multimodal sensor fusion for multiscale defect prediction and fault diagnosis, ultimately enabling self-adaptation in LAM processes. This paper aims to equip researchers and industry professionals with a holistic understanding of the current capabilities, limitations, and future directions in in-situ process monitoring and adaptive quality enhancement in LAM. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 107 Pages, 29 Figures. Paper Accepted At Journal of Manufacturing Systems

arXiv:2401.07041 [pdf, other]

An automated framework for brain vessel centerline extraction from CTA images

Authors: Sijie Liu, Ruisheng Su, Jianghang Su, Jingmin Xin, Jiayi Wu, Wim van Zwam, Pieter Jan van Doormaal, Aad van der Lugt, Wiro J. Niessen, Nanning Zheng, Theo van Walsum

Abstract: Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional an… ▽ More Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional annotation effort by physicians and more effective use of the generated lumen segmentation for improved centerline extraction performance. We propose an automated framework for brain vessel centerline extraction from CTA images. The framework consists of four major components: (1) pre-processing approaches that register CTA images with a CT atlas and divide these images into input patches, (2) lumen segmentation generation from annotated vessel centerlines using graph cuts and robust kernel regression, (3) a dual-branch topology-aware UNet (DTUNet) that can effectively utilize the annotated vessel centerlines and the generated lumen segmentation through a topology-aware loss (TAL) and its dual-branch design, and (4) post-processing approaches that skeletonize the predicted lumen segmentation. Extensive experiments on a multi-center dataset demonstrate that the proposed framework outperforms state-of-the-art methods in terms of average symmetric centerline distance (ASCD) and overlap (OV). Subgroup analyses further suggest that the proposed framework holds promise in clinical applications for stroke treatment. Code is publicly available at https://github.com/Liusj-gh/DTUNet. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2312.13304 [pdf, other]

End-to-end Rain Streak Removal with RAW Images

Authors: GuoDong Du, HaoJian Deng, JiaHao Su, Yuan Huang

Abstract: In this work we address the problem of rain streak removal with RAW images. The general approach is firstly processing RAW data into RGB images and removing rain streak with RGB images. Actually the original information of rain in RAW images is affected by image signal processing (ISP) pipelines including none-linear algorithms, unexpected noise, artifacts and so on. It gains more benefit to direc… ▽ More In this work we address the problem of rain streak removal with RAW images. The general approach is firstly processing RAW data into RGB images and removing rain streak with RGB images. Actually the original information of rain in RAW images is affected by image signal processing (ISP) pipelines including none-linear algorithms, unexpected noise, artifacts and so on. It gains more benefit to directly remove rain in RAW data before being processed into RGB format. To solve this problem, we propose a joint solution for rain removal and RAW processing to obtain clean color images from rainy RAW image. To be specific, we generate rainy RAW data by converting color rain streak into RAW space and design simple but efficient RAW processing algorithms to synthesize both rainy and clean color images. The rainy color images are used as reference to help color corrections. Different backbones show that our method conduct a better result compared with several other state-of-the-art deraining methods focused on color image. In addition, the proposed network generalizes well to other cameras beyond our selected RAW dataset. Finally, we give the result tested on images processed by different ISP pipelines to show the generalization performance of our model is better compared with methods on color images. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 10 pages, 5 figures,4 tables, conference

arXiv:2310.17471 [pdf, other]

Foundation Model Based Native AI Framework in 6G with Cloud-Edge-End Collaboration

Authors: Xiang Chen, Zhiheng Guo, Xijun Wang, Howard H. Yang, Chenyuan Feng, Junshen Su, Sihui Zheng, Tony Q. S. Quek

Abstract: Future wireless communication networks are in a position to move beyond data-centric, device-oriented connectivity and offer intelligent, immersive experiences based on task-oriented connections, especially in the context of the thriving development of pre-trained foundation models (PFM) and the evolving vision of 6G native artificial intelligence (AI). Therefore, redefining modes of collaboration… ▽ More Future wireless communication networks are in a position to move beyond data-centric, device-oriented connectivity and offer intelligent, immersive experiences based on task-oriented connections, especially in the context of the thriving development of pre-trained foundation models (PFM) and the evolving vision of 6G native artificial intelligence (AI). Therefore, redefining modes of collaboration between devices and servers and constructing native intelligence libraries become critically important in 6G. In this paper, we analyze the challenges of achieving 6G native AI from the perspectives of data, intelligence, and networks. Then, we propose a 6G native AI framework based on foundation models, provide a customization approach for intent-aware PFM, present a construction of a task-oriented AI toolkit, and outline a novel cloud-edge-end collaboration paradigm. As a practical use case, we apply this framework for orchestration, achieving the maximum sum rate within a wireless communication system, and presenting preliminary evaluation results. Finally, we outline research directions for achieving native AI in 6G. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures, 1 table

arXiv:2308.14763 [pdf, other]

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

Authors: Jia-Jyu Su, Pang-Chen Liao, Yen-Ting Lin, Wu-Hao Li, Guan-Ting Liou, Cheng-Che Kao, Wei-Cheng Chen, Jen-Chieh Chiang, Wen-Yang Chang, Pin-Han Lin, Chen-Yu Chiang

Abstract: Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of… ▽ More Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project. The developed corpus is named after the VoiceBank-2023 speech corpus because of its release year. The corpus contains 29.78 hours of utterances with prompts of short paragraphs and common phrases spoken by 111 native Mandarin speakers. The corpus is labeled with information about gender, degree of speech impairment, types of users, transcription, SNRs, and speaking rates. The VoiceBank-2023 is available by request for non-commercial use and welcomes all parties to join the VoiceBanking project to improve the services for the speech impaired. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: submitted to 26th International Conference of the ORIENTAL-COCOSDA

arXiv:2306.10689 [pdf, other]

Realistic Restorer: artifact-free flow restorer(AF2R) for MRI motion artifact removal

Authors: Jiandong Su, Kun Shang, Dong Liang

Abstract: Motion artifact is a major challenge in magnetic resonance imaging (MRI) that severely degrades image quality, reduces examination efficiency, and makes accurate diagnosis difficult. However, previous methods often relied on implicit models for artifact correction, resulting in biases in modeling the artifact formation mechanism and characterizing the relationship between artifact information and… ▽ More Motion artifact is a major challenge in magnetic resonance imaging (MRI) that severely degrades image quality, reduces examination efficiency, and makes accurate diagnosis difficult. However, previous methods often relied on implicit models for artifact correction, resulting in biases in modeling the artifact formation mechanism and characterizing the relationship between artifact information and anatomical details. These limitations have hindered the ability to obtain high-quality MR images. In this work, we incorporate the artifact generation mechanism to reestablish the relationship between artifacts and anatomical content in the image domain, highlighting the superiority of explicit models over implicit models in medical problems. Based on this, we propose a novel end-to-end image domain model called AF2R, which addresses this problem using conditional normalization flow. Specifically, we first design a feature encoder to extract anatomical features from images with motion artifacts. Then, through a series of reversible transformations using the feature-to-image flow module, we progressively obtain MR images unaffected by motion artifacts. Experimental results on simulated and real datasets demonstrate that our method achieves better performance in both quantitative and qualitative results, preserving better anatomical details. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.10520 [pdf, other]

RetinexFlow for CT metal artifact reduction

Authors: Jiandong Su, Ce Wang, Yinsheng Li, Kun Shang, Dong Liang

Abstract: Metal artifacts is a major challenge in computed tomography (CT) imaging, significantly degrading image quality and making accurate diagnosis difficult. However, previous methods either require prior knowledge of the location of metal implants, or have modeling deviations with the mechanism of artifact formation, which limits the ability to obtain high-quality CT images. In this work, we formulate… ▽ More Metal artifacts is a major challenge in computed tomography (CT) imaging, significantly degrading image quality and making accurate diagnosis difficult. However, previous methods either require prior knowledge of the location of metal implants, or have modeling deviations with the mechanism of artifact formation, which limits the ability to obtain high-quality CT images. In this work, we formulate metal artifacts reduction problem as a combination of decomposition and completion tasks. And we propose RetinexFlow, which is a novel end-to-end image domain model based on Retinex theory and conditional normalizing flow, to solve it. Specifically, we first design a feature decomposition encoder for decomposing the metal implant component and inherent component, and extracting the inherent feature. Then, it uses a feature-to-image flow module to complete the metal artifact-free CT image step by step through a series of invertible transformations. These designs are incorporated in our model with a coarse-to-fine strategy, enabling it to achieve superior performance. The experimental results on on simulation and clinical datasets show our method achieves better quantitative and qualitative results, exhibiting better visual performance in artifact removal and image fidelity △ Less

Submitted 18 June, 2023; originally announced June 2023.

arXiv:2306.03835 [pdf, other]

Atrial Septal Defect Detection in Children Based on Ultrasound Video Using Multiple Instances Learning

Authors: Yiman Liu, Qiming Huang, Xiaoxiang Han, Tongtong Liang, Zhifang Zhang, Lijun Chen, Jinfeng Wang, Angelos Stefanidis, Jionglong Su, Jiangang Chen, Qingli Li, Yuqi Zhang

Abstract: Purpose: Congenital heart defect (CHD) is the most common birth defect. Thoracic echocardiography (TTE) can provide sufficient cardiac structure information, evaluate hemodynamics and cardiac function, and is an effective method for atrial septal defect (ASD) examination. This paper aims to study a deep learning method based on cardiac ultrasound video to assist in ASD diagnosis. Materials and met… ▽ More Purpose: Congenital heart defect (CHD) is the most common birth defect. Thoracic echocardiography (TTE) can provide sufficient cardiac structure information, evaluate hemodynamics and cardiac function, and is an effective method for atrial septal defect (ASD) examination. This paper aims to study a deep learning method based on cardiac ultrasound video to assist in ASD diagnosis. Materials and methods: We select two standard views of the atrial septum (subAS) and low parasternal four-compartment view (LPS4C) as the two views to identify ASD. We enlist data from 300 children patients as part of a double-blind experiment for five-fold cross-validation to verify the performance of our model. In addition, data from 30 children patients (15 positives and 15 negatives) are collected for clinician testing and compared to our model test results (these 30 samples do not participate in model training). We propose an echocardiography video-based atrial septal defect diagnosis system. In our model, we present a block random selection, maximal agreement decision and frame sampling strategy for training and testing respectively, resNet18 and r3D networks are used to extract the frame features and aggregate them to build a rich video-level representation. Results: We validate our model using our private dataset by five-cross validation. For ASD detection, we achieve 89.33 AUC, 84.95 accuracy, 85.70 sensitivity, 81.51 specificity and 81.99 F1 score. Conclusion: The proposed model is multiple instances learning-based deep learning model for video atrial septal defect detection which effectively improves ASD detection accuracy when compared to the performances of previous networks and clinical doctors. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2304.11341 [pdf, ps, other]

Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications

Authors: Ziyang Song, Zheng Shi, Jiaji Su, Qingping Dou, Guanghua Yang, Haichuan Ding, Shaodan Ma

Abstract: Terahertz (THz) communications are envisioned to be a promising technology for 6G thanks to its broad bandwidth. However, the large path loss, antenna misalignment, and atmospheric influence of THz communications severely deteriorate its reliability. To address this, hybrid automatic repeat request (HARQ) is recognized as an effective technique to ensure reliable THz communications. This paper del… ▽ More Terahertz (THz) communications are envisioned to be a promising technology for 6G thanks to its broad bandwidth. However, the large path loss, antenna misalignment, and atmospheric influence of THz communications severely deteriorate its reliability. To address this, hybrid automatic repeat request (HARQ) is recognized as an effective technique to ensure reliable THz communications. This paper delves into the performance analysis of HARQ with incremental redundancy (HARQ-IR)-aided THz communications in the presence/absence of blockage. More specifically, the analytical expression of the outage probability of HARQ-IR-aided THz communications is derived, with which the asymptotic outage analysis is enabled to gain meaningful insights, including diversity order, power allocation gain, modulation and coding gain, etc. Then the long term average throughput (LTAT) is expressed in terms of the outage probability based on renewal theory. Moreover, to combat the blockage effects, a multi-hop HARQ-IR-aided THz communication scheme is proposed and its performance is examined. To demonstrate the superiority of the proposed scheme, the other two HARQ-aided schemes, i.e., Type-I HARQ and HARQ with chase combining (HARQ-CC), are used for benchmarking in the simulations. In addition, a deep neural network (DNN) based outage evaluation framework with low computational complexity is devised to reap the benefits of using both asymptotic and simulation results in low and high outage regimes, respectively. This novel outage evaluation framework is finally employed for the optimal rate selection, which outperforms the asymptotic based optimization. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Comments: Blockage, hybrid automatic repeat request (HARQ), outage probability, terahertz (THz) communications

arXiv:2304.09620 [pdf]

DCELANM-Net:Medical Image Segmentation based on Dual Channel Efficient Layer Aggregation Network with Learner

Authors: Chengzhun Lu, Zhangrun Xia, Krzysztof Przystupa, Orest Kochan, Jun Su

Abstract: The DCELANM-Net structure, which this article offers, is a model that ingeniously combines a Dual Channel Efficient Layer Aggregation Network (DCELAN) and a Micro Masked Autoencoder (Micro-MAE). On the one hand, for the DCELAN, the features are more effectively fitted by deepening the network structure; the deeper network can successfully learn and fuse the features, which can more accurately loca… ▽ More The DCELANM-Net structure, which this article offers, is a model that ingeniously combines a Dual Channel Efficient Layer Aggregation Network (DCELAN) and a Micro Masked Autoencoder (Micro-MAE). On the one hand, for the DCELAN, the features are more effectively fitted by deepening the network structure; the deeper network can successfully learn and fuse the features, which can more accurately locate the local feature information; and the utilization of each layer of channels is more effectively improved by widening the network structure and residual connections. We adopted Micro-MAE as the learner of the model. In addition to being straightforward in its methodology, it also offers a self-supervised learning method, which has the benefit of being incredibly scaleable for the model. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2304.04598 [pdf]

doi 10.1016/j.addma.2023.103547

In-situ crack and keyhole pore detection in laser directed energy deposition through acoustic signal and deep learning

Authors: Lequn Chen, Xiling Yao, Chaolin Tan, Weiyang He, Jinlong Su, Fei Weng, Youxiang Chew, Nicholas Poh Huat Ng, Seung Ki Moon

Abstract: Cracks and keyhole pores are detrimental defects in alloys produced by laser directed energy deposition (LDED). Laser-material interaction sound may hold information about underlying complex physical events such as crack propagation and pores formation. However, due to the noisy environment and intricate signal content, acoustic-based monitoring in LDED has received little attention. This paper pr… ▽ More Cracks and keyhole pores are detrimental defects in alloys produced by laser directed energy deposition (LDED). Laser-material interaction sound may hold information about underlying complex physical events such as crack propagation and pores formation. However, due to the noisy environment and intricate signal content, acoustic-based monitoring in LDED has received little attention. This paper proposes a novel acoustic-based in-situ defect detection strategy in LDED. The key contribution of this study is to develop an in-situ acoustic signal denoising, feature extraction, and sound classification pipeline that incorporates convolutional neural networks (CNN) for online defect prediction. Microscope images are used to identify locations of the cracks and keyhole pores within a part. The defect locations are spatiotemporally registered with acoustic signal. Various acoustic features corresponding to defect-free regions, cracks, and keyhole pores are extracted and analysed in time-domain, frequency-domain, and time-frequency representations. The CNN model is trained to predict defect occurrences using the Mel-Frequency Cepstral Coefficients (MFCCs) of the lasermaterial interaction sound. The CNN model is compared to various classic machine learning models trained on the denoised acoustic dataset and raw acoustic dataset. The validation results shows that the CNN model trained on the denoised dataset outperforms others with the highest overall accuracy (89%), keyhole pore prediction accuracy (93%), and AUC-ROC score (98%). Furthermore, the trained CNN model can be deployed into an in-house developed software platform for online quality monitoring. The proposed strategy is the first study to use acoustic signals with deep learning for insitu defect detection in LDED process. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 36 Pages, 16 Figures, accepted at journal Additive Manufacturing

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, Jinhua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.06293 [pdf, other]

Two-stream Network for ECG Signal Classification

Authors: Xinyao Hou, Shengmei Qin, Jianbo Su

Abstract: Electrocardiogram (ECG), a technique for medical monitoring of cardiac activity, is an important method for identifying cardiovascular disease. However, analyzing the increasing quantity of ECG data consumes a lot of medical resources. This paper explores an effective algorithm for automatic classifications of multi-classes of heartbeat types based on ECG. Most neural network based methods target… ▽ More Electrocardiogram (ECG), a technique for medical monitoring of cardiac activity, is an important method for identifying cardiovascular disease. However, analyzing the increasing quantity of ECG data consumes a lot of medical resources. This paper explores an effective algorithm for automatic classifications of multi-classes of heartbeat types based on ECG. Most neural network based methods target the individual heartbeats, ignoring the secrets embedded in the temporal sequence. And the ECG signal has temporal variation and unique individual characteristics, which means that the same type of ECG signal varies among patients under different physical conditions. A two-stream architecture is used in this paper and presents an enhanced version of ECG recognition based on this. The architecture achieves classification of holistic ECG signal and individual heartbeat and incorporates identified and temporal stream networks. Identified networks are used to extract features of individual heartbeats, while temporal networks aim to extract temporal correlations between heartbeats. Results on the MIT-BIH Arrhythmia Database demonstrate that the proposed algorithm performs an accuracy of 99.38\%. In addition, the proposed algorithm reaches an 88.07\% positive accuracy on massive data in real life, showing that the proposed algorithm can efficiently categorize different classes of heartbeat with high diagnostic performance. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.06353 [pdf, other]

Label Refinement Network from Synthetic Error Augmentation for Medical Image Segmentation

Authors: Shuai Chen, Antonio Garcia-Uceda, Jiahang Su, Gijs van Tulder, Lennard Wolff, Theo van Walsum, Marleen de Bruijne

Abstract: Deep convolutional neural networks for image segmentation do not learn the label structure explicitly and may produce segmentations with an incorrect structure, e.g., with disconnected cylindrical structures in the segmentation of tree-like structures such as airways or blood vessels. In this paper, we propose a novel label refinement method to correct such errors from an initial segmentation, imp… ▽ More Deep convolutional neural networks for image segmentation do not learn the label structure explicitly and may produce segmentations with an incorrect structure, e.g., with disconnected cylindrical structures in the segmentation of tree-like structures such as airways or blood vessels. In this paper, we propose a novel label refinement method to correct such errors from an initial segmentation, implicitly incorporating information about label structure. This method features two novel parts: 1) a model that generates synthetic structural errors, and 2) a label appearance simulation network that produces synthetic segmentations (with errors) that are similar in appearance to the real initial segmentations. Using these synthetic segmentations and the original images, the label refinement network is trained to correct errors and improve the initial segmentations. The proposed method is validated on two segmentation tasks: airway segmentation from chest computed tomography (CT) scans and brain vessel segmentation from 3D CT angiography (CTA) images of the brain. In both applications, our method significantly outperformed a standard 3D U-Net and other previous refinement approaches. Improvements are even larger when additional unlabeled data is used for model training. In an ablation study, we demonstrate the value of the different components of the proposed method. △ Less

Submitted 9 October, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

arXiv:2208.04360 [pdf, other]

SDWPF: A Dataset for Spatial Dynamic Wind Power Forecasting Challenge at KDD Cup 2022

Authors: Jingbo Zhou, Xinjiang Lu, Yixiong Xiao, Jiantao Su, Junfu Lyu, Yanjun Ma, Dejing Dou

Abstract: The variability of wind power supply can present substantial challenges to incorporating wind power into a grid system. Thus, Wind Power Forecasting (WPF) has been widely recognized as one of the most critical issues in wind power integration and operation. There has been an explosion of studies on wind power forecasting problems in the past decades. Nevertheless, how to well handle the WPF proble… ▽ More The variability of wind power supply can present substantial challenges to incorporating wind power into a grid system. Thus, Wind Power Forecasting (WPF) has been widely recognized as one of the most critical issues in wind power integration and operation. There has been an explosion of studies on wind power forecasting problems in the past decades. Nevertheless, how to well handle the WPF problem is still challenging, since high prediction accuracy is always demanded to ensure grid stability and security of supply. We present a unique Spatial Dynamic Wind Power Forecasting dataset: SDWPF, which includes the spatial distribution of wind turbines, as well as the dynamic context factors. Whereas, most of the existing datasets have only a small number of wind turbines without knowing the locations and context information of wind turbines at a fine-grained time scale. By contrast, SDWPF provides the wind power data of 134 wind turbines from a wind farm over half a year with their relative positions and internal statuses. We use this dataset to launch the Baidu KDD Cup 2022 to examine the limit of current WPF solutions. The dataset is released at https://aistudio.baidu.com/aistudio/competition/detail/152/0/datasets. △ Less

Submitted 8 August, 2022; originally announced August 2022.

arXiv:2206.06701 [pdf, other]

CNN-based Classification Framework for Lung Tissues with Auxiliary Information

Authors: Huafeng Hu, Ruijie Ye, Jeyarajan Thiyagalingam, Frans Coenen, Jionglong Su

Abstract: Interstitial lung diseases are a large group of heterogeneous diseases characterized by different degrees of alveolitis and pulmonary fibrosis. Accurately diagnosing these diseases has significant guiding value for formulating treatment plans. Although previous work has produced impressive results in classifying interstitial lung diseases, there is still room for improving the accuracy of these te… ▽ More Interstitial lung diseases are a large group of heterogeneous diseases characterized by different degrees of alveolitis and pulmonary fibrosis. Accurately diagnosing these diseases has significant guiding value for formulating treatment plans. Although previous work has produced impressive results in classifying interstitial lung diseases, there is still room for improving the accuracy of these techniques, mainly to enhance automated decision-making. In order to improve the classification precision, our study proposes a convolutional neural networks-based framework with auxiliary information. Firstly, ILD images are added with their medical information by re-scaling the original image in Hounsfield Units. Secondly, a modified CNN model is used to produce a vector of classification probability for each tissue. Thirdly, location information of the input image, consisting of the occurrence frequencies of different diseases in the CT scans on certain locations, is used to calculate a location weight vector. Finally, the Hadamard product between two vectors is used to produce a decision vector for the prediction. Compared to the state-of-the-art methods, the results using a publicly available ILD database show the potential of predicting these using different auxiliary information. △ Less

Submitted 18 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

arXiv:2203.03635 [pdf, ps, other]

Stepwise Feature Fusion: Local Guides Global

Authors: Jinfeng Wang, Qiming Huang, Feilong Tang, Jia Meng, Jionglong Su, Sifan Song

Abstract: Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for… ▽ More Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for accurate polyp segmentation tasks with excellent results. However, due to the structure of polyps image and the varying shapes of polyps, it easy for existing deep learning models to overfitting the current dataset. As a result, the model may not process unseen colonoscopy data. To address this, we propose a new State-Of-The-Art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models. Specifically, our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and restrict attention dispersion. The SSFormer achieves statet-of-the-art performance in both learning and generalization assessment. △ Less

Submitted 27 June, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 10 pages, 5 figures

arXiv:2201.12605 [pdf]

Design of Outdoor Autonomous Moble Robot

Authors: I-Hsi Kao, Jian-An Su, Jau-Woei Perng

Abstract: This study presents the design of a six-wheeled outdoor autonomous mobile robot. The main design goal of our robot is to increase its adaptability and flexibility when moving outdoors. This six-wheeled robot platform was equipped with some sensors, such as a global positioning system (GPS), high definition (HD) webcam, light detection and ranging (LiDAR), and rotary encoders. A personal mobile com… ▽ More This study presents the design of a six-wheeled outdoor autonomous mobile robot. The main design goal of our robot is to increase its adaptability and flexibility when moving outdoors. This six-wheeled robot platform was equipped with some sensors, such as a global positioning system (GPS), high definition (HD) webcam, light detection and ranging (LiDAR), and rotary encoders. A personal mobile computer and 86Duino ONE microcontroller were used as the algorithm computing platform. In terms of control, the lateral offset and head angle offset of the robot were calculated using a differential GPS or a camera to detect structured and unstructured road boundaries. The lateral offset and head angle offset were fed to a fuzzy controller. The control input was designed by Q-learning of the differential speed between the left and right wheels. This made the robot track a reference route so that it could stay in its own lane. 2D LiDAR was also used to measure the relative distance from the front obstacle. The robot would immediately stop to avoid a collision when the distance between the robot and obstacle was less than a specific safety distance. A custom-designed rocker arm gave the robot the ability to climb a low step. Body balance could be maintained by controlling the angle of the rocker arm when the robot changed its pose. The autonomous mobile robot has been used for delivery service on our campus road by integrating the above system functionality. △ Less

Submitted 29 January, 2022; originally announced January 2022.

Comments: International Conference on Recent Innovations in Biotechnology, System Engineering, Applied Sciences, Space Environment & Aviation Technology

arXiv:2201.09208 [pdf]

Design of Sensor Fusion Driver Assistance System for Active Pedestrian Safety

Authors: I-Hsi Kao, Ya-Zhu Yian, Jian-An Su, Yi-Horng Lai, Jau-Woei Perng, Tung-Li Hsieh, Yi-Shueh Tsai, Min-Shiu Hsieh

Abstract: In this paper, we present a parallel architecture for a sensor fusion detection system that combines a camera and 1D light detection and ranging (lidar) sensor for object detection. The system contains two object detection methods, one based on an optical flow, and the other using lidar. The two sensors can effectively complement the defects of the other. The accurate longitudinal accuracy of the… ▽ More In this paper, we present a parallel architecture for a sensor fusion detection system that combines a camera and 1D light detection and ranging (lidar) sensor for object detection. The system contains two object detection methods, one based on an optical flow, and the other using lidar. The two sensors can effectively complement the defects of the other. The accurate longitudinal accuracy of the object's location and its lateral movement information can be achieved simultaneously. Using a spatio-temporal alignment and a policy of sensor fusion, we completed the development of a fusion detection system with high reliability at distances of up to 20 m. Test results show that the proposed system achieves a high level of accuracy for pedestrian or object detection in front of a vehicle, and has high robustness to special environments. △ Less

Submitted 23 January, 2022; originally announced January 2022.

Comments: The 14th International Conference on Automation Technology (Automation 2017), December 8-10, 2017, Kaohsiung, Taiwan

arXiv:2110.09860 [pdf, other]

Bilateral-ViT for Robust Fovea Localization

Authors: Sifan Song, Kang Dang, Qinji Yu, Zilong Wang, Frans Coenen, Jionglong Su, Xiaowei Ding

Abstract: The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates inform… ▽ More The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates information both inside and outside the fovea region to achieve robust fovea localization. Our proposed network, named Bilateral-Vision-Transformer (Bilateral-ViT), consists of two network branches: a transformer-based main network branch for integrating global context across the entire fundus image and a vessel branch for explicitly incorporating the structure of blood vessels. The encoded features from both network branches are subsequently merged with a customized Multi-scale Feature Fusion (MFF) module. Our comprehensive experiments demonstrate that the proposed approach is significantly more robust for diseased images and establishes the new state of the arts using the Messidor and PALM datasets. △ Less

Submitted 3 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: This work has been accepted for oral presentation by ISBI2022

arXiv:2108.01553 [pdf, other]

Amortized Neural Networks for Low-Latency Speech Recognition

Authors: Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow

Abstract: We introduce Amortized Neural Networks (AmNets), a compute cost- and latency-aware network architecture particularly well-suited for sequence modeling tasks. We apply AmNets to the Recurrent Neural Network Transducer (RNN-T) to reduce compute cost and latency for an automatic speech recognition (ASR) task. The AmNets RNN-T architecture enables the network to dynamically switch between encoder bran… ▽ More We introduce Amortized Neural Networks (AmNets), a compute cost- and latency-aware network architecture particularly well-suited for sequence modeling tasks. We apply AmNets to the Recurrent Neural Network Transducer (RNN-T) to reduce compute cost and latency for an automatic speech recognition (ASR) task. The AmNets RNN-T architecture enables the network to dynamically switch between encoder branches on a frame-by-frame basis. Branches are constructed with variable levels of compute cost and model capacity. Here, we achieve variable compute for two well-known candidate techniques: one using sparse pruning and the other using matrix factorization. Frame-by-frame switching is determined by an arbitrator network that requires negligible compute overhead. We present results using both architectures on LibriSpeech data and show that our proposed architecture can reduce inference cost by up to 45\% and latency to nearly real-time without incurring a loss in accuracy. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: Accepted at Interspeech 2021

arXiv:2105.13302 [pdf, other]

Characterizing the SLOPE Trade-off: A Variational Perspective and the Donoho-Tanner Limit

Authors: Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie J. Su

Abstract: Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this relatively new regularization technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion… ▽ More Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this relatively new regularization technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I error and power. Assuming a regime of linear sparsity and working under Gaussian random designs, we obtain an upper bound on the optimal trade-off for SLOPE, showing its capability of breaking the Donoho-Tanner power limit. To put it into perspective, this limit is the highest possible power that the Lasso, which is perhaps the most popular l1-based method, can achieve even with arbitrarily strong effect sizes. Next, we derive a tight lower bound that delineates the fundamental limit of sorted l1 regularization in optimally trading the FDP off for the TPP. Finally, we show that on any problem instance, SLOPE with a certain regularization sequence outperforms the Lasso, in the sense of having a smaller FDP, larger TPP and smaller l2 estimation risk simultaneously. Our proofs are based on a novel technique that reduces a calculus of variations problem to a class of infinite-dimensional convex optimization problems and a very recent result from approximate message passing theory. △ Less

Submitted 5 June, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

Journal ref: Annals of Statistics 2022

arXiv:2102.12173 [pdf]

Deep learning-based framework for cardiac function assessment in embryonic zebrafish from heart beating videos

Authors: Amir Mohammad Naderi, Haisong Bu, Jingcheng Su, Mao-Hsiang Huang, Khuong Vo, Ramses Seferino Trigo Torres, J. -C. Chiao, Juhyun Lee, Michael P. H. Lau, Xiaolei Xu, Hung Cao

Abstract: Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validate… ▽ More Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validated a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) based on a U-net deep learning model for automated assessment of cardiovascular indices, such as ejection fraction (EF) and fractional shortening (FS) from microscopic videos of wildtype and cardiomyopathy mutant zebrafish embryos. Our approach yielded favorable performance with accuracy above 90% compared with manual processing. We used only black and white regular microscopic recordings with frame rates of 5-20 frames per second (fps); thus, the framework could be widely applicable with any laboratory resources and infrastructure. Most importantly, the automatic feature holds promise to enable efficient, consistent and reliable processing and analysis capacity for large amounts of videos, which can be generated by diverse collaborating teams. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2101.10444 [pdf, ps, other]

GnetSeg: Semantic Segmentation Model Optimized on a 224mW CNN Accelerator Chip at the Speed of 318FPS

Authors: Baohua Sun, Weixiong Lin, Hao Sha, Jiapeng Su

Abstract: Semantic segmentation is the task to cluster pixels on an image belonging to the same class. It is widely used in the real-world applications including autonomous driving, medical imaging analysis, industrial inspection, smartphone camera for person segmentation and so on. Accelerating the semantic segmentation models on the mobile and edge devices are practical needs for the industry. Recent year… ▽ More Semantic segmentation is the task to cluster pixels on an image belonging to the same class. It is widely used in the real-world applications including autonomous driving, medical imaging analysis, industrial inspection, smartphone camera for person segmentation and so on. Accelerating the semantic segmentation models on the mobile and edge devices are practical needs for the industry. Recent years have witnessed the wide availability of CNN (Convolutional Neural Networks) accelerators. They have the advantages on power efficiency, inference speed, which are ideal for accelerating the semantic segmentation models on the edge devices. However, the CNN accelerator chips also have the limitations on flexibility and memory. In addition, the CPU load is very critical because the CNN accelerator chip works as a co-processor with a host CPU. In this paper, we optimize the semantic segmentation model in order to fully utilize the limited memory and the supported operators on the CNN accelerator chips, and at the same time reduce the CPU load of the CNN model to zero. The resulting model is called GnetSeg. Furthermore, we propose the integer encoding for the mask of the GnetSeg model, which minimizes the latency of data transfer between the CNN accelerator and the host CPU. The experimental result shows that the model running on the 224mW chip achieves the speed of 318FPS with excellent accuracy for applications such as person segmentation. △ Less

Submitted 9 January, 2021; originally announced January 2021.

Comments: 7 pages, 3 figures, and 2 tables

arXiv:2101.04928 [pdf, ps, other]

Distributed Multi-Building Coordination for Demand Response

Authors: Junyan Su, Yuning Jiang, Altug Bitlislioglu, Colin N. Jones, Boris Houska

Abstract: This paper presents a distributed optimization algorithm tailored for solving optimal control problems arising in multi-building coordination. The buildings coordinated by a grid operator, join a demand response program to balance the voltage surge by using an energy cost defined criterion. In order to model the hierarchical structure of the building network, we formulate a distributed convex opti… ▽ More This paper presents a distributed optimization algorithm tailored for solving optimal control problems arising in multi-building coordination. The buildings coordinated by a grid operator, join a demand response program to balance the voltage surge by using an energy cost defined criterion. In order to model the hierarchical structure of the building network, we formulate a distributed convex optimization problem with separable objectives and coupled affine equality constraints. A variant of the Augmented Lagrangian based Alternating Direction Inexact Newton (ALADIN) method for solving the considered class of problems is then presented along with a convergence guarantee. To illustrate the effectiveness of the proposed method, we compare it to the Alternating Direction Method of Multipliers (ADMM) by running both an ALADIN and an ADMM based model predictive controller on a benchmark case study. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2009.02130 [pdf]

doi 10.1109/TGRS.2021.3093977

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

Authors: Rui Li, Shunyi Zheng, Chenxi Duan, Ce Zhang, Jianlin Su, P. M. Atkinson

Abstract: Semantic segmentation of remote sensing images plays an important role in a wide range of applications including land resource management, biosphere monitoring and urban planning. Although the accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks, several limitations exist in standard models. First, for encoder-decoder arc… ▽ More Semantic segmentation of remote sensing images plays an important role in a wide range of applications including land resource management, biosphere monitoring and urban planning. Although the accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks, several limitations exist in standard models. First, for encoder-decoder architectures such as U-Net, the utilization of multi-scale features causes the underuse of information, where low-level features and high-level features are concatenated directly without any refinement. Second, long-range dependencies of feature maps are insufficiently explored, resulting in sub-optimal feature representations associated with each semantic class. Third, even though the dot-product attention mechanism has been introduced and utilized in semantic segmentation to model long-range dependencies, the large time and space demands of attention impede the actual usage of attention in application scenarios with large-scale input. This paper proposed a Multi-Attention-Network (MANet) to address these issues by extracting contextual dependencies through multiple efficient attention modules. A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention. Based on kernel attention and channel attention, we integrate local feature maps extracted by ResNeXt-101 with their corresponding global dependencies and reweight interdependent channel maps adaptively. Numerical experiments on three large-scale fine resolution remote sensing images captured by different satellite sensors demonstrate the superior performance of the proposed MANet, outperforming the DeepLab V3+, PSPNet, FastFCN, DANet, OCRNet, and other benchmark approaches. △ Less

Submitted 23 November, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.14902

arXiv:2006.15228 [pdf, other]

HypervolGAN: An efficient approach for GAN with multi-objective training function

Authors: Jingwen Su, Hujun Yin

Abstract: Since the advent of generative adversarial networks (GANs), various loss functions have been developed and combined to constitute the overall training objective function, in order to improve model performance or for specific learning tasks. For instance, in image enhancement or restoration, there are often several criteria to consider such as signal-noise ratio, smoothness, structures and details.… ▽ More Since the advent of generative adversarial networks (GANs), various loss functions have been developed and combined to constitute the overall training objective function, in order to improve model performance or for specific learning tasks. For instance, in image enhancement or restoration, there are often several criteria to consider such as signal-noise ratio, smoothness, structures and details. However, when the optimization goal has more than one adversarial loss, balancing multiple losses in the overall function becomes a challenging, critical and time-consuming problem. In this paper, we propose to tackle the problem by means of efficient multi-objective optimization. The proposed HypervolGAN adopts an adapted version of hypervolume maximization method to effectively define the multi-objective training function for GAN. We tested our proposed method on solving single image super-resolution problem. Experiments show that the proposed HypervolGAN is efficient in saving computational time and efforts for fine-tuning weights of various losses, and can generate enhanced samples that have better quality than results given by baseline GANs. The work explores the integration of adversarial learning and optimization techniques, which can benefit not only image processing but also a wide range of applications. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.05694 [pdf, other]

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Authors: Jiaqi Su, Zeyu Jin, Adam Finkelstein

Abstract: Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-f… ▽ More Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments. △ Less

Submitted 21 September, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted by INTERSPEECH 2020

arXiv:2006.05637 [pdf, ps, other]

Distributed Optimization for Massive Connectivity

Authors: Yuning Jiang, Junyan Su, Yuanming Shi, Boris Houska

Abstract: Massive device connectivity in Internet of Thing (IoT) networks with sporadic traffic poses significant communication challenges. To overcome this challenge, the serving base station is required to detect the active devices and estimate the corresponding channel state information during each coherence block. The corresponding joint activity detection and channel estimation problem can be formulate… ▽ More Massive device connectivity in Internet of Thing (IoT) networks with sporadic traffic poses significant communication challenges. To overcome this challenge, the serving base station is required to detect the active devices and estimate the corresponding channel state information during each coherence block. The corresponding joint activity detection and channel estimation problem can be formulated as a group sparse estimation problem, also known under the name "Group Lasso". This letter presents a fast and efficient distributed algorithm to solve such Group Lasso problems, which alternates between solving small-scaled problems in parallel and dealing with a linear equation for consensus. Numerical results demonstrate the speedup of this algorithm compared with the state-of-the-art methods in terms of convergence speed and computation time. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2004.09754 [pdf, other]

The 1st Agriculture-Vision Challenge: Methods and Results

Authors: Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennifer Hobbs, Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Ivan Dozier, Wyatt Dozier, Karen Ghandilyan, David Wilson, Hyunseong Park, Junhee Kim, Sungho Kim, Qinghui Liu, Michael C. Kampffmeyer, Robert Jenssen, Arnt B. Salberg, Alexandre Barbosa, Rodrigo Trevisan, Bingchen Zhao , et al. (17 additional authors not shown)

Abstract: The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agricultu… ▽ More The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here. △ Less

Submitted 23 April, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

Comments: CVPR 2020 Workshop

arXiv:2002.09334 [pdf]

doi 10.1016/j.eng.2020.04.010

Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia

Authors: Xiaowei Xu, Xiangao Jiang, Chunlian Ma, Peng Du, Xukun Li, Shuangzhi Lv, Liang Yu, Yanfei Chen, Junwei Su, Guanjing Lang, Yongtao Li, Hong Zhao, Kaijin Xu, Lingxiang Ruan, Wei Wu

Abstract: We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of v… ▽ More We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of viral pneumonia, such as Influenza-A viral pneumonia. Therefore, clinical doctors call for another early diagnostic criteria for this new type of pneumonia as soon as possible.This study aimed to establish an early screening model to distinguish COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases with pulmonary CT images using deep learning techniques. The candidate infection regions were first segmented out using a 3-dimensional deep learning model from pulmonary CT image set. These separated images were then categorized into COVID-19, Influenza-A viral pneumonia and irrelevant to infection groups, together with the corresponding confidence scores using a location-attention classification model. Finally the infection type and total confidence score of this CT case were calculated with Noisy-or Bayesian function.The experiments result of benchmark dataset showed that the overall accuracy was 86.7 % from the perspective of CT cases as a whole.The deep learning models established in this study were effective for the early screening of COVID-19 patients and demonstrated to be a promising supplementary diagnostic method for frontline clinical doctors. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Journal ref: Engineering, Volume 6, Issue 10, October 2020, Pages 1122-1129

arXiv:2001.08383 [pdf, other]

A Multi-site Study of a Breast Density Deep Learning Model for Full-field Digital Mammography Images and Synthetic Mammography Images

Authors: Thomas P. Matthews, Sadanand Singh, Brent Mombourquette, Jason Su, Meet P. Shah, Stefano Pedemonte, Aaron Long, David Maffit, Jenny Gurney, Rodrigo Morales Hoil, Nikita Ghare, Douglas Smith, Stephen M. Moore, Susan C. Marks, Richard L. Wahl

Abstract: Purpose: To develop a Breast Imaging Reporting and Data System (BI-RADS) breast density deep learning (DL) model in a multi-site setting for synthetic two-dimensional mammography (SM) images derived from digital breast tomosynthesis exams using full-field digital mammography (FFDM) images and limited SM data. Materials and Methods: A DL model was trained to predict BI-RADS breast density using F… ▽ More Purpose: To develop a Breast Imaging Reporting and Data System (BI-RADS) breast density deep learning (DL) model in a multi-site setting for synthetic two-dimensional mammography (SM) images derived from digital breast tomosynthesis exams using full-field digital mammography (FFDM) images and limited SM data. Materials and Methods: A DL model was trained to predict BI-RADS breast density using FFDM images acquired from 2008 to 2017 (Site 1: 57492 patients, 187627 exams, 750752 images) for this retrospective study. The FFDM model was evaluated using SM datasets from two institutions (Site 1: 3842 patients, 3866 exams, 14472 images, acquired from 2016 to 2017; Site 2: 7557 patients, 16283 exams, 63973 images, 2015 to 2019). Each of the three datasets were then split into training, validation, and test datasets. Adaptation methods were investigated to improve performance on the SM datasets and the effect of dataset size on each adaptation method is considered. Statistical significance was assessed using confidence intervals (CI), estimated by bootstrapping. Results: Without adaptation, the model demonstrated substantial agreement with the original reporting radiologists for all three datasets (Site 1 FFDM: linearly-weighted $κ_w$ = 0.75 [95% CI: 0.74, 0.76]; Site 1 SM: $κ_w$ = 0.71 [95% CI: 0.64, 0.78]; Site 2 SM: $κ_w$ = 0.72 [95% CI: 0.70, 0.75]). With adaptation, performance improved for Site 2 (Site 1: $κ_w$ = 0.72 [95% CI: 0.66, 0.79], 0.71 vs 0.72, P = .80; Site 2: $κ_w$ = 0.79 [95% CI: 0.76, 0.81], 0.72 vs 0.79, P $<$ .001) using only 500 SM images from that site. Conclusion: A BI-RADS breast density DL model demonstrated strong performance on FFDM and SM images from two institutions without training on SM images and improved using few SM images. △ Less

Submitted 2 October, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

MSC Class: 68T45 ACM Class: I.5.4; J.3; I.2.10; I.4.8

arXiv:2001.08382 [pdf, other]

A Hypersensitive Breast Cancer Detector

Authors: Stefano Pedemonte, Brent Mombourquette, Alexis Goh, Trevor Tsue, Aaron Long, Sadanand Singh, Thomas Paul Matthews, Meet Shah, Jason Su

Abstract: Early detection of breast cancer through screening mammography yields a 20-35% increase in survival rate; however, there are not enough radiologists to serve the growing population of women seeking screening mammography. Although commercial computer aided detection (CADe) software has been available to radiologists for decades, it has failed to improve the interpretation of full-field digital mamm… ▽ More Early detection of breast cancer through screening mammography yields a 20-35% increase in survival rate; however, there are not enough radiologists to serve the growing population of women seeking screening mammography. Although commercial computer aided detection (CADe) software has been available to radiologists for decades, it has failed to improve the interpretation of full-field digital mammography (FFDM) images due to its low sensitivity over the spectrum of findings. In this work, we leverage a large set of FFDM images with loose bounding boxes of mammographically significant findings to train a deep learning detector with extreme sensitivity. Building upon work from the Hourglass architecture, we train a model that produces segmentation-like images with high spatial resolution, with the aim of producing 2D Gaussian blobs centered on ground-truth boxes. We replace the pixel-wise $L_2$ norm with a weak-supervision loss designed to achieve high sensitivity, asymmetrically penalizing false positives and false negatives while softening the noise of the loose bounding boxes by permitting a tolerance in misaligned predictions. The resulting system achieves a sensitivity for malignant findings of 0.99 with only 4.8 false positive markers per image. When utilized in a CADe system, this model could enable a novel workflow where radiologists can focus their attention with trust on only the locations proposed by the model, expediting the interpretation process and bringing attention to potential findings that could otherwise have been missed. Due to its nearly perfect sensitivity, the proposed detector can also be used as a high-performance proposal generator in two-stage detection systems. △ Less

Submitted 23 January, 2020; originally announced January 2020.

Comments: SPIE Medical Imaging 2020

arXiv:2001.08381 [pdf, other]

Adaptation of a deep learning malignancy model from full-field digital mammography to digital breast tomosynthesis

Authors: Sadanand Singh, Thomas Paul Matthews, Meet Shah, Brent Mombourquette, Trevor Tsue, Aaron Long, Ranya Almohsen, Stefano Pedemonte, Jason Su

Abstract: Mammography-based screening has helped reduce the breast cancer mortality rate, but has also been associated with potential harms due to low specificity, leading to unnecessary exams or procedures, and low sensitivity. Digital breast tomosynthesis (DBT) improves on conventional mammography by increasing both sensitivity and specificity and is becoming common in clinical settings. However, deep lea… ▽ More Mammography-based screening has helped reduce the breast cancer mortality rate, but has also been associated with potential harms due to low specificity, leading to unnecessary exams or procedures, and low sensitivity. Digital breast tomosynthesis (DBT) improves on conventional mammography by increasing both sensitivity and specificity and is becoming common in clinical settings. However, deep learning (DL) models have been developed mainly on conventional 2D full-field digital mammography (FFDM) or scanned film images. Due to a lack of large annotated DBT datasets, it is difficult to train a model on DBT from scratch. In this work, we present methods to generalize a model trained on FFDM images to DBT images. In particular, we use average histogram matching (HM) and DL fine-tuning methods to generalize a FFDM model to the 2D maximum intensity projection (MIP) of DBT images. In the proposed approach, the differences between the FFDM and DBT domains are reduced via HM and then the base model, which was trained on abundant FFDM images, is fine-tuned. When evaluating on image patches extracted around identified findings, we are able to achieve similar areas under the receiver operating characteristic curve (ROC AUC) of $\sim 0.9$ for FFDM and $\sim 0.85$ for MIP images, as compared to a ROC AUC of $\sim 0.75$ when tested directly on MIP images. △ Less

Submitted 23 January, 2020; originally announced January 2020.

Comments: SPIE Medical Imaging 2020

arXiv:1911.09837 [pdf, other]

Graph Convolution Networks for Probabilistic Modeling of Driving Acceleration

Authors: Jianyu Su, Peter A. Beling, Rui Guo, Kyungtae Han

Abstract: The ability to model and predict ego-vehicle's surrounding traffic is crucial for autonomous pilots and intelligent driver-assistance systems. Acceleration prediction is important as one of the major components of traffic prediction. This paper proposes novel approaches to the acceleration prediction problem. By representing spatial relationships between vehicles with a graph model, we build a gen… ▽ More The ability to model and predict ego-vehicle's surrounding traffic is crucial for autonomous pilots and intelligent driver-assistance systems. Acceleration prediction is important as one of the major components of traffic prediction. This paper proposes novel approaches to the acceleration prediction problem. By representing spatial relationships between vehicles with a graph model, we build a generalized acceleration prediction framework. This paper studies the effectiveness of proposed Graph Convolution Networks, which operate on graphs predicting the acceleration distribution for vehicles driving on highways. We further investigate prediction improvement through integrating of Recurrent Neural Networks to disentangle the temporal complexity inherent in the traffic data. Results from simulation studies using comprehensive performance metrics support the conclusion that our proposed networks outperform state-of-the-art methods in generating realistic trajectories over a prediction horizon. △ Less

Submitted 7 May, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: Accepted by ITSC 2020

arXiv:1910.04099 [pdf, other]

Manhattan Room Layout Reconstruction from a Single 360 image: A Comparative Study of State-of-the-art Methods

Authors: Chuhang Zou, Jheng-Wei Su, Chi-Han Peng, Alex Colburn, Qi Shan, Peter Wonka, Hung-Kuo Chu, Derek Hoiem

Abstract: Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different d… ▽ More Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g. SegNet or ResNet), type of elements predicted (e.g. corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset [3], and introduce two depth-based evaluation metrics. △ Less

Submitted 25 December, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted by International Journal of Computer Vision (IJCV), 2021

arXiv:1909.04142 [pdf, other]

DaTscan SPECT Image Classification for Parkinson's Disease

Authors: Justin Quan, Lin Xu, Rene Xu, Tyrael Tong, Jean Su

Abstract: Parkinson's Disease (PD) is a neurodegenerative disease that currently does not have a cure. In order to facilitate disease management and reduce the speed of symptom progression, early diagnosis is essential. The current clinical, diagnostic approach is to have radiologists perform human visual analysis of the degeneration of dopaminergic neurons in the substantia nigra region of the brain. Clini… ▽ More Parkinson's Disease (PD) is a neurodegenerative disease that currently does not have a cure. In order to facilitate disease management and reduce the speed of symptom progression, early diagnosis is essential. The current clinical, diagnostic approach is to have radiologists perform human visual analysis of the degeneration of dopaminergic neurons in the substantia nigra region of the brain. Clinically, dopamine levels are monitored through observing dopamine transporter (DaT) activity. One method of DaT activity analysis is performed with the injection of an Iodine-123 fluoropropyl (123I-FP-CIT) tracer combined with single photon emission computerized tomography (SPECT) imaging. The tracer illustrates the region of interest in the resulting DaTscan SPECT images. Human visual analysis is slow and vulnerable to subjectivity between radiologists, so the goal was to develop an introductory implementation of a deep convolutional neural network that can objectively and accurately classify DaTscan SPECT images as Parkinson's Disease or normal. This study illustrates the approach of using a deep convolutional neural network and evaluates its performance on DaTscan SPECT image classification. The data used in this study was obtained through a database provided by the Parkinson's Progression Markers Initiative (PPMI). The deep neural network in this study utilizes the InceptionV3 architecture, 1st runner up in the 2015 ImageNet Large Scale Visual Recognition Competition (ILSVRC), as a base model. A custom, binary classifier block was added on top of this base. In order to account for the small dataset size, a ten fold cross validation was implemented to evaluate the model's performance. △ Less

Submitted 9 September, 2019; originally announced September 2019.

arXiv:1907.07502 [pdf, other]

Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing

Authors: Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie Su

Abstract: SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted l1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this paper, we develop an asymptotically exact characterization of the SLOPE solution u… ▽ More SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted l1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this paper, we develop an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP). This algorithmic approach allows us to approximate the SLOPE solution via the much more amenable AMP iterates. Explicitly, we characterize the asymptotic dynamics of the AMP iterates relying on a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted l1 penalty. Moreover, we prove that the AMP iterates converge to the SLOPE solution in an asymptotic sense, and numerical simulations show that the convergence is surprisingly fast. Our proof rests on a novel technique that specifically leverages the SLOPE problem. In contrast to prior literature, our work not only yields an asymptotically sharp analysis but also offers an algorithmic, flexible, and constructive approach to understanding the SLOPE problem. △ Less

Submitted 17 July, 2019; originally announced July 2019.

arXiv:1905.11045 [pdf, other]

Attention Based Image Compression Post-Processing Convolutional Neural Network

Authors: Yuyang Xue, Jiannan Su

Abstract: The traditional image compressors, e.g., BPG and H.266, have achieved great image and video compression quality. Recently, Convolutional Neural Network has been used widely in image compression. We proposed an attention-based convolutional neural network for low bit-rate compression to post-process the output of traditional image compression decoder. Across the experimental results on validation s… ▽ More The traditional image compressors, e.g., BPG and H.266, have achieved great image and video compression quality. Recently, Convolutional Neural Network has been used widely in image compression. We proposed an attention-based convolutional neural network for low bit-rate compression to post-process the output of traditional image compression decoder. Across the experimental results on validation sets, the post-processing module trained by MAE and MS-SSIM losses yields the highest PSNR of 32.10 on average at the bit-rate of 0.15. △ Less

Submitted 27 May, 2019; originally announced May 2019.

Comments: 4 pages, 2 figures, CVPR Compression Workshop

arXiv:1901.05049 [pdf, other]

doi 10.1145/3403572

Bonseyes AI Pipeline -- bringing AI to you. End-to-end integration of data, algorithms and deployment tools

Authors: Miguel de Prado, Jing Su, Rabia Saeed, Lorenzo Keller, Noelia Vallez, Andrew Anderson, David Gregg, Luca Benini, Tim Llewellynn, Nabil Ouerhani, Rozenn Dahyot and, Nuria Pazos

Abstract: Next generation of embedded Information and Communication Technology (ICT) systems are collaborative systems able to perform autonomous tasks. The remarkable expansion of the embedded ICT market, together with the rise and breakthroughs of Artificial Intelligence (AI), have put the focus on the Edge as it stands as one of the keys for the next technological revolution: the seamless integration of… ▽ More Next generation of embedded Information and Communication Technology (ICT) systems are collaborative systems able to perform autonomous tasks. The remarkable expansion of the embedded ICT market, together with the rise and breakthroughs of Artificial Intelligence (AI), have put the focus on the Edge as it stands as one of the keys for the next technological revolution: the seamless integration of AI in our daily life. However, training and deployment of custom AI solutions on embedded devices require a fine-grained integration of data, algorithms, and tools to achieve high accuracy. Such integration requires a high level of expertise that becomes a real bottleneck for small and medium enterprises wanting to deploy AI solutions on the Edge which, ultimately, slows down the adoption of AI on daily-life applications. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms, and deployment tools together. By removing the integration barriers and lowering the required expertise, we can interconnect the different stages of tools and provide a modular end-to-end development of AI products for embedded devices. Our AI pipeline consists of four modular main steps: i) data ingestion, ii) model training, iii) deployment optimization and, iv) the IoT hub integration. To show the effectiveness of our pipeline, we provide examples of different AI applications during each of the steps. Besides, we integrate our deployment framework, LPDNN, into the AI pipeline and present its lightweight architecture and deployment capabilities for embedded devices. Finally, we demonstrate the results of the AI pipeline by showing the deployment of several AI applications such as keyword spotting, image classification and object detection on a set of well-known embedded platforms, where LPDNN consistently outperforms all other popular deployment frameworks. △ Less

Submitted 11 June, 2020; v1 submitted 15 January, 2019; originally announced January 2019.

arXiv:1812.10199 [pdf, other]

A Multiversion Programming Inspired Approach to Detecting Audio Adversarial Examples

Authors: Qiang Zeng, Jianhai Su, Chenglong Fu, Golam Kayas, Lannan Luo

Abstract: Adversarial examples (AEs) are crafted by adding human-imperceptible perturbations to inputs such that a machine-learning based classifier incorrectly labels them. They have become a severe threat to the trustworthiness of machine learning. While AEs in the image domain have been well studied, audio AEs are less investigated. Recently, multiple techniques are proposed to generate audio AEs, which… ▽ More Adversarial examples (AEs) are crafted by adding human-imperceptible perturbations to inputs such that a machine-learning based classifier incorrectly labels them. They have become a severe threat to the trustworthiness of machine learning. While AEs in the image domain have been well studied, audio AEs are less investigated. Recently, multiple techniques are proposed to generate audio AEs, which makes countermeasures against them an urgent task. Our experiments show that, given an AE, the transcription results by different Automatic Speech Recognition (ASR) systems differ significantly, as they use different architectures, parameters, and training datasets. Inspired by Multiversion Programming, we propose a novel audio AE detection approach, which utilizes multiple off-the-shelf ASR systems to determine whether an audio input is an AE. The evaluation shows that the detection achieves accuracies over 98.6%. △ Less

Submitted 3 December, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

Comments: 8 pages, 4 figures, AICS 2019, The AAAI-19 Workshop on Artificial Intelligence for Cyber Security (AICS), 2019

Report number: AICS/2019/06

Journal ref: The AAAI-19 Workshop on Artificial Intelligence for Cyber Security (AICS), 2019

Showing 1–45 of 45 results for author: Su, J