Search | arXiv e-print repository

arXiv:2407.20622 [pdf, other]

Decoding Linguistic Representations of Human Brain

Authors: Yu Wang, Heyang Liu, Yuhao Wang, Chuan Xuan, Yixuan Hou, Sheng Feng, Hongcheng Liu, Yusheng Liao, Yanfeng Wang

Abstract: Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain. Decoding linguistic representations in the evoked brain has shown groundbreaking achievements, thanks to the rapid improvement of neuroimaging, medical technology, life sciences and artificial intelligence. In this work, we present a taxonomy of… ▽ More Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain. Decoding linguistic representations in the evoked brain has shown groundbreaking achievements, thanks to the rapid improvement of neuroimaging, medical technology, life sciences and artificial intelligence. In this work, we present a taxonomy of brain-to-language decoding of both textual and speech formats. This work integrates two types of research: neuroscience focusing on language understanding and deep learning-based brain decoding. Generating discernible language information from brain activity could not only help those with limited articulation, especially amyotrophic lateral sclerosis (ALS) patients but also open up a new way for the next generation's brain-computer interface (BCI). This article will help brain scientists and deep-learning researchers to gain a bird's eye view of fine-grained language perception, and thus facilitate their further investigation and research of neural process and language decoding. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.09972 [pdf, other]

Harvesting Private Medical Images in Federated Learning Systems with Crafted Models

Authors: Shanghao Shi, Md Shahedul Haque, Abhijeet Parida, Marius George Linguraru, Y. Thomas Hou, Syed Muhammad Anwar, Wenjing Lou

Abstract: Federated learning (FL) allows a set of clients to collaboratively train a machine-learning model without exposing local training samples. In this context, it is considered to be privacy-preserving and hence has been adopted by medical centers to train machine-learning models over private data. However, in this paper, we propose a novel attack named MediLeak that enables a malicious parameter serv… ▽ More Federated learning (FL) allows a set of clients to collaboratively train a machine-learning model without exposing local training samples. In this context, it is considered to be privacy-preserving and hence has been adopted by medical centers to train machine-learning models over private data. However, in this paper, we propose a novel attack named MediLeak that enables a malicious parameter server to recover high-fidelity patient images from the model updates uploaded by the clients. MediLeak requires the server to generate an adversarial model by adding a crafted module in front of the original model architecture. It is published to the clients in the regular FL training process and each client conducts local training on it to generate corresponding model updates. Then, based on the FL protocol, the model updates are sent back to the server and our proposed analytical method recovers private data from the parameter updates of the crafted module. We provide a comprehensive analysis for MediLeak and show that it can successfully break the state-of-the-art cryptographic secure aggregation protocols, designed to protect the FL systems from privacy inference attacks. We implement MediLeak on the MedMNIST and COVIDx CXR-4 datasets. The results show that MediLeak can nearly perfectly recover private images with high recovery rates and quantitative scores. We further perform downstream tasks such as disease classification with the recovered data, where our results show no significant performance degradation compared to using the original training samples. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.05558 [pdf]

Hidden Convexity-Based Distributed Operation of Integrated Electricity-Gas Systems

Authors: Rong-Peng Liu, Yue Song, Junhong Liu, Xiaozhe Wang, Jinpeng Guo, Yunhe Hou

Abstract: We propose a hidden convexity-based method to address distributed optimal energy flow (OEF) problems for transmission-level integrated electricity-gas systems. First, we develop a node-wise decoupling method to de-compose an OEF problem into multiple OEF subproblems. Then, we propose a hidden convexity-based method to equivalently reformulate nonconvex OEF subproblems as semi-definite programs. Th… ▽ More We propose a hidden convexity-based method to address distributed optimal energy flow (OEF) problems for transmission-level integrated electricity-gas systems. First, we develop a node-wise decoupling method to de-compose an OEF problem into multiple OEF subproblems. Then, we propose a hidden convexity-based method to equivalently reformulate nonconvex OEF subproblems as semi-definite programs. This method differs from any ap-proximation and convexification methods that may incur infeasible solutions. Since all OEF subproblems are origi-nally convex or equivalently convexified, we adopt an ADMM to solve the hidden convexity-based distributed OEF problem with convergence analysis. Test results validate the effectiveness of the proposed method, especially in handling a large number of agents. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 7 pages

arXiv:2406.16933 [pdf, other]

SGSM: A Foundation-model-like Semi-generalist Sensing Model

Authors: Tianjian Yang, Hao Zhou, Shuo Liu, Kaiwen Guo, Yiwen Hou, Haohua Du, Zhi Liu, Xiang-Yang Li

Abstract: The significance of intelligent sensing systems is growing in the realm of smart services. These systems extract relevant signal features and generate informative representations for particular tasks. However, building the feature extraction component for such systems requires extensive domain-specific expertise or data. The exceptionally rapid development of foundation models is likely to usher i… ▽ More The significance of intelligent sensing systems is growing in the realm of smart services. These systems extract relevant signal features and generate informative representations for particular tasks. However, building the feature extraction component for such systems requires extensive domain-specific expertise or data. The exceptionally rapid development of foundation models is likely to usher in newfound abilities in such intelligent sensing. We propose a new scheme for sensing model, which we refer to as semi-generalist sensing model (SGSM). SGSM is able to semiautomatically solve various tasks using relatively less task-specific labeled data compared to traditional systems. Built through the analysis of the common theoretical model, SGSM can depict different modalities, such as the acoustic and Wi-Fi signal. Experimental results on such two heterogeneous sensors illustrate that SGSM functions across a wide range of scenarios, thereby establishing its broad applicability. In some cases, SGSM even achieves better performance than sensor-specific specialized solutions. Wi-Fi evaluations indicate a 20\% accuracy improvement when applying SGSM to an existing sensing model. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.05914 [pdf, other]

Soundscape Captioning using Sound Affective Quality Network and Large Language Model

Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe… ▽ More We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship between sounds and the emotions they evoke within a context. To fill this gap and to automate soundscape analysis, which traditionally relies on labour-intensive subjective ratings and surveys, we propose the soundscape captioning (SoundSCap) task. SoundSCap generates context-aware soundscape descriptions by capturing the acoustic scene, event information, and the corresponding human affective qualities. To this end, we propose an automatic soundscape captioner (SoundSCaper) composed of an acoustic model, SoundAQnet, and a general large language model (LLM). SoundAQnet simultaneously models multi-scale information about acoustic scenes, events, and perceived affective qualities, while LLM generates soundscape captions by parsing the information captured by SoundAQnet to a common language. The soundscape caption's quality is assessed by a jury of 16 audio/soundscape experts. The average score (out of 5) of SoundSCaper-generated captions is lower than the score of captions generated by two soundscape experts by 0.21 and 0.25, respectively, on the evaluation set and the model-unknown mixed external dataset with varying lengths and acoustic properties, but the differences are not statistically significant. Overall, SoundSCaper-generated captions show promising performance compared to captions annotated by soundscape experts. The models' code, LLM scripts, human assessment data and instructions, and expert evaluation statistics are all publicly available. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

arXiv:2405.08838 [pdf, other]

PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

Authors: Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

Abstract: With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, an… ▽ More With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 13 page, 4 figures

MSC Class: 68T45 ACM Class: I.4.9

arXiv:2405.00885 [pdf, other]

WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan

Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy. △ Less

Submitted 19 August, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.00989 [pdf, other]

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Authors: Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

Abstract: Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentri… ▽ More Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. Figure 1 offers a glimpse of all 28 scene categories of our 360+x dataset. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives. △ Less

Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: CVPR 2024 (Oral Presentation), Project page: https://x360dataset.github.io/

Journal ref: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

arXiv:2401.07163 [pdf]

A New Method of Pixel-level In-situ U-value Measurement for Building Envelopes Based on Infrared Thermography

Authors: Zihao Wang, Yu Hou, Lucio Soibelman

Abstract: The potential energy loss of aging buildings traps building owners in a cycle of underfunding operations and overpaying maintenance costs. Energy auditors intending to generate an energy model of a target building for performance assessment may struggle to obtain accurate results as the spatial distribution of temperatures is not considered when calculating the U-value of the building envelope. Th… ▽ More The potential energy loss of aging buildings traps building owners in a cycle of underfunding operations and overpaying maintenance costs. Energy auditors intending to generate an energy model of a target building for performance assessment may struggle to obtain accurate results as the spatial distribution of temperatures is not considered when calculating the U-value of the building envelope. This paper proposes a pixel-level method based on infrared thermography (IRT) that considers two-dimensional (2D) spatial temperature distributions of the outdoor and indoor surfaces of the target wall to generate a 2D U-value map of the wall. The result supports that the proposed method can better reflect the actual thermal insulation performance of the target wall compared to the current IRT-based methods that use a single-point room temperature as input. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: Accepted and presented at 2023 ASCE International Conference on Computing in Civil Engineering (i3CE 2023)

arXiv:2401.05639 [pdf, other]

Full-State Prescribed Performance-Based Consensus of Double-Integrator Multi-Agent Systems with Jointly Connected Topologies

Authors: Yahui Hou, Bin Cheng

Abstract: This paper addresses the full-state prescribed performance-based consensus problem for double-integrator multi-agent systems with jointly connected topologies. To improve the transient performance, a distributed prescribed performance control protocol consisting of the transformed relative position and the transformed relative velocity is proposed, where the communication topology satisfies the jo… ▽ More This paper addresses the full-state prescribed performance-based consensus problem for double-integrator multi-agent systems with jointly connected topologies. To improve the transient performance, a distributed prescribed performance control protocol consisting of the transformed relative position and the transformed relative velocity is proposed, where the communication topology satisfies the jointly connected assumption. Different from the existing literatures, two independent transient performance specifications imposed on relative positions and relative velocities can be guaranteed simultaneously. A numerical example is ultimately used to validate the effectiveness of proposed protocol. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 5 pages, 3 figures

arXiv:2401.00269 [pdf]

doi 10.1109/TPWRS.2021.3081557

Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty

Authors: Rong-Peng Liu, Yunhe Hou, Yujia Li, Shunbo Lei, Wei Wei, Xiaozhe Wang

Abstract: This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of… ▽ More This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of the simplified model by exploring its structural features and, accordingly, develop a solution method. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 10 pages

Journal ref: IEEE Trans. Power Syst., vol. 36, no. 6, pp. 5889-5900, Nov. 2021

arXiv:2312.09952 [pdf, other]

Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

Authors: Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

Abstract: WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-relat… ▽ More WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR). Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP). Experiments show that: 1) MLGL with 4.1 M parameters improves AEC and ARP results by using semantic node information in local and global context aware graphs; 2) MLGL captures relations between coarse and fine-grained AEs and AR well; 3) Statistical analysis of MLGL results shows that some AEs from different sources significantly correlate with AR, which is consistent with previous research on human perception of these sound sources. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2311.15153 [pdf, other]

Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture

Authors: Weijie Li, Yang Wei, Tianpeng Liu, Yuenan Hou, Yuxuan Li, Zhen Liu, Yongxiang Liu, Li Liu

Abstract: The growing Synthetic Aperture Radar (SAR) data has the potential to build a foundation model through Self-Supervised Learning (SSL) methods, which can achieve various SAR Automatic Target Recognition (ATR) tasks with pre-training in large-scale unlabeled data and fine-tuning in small labeled samples. SSL aims to construct supervision signals directly from the data, which minimizes the need for ex… ▽ More The growing Synthetic Aperture Radar (SAR) data has the potential to build a foundation model through Self-Supervised Learning (SSL) methods, which can achieve various SAR Automatic Target Recognition (ATR) tasks with pre-training in large-scale unlabeled data and fine-tuning in small labeled samples. SSL aims to construct supervision signals directly from the data, which minimizes the need for expensive expert annotation and maximizes the use of the expanding data pool for a foundational model. This study investigates an effective SSL method for SAR ATR, which can pave the way for a foundation model in SAR ATR. The primary obstacles faced in SSL for SAR ATR are the small targets in remote sensing and speckle noise in SAR images, corresponding to the SSL approach and signals. To overcome these challenges, we present a novel Joint-Embedding Predictive Architecture for SAR ATR (SAR-JEPA), which leverages local masked patches to predict the multi-scale SAR gradient representations of unseen context. The key aspect of SAR-JEPA is integrating SAR domain features to ensure high-quality self-supervised signals as target features. Besides, we employ local masks and multi-scale features to accommodate the various small targets in remote sensing. By fine-tuning and evaluating our framework on three target recognition datasets (vehicle, ship, and aircraft) with four other datasets as pre-training, we demonstrate its outperformance over other SSL methods and its effectiveness with increasing SAR data. This study showcases the potential of SSL for SAR target recognition across diverse targets, scenes, and sensors.Our codes and weights are available in \url{https://github.com/waterdisappear/SAR-JEPA. △ Less

Submitted 21 August, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

Comments: 15 pages, 7 figures,

arXiv:2311.09030 [pdf]

doi 10.1121/10.0022408

AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance

Authors: Yuanbo Hou, Qiaoqiao Ren, Huizhong Zhang, Andrew Mitchell, Francesco Aletta, Jian Kang, Dick Botteldooren

Abstract: Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex… ▽ More Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: The Journal of the Acoustical Society of America, 154 (5), 3145

Journal ref: The Journal of the Acoustical Society of America, 154, 3145 (2023)

arXiv:2310.09673 [pdf, other]

Robust Quickest Change Detection in Non-Stationary Processes

Authors: Yingze Hou, Yousef Oleyaeimotlagh, Rahul Mishra, Hoda Bidkhori, Taposh Banerjee

Abstract: Optimal algorithms are developed for robust detection of changes in non-stationary processes. These are processes in which the distribution of the data after change varies with time. The decision-maker does not have access to precise information on the post-change distribution. It is shown that if the post-change non-stationary family has a distribution that is least favorable in a well-defined se… ▽ More Optimal algorithms are developed for robust detection of changes in non-stationary processes. These are processes in which the distribution of the data after change varies with time. The decision-maker does not have access to precise information on the post-change distribution. It is shown that if the post-change non-stationary family has a distribution that is least favorable in a well-defined sense, then the algorithms designed using the least favorable distributions are robust and optimal. Non-stationary processes are encountered in public health monitoring and space and military applications. The robust algorithms are applied to real and simulated data to show their effectiveness. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2310.08445 [pdf, other]

Risk-informed Resilience Planning of Transmission Systems Against Ice Storms

Authors: Chenxi Hu, Yujia Li, Yunhe Hou

Abstract: Ice storms, known for their severity and predictability, necessitate proactive resilience enhancement in power systems. Traditional approaches often overlook the endogenous uncertainties inherent in human decisions and underutilize predictive information like forecast accuracy and preparation time. To bridge these gaps, we proposed a two-stage risk-informed decision-dependent resilience planning (… ▽ More Ice storms, known for their severity and predictability, necessitate proactive resilience enhancement in power systems. Traditional approaches often overlook the endogenous uncertainties inherent in human decisions and underutilize predictive information like forecast accuracy and preparation time. To bridge these gaps, we proposed a two-stage risk-informed decision-dependent resilience planning (RIDDRP) for transmission systems against ice storms. The model leverages predictive information to optimize resource allocation, considering decision-dependent line failure uncertainties introduced by planning decisions and exogenous ice storm-related uncertainties. We adopt a dual-objective approach to balance economic efficiency and system resilience across both normal and emergent conditions. The first stage of the RDDIP model makes line hardening decisions, as well as the optimal sitting and sizing of energy storage. The second stage evaluates the risk-informed operation costs, considering both pre-event preparation and emergency operations. Case studies demonstrate the model's ability to leverage predictive information, leading to more judicious investment decisions and optimized utilization of dispatchable resources. We also quantified the impact of different properties of predictive information on resilience enhancement. The RIDDRP model provides grid operators and planners valuable insights for making risk-informed infrastructure investments and operational strategy decisions, thereby improving preparedness and response to future extreme weather events. △ Less

Submitted 22 January, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.07352 [pdf, other]

Adaptive Distributionally Robust Planning for Renewable-Powered Fast Charging Stations Under Decision-Dependent EV Diffusion Uncertainty

Authors: Yujia Li, Feng Qiu, Yixuan Chen, Yunhe Hou

Abstract: When deploying fast charging stations (FCSs) to support long-distance trips of electric vehicles (EVs), there exist indirect network effects: while the gradual diffusion of EVs directly influences the timing and capacities of FCS allocation, the decisions for FCS allocations, in turn, impact the drivers' willingness to adopt EVs. This interplay, if neglected, can result in uncovered EVs and securi… ▽ More When deploying fast charging stations (FCSs) to support long-distance trips of electric vehicles (EVs), there exist indirect network effects: while the gradual diffusion of EVs directly influences the timing and capacities of FCS allocation, the decisions for FCS allocations, in turn, impact the drivers' willingness to adopt EVs. This interplay, if neglected, can result in uncovered EVs and security issues on the grid side and even hinder the effective diffusion of EVs. In this paper, we explicitly incorporate this interdependence by quantifying EV adoption rates as decision-dependent uncertainties (DDUs) using decision-dependent ambiguity sets (DDASs). Then, a two-stage decision-dependent distributionally robust FCS planning (D$^3$R-FCSP) model is developed for adaptively deploying FCSs with on-site sources and expanding the coupled distribution network. A multi-period capacitated arc cover-path cover (MCACPC) model is incorporated to capture the EVs' recharging patterns to ensure the feasibility of FCS locations and capacities. To resolve the nonlinearity and nonconvexity, the D$^3$R-FCSP model is equivalently reformulated into a single-level mixed-integer linear programming by exploiting its strong duality and applying the McCormick envelope. Finally, case studies highlight the superior out-of-sample performances of our model in terms of security and cost-efficiency. Furthermore, the byproduct of accelerated EV adoption through an implicit positive feedback loop is highlighted. △ Less

Submitted 11 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.03889 [pdf, ps, other]

doi 10.1109/LSP.2023.3319233

Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification

Authors: Yuanbo Hou, Siyang Song, Chuang Yu, Wenwu Wang, Dick Botteldooren

Abstract: Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This paper conducts the first study on disclosing the relationship between real-life aco… ▽ More Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This paper conducts the first study on disclosing the relationship between real-life acoustic scenes and semantic embeddings from the most relevant AEs. Specifically, we propose an event-relational graph representation learning (ERGL) framework for ASC to classify scenes, and simultaneously answer clearly and straightly which cues are used in classifying. In the event-relational graph, embeddings of each event are treated as nodes, while relationship cues derived from each pair of nodes are described by multi-dimensional edge features. Experiments on a real-life ASC dataset show that the proposed ERGL achieves competitive performance on ASC by learning embeddings of only a limited number of AEs. The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph. Visualizations of graph representations learned by ERGL are available here (https://github.com/Yuanbo2020/ERGL). △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: IEEE Signal Processing Letters, doi: 10.1109/LSP.2023.3319233

arXiv:2308.16545 [pdf, other]

Distributed Nonblocking Supervisory Control of Timed Discrete-Event Systems with Communication Delays and Losses

Authors: Yunfeng Hou, Qingdu Li

Abstract: This paper investigates the problem of distributed nonblocking supervisory control for timed discrete-event systems (DESs). The distributed supervisors communicate with each other over networks subject to nondeterministic communication delays and losses. Given that the delays are counted by time, techniques have been developed to model the dynamics of the communication channels. By incorporating t… ▽ More This paper investigates the problem of distributed nonblocking supervisory control for timed discrete-event systems (DESs). The distributed supervisors communicate with each other over networks subject to nondeterministic communication delays and losses. Given that the delays are counted by time, techniques have been developed to model the dynamics of the communication channels. By incorporating the dynamics of the communication channels into the system model, we construct a communication automaton to model the interaction process between the supervisors. Based on the communication automaton, we define the observation mappings for the supervisors, which consider delays and losses occurring in the communication channels. Then, we derive the necessary and sufficient conditions for the existence of a set of supervisors for distributed nonblocking supervisory control. These conditions are expressed as network controllability, network joint observability, and system language closure. Finally, an example of intelligent manufacturing is provided to show the application of the proposed framework. △ Less

Submitted 4 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.11980 [pdf, other]

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

arXiv:2306.02886 [pdf]

Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks

Authors: Xiaohan Liu, Yanwei Pang, Xuebin Sun, Yiming Liu, Yonghong Hou, Zhenchang Wang, Xuelong Li

Abstract: Partial scan is a common approach to accelerate Magnetic Resonance Imaging (MRI) data acquisition in both 2D and 3D settings. However, accurately reconstructing images from partial scan data (i.e., incomplete k-space matrices) remains challenging due to lack of an effectively global receptive field in both spatial and k-space domains. To address this problem, we propose the following: (1) a novel… ▽ More Partial scan is a common approach to accelerate Magnetic Resonance Imaging (MRI) data acquisition in both 2D and 3D settings. However, accurately reconstructing images from partial scan data (i.e., incomplete k-space matrices) remains challenging due to lack of an effectively global receptive field in both spatial and k-space domains. To address this problem, we propose the following: (1) a novel convolutional operator called Faster Fourier Convolution (FasterFC) to replace the two consecutive convolution operations typically used in convolutional neural networks (e.g., U-Net, ResNet). Based on the spectral convolution theorem in Fourier theory, FasterFC employs alternating kernels of size 1 in 3D case) in different domains to extend the dual-domain receptive field to the global and achieves faster calculation speed than traditional Fast Fourier Convolution (FFC). (2) A 2D accelerated MRI method, FasterFC-End-to-End-VarNet, which uses FasterFC to improve the sensitivity maps and reconstruction quality. (3) A multi-stage 3D accelerated MRI method called FasterFC-based Single-to-group Network (FAS-Net) that utilizes a single-to-group algorithm to guide k-space domain reconstruction, followed by FasterFC-based cascaded convolutional neural networks to expand the effective receptive field in the dual-domain. Experimental results on the fastMRI and Stanford MRI Data datasets demonstrate that FasterFC improves the quality of both 2D and 3D reconstruction. Moreover, FAS-Net, as a 3D high-resolution multi-coil (eight) accelerated MRI method, achieves superior reconstruction performance in both qualitative and quantitative results compared with state-of-the-art 2D and 3D methods. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.17629 [pdf, other]

Multi-Modal Wireless Flexible Gel-Free Sensors with Edge Deep Learning for Detecting and Alerting Freezing of Gait in Parkinson's Patients

Authors: Yuhan Hou, Jack Ji, Yi Zhu, Thomas Dell, Xilin Liu

Abstract: Freezing of gait (FoG) is a debilitating symptom of Parkinson's disease (PD). This work develops flexible wearable sensors that can detect FoG and alert patients and companions to help prevent falls. FoG is detected on the sensors using a deep learning (DL) model with multi-modal sensory inputs collected from distributed wireless sensors. Two types of wireless sensors are developed, including: (1)… ▽ More Freezing of gait (FoG) is a debilitating symptom of Parkinson's disease (PD). This work develops flexible wearable sensors that can detect FoG and alert patients and companions to help prevent falls. FoG is detected on the sensors using a deep learning (DL) model with multi-modal sensory inputs collected from distributed wireless sensors. Two types of wireless sensors are developed, including: (1) a C-shape central node placed around the patient's ears, which collects electroencephalogram (EEG), detects FoG using an on-device DL model, and generates auditory alerts when FoG is detected; (2) a stretchable patch-type sensor attached to the patient's legs, which collects electromyography (EMG) and movement information from accelerometers. The patch-type sensors wirelessly send collected data to the central node through low-power ultra-wideband (UWB) transceivers. All sensors are fabricated on flexible printed circuit boards. Adhesive gel-free acetylene carbon black and polydimethylsiloxane electrodes are fabricated on the flexible substrate to allow conformal wear over the long term. Custom integrated circuits (IC) are developed in 180 nm CMOS technology and used in both types of sensors for signal acquisition, digitization, and wireless communication. A novel lightweight DL model is trained using multi-modal sensory data. The inference of the DL model is performed on a low-power microcontroller in the central node. The DL model achieves a high detection sensitivity of 0.81 and a specificity of 0.88. The developed wearable sensors are ready for clinical experiments and hold great promise in improving the quality of life of patients with PD. The proposed design methodologies can be used in wearable medical devices for the monitoring and treatment of a wide range of neurodegenerative diseases. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2304.11670 [pdf, other]

Evading DeepFake Detectors via Adversarial Statistical Consistency

Authors: Yang Hou, Qing Guo, Yihao Huang, Xiaofei Xie, Lei Ma, Jianjun Zhao

Abstract: In recent years, as various realistic face forgery techniques known as DeepFake improves by leaps and bounds,more and more DeepFake detection techniques have been proposed. These methods typically rely on detecting statistical differences between natural (i.e., real) and DeepFakegenerated images in both spatial and frequency domains. In this work, we propose to explicitly minimize the statistical… ▽ More In recent years, as various realistic face forgery techniques known as DeepFake improves by leaps and bounds,more and more DeepFake detection techniques have been proposed. These methods typically rely on detecting statistical differences between natural (i.e., real) and DeepFakegenerated images in both spatial and frequency domains. In this work, we propose to explicitly minimize the statistical differences to evade state-of-the-art DeepFake detectors. To this end, we propose a statistical consistency attack (StatAttack) against DeepFake detectors, which contains two main parts. First, we select several statistical-sensitive natural degradations (i.e., exposure, blur, and noise) and add them to the fake images in an adversarial way. Second, we find that the statistical differences between natural and DeepFake images are positively associated with the distribution shifting between the two kinds of images, and we propose to use a distribution-aware loss to guide the optimization of different degradations. As a result, the feature distributions of generated adversarial examples is close to the natural images.Furthermore, we extend the StatAttack to a more powerful version, MStatAttack, where we extend the single-layer degradation to multi-layer degradations sequentially and use the loss to tune the combination weights jointly. Comprehensive experimental results on four spatial-based detectors and two frequency-based detectors with four datasets demonstrate the effectiveness of our proposed attack method in both white-box and black-box settings. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: Accepted by CVPR 2023

arXiv:2303.15706 [pdf, other]

Minimization of Sensor Activation in Discrete-Event Systems with Control Delays and Observation Delays

Authors: Yunfeng Hou, Ching-Yen Weng, Peng Li

Abstract: In discrete-event systems, to save sensor resources, the agent continuously adjusts sensor activation decisions according to a sensor activation policy based on the changing observations. However, new challenges arise for sensor activations in networked discrete-event systems, where observation delays and control delays exist between the sensor systems and the agent. In this paper, a new framework… ▽ More In discrete-event systems, to save sensor resources, the agent continuously adjusts sensor activation decisions according to a sensor activation policy based on the changing observations. However, new challenges arise for sensor activations in networked discrete-event systems, where observation delays and control delays exist between the sensor systems and the agent. In this paper, a new framework for activating sensors in networked discrete-event systems is established. In this framework, we construct a communication automaton that explicitly expresses the interaction process between the agent and the sensor systems over the observation channel and the control channel. Based on the communication automaton, we can define dynamic observations of a communicated string. To guarantee that a sensor activation policy is physically implementable and insensitive to non-deterministic control delays and observation delays, we further introduce the definition of delay feasibility. We show that a delay feasible sensor activation policy can be used to dynamically activate sensors even if control delays and observation delays exist. A set of algorithms are developed to minimize sensor activations in a transition-based domain while ensuring a given specification condition is satisfied. A practical example is also provided to show the application of the proposed framework. Finally, we briefly discuss how to extend the proposed framework to a decentralized observation setting. △ Less

Submitted 5 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.12801 [pdf, ps, other]

A Data Augmentation Method and the Embedding Mechanism for Detection and Classification of Pulmonary Nodules on Small Samples

Authors: Yang Liu, Yue-Jie Hou, Chen-Xin Qin, Xin-Hui Li, Si-Jing Li, Bin Wang, Chi-Chun Zhou

Abstract: Detection of pulmonary nodules by CT is used for screening lung cancer in early stages.omputer aided diagnosis (CAD) based on deep-learning method can identify the suspected areas of pulmonary nodules in CT images, thus improving the accuracy and efficiency of CT diagnosis. The accuracy and robustness of deep learning models. Method:In this paper, we explore (1) the data augmentation method based… ▽ More Detection of pulmonary nodules by CT is used for screening lung cancer in early stages.omputer aided diagnosis (CAD) based on deep-learning method can identify the suspected areas of pulmonary nodules in CT images, thus improving the accuracy and efficiency of CT diagnosis. The accuracy and robustness of deep learning models. Method:In this paper, we explore (1) the data augmentation method based on the generation model and (2) the model structure improvement method based on the embedding mechanism. Two strategies have been introduced in this study: a new data augmentation method and a embedding mechanism. In the augmentation method, a 3D pixel-level statistics algorithm is proposed to generate pulmonary nodule and by combing the faked pulmonary nodule and healthy lung, we generate new pulmonary nodule samples. The embedding mechanism are designed to better understand the meaning of pixels of the pulmonary nodule samples by introducing hidden variables. Result: The result of the 3DVNET model with the augmentation method for pulmonary nodule detection shows that the proposed data augmentation method outperforms the method based on generative adversarial network (GAN) framework, training accuracy improved by 1.5%, and with embedding mechanism for pulmonary nodules classification shows that the embedding mechanism improves the accuracy and robustness for the classification of pulmonary nodules obviously, the model training accuracy is close to 1 and the model testing F1-score is 0.90.Conclusion:he proposed data augmentation method and embedding mechanism are beneficial to improve the accuracy and robustness of the model, and can be further applied in other common diagnostic imaging tasks. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.07597 [pdf, other]

Preventive-Corrective Cyber-Defense: Attack-Induced Region Minimization and Cybersecurity Margin Maximization

Authors: Jiazuo Hou, Fei Teng, Wenqian Yin, Yue Song, Yunhe Hou

Abstract: False data injection (FDI) cyber-attacks on power systems can be prevented by strategically selecting and protecting a sufficiently large measurement subset, which, however, requires adequate cyber-defense resources for measurement protection. With any given cyber-defense resource, this paper proposes a preventive-corrective cyber-defense strategy, which minimizes the FDI attack-induced region in… ▽ More False data injection (FDI) cyber-attacks on power systems can be prevented by strategically selecting and protecting a sufficiently large measurement subset, which, however, requires adequate cyber-defense resources for measurement protection. With any given cyber-defense resource, this paper proposes a preventive-corrective cyber-defense strategy, which minimizes the FDI attack-induced region in a preventive manner, followed by maximizing the cybersecurity margin in a corrective manner. First, this paper proposes a preventive cyber-defense strategy that minimizes the volume of the FDI attack-induced region via preventive allocation of any given measurement protection resource. Particularly, a sufficient condition for constructing the FDI unattackable lines is proposed, indicating that the FDI cyber-attack could be locally rather than globally prevented. Then, given a non-empty FDI attack-induced region, this paper proposes a corrective cyber-defense strategy that maximizes the cybersecurity margin, leading to a trade-off between the safest-but-expensive operation point (i.e., Euclidean Chebyshev center) and the cheapest-but-dangerous operation point. Simulation results on a modified IEEE 14 bus system verify the effectiveness and cost-effectiveness of the proposed preventive-corrective cyber-defense strategy. △ Less

Submitted 13 November, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.02168 [pdf, other]

Stability Constrained OPF in Microgrids: A Chance Constrained Optimization Framework with Non-Gaussian Uncertainty

Authors: Jun Wang, Yue Song, David John Hill, Yunhe Hou, Feilong Fan

Abstract: To figure out the stability issues brought by renewable energy sources (RES) with non-Gaussian uncertainties in isolated microgrids, this paper proposes a chance constrained stability constrained optimal power flow (CC-SC-OPF) model. Firstly, we propose a bi-level optimization problem, of which the upper level aims to minimize the expected generation cost without violating the stability chance con… ▽ More To figure out the stability issues brought by renewable energy sources (RES) with non-Gaussian uncertainties in isolated microgrids, this paper proposes a chance constrained stability constrained optimal power flow (CC-SC-OPF) model. Firstly, we propose a bi-level optimization problem, of which the upper level aims to minimize the expected generation cost without violating the stability chance constraint; the lower level concerns about the stability index given by a semi-definite program (SDP). Secondly, we apply the Gaussian mixture model (GMM) to handle the non-Gaussian RES uncertainties and introduce analytical sensitivity analysis to reformulate chance constraints with respect to stability index and operational variables into linear deter-ministic versions. By incorporating linearized constraints, the bi-level model can be efficiently solved by Benders decomposition-based approach. Thirdly, we design a supplementary corrective countermeasure to compensate the possible control error caused by the linear approximation. Simulation results on the 33-bus microgrid reveal that compared to benchmarking approaches, the proposed model converges 30 times faster with more accurate solutions. △ Less

Submitted 4 February, 2023; originally announced February 2023.

arXiv:2301.11089 [pdf, ps, other]

An Analytical Formula for Stability Sensitivity Using SDP Dual

Authors: Jun Wang, Yue Song, David John Hill, Yunhe Hou

Abstract: In this letter, we analytically investigate the sensitivity of stability index to its dependent variables in general power systems. Firstly, we give a small-signal model, the stability index is defined as the solution to a semidefinite program (SDP) based on the related Lyapunov equation. In case of stability, the stability index also characterizes the convergence rate of the system after disturba… ▽ More In this letter, we analytically investigate the sensitivity of stability index to its dependent variables in general power systems. Firstly, we give a small-signal model, the stability index is defined as the solution to a semidefinite program (SDP) based on the related Lyapunov equation. In case of stability, the stability index also characterizes the convergence rate of the system after disturbances. Then, by leveraging the duality of SDP, we deduce an analytical formula of the stability sensitivity to any entries of the system Jacobian matrix in terms of the SDP primal and dual variables. Unlike the traditional numerical perturbation method, the proposed sensitivity evaluation method is more accurate with a much lower computational burden. This letter applies a modified microgrid for comparative case studies. The results reveal the significant improvements on the accuracy and computational efficiency of stability sensitivity evaluation. △ Less

Submitted 26 January, 2023; originally announced January 2023.

arXiv:2301.00554 [pdf]

In-situ monitoring additive manufacturing process with AI edge computing

Authors: Wenkang Zhu, Hui Li, Yikai Zhang, Yuqing Hou, Liwei Chen

Abstract: In-situ monitoring system can be used to monitor the quality of additive manufacturing (AM) processes. In the case of digital image correlation (DIC) based in-situ monitoring systems, high-speed cameras were used to capture images of high resolutions. This paper proposed a novel in-situ monitoring system to accelerate the process of digital images using artificial intelligence (AI) edge computing… ▽ More In-situ monitoring system can be used to monitor the quality of additive manufacturing (AM) processes. In the case of digital image correlation (DIC) based in-situ monitoring systems, high-speed cameras were used to capture images of high resolutions. This paper proposed a novel in-situ monitoring system to accelerate the process of digital images using artificial intelligence (AI) edge computing board. It built a visual transformer based video super resolution (ViTSR) network to reconstruct high resolution (HR) videos frames. Fully convolutional network (FCN) was used to simultaneously extract the geometric characteristics of molten pool and plasma arc during the AM processes. Compared with 6 state-of-the-art super resolution methods, ViTSR ranks first in terms of peak signal to noise ratio (PSNR). The PSNR of ViTSR for 4x super resolution reached 38.16 dB on test data with input size of 75 pixels x 75 pixels. Inference time of ViTSR and FCN was optimized to 50.97 ms and 67.86 ms on AI edge board after operator fusion and model pruning. The total inference time of the proposed system was 118.83 ms, which meets the requirement of real-time quality monitoring with low cost in-situ monitoring equipment during AM processes. The proposed system achieved an accuracy of 96.34% on the multi-objects extraction task and can be applied to different AM processes. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2212.07560 [pdf, other]

Multi-level and multi-modal feature fusion for accurate 3D object detection in Connected and Automated Vehicles

Authors: Yiming Hou, Mahdi Rezaei, Richard Romano

Abstract: Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded d… ▽ More Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded during the convolutional process. The novel fusion scheme effectively fuses features across sensory modalities and convolutional layers to find the best representative global features. The fused features are shared by a two-stage network: the region proposal network (RPN) and the detection head (DH). The RPN generates high-recall proposals, and the DH produces final detection results. The experimental results show the proposed model outperforms more recent research on the KITTI 2D and 3D detection benchmark, particularly for distant and highly occluded instances. △ Less

Submitted 19 December, 2022; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2212.00729 [pdf, other]

Edge Deep Learning Enabled Freezing of Gait Detection in Parkinson's Patients

Authors: Ourong Lin, Tian Yu, Yuhan Hou, Yi Zhu, Xilin Liu

Abstract: This paper presents the design of a wireless sensor network for detecting and alerting the freezing of gait (FoG) symptoms in patients with Parkinson's disease. Three sensor nodes, each integrating a 3-axis accelerometer, can be placed on a patient at ankle, thigh, and truck. Each sensor node can independently detect FoG using an on-device deep learning (DL) model, featuring a squeeze and excitati… ▽ More This paper presents the design of a wireless sensor network for detecting and alerting the freezing of gait (FoG) symptoms in patients with Parkinson's disease. Three sensor nodes, each integrating a 3-axis accelerometer, can be placed on a patient at ankle, thigh, and truck. Each sensor node can independently detect FoG using an on-device deep learning (DL) model, featuring a squeeze and excitation convolutional neural network (CNN). In a validation using a public dataset, the prototype developed achieved a FoG detection sensitivity of 88.8% and an F1 score of 85.34%, using less than 20 k trainable parameters per sensor node. Once FoG is detected, an auditory signal will be generated to alert users, and the alarm signal will also be sent to mobile phones for further actions if needed. The sensor node can be easily recharged wirelessly by inductive coupling. The system is self-contained and processes all user data locally without streaming data to external devices or the cloud, thus eliminating the cybersecurity risks and power penalty associated with wireless data transmission. The developed methodology can be used in a wide range of applications. △ Less

Submitted 27 November, 2022; originally announced December 2022.

arXiv:2211.13128 [pdf, other]

A Closed-loop Sleep Modulation System with FPGA-Accelerated Deep Learning

Authors: Mingzhe Sun, Aaron Zhou, Naize Yang, Yaqian Xu, Yuhan Hou, Xilin Liu

Abstract: Closed-loop sleep modulation is an emerging research paradigm to treat sleep disorders and enhance sleep benefits. However, two major barriers hinder the widespread application of this research paradigm. First, subjects often need to be wire-connected to rack-mount instrumentation for data acquisition, which negatively affects sleep quality. Second, conventional real-time sleep stage classificatio… ▽ More Closed-loop sleep modulation is an emerging research paradigm to treat sleep disorders and enhance sleep benefits. However, two major barriers hinder the widespread application of this research paradigm. First, subjects often need to be wire-connected to rack-mount instrumentation for data acquisition, which negatively affects sleep quality. Second, conventional real-time sleep stage classification algorithms give limited performance. In this work, we conquer these two limitations by developing a sleep modulation system that supports closed-loop operations on the device. Sleep stage classification is performed using a lightweight deep learning (DL) model accelerated by a low-power field-programmable gate array (FPGA) device. The DL model uses a single channel electroencephalogram (EEG) as input. Two convolutional neural networks (CNNs) are used to capture general and detailed features, and a bidirectional long-short-term memory (LSTM) network is used to capture time-variant sequence features. An 8-bit quantization is used to reduce the computational cost without compromising performance. The DL model has been validated using a public sleep database containing 81 subjects, achieving a state-of-the-art classification accuracy of 85.8% and a F1-score of 79%. The developed model has also shown the potential to be generalized to different channels and input data lengths. Closed-loop in-phase auditory stimulation has been demonstrated on the test bench. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2210.15366 [pdf, other]

Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification

Authors: Yuanbo Hou, Siyang Song, Chuang Yu, Yuxin Song, Wenwu Wang, Dick Botteldooren

Abstract: Most existing deep learning-based acoustic scene classification (ASC) approaches directly utilize representations extracted from spectrograms to identify target scenes. However, these approaches pay little attention to the audio events occurring in the scene despite they provide crucial semantic information. This paper conducts the first study that investigates whether real-life acoustic scenes ca… ▽ More Most existing deep learning-based acoustic scene classification (ASC) approaches directly utilize representations extracted from spectrograms to identify target scenes. However, these approaches pay little attention to the audio events occurring in the scene despite they provide crucial semantic information. This paper conducts the first study that investigates whether real-life acoustic scenes can be reliably recognized based only on the features that describe a limited number of audio events. To model the task-specific relationships between coarse-grained acoustic scenes and fine-grained audio events, we propose an event relational graph representation learning (ERGL) framework for ASC. Specifically, ERGL learns a graph representation of an acoustic scene from the input audio, where the embedding of each event is treated as a node, while the relationship cues derived from each pair of event embeddings are described by a learned multidimensional edge feature. Experiments on a polyphonic acoustic scene dataset show that the proposed ERGL achieves competitive performance on ASC by using only a limited number of embeddings of audio events without any data augmentations. The validity of the proposed ERGL framework proves the feasibility of recognizing diverse acoustic scenes based on the event relational graph. Our code is available on our homepage (https://github.com/Yuanbo2020/ERGL). △ Less

Submitted 1 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.12541 [pdf, other]

GCT: Gated Contextual Transformer for Sequential Audio Tagging

Authors: Yuanbo Hou, Yun Wang, Wenwu Wang, Dick Botteldooren

Abstract: Audio tagging aims to assign predefined tags to audio clips to indicate the class information of audio events. Sequential audio tagging (SAT) means detecting both the class information of audio events, and the order in which they occur within the audio clip. Most existing methods for SAT are based on connectionist temporal classification (CTC). However, CTC cannot effectively capture connections b… ▽ More Audio tagging aims to assign predefined tags to audio clips to indicate the class information of audio events. Sequential audio tagging (SAT) means detecting both the class information of audio events, and the order in which they occur within the audio clip. Most existing methods for SAT are based on connectionist temporal classification (CTC). However, CTC cannot effectively capture connections between events due to the conditional independence assumption between outputs at different times. The contextual Transformer (cTransformer) addresses this issue by exploiting contextual information in SAT. Nevertheless, cTransformer is also limited in exploiting contextual information as it only uses forward information in inference. This paper proposes a gated contextual Transformer (GCT) with forward-backward inference (FBI). In addition, a gated contextual multi-layer perceptron (GCMLP) block is proposed in GCT to improve the performance of cTransformer structurally. Experiments on two real-life audio datasets show that the proposed GCT with GCMLP and FBI performs better than the CTC-based methods and cTransformer. To promote research on SAT, the manually annotated sequential labels for the two datasets are released. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2209.10425 [pdf, other]

Consecutive Knowledge Meta-Adaptation Learning for Unsupervised Medical Diagnosis

Authors: Yumin Zhang, Yawen Hou, Xiuyi Chen, Hongyuan Yu, Long Xia

Abstract: Deep learning-based Computer-Aided Diagnosis (CAD) has attracted appealing attention in academic researches and clinical applications. Nevertheless, the Convolutional Neural Networks (CNNs) diagnosis system heavily relies on the well-labeled lesion dataset, and the sensitivity to the variation of data distribution also restricts the potential application of CNNs in CAD. Unsupervised Domain Adaptat… ▽ More Deep learning-based Computer-Aided Diagnosis (CAD) has attracted appealing attention in academic researches and clinical applications. Nevertheless, the Convolutional Neural Networks (CNNs) diagnosis system heavily relies on the well-labeled lesion dataset, and the sensitivity to the variation of data distribution also restricts the potential application of CNNs in CAD. Unsupervised Domain Adaptation (UDA) methods are developed to solve the expensive annotation and domain gaps problem and have achieved remarkable success in medical image analysis. Yet existing UDA approaches only adapt knowledge learned from the source lesion domain to a single target lesion domain, which is against the clinical scenario: the new unlabeled target domains to be diagnosed always arrive in an online and continual manner. Moreover, the performance of existing approaches degrades dramatically on previously learned target lesion domains, due to the newly learned knowledge overwriting the previously learned knowledge (i.e., catastrophic forgetting). To deal with the above issues, we develop a meta-adaptation framework named Consecutive Lesion Knowledge Meta-Adaptation (CLKM), which mainly consists of Semantic Adaptation Phase (SAP) and Representation Adaptation Phase (RAP) to learn the diagnosis model in an online and continual manner. In the SAP, the semantic knowledge learned from the source lesion domain is transferred to consecutive target lesion domains. In the RAP, the feature-extractor is optimized to align the transferable representation knowledge across the source and multiple target lesion domains. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2209.09708 [pdf, ps, other]

Two-Stage Submodular Optimization of Dynamic Thermal Rating for Risk Mitigation Considering Placement and Operation Schedule

Authors: Qinfei Long, Junhong Liu, Chenhao Ren, Wenqian Yin, Feng Liu, Yunhe Hou

Abstract: Cascading failure causes a major risk to society currently. To effectively mitigate the risk, dynamic thermal rating (DTR) technique can be applied as a cost-effective strategy to exploit potential transmission capability. From the perspectives of service life and Braess paradox, it is important and challenging to jointly optimize the DTR placement and operation schedule for changing system state,… ▽ More Cascading failure causes a major risk to society currently. To effectively mitigate the risk, dynamic thermal rating (DTR) technique can be applied as a cost-effective strategy to exploit potential transmission capability. From the perspectives of service life and Braess paradox, it is important and challenging to jointly optimize the DTR placement and operation schedule for changing system state, which is a two-stage combinatorial problem with only discrete variables, suffering from no approximation guarantee and dimension curse only based on traditional models. Thus, the present work proposes a novel two-stage submodular optimization (TSSO) of DTR for risk mitigation considering placement and operation schedule. Specifically, it optimizes DTR placement with proper redundancy in first stage, and then determines the corresponding DTR operation for each system state in second stage. Under the condition of the Markov and submodular features in sub-function of risk mitigation, the submodularity of total objective function of TSSO can be proven for the first time. Based on this, a state-of-the-art efficient solving algorithm is developed that can provide a better approximation guarantee than previous studies by coordinating the separate curvature and error form. The performance of the proposed algorithms is verified by case results. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2208.02086 [pdf, other]

Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion

Authors: Yuanbo Hou, Bo Kang, Dick Botteldooren

Abstract: Previous works on scene classification are mainly based on audio or visual signals, while humans perceive the environmental scenes through multiple senses. Recent studies on audio-visual scene classification separately fine-tune the largescale audio and image pre-trained models on the target dataset, then either fuse the intermediate representations of the audio model and the visual model, or fuse… ▽ More Previous works on scene classification are mainly based on audio or visual signals, while humans perceive the environmental scenes through multiple senses. Recent studies on audio-visual scene classification separately fine-tune the largescale audio and image pre-trained models on the target dataset, then either fuse the intermediate representations of the audio model and the visual model, or fuse the coarse-grained decision of both models at the clip level. Such methods ignore the detailed audio events and visual objects in audio-visual scenes (AVS), while humans often identify different scenes through audio events and visual objects within and the congruence between them. To exploit the fine-grained information of audio events and visual objects in AVS, and coordinate the implicit relationship between audio events and visual objects, this paper proposes a multibranch model equipped with contrastive event-object alignment (CEOA) and semantic-based fusion (SF) for AVSC. CEOA aims to align the learned embeddings of audio events and visual objects by comparing the difference between audio-visual event-object pairs. Then, visual objects associated with certain audio events and vice versa are accentuated by cross-attention and undergo SF for semantic-level fusion. Experiments show that: 1) the proposed AVSC model equipped with CEOA and SF outperforms the results of audio-only and visual-only models, i.e., the audio-visual results are better than the results from a single modality. 2) CEOA aligns the embeddings of audio events and related visual objects on a fine-grained level, and the SF effectively integrates both; 3) Compared with other large-scale integrated systems, the proposed model shows competitive performance, even without using additional datasets and data augmentation tricks. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: IEEE MMSP 2022

arXiv:2207.00741 [pdf, other]

doi 10.1109/TSG.2023.3310979

A Distributionally Robust Resilience Enhancement Strategy for Distribution Networks Considering Decision-Dependent Contingencies

Authors: Yujia Li, Shunbo Lei, Wei Sun, Chenxi Hu, Yunhe Hou

Abstract: When performing the resilience enhancement for distribution networks, there are two obstacles to reliably model the uncertain contingencies: 1) decision-dependent uncertainty (DDU) due to various line hardening decisions, and 2) distributional ambiguity due to limited outage information during extreme weather events (EWEs). To address these two challenges, this paper develops scenario-wise decisio… ▽ More When performing the resilience enhancement for distribution networks, there are two obstacles to reliably model the uncertain contingencies: 1) decision-dependent uncertainty (DDU) due to various line hardening decisions, and 2) distributional ambiguity due to limited outage information during extreme weather events (EWEs). To address these two challenges, this paper develops scenario-wise decision-dependent ambiguity sets (SWDD-ASs), where the DDU and distributional ambiguity inherent in EWE-induced contingencies are simultaneously captured for each possible EWE scenario. Then, a two-stage trilevel decision-dependent distributionally robust resilient enhancement (DD-DRRE) model is formulated, whose outputs include the optimal line hardening, distributed generation (DG) allocation, and proactive network reconfiguration strategy under the worst-case distributions in SWDD-ASs. Subsequently, the DD-DRRE model is equivalently recast to a mixed-integer linear programming (MILP)-based master problem and multiple scenario-wise subproblems, facilitating the adoption of a customized column-and-constraint generation (C&CG) algorithm. Finally, case studies demonstrate a remarkable improvement in the out-of-sample performance of our model, compared to its prevailing stochastic and robust counterparts. Moreover, the potential values of incorporating the ambiguity and distributional information are quantitatively estimated, providing a useful reference for planners with different budgets and risk-aversion levels. △ Less

Submitted 23 August, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

arXiv:2206.08233 [pdf, other]

Event-related data conditioning for acoustic event classification

Authors: Yuanbo Hou, Dick Botteldooren

Abstract: Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event classification (AEC). Among them, self-attention is often used in audio-only tasks to help the model recognize different acoustic events. Self-attention relies on the similarity between time frames, and uses global information from the whole segment to highlight specific features within a frame. In… ▽ More Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event classification (AEC). Among them, self-attention is often used in audio-only tasks to help the model recognize different acoustic events. Self-attention relies on the similarity between time frames, and uses global information from the whole segment to highlight specific features within a frame. In real life, information related to acoustic events will attenuate over time, which means the information within some frames around the event deserves more attention than distant time global information that may be unrelated to the event. This paper shows that self-attention may over-enhance certain segments of audio representations, and smooth out the boundaries between events representations and background noises. Hence, this paper proposes an event-related data conditioning (EDC) for AEC. EDC directly works on spectrograms. The idea of EDC is to adaptively select the frame-related attention range based on acoustic features, and gather the event-related local information to represent the frame. Experiments show that: 1) compared with spectrogram-based data augmentation methods and trainable feature weighting and self-attention, EDC outperforms them in both the original-size mode and the augmented mode; 2) EDC effectively gathers event-related local information and enhances boundaries between events and backgrounds, improving the performance of AEC. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted by INTERSPEECH 2022

arXiv:2206.05641 [pdf, ps, other]

An Unsupervised Deep-Learning Method for Bone Age Assessment

Authors: Hao Zhu, Wan-Jing Nie, Yue-Jie Hou, Qi-Meng Du, Si-Jing Li, Chi-Chun Zhou

Abstract: The bone age, reflecting the degree of development of the bones, can be used to predict the adult height and detect endocrine diseases of children. Both examinations of radiologists and variability of operators have a significant impact on bone age assessment. To decrease human intervention , machine learning algorithms are used to assess the bone age automatically. However, conventional supervise… ▽ More The bone age, reflecting the degree of development of the bones, can be used to predict the adult height and detect endocrine diseases of children. Both examinations of radiologists and variability of operators have a significant impact on bone age assessment. To decrease human intervention , machine learning algorithms are used to assess the bone age automatically. However, conventional supervised deep-learning methods need pre-labeled data. In this paper, based on the convolutional auto-encoder with constraints (CCAE), an unsupervised deep-learning model proposed in the classification of the fingerprint, we propose this model for the classification of the bone age and baptize it BA-CCAE. In the proposed BA-CCAE model, the key regions of the raw X-ray images of the bone age are encoded, yielding the latent vectors. The K-means clustering algorithm is used to obtain the final classifications by grouping the latent vectors of the bone images. A set of experiments on the Radiological Society of North America pediatric bone age dataset (RSNA) show that the accuracy of classifications at 48-month intervals is 76.15%. Although the accuracy now is lower than most of the existing supervised models, the proposed BA-CCAE model can establish the classification of bone age without any pre-labeled data, and to the best of our knowledge, the proposed BA-CCAE is one of the few trails using the unsupervised deep-learning method for the bone age assessment. △ Less

Submitted 11 June, 2022; originally announced June 2022.

arXiv:2206.03049 [pdf, other]

Siamese Encoder-based Spatial-Temporal Mixer for Growth Trend Prediction of Lung Nodules on CT Scans

Authors: Jiansheng Fang, Jingwen Wang, Anwei Li, Yuguang Yan, Yonghe Hou, Chao Song, Hongbo Liu, Jiang Liu

Abstract: In the management of lung nodules, we are desirable to predict nodule evolution in terms of its diameter variation on Computed Tomography (CT) scans and then provide a follow-up recommendation according to the predicted result of the growing trend of the nodule. In order to improve the performance of growth trend prediction for lung nodules, it is vital to compare the changes of the same nodule in… ▽ More In the management of lung nodules, we are desirable to predict nodule evolution in terms of its diameter variation on Computed Tomography (CT) scans and then provide a follow-up recommendation according to the predicted result of the growing trend of the nodule. In order to improve the performance of growth trend prediction for lung nodules, it is vital to compare the changes of the same nodule in consecutive CT scans. Motivated by this, we screened out 4,666 subjects with more than two consecutive CT scans from the National Lung Screening Trial (NLST) dataset to organize a temporal dataset called NLSTt. In specific, we first detect and pair regions of interest (ROIs) covering the same nodule based on registered CT scans. After that, we predict the texture category and diameter size of the nodules through models. Last, we annotate the evolution class of each nodule according to its changes in diameter. Based on the built NLSTt dataset, we propose a siamese encoder to simultaneously exploit the discriminative features of 3D ROIs detected from consecutive CT scans. Then we novelly design a spatial-temporal mixer (STM) to leverage the interval changes of the same nodule in sequential 3D ROIs and capture spatial dependencies of nodule regions and the current 3D ROI. According to the clinical diagnosis routine, we employ hierarchical loss to pay more attention to growing nodules. The extensive experiments on our organized dataset demonstrate the advantage of our proposed method. We also conduct experiments on an in-house dataset to evaluate the clinical utility of our method by comparing it against skilled clinicians. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: MICCAI 2022

arXiv:2205.06774 [pdf]

Controlled Mobility for C-V2X Road Safety Reception Optimization

Authors: Jingxuan Men, Yun Hou

Abstract: The use case of C-V2X for road safety requires real-time network connection and information exchanging between vehicles. In order to improve the reliability and safety of the system, intelligent networked vehicles need to move cooperatively to achieve network optimization. In this paper, we use the C-V2X sidelink mode 4 abstraction and the regression results of C-V2X network level simulation to fo… ▽ More The use case of C-V2X for road safety requires real-time network connection and information exchanging between vehicles. In order to improve the reliability and safety of the system, intelligent networked vehicles need to move cooperatively to achieve network optimization. In this paper, we use the C-V2X sidelink mode 4 abstraction and the regression results of C-V2X network level simulation to formulate the optimization of packet reception rate (PRR) with fairness in the road safety scenario. Under the optimization framework, we design a controlled mobility algorithm for the transmission node to adaptively adjust its position to maximize the aggregated PRR using only one-hop information. Simulation result shows that the algorithm converges and improve the aggregated PRR and fairness for C-V2X mode broadcast messages. △ Less

Submitted 3 May, 2022; originally announced May 2022.

arXiv:2205.00499 [pdf, other]

Relation-guided acoustic scene classification aided with event embeddings

Authors: Yuanbo Hou, Bo Kang, Wout Van Hauwermeiren, Dick Botteldooren

Abstract: In real life, acoustic scenes and audio events are naturally correlated. Humans instinctively rely on fine-grained audio events as well as the overall sound characteristics to distinguish diverse acoustic scenes. Yet, most previous approaches treat acoustic scene classification (ASC) and audio event classification (AEC) as two independent tasks. A few studies on scene and event joint classificatio… ▽ More In real life, acoustic scenes and audio events are naturally correlated. Humans instinctively rely on fine-grained audio events as well as the overall sound characteristics to distinguish diverse acoustic scenes. Yet, most previous approaches treat acoustic scene classification (ASC) and audio event classification (AEC) as two independent tasks. A few studies on scene and event joint classification either use synthetic audio datasets that hardly match the real world, or simply use the multi-task framework to perform two tasks at the same time. Neither of these two ways makes full use of the implicit and inherent relation between fine-grained events and coarse-grained scenes. To this end, this paper proposes a relation-guided ASC (RGASC) model to further exploit and coordinate the scene-event relation for the mutual benefit of scene and event recognition. The TUT Urban Acoustic Scenes 2018 dataset (TUT2018) is annotated with pseudo labels of events by a simple and efficient audio-related pre-trained model PANN, which is one of the state-of-the-art AEC models. Then, a prior scene-event relation matrix is defined as the average probability of the presence of each event type in each scene class. Finally, the two-tower RGASC model is jointly trained on the real-life dataset TUT2018 for both scene and event classification. The following results are achieved. 1) RGASC effectively coordinates the true information of coarse-grained scenes and the pseudo information of fine-grained events. 2) The event embeddings learned from pseudo labels under the guidance of prior scene-event relations help reduce the confusion between similar acoustic scenes. 3) Compared with other (non-ensemble) methods, RGASC improves the scene classification accuracy on the real-life dataset. △ Less

Submitted 1 May, 2022; originally announced May 2022.

Comments: International Joint Conference on Neural Networks (IJCNN) 2022

arXiv:2203.17156 [pdf, other]

doi 10.1109/ICME52920.2022.9859703

Adaptive Mean-Residue Loss for Robust Facial Age Estimation

Authors: Ziyuan Zhao, Peisheng Qian, Yubo Hou, Zeng Zeng

Abstract: Automated facial age estimation has diverse real-world applications in multimedia analysis, e.g., video surveillance, and human-computer interaction. However, due to the randomness and ambiguity of the aging process, age assessment is challenging. Most research work over the topic regards the task as one of age regression, classification, and ranking problems, and cannot well leverage age distribu… ▽ More Automated facial age estimation has diverse real-world applications in multimedia analysis, e.g., video surveillance, and human-computer interaction. However, due to the randomness and ambiguity of the aging process, age assessment is challenging. Most research work over the topic regards the task as one of age regression, classification, and ranking problems, and cannot well leverage age distribution in representing labels with age ambiguity. In this work, we propose a simple yet effective loss function for robust facial age estimation via distribution learning, i.e., adaptive mean-residue loss, in which, the mean loss penalizes the difference between the estimated age distribution's mean and the ground-truth age, whereas the residue loss penalizes the entropy of age probability out of dynamic top-K in the distribution. Experimental results in the datasets FG-NET and CLAP2016 have validated the effectiveness of the proposed loss. Our code is available at https://github.com/jacobzhaoziyuan/AMR-Loss. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2022)

Journal ref: 2022 IEEE International Conference on Multimedia and Expo (ICME)

arXiv:2203.11573 [pdf, other]

CT-SAT: Contextual Transformer for Sequential Audio Tagging

Authors: Yuanbo Hou, Zhaoyi Liu, Bo Kang, Yun Wang, Dick Botteldooren

Abstract: Sequential audio event tagging can provide not only the type information of audio events, but also the order information between events and the number of events that occur in an audio clip. Most previous works on audio event sequence analysis rely on connectionist temporal classification (CTC). However, CTC's conditional independence assumption prevents it from effectively learning correlations be… ▽ More Sequential audio event tagging can provide not only the type information of audio events, but also the order information between events and the number of events that occur in an audio clip. Most previous works on audio event sequence analysis rely on connectionist temporal classification (CTC). However, CTC's conditional independence assumption prevents it from effectively learning correlations between diverse audio events. This paper first attempts to introduce Transformer into sequential audio tagging, since Transformers perform well in sequence-related tasks. To better utilize contextual information of audio event sequences, we draw on the idea of bidirectional recurrent neural networks, and propose a contextual Transformer (cTransformer) with a bidirectional decoder that could exploit the forward and backward information of event sequences. Experiments on the real-life polyphonic audio dataset show that, compared to CTC-based methods, the cTransformer can effectively combine the fine-grained acoustic representations from the encoder and coarse-grained audio event cues to exploit contextual information to successfully recognize and predict audio event sequences. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: Submitted to interspeech 2022

arXiv:2202.11880 [pdf, other]

On Nash-Stackelberg-Nash Games under Decision-Dependent Uncertainties: Model and Equilibrium

Authors: Yunfan Zhang, Feng Liu, Zhaojian Wang, Yue Chen, Shuanglei Feng, Qiuwei Wu, Yunhe Hou

Abstract: In this paper, we discuss a class of two-stage hierarchical games with multiple leaders and followers, which is called Nash-Stackelberg-Nash (N-S-N) games. Particularly, we consider N-S-N games under decision-dependent uncertainties (DDUs). DDUs refer to the uncertainties that are affected by the strategies of decision-makers and have been rarely addressed in game equilibrium analysis. In this pap… ▽ More In this paper, we discuss a class of two-stage hierarchical games with multiple leaders and followers, which is called Nash-Stackelberg-Nash (N-S-N) games. Particularly, we consider N-S-N games under decision-dependent uncertainties (DDUs). DDUs refer to the uncertainties that are affected by the strategies of decision-makers and have been rarely addressed in game equilibrium analysis. In this paper, we first formulate the N-S-N games with DDUs of complete ignorance, where the interactions between the players and DDUs are characterized by uncertainty sets that depend parametrically on the players' strategies. Then, a rigorous definition for the equilibrium of the game is established by consolidating generalized Nash equilibrium and Pareto-Nash equilibrium. Afterward, we prove the existence of the equilibrium of N-S-N games under DDUs by applying Kakutani's fixed-point theorem. Finally, an illustrative example is provided to show the impact of DDUs on the equilibrium of N-S-N games. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2201.04800 [pdf, other]

Online State Estimation for Supervisor Synthesis in Discrete-Event Systems with Communication Delays and Losses

Authors: Yunfeng Hou, Yunfeng Ji, Gang Wang, Ching-Yen Weng, Qingdu Li

Abstract: In the context of networked discrete-event systems (DESs), communication delays and losses exist between the plant and the supervisor for observation and between the supervisor and the actuator for control. In this paper, we first introduce a new framework for supervisory control of networked DESs. Under the introduced framework, we address the state estimation problem for supervisor synthesis of… ▽ More In the context of networked discrete-event systems (DESs), communication delays and losses exist between the plant and the supervisor for observation and between the supervisor and the actuator for control. In this paper, we first introduce a new framework for supervisory control of networked DESs. Under the introduced framework, we address the state estimation problem for supervisor synthesis of networked DESs with both communication delays and losses. The estimation algorithm considers the effect of the controls imposed on the system. Additionally, the estimation algorithm is based on the control decisions available up to the moment, and all the future control decisions are assumed to be unknowable. Two notions, called "observation channel configuration" for tracking observation delays and losses and "control channel configuration" for tracking control delays and losses, are defined. Then, we introduce an online approach for state estimation of the controlled system. Compared with the existing approach, the proposed approach under the introduced framework can estimate the state of the controlled system more accurately. As an application of the proposed approach, we finally show that the existing methods can be easily applied to synthesize maximally permissible and safe networked supervisors. △ Less

Submitted 6 October, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

arXiv:2112.05612 [pdf, other]

doi 10.1109/MWC.101.2100354

Decentralized Spectrum Access System: Vision, Challenges, and a Blockchain Solution

Authors: Yang Xiao, Shanghao Shi, Wenjing Lou, Chonggang Wang, Xu Li, Ning Zhang, Y. Thomas Hou, Jeffrey H. Reed

Abstract: Spectrum access system (SAS) is widely considered the de facto solution to coordinating dynamic spectrum sharing (DSS) and protecting incumbent users. The current SAS paradigm prescribed by the FCC for the CBRS band and standardized by the WInnForum follows a centralized service model in that a spectrum user subscribes to a SAS server for spectrum allocation service. This model, however, neither t… ▽ More Spectrum access system (SAS) is widely considered the de facto solution to coordinating dynamic spectrum sharing (DSS) and protecting incumbent users. The current SAS paradigm prescribed by the FCC for the CBRS band and standardized by the WInnForum follows a centralized service model in that a spectrum user subscribes to a SAS server for spectrum allocation service. This model, however, neither tolerates SAS server failures (crash or Byzantine) nor resists dishonest SAS administrators, leading to serious concerns on SAS system reliability and trustworthiness. This is especially concerning for the evolving DSS landscape where an increasing number of SAS service providers and heterogeneous user requirements are coming up. To address these challenges, we propose a novel blockchain-based decentralized SAS architecture called BD-SAS that provides SAS services securely and efficiently, without relying on the trust of each individual SAS server for the overall system trustworthiness. In BD-SAS, a global blockchain (G-Chain) is used for spectrum regulatory compliance while smart contract-enabled local blockchains (L-Chains) are instantiated in individual spectrum zones for automating spectrum access assignment per user request. We hope our vision of a decentralized SAS, the BD-SAS architecture, and discussion on future challenges can open up a new direction towards reliable spectrum management in a decentralized manner. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: A version of this work has been accepted by IEEE Wireless Communications for publication

Journal ref: IEEE Wireless Communications (2022)

arXiv:2110.00265 [pdf, other]

A New Approach for Verification of Delay Coobservability of Discrete-Event Systems

Authors: Yunfeng Hou, Qingdu Li, Yunfeng Ji, Gang Wang, Ching-Yen Weng

Abstract: In decentralized networked supervisory control of discrete-event systems (DESs), the local supervisors observe event occurrences subject to observation delays to make correct control decisions. Delay coobservability describes whether these local supervisors can make sufficient observations. In this paper, we provide an efficient way to verify delay coobservability. For each controllable event, we… ▽ More In decentralized networked supervisory control of discrete-event systems (DESs), the local supervisors observe event occurrences subject to observation delays to make correct control decisions. Delay coobservability describes whether these local supervisors can make sufficient observations. In this paper, we provide an efficient way to verify delay coobservability. For each controllable event, we partition the specification language into a finite number of sets such that strings in different sets have different lengths. For each of the sets, we construct a verifier to check if delay coobservability holds for the controllable event. The computational complexity of the proposed approach is polynomial with respect to the number of states, the number of events, and the upper bounds on observation delays and only exponential with respect to the number of local supervisors. It has lower complexity order than the existing approaches. In addition, we investigate the relationship between the decentralized supervisory control of networked DESs and the decentralized fault diagnosis of networked DESs and show that delay $K$-codiagnosability is transformable to delay coobservability. Thus, techniques for the verification of delay coobservability can be leveraged to verify delay $K$-codiagnosability. △ Less

Submitted 19 May, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2109.08195 [pdf, other]

A Data-Driven Uncertainty Quantification Method for Stochastic Economic Dispatch

Authors: Xiaoting Wang, Rong-Peng Liu, Xiaozhe Wang, Yunhe Hou, François Bouffard

Abstract: This letter proposes a data-driven sparse polynomial chaos expansion-based surrogate model for the stochastic economic dispatch problem considering uncertainty from wind power. The proposed method can provide accurate estimations for the statistical information (e.g., mean, variance, probability density function, and cumulative distribution function) for the stochastic economic dispatch solution e… ▽ More This letter proposes a data-driven sparse polynomial chaos expansion-based surrogate model for the stochastic economic dispatch problem considering uncertainty from wind power. The proposed method can provide accurate estimations for the statistical information (e.g., mean, variance, probability density function, and cumulative distribution function) for the stochastic economic dispatch solution efficiently without requiring the probability distributions of random inputs. Simulation studies on an integrated electricity and gas system (IEEE 118-bus system integrated with a 20-node gas system are presented, demonstrating the efficiency and accuracy of the proposed method compared to the Monte Carlo simulations. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted by IEEE Transactions on Power Systems for future publication

Showing 1–50 of 71 results for author: Hou, Y