Zum Hauptinhalt springen

Showing 1–41 of 41 results for author: Dong, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.07866  [pdf, other

    eess.SY

    Certifiable Deep Learning for Reachability Using a New Lipschitz Continuous Value Function

    Authors: Jingqi Li, Donggun Lee, Jaewon Lee, Kris Shengjun Dong, Somayeh Sojoudi, Claire Tomlin

    Abstract: We propose a new reachability learning framework for high-dimensional nonlinear systems, focusing on reach-avoid problems. These problems require computing the reach-avoid set, which ensures that all its elements can safely reach a target set despite any disturbance within pre-specified bounds. Our framework has two main parts: offline learning of a newly designed reach-avoid value function and po… ▽ More

    Submitted 19 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Submitted, under review

  2. arXiv:2406.09664  [pdf, other

    cs.SD eess.AS

    Frequency-mix Knowledge Distillation for Fake Speech Detection

    Authors: Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv

    Abstract: In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.06086  [pdf, other

    cs.SD eess.AS

    RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

    Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  5. arXiv:2402.18076  [pdf, other

    eess.SY

    Online Ecological Gearshift Strategy via Neural Network with Soft-Argmax Operator

    Authors: Xi Luo, Shiying Dong, Jinlong Hong, Bingzhao Gao, Hong Chen

    Abstract: This paper presents a neural network optimizer with soft-argmax operator to achieve an ecological gearshift strategy in real-time. The strategy is reformulated as the mixed-integer model predictive control (MIMPC) problem to minimize energy consumption. Then the outer convexification is introduced to transform integer variables into relaxed binary controls. To approximate binary solutions properly… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 6 pages, 5 figures, submitted to 8th IFAC Conference on Nonlinear Model Predictive Control

  6. arXiv:2402.03048  [pdf, other

    cs.MA cs.LG eess.SY

    Cooperative Learning with Gaussian Processes for Euler-Lagrange Systems Tracking Control under Switching Topologies

    Authors: Zewen Yang, Songbo Dong, Armin Lederer, Xiaobing Dai, Siyu Chen, Stefan Sosnowski, Georges Hattab, Sandra Hirche

    Abstract: This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty pre… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 8 pages

  7. arXiv:2311.06579  [pdf, other

    cs.RO eess.SY

    Five-Tiered Route Planner for Multi-AUV Accessing Fixed Nodes in Uncertain Ocean Environments

    Authors: Jiaxin Zhang, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong

    Abstract: This article introduces a five-tiered route planner for accessing multiple nodes with multiple autonomous underwater vehicles (AUVs) that enables efficient task completion in stochastic ocean environments. First, the pre-planning tier solves the single-AUV routing problem to find the optimal giant route (GR), estimates the number of required AUVs based on GR segmentation, and allocates nodes for e… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  8. arXiv:2310.01163  [pdf, other

    cs.RO eess.SY

    Trust-Aware Motion Planning for Human-Robot Collaboration under Distribution Temporal Logic Specifications

    Authors: Pian Yu, Shuyang Dong, Shili Sheng, Lu Feng, Marta Kwiatkowska

    Abstract: Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic that involve human trust. Since human trust in robots is not observable, we adopt the widely used partially observable Markov decision process (POMDP) framework… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  9. arXiv:2309.04100  [pdf

    eess.IV cs.LG physics.med-ph

    Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI)

    Authors: Siyuan Dong, Henk M. De Feyter, Monique A. Thomas, Robin A. de Graaf, James S. Duncan

    Abstract: Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The C… ▽ More

    Submitted 13 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

  10. arXiv:2308.15930  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    LLaSM: Large Language and Speech Model

    Authors: Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

    Abstract: Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f… ▽ More

    Submitted 16 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  11. arXiv:2308.04013  [pdf, other

    eess.SY cs.IT

    Distributed Target Tracking with Fading Channels over Underwater Wireless Sensor Networks

    Authors: Miaoyi Tang, Meiqin Liu, Senlin Zhang, Ronghao Zheng, Shanling Dong

    Abstract: This paper investigates the problem of distributed target tracking via underwater wireless sensor networks (UWSNs) with fading channels. The degradation of signal quality due to wireless channel fading can significantly impact network reliability and subsequently reduce the tracking accuracy. To address this issue, we propose a modified distributed unscented Kalman filter (DUKF) named DUKF-Fc, whi… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures, 6 tables

  12. arXiv:2306.15389  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake Speech Detection

    Authors: Shunbo Dong, Jun Xue, Cunhang Fan, Kang Zhu, Yujie Chen, Zhao Lv

    Abstract: In this paper, we propose the multi-perspective information fusion (MPIF) Res2Net with random Specmix for fake speech detection (FSD). The main purpose of this system is to improve the model's ability to learn precise forgery information for FSD task in low-quality scenarios. The task of random Specmix, a data augmentation, is to improve the generalization ability of the model and enhance the mode… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by DADA2023

  13. arXiv:2306.02231  [pdf, other

    cs.CL cs.AI cs.LG eess.SY

    Fine-Tuning Language Models with Advantage-Induced Policy Alignment

    Authors: Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  14. arXiv:2305.18771  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    SFCNeXt: a simple fully convolutional network for effective brain age estimation with small sample size

    Authors: Yu Fu, Yanyan Huang, Shunjie Dong, Yalin Wang, Tianbai Yu, Meng Niu, Cheng Zhuo

    Abstract: Deep neural networks (DNN) have been designed to predict the chronological age of a healthy brain from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as a valuable biomarker for the early detection of development-related or aging-related disorders. Recent DNN models for brain age estimations usually rely too much on large sample sizes and complex network s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: This paper has been accepted by IEEE ISBI 2023

  15. Picking Up Quantization Steps for Compressed Image Classification

    Authors: Li Ma, Peixi Peng, Guangyao Chen, Yifan Zhao, Siwei Dong, Yonghong Tian

    Abstract: The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file. In this paper, we argue that neglected disposable coding parameters stored in compressed files could be picked up to reduce the sensitivity of deep neural networks to compressed im… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Journal ref: in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 4, pp. 1884-1898, April 2023

  16. arXiv:2304.03708  [pdf, other

    eess.IV cs.CV

    Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge

    Authors: Gongning Luo, Kuanquan Wang, Jun Liu, Shuo Li, Xinjie Liang, Xiangyu Li, Shaowei Gan, Wei Wang, Suyu Dong, Wenyi Wang, Pengxin Yu, Enyou Liu, Hongrong Wei, Na Wang, Jia Guo, Huiqi Li, Zhao Zhang, Ziwei Zhao, Na Gao, Nan An, Ashkan Pakzad, Bojidar Rangelov, Jiaqi Dou, Song Tian, Zeyu Liu , et al. (5 additional authors not shown)

    Abstract: Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi… ▽ More

    Submitted 9 August, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  17. arXiv:2303.13463  [pdf, other

    cs.CL eess.AS

    W2KPE: Keyphrase Extraction with Word-Word Relation

    Authors: Wen Cheng, Shichen Dong, Wei Wang

    Abstract: This paper describes our submission to ICASSP 2023 MUG Challenge Track 4, Keyphrase Extraction, which aims to extract keyphrases most relevant to the conference theme from conference materials. We model the challenge as a single-class Named Entity Recognition task and developed techniques for better performance on the challenge: For the data preprocessing, we encode the split keyphrases after word… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  18. arXiv:2302.13893  [pdf, other

    eess.SY

    Electric Vehicle Sales Forecasting Model Considering Green Premium: A Chinese Market-based Perspective

    Authors: Zhi Li, Hang Fan, Shuyan Dong

    Abstract: "Green Premiums" which means the difference in cost between emissions-emitting technology and zero-emissions or emissions-reducing technology is significant for those renewable energy technology to address the climate change challenge facing the world in this century. China's Electrical Vehicles (EVs) industry is the first to cross the green premium into the commercialization stage, prompting its… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  19. RDFNet: Regional Dynamic FISTA-Net for Spectral Snapshot Compressive Imaging

    Authors: Shiyun Zhou, Tingfa Xu, Shaocong Dong, Jianan Li

    Abstract: Deep convolutional neural networks have recently shown promising results in compressive spectral reconstruction. Previous methods, however, usually adopt a single mapping function for sparse representation. Considering that different regions have distinct characteristics, it is desirable to apply various mapping functions to adjust different regions' transformations dynamically. With this in mind,… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: IEEE Transactions on Computational Imaging

  20. arXiv:2301.05781  [pdf, other

    eess.SY

    Analysis of November 21, 2021, Kaua`i Island Power System 18-20 Hz Oscillations

    Authors: Shuan Dong, Bin Wang, Jin Tan, Cameron J. Kruse, Brad W. Rockwell, Anderson Hoke

    Abstract: This letter discusses the 18-20 Hz oscillation event at 05:30 am on November 21, 2021, in Kaua`i's power system following the trip of an oil power plant. As far as the authors are aware, this is the first report of a transmission system-wide subsynchronous oscillation driven by inverter-based resources (though the system in question is relatively small). In this letter, we leverage two data-based… ▽ More

    Submitted 10 February, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

  21. arXiv:2211.08402  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Introducing Semantics into Speech Encoders

    Authors: Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang

    Abstract: Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 11 pages, 3 figures

  22. arXiv:2209.09413  [pdf, other

    eess.SY

    A Unified Analytical Method to Quantify Three Types of Fast Frequency Response from Inverter-based Resources

    Authors: Shuan Dong, Xin Fang, Jin Tan, Ningchao Gao, Xiaofan Cui, Anderson Hoke

    Abstract: With more inverter-based resources (IBRs), our power systems have lower frequency nadirs following N-1 contingencies, and undesired under-frequency load shedding (UFLS) can occur. To address this challenge, IBRs can be programmed to provide at least three types of fast frequency response (FFR), e.g., step response, proportional response (P/f droop response), and derivative response (synthetic iner… ▽ More

    Submitted 25 August, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

  23. arXiv:2207.10181  [pdf, other

    eess.IV cs.CV cs.LG

    Flow-based Visual Quality Enhancer for Super-resolution Magnetic Resonance Spectroscopic Imaging

    Authors: Siyuan Dong, Gilbert Hangel, Eric Z. Chen, Shanhui Sun, Wolfgang Bogner, Georg Widhalm, Chenyu You, John A. Onofrey, Robin de Graaf, James S. Duncan

    Abstract: Magnetic Resonance Spectroscopic Imaging (MRSI) is an essential tool for quantifying metabolites in the body, but the low spatial resolution limits its clinical applications. Deep learning-based super-resolution methods provided promising results for improving the spatial resolution of MRSI, but the super-resolved images are often blurry compared to the experimentally-acquired high-resolution imag… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted by DGM4MICCAI 2022

  24. arXiv:2206.08984  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-scale Super-resolution Magnetic Resonance Spectroscopic Imaging with Adjustable Sharpness

    Authors: Siyuan Dong, Gilbert Hangel, Wolfgang Bogner, Georg Widhalm, Karl Rössler, Siegfried Trattnig, Chenyu You, Robin de Graaf, John Onofrey, James Duncan

    Abstract: Magnetic Resonance Spectroscopic Imaging (MRSI) is a valuable tool for studying metabolic activities in the human body, but the current applications are limited to low spatial resolutions. The existing deep learning-based MRSI super-resolution methods require training a separate network for each upscaling factor, which is time-consuming and memory inefficient. We tackle this multi-scale super-reso… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted by MICCAI 2022

  25. arXiv:2206.04682  [pdf, other

    eess.IV cs.CV cs.LG

    RT-DNAS: Real-time Constrained Differentiable Neural Architecture Search for 3D Cardiac Cine MRI Segmentation

    Authors: Qing Lu, Xiaowei Xu, Shunjie Dong, Cong Hao, Lei Yang, Cheng Zhuo, Yiyu Shi

    Abstract: Accurately segmenting temporal frames of cine magnetic resonance imaging (MRI) is a crucial step in various real-time MRI guided cardiac interventions. To achieve fast and accurate visual assistance, there are strict requirements on the maximum latency and minimum throughput of the segmentation framework. State-of-the-art neural networks on this task are mostly hand-crafted to satisfy these constr… ▽ More

    Submitted 13 June, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  26. arXiv:2206.02838  [pdf, other

    eess.IV cs.CV cs.LG

    Invertible Sharpening Network for MRI Reconstruction Enhancement

    Authors: Siyuan Dong, Eric Z. Chen, Lin Zhao, Xiao Chen, Yikang Liu, Terrence Chen, Shanhui Sun

    Abstract: High-quality MRI reconstruction plays a critical role in clinical applications. Deep learning-based methods have achieved promising results on MRI reconstruction. However, most state-of-the-art methods were designed to optimize the evaluation metrics commonly used for natural images, such as PSNR and SSIM, whereas the visual quality is not primarily pursued. Compared to the fully-sampled images, t… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted by MICCAI 2022

  27. arXiv:2206.01369  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

    Authors: Chenyu You, Jinlin Xiang, Kun Su, Xiaoran Zhang, Siyuan Dong, John Onofrey, Lawrence Staib, James S. Duncan

    Abstract: Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, whi… ▽ More

    Submitted 30 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  28. arXiv:2205.02758  [pdf, other

    physics.soc-ph eess.SY

    Quantitative Measures for Integrating Resilience into Transportation Planning Practice: Study in Texas

    Authors: Cheng-Chun Lee, Akhil Rajput, Chia-Wei Hsu, Chao Fan, Faxi Yuan, Shangjia Dong, Amir Esmalian, Hamed Farahmand, Flavia Ioana Patrascu, Chia-Fu Liu, Bo Li, Junwei Ma, Ali Mostafavi

    Abstract: The objective of this study is to propose a system-level framework with quantitative measures to assess the resilience of road networks. The framework proposed in this paper can help transportation agencies incorporate resilience considerations into project development proactively and to understand the resilience performance of current road networks effectively. This study identified and implement… ▽ More

    Submitted 5 May, 2022; v1 submitted 4 April, 2022; originally announced May 2022.

  29. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

  30. arXiv:2203.04911  [pdf, other

    cs.CL cs.SD eess.AS

    DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

    Authors: Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages… ▽ More

    Submitted 21 June, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  31. arXiv:2202.06548  [pdf, other

    eess.IV cs.LG

    A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis

    Authors: Yu Fu, Shunjie Dong, Yi Liao, Le Xue, Yuanfan Xu, Feng Li, Qianqian Yang, Tianbai Yu, Mei Tian, Cheng Zhuo

    Abstract: 18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  32. arXiv:2201.10737  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Class-Aware Adversarial Transformers for Medical Image Segmentation

    Authors: Chenyu You, Ruihan Zhao, Fenglin Liu, Siyuan Dong, Sandeep Chinchali, Ufuk Topcu, Lawrence Staib, James S. Duncan

    Abstract: Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale f… ▽ More

    Submitted 15 December, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

  33. arXiv:2112.05144  [pdf

    cs.CV eess.IV

    Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing

    Authors: Wujie Zhou, Shaohua Dong, Caie Xu, Yaguan Qian

    Abstract: RGB thermal scene parsing has recently attracted increasing research interest in the field of computer vision. However, most existing methods fail to perform good boundary extraction for prediction maps and cannot fully use high level features. In addition, these methods simply fuse the features from RGB and thermal modalities but are unable to obtain comprehensive fused features. To address these… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI2022

  34. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  35. arXiv:2102.11099  [pdf, other

    eess.IV cs.CV

    RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection

    Authors: Shunjie Dong, Qianqian Yang, Yu Fu, Mei Tian, Cheng Zhuo

    Abstract: The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Chest Computed Tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is pref… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  36. arXiv:2007.06341  [pdf, other

    eess.IV cs.CV

    DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation

    Authors: Shunjie Dong, Jinlong Zhao, Maojun Zhang, Zhengxue Shi, Jianing Deng, Yiyu Shi, Mei Tian, Cheng Zhuo

    Abstract: Automatic segmentation of cardiac magnetic resonance imaging (MRI) facilitates efficient and accurate volume measurement in clinical applications. However, due to anisotropic resolution and ambiguous border (e.g., right ventricular endocardium), existing methods suffer from the degradation of accuracy and robustness in 3D cardiac MRI video segmentation. In this paper, we propose a novel Deformable… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  37. arXiv:2006.09201  [pdf, other

    eess.SP cs.LG stat.ML

    A Hybrid Deep Learning Model for Predictive Flood Warning and Situation Awareness using Channel Network Sensors Data

    Authors: Shangjia Dong, Tianbo Yu, Hamed Farahmand, Ali Mostafavi

    Abstract: The objective of this study is to create and test a hybrid deep learning model, FastGRNN-FCN (Fast, Accurate, Stable and Tiny Gated Recurrent Neural Network-Fully Convolutional Network), for urban flood prediction and situation awareness using channel network sensors data. The study used Harris County, Texas as the testbed, and obtained channel sensor data from three historical flood events (e.g.,… ▽ More

    Submitted 8 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  38. arXiv:2004.11848  [pdf

    cs.CV cs.LG eess.IV stat.ML

    Deep learning for smart fish farming: applications, opportunities and challenges

    Authors: Xinting Yang, Song Zhang, Jintao Liu, Qinfeng Gao, Shuanglin Dong, Chao Zhou

    Abstract: With the rapid emergence of deep learning (DL) technology, it has been successfully used in various fields including aquaculture. This change can create new opportunities and a series of challenges for information and data processing in smart fish farming. This paper focuses on the applications of DL in aquaculture, including live fish identification, species classification, behavioral analysis, f… ▽ More

    Submitted 30 June, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: 43 pages, 7 figures

    Journal ref: Reviews in aquaculture,2020

  39. arXiv:2002.03236  [pdf, other

    cs.RO eess.SY

    Tactile Dexterity: Manipulation Primitives with Tactile Feedback

    Authors: Francois R. Hogan, Jose Ballester, Siyuan Dong, Alberto Rodriguez

    Abstract: This paper develops closed-loop tactile controllers for dexterous robotic manipulation with a dual-palm robotic system. Tactile dexterity is an approach to dexterous manipulation that plans for robot/object interactions that render interpretable tactile information for control. We divide the role of tactile control into two goals: 1) control the contact state between the end-effector and the objec… ▽ More

    Submitted 30 April, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

  40. arXiv:1910.02860  [pdf, other

    cs.RO eess.IV eess.SY

    Cable Manipulation with a Tactile-Reactive Gripper

    Authors: Yu She, Shaoxiong Wang, Siyuan Dong, Neha Sunil, Alberto Rodriguez, Edward Adelson

    Abstract: Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical co… ▽ More

    Submitted 23 June, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: Accepted to RSS 2020

  41. arXiv:1907.08769  [pdf, other

    eess.IV cs.CV cs.MM

    A Retina-inspired Sampling Method for Visual Texture Reconstruction

    Authors: Lin Zhu, Siwei Dong, Tiejun Huang, Yonghong Tian

    Abstract: Conventional frame-based camera is not able to meet the demand of rapid reaction for real-time applications, while the emerging dynamic vision sensor (DVS) can realize high speed capturing for moving objects. However, to achieve visual texture reconstruction, DVS need extra information apart from the output spikes. This paper introduces a fovea-like sampling method inspired by the neuron signal pr… ▽ More

    Submitted 20 July, 2019; originally announced July 2019.

    Comments: Published in ICME 2019