Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Xiang, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.01654  [pdf, other

    cs.CV cs.AI eess.IV eess.SP

    AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

    Authors: Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng

    Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low ef… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical report for AI WALKUP, an APP winning 3rd Prize of 2022 HUST GS AI Innovation and Design Competition

  2. arXiv:2404.00257   

    cs.CV cs.AI cs.LG eess.IV

    YOLOOC: YOLO-based Open-Class Incremental Object Detection with Novel Class Discovery

    Authors: Qian Wan, Xiang Xiang, Qinhao Zhou

    Abstract: Because of its use in practice, open-world object detection (OWOD) has gotten a lot of attention recently. The challenge is how can a model detect novel classes and then incrementally learn them without forgetting previously known classes. Previous approaches hinge on strongly-supervised or weakly-supervised novel-class data for novel-class detection, which may not apply to real applications. We c… ▽ More

    Submitted 22 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Withdrawn because it was submitted without consent of the first author. In addition, this submission has some errors

  3. arXiv:2401.00135  [pdf

    eess.IV cs.CV

    Deep Radon Prior: A Fully Unsupervised Framework for Sparse-View CT Reconstruction

    Authors: Shuo Xu, Yucheng Zhang, Gang Chen, Xincheng Xiang, Peng Cong, Yuewen Sun

    Abstract: Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality tr… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 11 pages, 12 figures, Journal paper

  4. arXiv:2312.14239  [pdf, other

    cs.CV eess.IV

    PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

    Authors: Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram, Yuchen Fan, Christian Richardt, Ramesh Raskar, Rakesh Ranjan

    Abstract: 3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded reg… ▽ More

    Submitted 5 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project Page: https://platonerf.github.io/

  5. arXiv:2312.03640  [pdf, other

    eess.IV cs.CV

    Training Neural Networks on RAW and HDR Images for Restoration Tasks

    Authors: Lei Luo, Alexandre Chapiro, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Rafal Mantiuk

    Abstract: The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearl… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  6. arXiv:2306.15161  [pdf, other

    eess.AS cs.SD

    Wespeaker baselines for VoxSRC2023

    Authors: Shuai Wang, Chengdong Liang, Xu Xiang, Bing Han, Zhengyang Chen, Hongji Wang, Wen Ding

    Abstract: This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In thi… ▽ More

    Submitted 28 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  7. arXiv:2211.00815  [pdf, other

    cs.SD eess.AS

    Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: Many speaker recognition challenges have been held to assess the speaker verification system in the wild and probe the performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on the voxceleb, is the most popular. Besides, another challenge called CN-Celeb Speaker Recognition Challenge (CNSRC) is also held this year, which is based on the Chinese celebrity multi-genre dataset CN-C… ▽ More

    Submitted 1 June, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted by InterSpeech 2023

  8. arXiv:2210.17016  [pdf, other

    cs.SD eess.AS

    Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

    Authors: Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

    Abstract: Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

  9. arXiv:2209.09076  [pdf, other

    cs.SD eess.AS

    SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: This report describes the SJTU-AISPEECH system for the Voxceleb Speaker Recognition Challenge 2022. For track1, we implemented two kinds of systems, the online system and the offline system. Different ResNet-based backbones and loss functions are explored. Our final fusion system achieved 3rd place in track1. For track3, we implemented statistic adaptation and jointly training based domain adaptat… ▽ More

    Submitted 20 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: System description of VoxSRC 2022

  10. arXiv:2206.02146  [pdf, other

    cs.CV eess.IV

    Recurrent Video Restoration Transformer with Guided Deformable Attention

    Authors: Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc Van Gool

    Abstract: Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusi… ▽ More

    Submitted 12 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted by NeurIPS 2022. Code: https://github.com/JingyunLiang/RVRT

  11. arXiv:2205.10619  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A Pilot Study of Relating MYCN-Gene Amplification with Neuroblastoma-Patient CT Scans

    Authors: Zihan Zhang, Xiang Xiang, Xuehua Peng, Jianbo Shao

    Abstract: Neuroblastoma is one of the most common cancers in infants, and the initial diagnosis of this disease is difficult. At present, the MYCN gene amplification (MNA) status is detected by invasive pathological examination of tumor samples. This is time-consuming and may have a hidden impact on children. To handle this problem, we adopt multiple machine learning (ML) algorithms to predict the presence… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  12. Low-Interception Waveform: To Prevent the Recognition of Spectrum Waveform Modulation via Adversarial Examples

    Authors: Haidong Xie, Jia Tan, Xiaoying Zhang, Nan Ji, Haihua Liao, Zuguo Yu, Xueshuang Xiang, Naijin Liu

    Abstract: Deep learning is applied to many complex tasks in the field of wireless communication, such as modulation recognition of spectrum waveforms, because of its convenience and efficiency. This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform. Some existing works address this problem directly using the conc… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 4 pages, 4 figures, published in 2021 34th General Assembly and Scientific Symposium of the International Union of Radio Science, URSI GASS 2021

    Journal ref: URSI GASS, 2021, pp. 1-4

  13. Adversarial Jamming for a More Effective Constellation Attack

    Authors: Haidong Xie, Yizhou Xu, Yuanqing Chen, Nan Ji, Shuai Yuan, Naijin Liu, Xueshuang Xiang

    Abstract: The common jamming mode in wireless communication is band barrage jamming, which is controllable and difficult to resist. Although this method is simple to implement, it is obviously not the best jamming waveform. Therefore, based on the idea of adversarial examples, we propose the adversarial jamming waveform, which can independently optimize and find the best jamming waveform. We attack QAM with… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 3 pages, 2 figures, published in The 13th International Symposium on Antennas, Propagation and EM Theory (ISAPE 2021)

  14. arXiv:2104.07473  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution

    Authors: Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

    Abstract: In this paper, we address the space-time video super-resolution, which aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. A naïve method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). Nevertheless, temporal interpolation and spatial upscaling are intra-related in t… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Journal version of "Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution"(CVPR-2020). 14 pages, 14 figures

  15. arXiv:2103.08259  [pdf, other

    eess.IV cs.CV cs.LG

    The QXS-SAROPT Dataset for Deep Learning in SAR-Optical Data Fusion

    Authors: Meiyu Huang, Yao Xu, Lixin Qian, Weili Shi, Yaqin Zhang, Wei Bao, Nan Wang, Xuejiao Liu, Xueshuang Xiang

    Abstract: Deep learning techniques have made an increasing impact on the field of remote sensing. However, deep neural networks based fusion of multimodal data from different remote sensors with heterogenous characteristics has not been fully explored, due to the lack of availability of big amounts of perfectly aligned multi-sensor image data with diverse scenes of high resolutions, especially for synthetic… ▽ More

    Submitted 25 April, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

  16. arXiv:2103.01524  [pdf, other

    eess.IV cs.CV cs.LG

    Feature-Align Network with Knowledge Distillation for Efficient Denoising

    Authors: Lucas D. Young, Fitsum A. Reda, Rakesh Ranjan, Jon Morton, Jun Hu, Yazhu Ling, Xiaoyu Xiang, David Liu, Vikas Chandra

    Abstract: We propose an efficient neural network for RAW image denoising. Although neural network-based denoising has been extensively studied for image restoration, little attention has been given to efficient denoising for compute limited and power sensitive devices, such as smartphones and smartwatches. In this paper, we present a novel architecture and a suite of training techniques for high quality den… ▽ More

    Submitted 17 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    MSC Class: 94A08 (Primary) 68T07; 65D19 (Secondary) ACM Class: I.4.5; I.2.6

  17. arXiv:2102.09828  [pdf, other

    cs.SD eess.AS

    AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

    Authors: Houjun Huang, Xu Xiang, Yexin Yang, Rao Ma, Yanmin Qian

    Abstract: This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipe… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  18. arXiv:2102.09817  [pdf, ps, other

    cs.SD eess.AS

    Unit selection synthesis based data augmentation for fixed phrase speaker verification

    Authors: Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang, Yanmin Qian

    Abstract: Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data augmentation methods usually focus on the diversity of acoustic environment, leaving the lexicon variation neglected. For text dependent speaker verification tasks, it's well-known that preparing training data with the target transcript is the most… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  19. arXiv:2101.08074  [pdf, other

    eess.SY cs.AI cs.LG cs.MA

    Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning

    Authors: Chao Yan, Xiaojia Xiang, Chang Wang, Zhen Lan

    Abstract: Developing the flocking behavior for a dynamic squad of fixed-wing UAVs is still a challenge due to kinematic complexity and environmental uncertainty. In this paper, we deal with the decentralized flocking and collision avoidance problem through deep reinforcement learning (DRL). Specifically, we formulate a decentralized DRL-based decision making framework from the perspective of every follower,… ▽ More

    Submitted 22 July, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: Accepted for publication in the proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  20. arXiv:2011.00200  [pdf, other

    cs.SD cs.CL eess.AS

    The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020

    Authors: Xu Xiang

    Abstract: This report describes the systems submitted to the first and second tracks of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020, which ranked second in both tracks. Three key points of the system pipeline are explored: (1) investigating multiple CNN architectures including ResNet, Res2Net and dual path network (DPN) to extract the x-vectors, (2) using a composite angular margin softmax loss… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

  21. arXiv:2010.08919  [pdf, other

    cs.CV cs.MM eess.IV

    Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

    Authors: Xiaoyu Xiang, Qian Lin, Jan P. Allebach

    Abstract: Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by explorin… ▽ More

    Submitted 17 December, 2020; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: 8 pages, 6 figures, 5 tables. Accepted by the 25th ICPR (2020)

  22. arXiv:2002.11616  [pdf, other

    cs.CV cs.MM eess.IV

    Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

    Authors: Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

    Abstract: In this paper, we explore the space-time video super-resolution task, which aims to generate a high-resolution (HR) slow-motion video from a low frame rate (LFR), low-resolution (LR) video. A simple solution is to split it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). However, temporal interpolation and spatial super-resolution are intra-related in this task… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: This work is accepted in CVPR 2020. The source code and pre-trained model are available on https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020. 12 pages, 10 figures

    ACM Class: I.2; I.4.3; I.4.4

  23. arXiv:1906.07317  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

    Authors: Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu

    Abstract: Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with softmax. However, this kind of loss function does not explicitly encourage inter-class separability and intra-class compactness. As a result, the emb… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: not accepted by INTERSPEECH 2019

  24. arXiv:1805.00362  [pdf

    eess.SP

    A code-free optical undersampling technique for broadband microwave spectrum measurement

    Authors: Guangyu Gao, Xueshuang Xiang, Qijun Liang, Naijin Liu

    Abstract: A novel broadband microwave (MW) spectrum measurement (BMSM) scheme based on code-free optical undersampling and homodyne detection is proposed. The fully analog generation of optical pulses with a far-less-than-Nyquist rate is only through modulating cascaded electrooptical modulators by a single RF tone instead of any high-speed coding sequence modulation. Homodyne detection will reduce the anal… ▽ More

    Submitted 31 July, 2019; v1 submitted 29 April, 2018; originally announced May 2018.

    Comments: 3 pages and 7 figures