Zum Hauptinhalt springen

Showing 151–200 of 652 results for author: Qian, Y

.
  1. arXiv:2305.10788  [pdf, other

    cs.SD cs.CL eess.AS

    Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

    Authors: Hang Shao, Wei Wang, Bei Liu, Xun Gong, Haoyu Wang, Yanmin Qian

    Abstract: Due to the rapid development of computing hardware resources and the dramatic growth of data, pre-trained models in speech recognition, such as Whisper, have significantly improved the performance of speech recognition tasks. However, these models usually have a high computational overhead, making it difficult to execute effectively on resource-constrained devices. To speed up inference and reduce… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  2. arXiv:2305.10704  [pdf, other

    cs.SD eess.AS

    Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

    Authors: Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian

    Abstract: This paper proposes a novel Attention-based Encoder-Decoder network for End-to-End Neural speaker Diarization (AED-EEND). In AED-EEND system, we incorporate the target speaker enrollment information used in target speaker voice activity detection (TS-VAD) to calculate the attractor, which can mitigate the speaker permutation problem and facilitate easier model convergence. In the training process,… ▽ More

    Submitted 15 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by InterSpeech 2023

  3. arXiv:2305.00437  [pdf, ps, other

    cond-mat.str-el cond-mat.mtrl-sci

    Temperature-Dependent and Magnetism-Controlled Fermi Surface Changes in Magnetic Weyl Semimetals

    Authors: Nan Zhang, Xianyong Ding, Fangyang Zhan, Houpu Li, Hongyu Li, Kaixin Tang, Yingcai Qian, Senyang Pan, Xiaoliang Xiao, Jinglei Zhang, Rui Wang, Ziji Xiang, Xianhui Chen

    Abstract: The coupling between band structure and magnetism can lead to intricate Fermi surface modifications. Here we report on the comprehensive study of the Shubnikov-de Haas (SdH) effect in two rare-earth-based magnetic Weyl semimetals, NdAlSi and CeAlSi$_{0.8}$Ge$_{0.2}$. The results show that the temperature evolution of topologically nontrivial Fermi surfaces strongly depends on magnetic configuratio… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 8 pages, 5 figures

    Journal ref: Phys. Rev. Research 5, L022013 (2023)

  4. arXiv:2304.14508  [pdf

    eess.IV cs.CV cs.LG

    3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation

    Authors: Rui Nian, Guoyao Zhang, Yao Sui, Yuqi Qian, Qiuying Li, Mingzhang Zhao, Jianhui Li, Ali Gholipour, Simon K. Warfield

    Abstract: Magnetic resonance imaging (MRI) is critically important for brain mapping in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neu… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: 10 pages, 4 figures

    MSC Class: 68T07 ACM Class: I.4.6; I.5.1

  5. arXiv:2304.12259  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci physics.data-an

    Imaging 3D Chemistry at 1 nm Resolution with Fused Multi-Modal Electron Tomography

    Authors: Jonathan Schwartz, Zichao Wendy Di, Yi Jiang, Jason Manassa, Jacob Pietryga, Yiwen Qian, Min Gee Cho, Jonathan L. Rowell, Huihuo Zheng, Richard D. Robinson, Junsi Gu, Alexey Kirilin, Steve Rozeveld, Peter Ercius, Jeffrey A. Fessler, Ting Xu, Mary Scott, Robert Hovden

    Abstract: Measuring the three-dimensional (3D) distribution of chemistry in nanoscale matter is a longstanding challenge for metrological science. The inelastic scattering events required for 3D chemical imaging are too rare, requiring high beam exposure that destroys the specimen before an experiment completes. Even larger doses are required to achieve high resolution. Thus, chemical mapping in 3D has been… ▽ More

    Submitted 18 June, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Journal ref: Nat Commun 15, 3555 (2024)

  6. arXiv:2304.11550  [pdf, other

    eess.SY

    Provable Reach-avoid Controllers Synthesis Based on Inner-approximating Controlled Reach-avoid Sets

    Authors: Jianqiang Ding, Taoran Wu, Yuping Qian, Lijun Zhang, Bai Xue

    Abstract: In this paper, we propose an approach for synthesizing provable reach-avoid controllers, which drive a deterministic system operating in an unknown environment to safely reach a desired target set. The approach falls within the reachability analysis framework and is based on the computation of inner-approximations of controlled reach-avoid sets(CRSs). Given a target set and a safe set, the control… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

  7. arXiv:2304.06342  [pdf, other

    cs.CV cs.GR

    RoSI: Recovering 3D Shape Interiors from Few Articulation Images

    Authors: Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang

    Abstract: The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set o… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  8. arXiv:2304.05754  [pdf, other

    cs.SD eess.AS

    Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification

    Authors: Bing Han, Zhengyang Chen, Yanmin Qian

    Abstract: Automatic speaker verification task has made great achievements using deep learning approaches with the large-scale manually annotated dataset. However, it's very difficult and expensive to collect a large amount of well-labeled data for system building. In this paper, we propose a novel and advanced self-supervised learning framework which can construct a high performance speaker verification sys… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Submitted to TASLP in July 19, 2022

  9. arXiv:2304.04162  [pdf, other

    cs.GT cs.DC cs.LG

    Design of Two-Level Incentive Mechanisms for Hierarchical Federated Learning

    Authors: Shunfeng Chu, Jun Li, Kang Wei, Yuwen Qian, Kunlun Wang, Feng Shu, Wen Chen

    Abstract: Hierarchical Federated Learning (HFL) is a distributed machine learning paradigm tailored for multi-tiered computation architectures, which supports massive access of devices' models simultaneously. To enable efficient HFL, it is crucial to design suitable incentive mechanisms to ensure that devices actively participate in local training. However, there are few studies on incentive mechanism desig… ▽ More

    Submitted 16 January, 2024; v1 submitted 9 April, 2023; originally announced April 2023.

  10. arXiv:2304.03981  [pdf, other

    cs.LG cs.CV

    Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

    Authors: Meng Wang, Tian Lin, Lianyu Wang, Aidi Lin, Ke Zou, Xinxing Xu, Yi Zhou, Yuanyuan Peng, Qingquan Meng, Yiming Qian, Guoyao Deng, Zhiqun Wu, Junhong Chen, Jianhong Lin, Mingzhi Zhang, Weifang Zhu, Changqing Zhang, Daoqiang Zhang, Rick Siow Mong Goh, Yong Liu, Chi Pui Pang, Xinjian Chen, Haoyu Chen, Huazhu Fu

    Abstract: Failure to recognize samples from the classes unseen during training is a major limitation of artificial intelligence in the real-world implementation for recognition and classification of retinal anomalies. We established an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions. Besides assessing the probability of each category, UIOS also calcul… ▽ More

    Submitted 29 August, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

  11. arXiv:2304.03359  [pdf, other

    cs.DC eess.SY

    Approximate Wireless Communication for Federated Learning

    Authors: Xiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian

    Abstract: This paper presents an approximate wireless communication scheme for federated learning (FL) model aggregation in the uplink transmission. We consider a realistic channel that reveals bit errors during FL model exchange in wireless networks. Our study demonstrates that random bit errors during model transmission can significantly affect FL performance. To overcome this challenge, we propose an app… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  12. GIF: A General Graph Unlearning Strategy via Influence Function

    Authors: Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He

    Abstract: With the greater emphasis on privacy and security in our society, the problem of graph unlearning -- revoking the influence of specific data on the trained GNN model, is drawing increasing attention. However, ranging from machine unlearning to recently emerged graph unlearning methods, existing efforts either resort to retraining paradigm, or perform approximate erasure that fails to consider the… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted by WWW 2023

  13. Learning to Recover Spectral Reflectance from RGB Images

    Authors: Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang

    Abstract: This paper tackles spectral reflectance recovery (SRR) from RGB images. Since capturing ground-truth spectral reflectance and camera spectral sensitivity are challenging and costly, most existing approaches are trained on synthetic images and utilize the same parameters for all unseen testing images, which are suboptimal especially when the trained models are tested on real images because they nev… ▽ More

    Submitted 22 April, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: IEEE Transactions on Image Processing (TIP), 2024

  14. arXiv:2304.01849  [pdf, other

    stat.ME math.ST

    Semiparametric efficient estimation of genetic relatedness with machine learning methods

    Authors: Xu Guo, Yiyuan Qian, Hongwei Shi, Weichao Yang, Niwen Zhou

    Abstract: In this paper, we propose semiparametric efficient estimators of genetic relatedness between two traits in a model-free framework. Most existing methods require specifying certain parametric models involving the traits and genetic variants. However, the bias due to model misspecification may yield misleading statistical results. Moreover, the semiparametric efficient bounds for estimators of genet… ▽ More

    Submitted 2 June, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: 46pages,9 tables, 1 figure

  15. arXiv:2303.15790  [pdf, other

    hep-ex hep-ph physics.ins-det

    STCF Conceptual Design Report: Volume 1 -- Physics & Detector

    Authors: M. Achasov, X. C. Ai, R. Aliberti, L. P. An, Q. An, X. Z. Bai, Y. Bai, O. Bakina, A. Barnyakov, V. Blinov, V. Bobrovnikov, D. Bodrov, A. Bogomyagkov, A. Bondar, I. Boyko, Z. H. Bu, F. M. Cai, H. Cai, J. J. Cao, Q. H. Cao, Z. Cao, Q. Chang, K. T. Chao, D. Y. Chen, H. Chen , et al. (413 additional authors not shown)

    Abstract: The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,… ▽ More

    Submitted 5 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Journal ref: Front. Phys. 19(1), 14701 (2024)

  16. Federated Uncertainty-Aware Aggregation for Fundus Diabetic Retinopathy Staging

    Authors: Meng Wang, Lianyu Wang, Xinxing Xu, Ke Zou, Yiming Qian, Rick Siow Mong Goh, Yong Liu, Huazhu Fu

    Abstract: Deep learning models have shown promising performance in the field of diabetic retinopathy (DR) staging. However, collaboratively training a DR staging model across multiple institutions remains a challenge due to non-iid data, client reliability, and confidence evaluation of the prediction. To address these issues, we propose a novel federated uncertainty-aware aggregation paradigm (FedUAA), whic… ▽ More

    Submitted 22 July, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Report number: 978-3-031-43894-3

    Journal ref: Medical Image Computing and Computer Assisted Intervention(MICCAI 2023)

  17. arXiv:2303.12370  [pdf, other

    cs.CV

    Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

    Authors: Sixun Dong, Huazhang Hu, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao

    Abstract: Sequential video understanding, as an emerging video understanding task, has driven lots of researchers' attention because of its goal-oriented nature. This paper studies weakly supervised sequential video understanding where the accurate time-stamp level text-video alignment is not provided. We solve this task by borrowing ideas from CLIP. Specifically, we use a transformer to aggregate frame-lev… ▽ More

    Submitted 28 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Code: https://github.com/svip-lab/WeakSVR

  18. arXiv:2303.12245  [pdf, other

    math.NA cs.LG physics.comp-ph physics.flu-dyn

    Error Analysis of Physics-Informed Neural Networks for Approximating Dynamic PDEs of Second Order in Time

    Authors: Yanxia Qian, Yongchao Zhang, Yunqing Huang, Suchuan Dong

    Abstract: We consider the approximation of a class of dynamic partial differential equations (PDE) of second order in time by the physics-informed neural network (PINN) approach, and provide an error analysis of PINN for the wave equation, the Sine-Gordon equation and the linear elastodynamic equation. Our analyses show that, with feed-forward neural networks having two hidden layers and the $\tanh$ activat… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 46 pages, 14 figures, 3 tables

  19. arXiv:2303.10949  [pdf, other

    eess.AS cs.CL cs.SD

    Code-Switching Text Generation and Injection in Mandarin-English ASR

    Authors: Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

    Abstract: Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance. Automatic Speech Recognition (ASR) with End-to-End (E2E) modeling for such speech can be a challenging task due to the lack of data. In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transd… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  20. arXiv:2303.10741  [pdf

    cs.CV cs.LG

    Computer Vision Estimation of Emotion Reaction Intensity in the Wild

    Authors: Yang Qian, Ali Kargarandehkordi, Onur Cezmi Mutlu, Saimourya Surabhi, Mohammadmahdi Honarmand, Dennis Paul Wall, Peter Washington

    Abstract: Emotions play an essential role in human communication. Developing computer vision models for automatic recognition of emotion expression can aid in a variety of domains, including robotics, digital behavioral healthcare, and media analytics. There are three types of emotional representations which are traditionally modeled in affective computing research: Action Units, Valence Arousal (VA), and C… ▽ More

    Submitted 2 August, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

  21. arXiv:2303.08372  [pdf, other

    eess.AS cs.SD

    Target Sound Extraction with Variable Cross-modality Clues

    Authors: Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

    Abstract: Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources. It often uses a model conditioned on a fixed form of target sound clues, such as a sound class label, which limits the ways in which users can interact with the model to specify the target sounds. To leverage… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  22. arXiv:2303.07623  [pdf, other

    physics.med-ph eess.IV

    Uncertainty-weighted Multi-tasking for $T_{1ρ}$ and T$_2$ Mapping in the Liver with Self-supervised Learning

    Authors: Chaoxing Huang, Yurui Qian, Jian Hou, Baiyan Jiang, Queenie Chan, Vincent WS Wong, Winnie CW Chu, Weitian Chen

    Abstract: Multi-parametric mapping of MRI relaxations in liver has the potential of revealing pathological information of the liver. A self-supervised learning based multi-parametric mapping method is proposed to map T$T_{1ρ}$ and T$_2$ simultaneously, by utilising the relaxation constraint in the learning process. Data noise of different mapping tasks is utilised to make the model uncertainty-aware, which… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  23. Symmetry and bipolar motion in collective neutrino flavor oscillations

    Authors: Zewei Xiong, Meng-Ru Wu, Yong-Zhong Qian

    Abstract: We identify a geometric symmetry on the two-flavor Bloch sphere for collective flavor oscillations of a homogeneous dense neutrino gas. Based on this symmetry, analytical solutions to the periodic bipolar flavor evolution are derived. Using numerical calculations, we show that for configurations without this symmetry, the flavor evolution displays deviations from the bipolar flavor motion or even… ▽ More

    Submitted 11 June, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: 11 pages, 4 figures

  24. arXiv:2303.02693  [pdf, other

    cs.CV cs.LG

    Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition

    Authors: Junyan Wang, Zhenhong Sun, Yichen Qian, Dong Gong, Xiuyu Sun, Ming Lin, Maurice Pagnucco, Yang Song

    Abstract: 3D convolution neural networks (CNNs) have been the prevailing option for video recognition. To capture the temporal information, 3D convolutions are computed along the sequences, leading to cubically growing and expensive computations. To reduce the computational cost, previous methods resort to manually designed 3D/2D CNN structures with approximations or automatic search, which sacrifice the mo… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

    Comments: This manuscript has been accepted at ICLR 2023

  25. arXiv:2302.14498  [pdf, other

    cs.SI cs.DB

    Effective Community Search on Large Attributed Bipartite Graphs

    Authors: Zongyu Xu, Yihao Zhang, Long Yuan, Yuwen Qian, Zi Chen, Mingliang Zhou, Qin Mao, Weibin Pan

    Abstract: Community search over bipartite graphs has attracted significant interest recently. In many applications such as user-item bipartite graph in E-commerce, customer-movie bipartite graph in movie rating website, nodes tend to have attributes, while previous community search algorithm on bipartite graphs ignore attributes, which makes the returned results with poor cohesion with respect to their node… ▽ More

    Submitted 28 February, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

  26. arXiv:2302.13763  [pdf, other

    cs.CR cs.LG

    Efficient and Low Overhead Website Fingerprinting Attacks and Defenses based on TCP/IP Traffic

    Authors: Guodong Huang, Chuan Ma, Ming Ding, Yuwen Qian, Chunpeng Ge, Liming Fang, Zhe Liu

    Abstract: Website fingerprinting attack is an extensively studied technique used in a web browser to analyze traffic patterns and thus infer confidential information about users. Several website fingerprinting attacks based on machine learning and deep learning tend to use the most typical features to achieve a satisfactory performance of attacking rate. However, these attacks suffer from several practical… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  27. arXiv:2302.11787  [pdf, other

    cs.CL

    Empathetic Response Generation via Emotion Cause Transition Graph

    Authors: Yushan Qian, Bo Wang, Ting-En Lin, Yinhe Zheng, Ying Zhu, Dongming Zhao, Yuexian Hou, Yuchuan Wu, Yongbin Li

    Abstract: Empathetic dialogue is a human-like behavior that requires the perception of both affective factors (e.g., emotion status) and cognitive factors (e.g., cause of the emotion). Besides concerning emotion status in early work, the latest approaches study emotion causes in empathetic dialogue. These approaches focus on understanding and duplicating emotion causes in the context to show empathy for the… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  28. arXiv:2302.08629  [pdf, other

    cs.LG

    Physics-based parameterized neural ordinary differential equations: prediction of laser ignition in a rocket combustor

    Authors: Yizhou Qian, Jonathan Wang, Quentin Douasbin, Eric Darve

    Abstract: In this work, we present a novel physics-based data-driven framework for reduced-order modeling of laser ignition in a model rocket combustor based on parameterized neural ordinary differential equations (PNODE). Deep neural networks are embedded as functions of high-dimensional parameters of laser ignition to predict various terms in a 0D flow model including the heat source function, pre-exponen… ▽ More

    Submitted 3 May, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  29. MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

    Authors: Jiaying Lu, Yongchen Qian, Shifan Zhao, Yuanzhe Xi, Carl Yang

    Abstract: Previous research has demonstrated the advantages of integrating data from multiple sources over traditional unimodal data, leading to the emergence of numerous novel multimodal applications. We propose a multimodal classification benchmark MuG with eight datasets that allows researchers to evaluate and improve their models. These datasets are collected from four various genres of games that cover… ▽ More

    Submitted 17 October, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Journal ref: In Findings of the Association for Computational Linguistics: EMNLP 2023

  30. arXiv:2302.00613  [pdf

    physics.app-ph physics.optics

    Impact of Surface Roughness in Measuring Optoelectronic Characteristics of Thin-Film Solar Cells

    Authors: David Magginetti, Seokmin Jeon, Yohan Yoon, Ashif Choudhury, Ashraful Mamun, Yang Qian, Jordan Gerton, Heayoung Yoon

    Abstract: Microstructural properties of thin-film absorber layers play a vital role in developing high-performance solar cells. Scanning probe microscopy is frequently used for measuring spatially inhomogeneous properties of thin-film solar cells. While powerful, the nanoscale probe can be sensitive to the roughness of samples, introducing convoluted signals and unintended artifacts into the measurement. He… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: 4 pages, 4 figures

    Journal ref: 2023 IEEE 50th Photovoltaic Specialists Conference

  31. arXiv:2301.13356  [pdf, other

    cs.CV cs.LG

    Inference Time Evidences of Adversarial Attacks for Forensic on Transformers

    Authors: Hugo Lemarchant, Liangzi Li, Yiming Qian, Yuta Nakashima, Hajime Nagahara

    Abstract: Vision Transformers (ViTs) are becoming a very popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification. However, although early works implied that this network structure had increased robustness against adversarial attacks, some works argue ViTs are still vulnerable. This paper presents our first attempt toward detecting adversarial attacks during inf… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  32. arXiv:2301.12798  [pdf, other

    cs.CV

    Reliable Federated Disentangling Network for Non-IID Domain Feature

    Authors: Meng Wang, Kai Yu, Chun-Mei Feng, Yiming Qian, Ke Zou, Lianyu Wang, Rick Siow Mong Goh, Yong Liu, Huazhu Fu

    Abstract: Federated learning (FL), as an effective decentralized distributed learning approach, enables multiple institutions to jointly train a model without sharing their local data. However, the domain feature shift caused by different acquisition devices/clients substantially degrades the performance of the FL model. Furthermore, most existing FL approaches aim to improve accuracy without considering re… ▽ More

    Submitted 19 September, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  33. arXiv:2301.10460  [pdf, other

    cs.CV

    HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling

    Authors: Fenggen Yu, Yiming Qian, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

    Abstract: We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts. For the same reason, the necessary data annotation effort is tremendous, motivating approaches to minimize human involvement. Our labeling tool iteratively verifies… ▽ More

    Submitted 1 April, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Accepted to ICCV 2023

  34. arXiv:2301.10181  [pdf, other

    eess.SP cs.LG

    Interpretable Tsetlin Machine-based Premature Ventricular Contraction Identification

    Authors: Jinbao Zhang, Xuan Zhang, Lei Jiao, Ole-Christoffer Granmo, Yongjun Qian, Fan Pan

    Abstract: Neural network-based models have found wide use in automatic long-term electrocardiogram (ECG) analysis. However, such black box models are inadequate for analysing physiological signals where credibility and interpretability are crucial. Indeed, how to make ECG analysis transparent is still an open problem. In this study, we develop a Tsetlin machine (TM) based architecture for premature ventricu… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  35. arXiv:2301.06059  [pdf, other

    cs.GR cs.CV

    Learning Audio-Driven Viseme Dynamics for 3D Face Animation

    Authors: Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen, Xuefei Zhe, Di Kang

    Abstract: We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D facial animations from the input audio. Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs. The core of our approach is a novel parametric viseme fitting algorithm that utilizes phoneme priors to extract vis… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: Project page: https://linchaobao.github.io/viseme2023/

  36. arXiv:2301.04907  [pdf, other

    cs.CL cs.HC

    Think Twice: A Human-like Two-stage Conversational Agent for Emotional Response Generation

    Authors: Yushan Qian, Bo Wang, Shangzhao Ma, Wu Bin, Shuo Zhang, Dongming Zhao, Kun Huang, Yuexian Hou

    Abstract: Towards human-like dialogue systems, current emotional dialogue approaches jointly model emotion and semantics with a unified neural network. This strategy tends to generate safe responses due to the mutual restriction between emotion and semantics, and requires rare emotion-annotated large-scale dialogue corpus. Inspired by the "think twice" behavior in human dialogue, we propose a two-stage conv… ▽ More

    Submitted 8 June, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Accepted to AAMAS2023

  37. arXiv:2212.13860  [pdf

    cs.CL cs.DL

    Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

    Authors: Chengzhi Zhang, Yi Xiang, Wenke Hao, Zhicheng Li, Yuchen Qian, Yuzhuo Wang

    Abstract: Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to lo… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  38. arXiv:2212.12984  [pdf, other

    math.NA cs.LG

    MC-Nonlocal-PINNs: handling nonlocal operators in PINNs via Monte Carlo sampling

    Authors: Xiaodong Feng, Yue Qian, Wanfang Shen

    Abstract: We propose, Monte Carlo Nonlocal physics-informed neural networks (MC-Nonlocal-PINNs), which is a generalization of MC-fPINNs in \cite{guo2022monte}, for solving general nonlocal models such as integral equations and nonlocal PDEs. Similar as in MC-fPINNs, our MC-Nonlocal-PINNs handle the nonlocal operators in a Monte Carlo way, resulting in a very stable approach for high dimensional problems. We… ▽ More

    Submitted 25 December, 2022; originally announced December 2022.

    Comments: 23pages, 13figures

  39. arXiv:2212.08892  [pdf, other

    cs.CV

    Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis

    Authors: Qijian Zhang, Junhui Hou, Yue Qian, Yiming Zeng, Juyong Zhang, Ying He

    Abstract: Point clouds are characterized by irregularity and unstructuredness, which pose challenges in efficient data exploitation and discriminative feature extraction. In this paper, we present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology as a completely regular 2D point geometry image (PGI) structure, in which co… ▽ More

    Submitted 7 February, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

    Comments: Accepted to TPAMI

  40. One-Stage Cascade Refinement Networks for Infrared Small Target Detection

    Authors: Yimian Dai, Xiang Li, Fei Zhou, Yulei Qian, Yaohong Chen, Jian Yang

    Abstract: Single-frame InfraRed Small Target (SIRST) detection has been a challenging task due to a lack of inherent characteristics, imprecise bounding box regression, a scarcity of real-world datasets, and sensitive localization evaluation. In this paper, we propose a comprehensive solution to these challenges. First, we find that the existing anchor-free label assignment method is prone to mislabeling sm… ▽ More

    Submitted 31 December, 2022; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Submitted to TGRS

  41. Signature of Collapsars as Sources for High-energy Neutrinos and $r$-process Nuclei

    Authors: Gang Guo, Yong-Zhong Qian, Meng-Ru Wu

    Abstract: If collapsars are sources for both high-energy (HE) neutrinos and $r$-process nuclei, the profuse low-energy antineutrinos from $β$-decay of the newly-synthesized nuclei can annihilate the HE neutrinos. Considering HE neutrinos produced at internal shocks induced by intermittent mildly-magnetized jets, we show that such annihilation suppresses the overall HE neutrino spectrum at $\gtrsim 300$~TeV… ▽ More

    Submitted 23 July, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 10 pages, 7 figures

    Journal ref: Phys. Rev. D 108, L021303 (2023)

  42. arXiv:2212.04019  [pdf, other

    quant-ph

    Silicon-based decoder for polarization-encoding quantum key distribution

    Authors: Yongqiang Du, Xun Zhu, Xin Hua, Zhengeng Zhao, Xiao Hu, Yi Qian, Xi Xiao, Kejin Wei

    Abstract: Silicon-based polarization-encoding quantum key distribution (QKD) has been widely studied, owing to its low cost and robustness. However, prior studies have utilized off-chip devices to demodulate the quantum states or perform polarization compensation, given the difficulty of fabricating polarized independent components on the chip. In this paper, we propose a fully chip-based decoder for polari… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 11 pages

  43. arXiv:2212.02163  [pdf, other

    cs.CV

    2D Human Pose Estimation with Explicit Anatomical Keypoints Structure Constraints

    Authors: Zhangjian Ji, Zilong Wang, Ming Zhang, Yapeng Chen, Yuhua Qian

    Abstract: Recently, human pose estimation mainly focuses on how to design a more effective and better deep network structure as human features extractor, and most designed feature extraction networks only introduce the position of each anatomical keypoint to guide their training process. However, we found that some human anatomical keypoints kept their topology invariance, which can help to localize them mo… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  44. arXiv:2211.12277  [pdf, other

    cs.CV

    Semantic Guided Level-Category Hybrid Prediction Network for Hierarchical Image Classification

    Authors: Peng Wang, Jingzhou Chen, Yuntao Qian

    Abstract: Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure. The existing deep learning based HC methods usually predict an instance starting from the root node until a leaf node is reached. However, in the real world, images interfered by noise, occlusion, blur, or low resolution may not provide sufficient information for the classification at… ▽ More

    Submitted 31 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

  45. arXiv:2211.09412  [pdf, other

    cs.SD cs.CL eess.AS

    LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: Traditional automatic speech recognition~(ASR) systems usually focus on individual utterances, without considering long-form speech with useful historical information, which is more practical in real scenarios. Simply attending longer transcription history for a vanilla neural transducer model shows no much gain in our preliminary experiments, since the prediction network is not a pure language mo… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  46. arXiv:2211.04219  [pdf, other

    cs.CR

    Nimbus: Toward Speed Up Function Signature Recovery via Input Resizing and Multi-Task Learning

    Authors: Yi Qian, Ligeng Chen, Yuyang Wang, Bing Mao

    Abstract: Function signature recovery is important for many binary analysis tasks such as control-flow integrity enforcement, clone detection, and bug finding. Existing works try to substitute learning-based methods with rule-based methods to reduce human effort.They made considerable efforts to enhance the system's performance, which also bring the side effect of higher resource consumption. However, recov… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  47. arXiv:2211.01267  [pdf, other

    cs.CL cs.IR

    Multi-Vector Retrieval as Sparse Alignment

    Authors: Yujie Qian, Jinhyuk Lee, Sai Meher Karthik Duddu, Zhuyun Dai, Siddhartha Brahma, Iftekhar Naim, Tao Lei, Vincent Y. Zhao

    Abstract: Multi-vector retrieval models improve over single-vector dual encoders on many information retrieval tasks. In this paper, we cast the multi-vector retrieval problem as sparse alignment between query and document tokens. We propose AligneR, a novel multi-vector retrieval model that learns sparsified pairwise alignments between query and document tokens (e.g. `dog' vs. `puppy') and per-token unary… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  48. arXiv:2211.00815  [pdf, other

    cs.SD eess.AS

    Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: Many speaker recognition challenges have been held to assess the speaker verification system in the wild and probe the performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on the voxceleb, is the most popular. Besides, another challenge called CN-Celeb Speaker Recognition Challenge (CNSRC) is also held this year, which is based on the Chinese celebrity multi-genre dataset CN-C… ▽ More

    Submitted 1 June, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted by InterSpeech 2023

  49. arXiv:2210.17016  [pdf, other

    cs.SD eess.AS

    Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

    Authors: Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

    Abstract: Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

  50. arXiv:2210.15936  [pdf, other

    cs.SD eess.AS

    A comprehensive study on self-supervised distillation for speaker representation learning

    Authors: Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng

    Abstract: In real application scenarios, it is often challenging to obtain a large amount of labeled data for speaker representation learning due to speaker privacy concerns. Self-supervised learning with no labels has become a more and more promising way to solve it. Compared with contrastive learning, self-distilled approaches use only positive samples in the loss function and thus are more attractive. In… ▽ More

    Submitted 25 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT2022