Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Bai, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05551  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

    Authors: Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li

    Abstract: Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the i… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  2. arXiv:2406.04679  [pdf, other

    eess.IV cs.CV

    XctDiff: Reconstruction of CT Images with Consistent Anatomical Structures from a Single Radiographic Projection Image

    Authors: Qingze Bai, Tiange Liu, Zhi Liu, Yubing Tong, Drew Torigian, Jayaram Udupa

    Abstract: In this paper, we present XctDiff, an algorithm framework for reconstructing CT from a single radiograph, which decomposes the reconstruction process into two easily controllable tasks: feature extraction and CT reconstruction. Specifically, we first design a progressive feature extraction strategy that is able to extract robust 3D priors from radiographs. Then, we use the extracted prior informat… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2309.11730  [pdf, other

    eess.AS cs.SD

    Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

    Authors: Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li

    Abstract: Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level features to the downstream speaker recognition task. However, this approach introduces extra parameters as the pretrained model remains in the inference s… ▽ More

    Submitted 26 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: submitted to ICASSP 2024

  4. arXiv:2212.03752  [pdf, other

    cs.CV eess.IV

    GLeaD: Improving GANs with A Generator-Leading Task

    Authors: Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen

    Abstract: Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition. Towards a fairer game in GANs, we propose a new paradigm for adversarial training, which… ▽ More

    Submitted 6 June, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: CVPR2023. Project page: https://ezioby.github.io/glead/ Code: https://github.com/EzioBy/glead/

  5. arXiv:2208.02189  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

    Authors: Qibing Bai, Tom Ko, Yu Zhang

    Abstract: In human speech, the attitude of a speaker cannot be fully expressed only by the textual content. It has to come along with the intonation. Declarative questions are commonly used in daily Cantonese conversations, and they are usually uttered with rising intonation. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences due to the loss of… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Accepted by INTERSPEECH 2022

  6. arXiv:2206.05850  [pdf, other

    cs.LG cs.AI eess.SY

    Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

    Authors: Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value fu… ▽ More

    Submitted 16 May, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: The latest version fixed the error in the proof of Lemma 4 in AAAI2023

  7. arXiv:2205.09048  [pdf, other

    eess.IV cs.CV

    Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners

    Authors: Hao Quan, Xingyu Li, Weixing Chen, Qun Bai, Mingchen Zou, Ruijie Yang, Tingting Zheng, Ruiqun Qi, Xinghua Gao, Xiaoyu Cui

    Abstract: Based on digital pathology slice scanning technology, artificial intelligence algorithms represented by deep learning have achieved remarkable results in the field of computational pathology. Compared to other medical images, pathology images are more difficult to annotate, and thus, there is an extreme lack of available datasets for conducting supervised learning to train robust deep learning mod… ▽ More

    Submitted 15 November, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

  8. arXiv:2205.08993  [pdf, other

    cs.CL eess.AS

    Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

    Authors: Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang

    Abstract: Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  9. arXiv:2205.02850  [pdf

    eess.IV cs.AI cs.CV

    A Deep Reinforcement Learning Framework for Rapid Diagnosis of Whole Slide Pathological Images

    Authors: Tingting Zheng, Weixing chen, Shuqin Li, Hao Quan, Qun Bai, Tianhang Nan, Song Zheng, Xinghua Gao, Yue Zhao, Xiaoyu Cui

    Abstract: The deep neural network is a research hotspot for histopathological image analysis, which can improve the efficiency and accuracy of diagnosis for pathologists or be used for disease screening. The whole slide pathological image can reach one gigapixel and contains abundant tissue feature information, which needs to be divided into a lot of patches in the training and inference stages. This will l… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  10. arXiv:2203.15610  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

    Authors: Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

    Abstract: Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pr… ▽ More

    Submitted 18 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, accepted to Insterspeech 2022

  11. arXiv:2106.06680  [pdf, other

    cs.LG cs.AI eess.SY

    Markov Decision Processes with Long-Term Average Constraints

    Authors: Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

    Abstract: We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximize the long-term average reward while simultaneously keeping the $K$ long-term average costs lower than a certain threshold. In this paper, we propose… ▽ More

    Submitted 20 June, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

  12. arXiv:2105.14125  [pdf, other

    cs.LG cs.AI eess.SY

    Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

    Authors: Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

    Abstract: Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, a biased estimator is proposed. The pr… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  13. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  14. arXiv:2104.11599  [pdf, other

    cs.CV eess.IV

    Region-Adaptive Deformable Network for Image Quality Assessment

    Authors: Shuwei Shi, Qingyan Bai, Mingdeng Cao, Weihao Xia, Jiahao Wang, Yifan Chen, Yujiu Yang

    Abstract: Image quality assessment (IQA) aims to assess the perceptual quality of images. The outputs of the IQA algorithms are expected to be consistent with human subjective perception. In image restoration and enhancement tasks, images generated by generative adversarial networks (GAN) can achieve better visual performance than traditional CNN-generated images, although they have spatial shift and textur… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: CVPR NTIRE Workshop 2021. The first two authors contribute equally to this work. Code is available at https://github.com/IIGROUP/RADN

  15. arXiv:2006.05961   

    cs.LG cs.NI eess.SY math.OC stat.ML

    Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

    Authors: Qinbo Bai, Vaneet Aggarwal, Ather Gattami

    Abstract: In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the transition probabilities are not known. In the presence of long-term (or average) constraints, the agent has to choose a policy that maximizes the long-term average rewa… ▽ More

    Submitted 30 January, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: The result has error

  16. arXiv:2003.05555  [pdf, other

    math.OC cs.LG eess.SY stat.ML

    Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

    Authors: Qinbo Bai, Vaneet Aggarwal, Ather Gattami

    Abstract: In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model… ▽ More

    Submitted 13 June, 2022; v1 submitted 11 March, 2020; originally announced March 2020.

  17. arXiv:1908.11013  [pdf, other

    eess.SP

    Deep Learning based Channel Estimation Algorithm over Time Selective Fading Channels

    Authors: Qinbo Bai, Jintao Wang, Yue Zhang, Jian Song

    Abstract: The research about deep learning application for physical layer has been received much attention in recent years. In this paper, we propose a Deep Learning (DL) based channel estimator under time varying Rayleigh fading channel. We build up, train and test the channel estimator using Neural Network (NN). The proposed DL-based estimator can dynamically track the channel status without any prior kno… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.