Zum Hauptinhalt springen

Showing 1–50 of 80 results for author: Ou, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11407  [pdf, other

    cs.CV

    Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection

    Authors: Liang Yao, Fan Liu, Chuanyi Zhang, Zhiquan Ou, Ting Wu

    Abstract: Knowledge distillation (KD) is an effective method for compressing models in object detection tasks. Due to limited computational capability, UAV-based object detection (UAV-OD) widely adopt the KD technique to obtain lightweight detectors. Existing methods often overlook the significant differences in feature space caused by the large gap in scale between the teacher and student models. This limi… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2408.09377  [pdf, other

    cs.LG cs.IT stat.ML

    Mutual Information Multinomial Estimation

    Authors: Yanzhi Chen, Zijing Ou, Adrian Weller, Yingzhen Li

    Abstract: Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning. This work proposes a new estimator for mutual information. Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate. This preliminary estimate serves as a bridge between the joint and the marginal distribution, and by comparing with this br… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  3. arXiv:2408.00938  [pdf, other

    eess.IV cs.AI cs.CV

    CIResDiff: A Clinically-Informed Residual Diffusion Model for Predicting Idiopathic Pulmonary Fibrosis Progression

    Authors: Caiwen Jiang, Xiaodan Xing, Zaixin Ou, Mianxin Liu, Walsh Simon, Guang Yang, Dinggang Shen

    Abstract: The progression of Idiopathic Pulmonary Fibrosis (IPF) significantly correlates with higher patient mortality rates. Early detection of IPF progression is critical for initiating timely treatment, which can effectively slow down the advancement of the disease. However, the current clinical criteria define disease progression requiring two CT scans with a one-year interval, presenting a dilemma: a… ▽ More

    Submitted 5 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

  4. arXiv:2407.20684  [pdf, other

    cs.IR cs.AI

    RevGNN: Negative Sampling Enhanced Contrastive Graph Learning for Academic Reviewer Recommendation

    Authors: Weibin Liao, Yifan Zhu, Yanyan Li, Qi Zhang, Zhonghong Ou, Xuesong Li

    Abstract: Acquiring reviewers for academic submissions is a challenging recommendation scenario. Recent graph learning-driven models have made remarkable progress in the field of recommendation, but their performance in the academic reviewer recommendation task may suffer from a significant false negative issue. This arises from the assumption that unobserved edges represent negative samples. In fact, the m… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Transactions on Information Systems (TOIS)

  5. arXiv:2407.15569  [pdf, other

    cs.CL

    An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought

    Authors: Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou

    Abstract: Since the launch of ChatGPT at the end of 2022, generative dialogue models represented by ChatGPT have quickly become essential tools in daily life. As user expectations increase, enhancing the capability of generative dialogue models to solve complex problems has become a focal point of current research. This paper delves into the effectiveness of the RAFT (Retrieval Augmented Fine-Tuning) method… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  6. arXiv:2407.13292  [pdf, other

    cs.SD cs.CL eess.AS

    Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

    Authors: Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou

    Abstract: The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  7. arXiv:2407.10255  [pdf, other

    cs.SD eess.AS

    CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR

    Authors: Wenbo Zhao, Ziwei Li, Chuan Yu, Zhijian Ou

    Abstract: Streaming automatic speech recognition (ASR) is very important for many real-world ASR applications. However, a notable challenge for streaming ASR systems lies in balancing operational performance against latency constraint. Recently, a method of chunking, simulating future context and decoding, called CUSIDE, has been proposed for connectionist temporal classification (CTC) based streaming ASR,… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  8. arXiv:2406.17519  [pdf, other

    cs.CL

    Entropy-Based Decoding for Retrieval-Augmented Large Language Models

    Authors: Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King

    Abstract: Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, trainin… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  9. arXiv:2406.10808  [pdf, other

    cs.LG

    Diffusion Model With Optimal Covariance Matching

    Authors: Zijing Ou, Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yingzhen Li, David Barber

    Abstract: The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covarianc… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.09816  [pdf, other

    math.OC cs.MA

    A Zeroth-Order Proximal Algorithm for Consensus Optimization

    Authors: Chengan Wang, Zichong Ou, Jie Lu

    Abstract: This paper considers a consensus optimization problem, where all the nodes in a network, with access to the zeroth-order information of its local objective function only, attempt to cooperatively achieve a common minimizer of the sum of their local objectives. To address this problem, we develop ZoPro, a zeroth-order proximal algorithm, which incorporates a zeroth-order oracle for approximating He… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  11. arXiv:2406.02166  [pdf, other

    cs.SD cs.CL eess.AS

    Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

    Authors: Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou

    Abstract: There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2405.13084  [pdf, other

    cs.CL cs.AI

    The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG)

    Authors: Yucheng Cai, Si Chen, Yi Huang, Junlan Feng, Zhijian Ou

    Abstract: The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024

    Submitted 21 May, 2024; originally announced May 2024.

  13. arXiv:2404.11699  [pdf, other

    cs.RO

    Retrieval-Augmented Embodied Agents

    Authors: Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

    Abstract: Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  14. arXiv:2403.10961  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Energy-Based Models with Applications to Speech and Language Processing

    Authors: Zhijian Ou

    Abstract: Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are un-normalized and thus radically different from other popular self-normalized probabilistic models such as hidden Markov models (HMMs), autoregressive models, generative adversarial nets (GANs) and variational auto-encoders (VAEs). Over the past years, EB… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: The version before publisher editing

    Journal ref: Foundations and Trends in Signal Processing: Vol. 18: No. 1-2, pp 1-199

  15. arXiv:2403.06199  [pdf, other

    cs.CV cs.CL

    Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

    Authors: Minjie Zhu, Yichen Zhu, Xin Liu, Ning Liu, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Zhicai Ou, Feifei Feng, Jian Tang

    Abstract: Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning. Yet, their widespread application faces obstacles due to the high computational demands during both the training and inference phases, restricting their use to a limited audience within the research and user communities. In this paper, we investigate the design aspects… ▽ More

    Submitted 25 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  16. arXiv:2401.02330  [pdf, other

    cs.CV cs.CL

    LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model

    Authors: Yichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang

    Abstract: In this paper, we introduce LLaVA-$φ$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues. LLaVA-Phi marks a notable advancement in the realm of compact multi-modal models. It demonstrates that even smaller language models, with as few as 2.7B parameters, can effectively engage in intrica… ▽ More

    Submitted 22 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: The datasets were incomplete as they did not include all the necessary copyrights

  17. arXiv:2311.10271  [pdf, other

    cs.CL

    Prompt Pool based Class-Incremental Continual Learning for Dialog State Tracking

    Authors: Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Continual learning is crucial for dialog state tracking (DST) in dialog systems, since requirements from users for new functionalities are often encountered. However, most of existing continual learning methods for DST require task identities during testing, which is a severe limit in real-world applications. In this paper, we aim to address continual learning of DST in the class-incremental scena… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  18. arXiv:2310.03262  [pdf, other

    cs.CL

    Predicting Emergent Abilities with Infinite Resolution Evaluation

    Authors: Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

    Abstract: The scientific scale-up of large language models (LLMs) necessitates a comprehensive understanding of their scaling properties. However, the existing literature on the scaling properties only yields an incomplete answer: optimization loss decreases predictably as the model size increases, in line with established scaling law; yet no scaling law for task has been established and the task performanc… ▽ More

    Submitted 17 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: After revision

  19. arXiv:2309.11065  [pdf, other

    cs.CL

    UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt

    Authors: Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li

    Abstract: Recent research has shown that multi-task pre-training greatly improves the model's robustness and transfer ability, which is crucial for building a high-quality dialog system. However, most previous works on multi-task pre-training rely heavily on human-defined input format or prompt, which is not optimal in quality and quantity. In this work, we propose to use Task-based Automatic Prompt generat… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  20. arXiv:2307.07595  [pdf, other

    stat.ML cs.LG

    Training Discrete Energy-Based Models with Energy Discrepancy

    Authors: Tobias Schröder, Zijing Ou, Yingzhen Li, Andrew B. Duncan

    Abstract: Training energy-based models (EBMs) on discrete spaces is challenging because sampling over such spaces can be difficult. We propose to train discrete EBMs with energy discrepancy (ED), a novel type of contrastive loss functional which only requires the evaluation of the energy function at data points and their perturbed counter parts, thus not relying on sampling strategies like Markov chain Mont… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Presented at ICML 2023 Workshop: Sampling and Optimization in Discrete Space (SODS 2023)

  21. arXiv:2307.06431  [pdf, other

    stat.ML cs.LG

    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

    Authors: Tobias Schröder, Zijing Ou, Jen Ning Lim, Yingzhen Li, Sebastian J. Vollmer, Andrew B. Duncan

    Abstract: Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likeli… ▽ More

    Submitted 27 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Camera Ready version for the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Changes in this revision: Appendix A1: Corrected proof of Theorem 1. Appendix D3: Added definition and numerical experiments for energy discrepancy on binary discrete spaces. Minor changes in the main text and correction of typos. Added new references

  22. arXiv:2305.13199  [pdf, other

    cs.CL

    Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision

    Authors: Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses. In real-life applications, user utterances are noisier, and thus it is more difficult to accurately track dialog states and correctly secure relevant knowledge. Recently, a progress in question answering and document-gro… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 5 pages, accepted by INTERSPEECH2023

  23. arXiv:2305.12676  [pdf, other

    cs.CL

    Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition

    Authors: Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao

    Abstract: Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networ… ▽ More

    Submitted 29 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted into INTERSPEECH 2023

  24. arXiv:2305.02139  [pdf, other

    cs.LG cs.CL

    A Curriculum View of Robust Loss Functions

    Authors: Zebin Ou, Yue Zhang

    Abstract: Robust loss functions are designed to combat the adverse impacts of label noise, whose robustness is typically supported by theoretical bounds agnostic to the training dynamics. However, these bounds may fail to characterize the empirical performance as it remains unclear why robust loss functions can underfit. We show that most loss functions can be rewritten into a form with the same class-score… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  25. arXiv:2304.10707  [pdf, other

    stat.ML cs.LG

    Persistently Trained, Diffusion-assisted Energy-based Models

    Authors: Xinwei Zhang, Zhiqiang Tan, Zhijian Ou

    Abstract: Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo.Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation. We propose to introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, throug… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: main text 8 pages

  26. arXiv:2304.03548  [pdf, other

    cs.CL

    GEMINI: Controlling the Sentence-level Writing Style for Abstractive Text Summarization

    Authors: Guangsheng Bao, Zebin Ou, Yue Zhang

    Abstract: Human experts write summaries using different techniques, including extracting a sentence from the document and rewriting it, or fusing various information from the document to abstract it. These techniques are flexible and thus difficult to be imitated by any single method. To address this issue, we propose an adaptive model, GEMINI, that integrates a rewriter and a generator to mimic the sentenc… ▽ More

    Submitted 9 December, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: EMNLP2023 camera-ready version. 8 pages, 5 figures, 6 tables

  27. arXiv:2211.02639  [pdf, other

    eess.IV cs.CV cs.LG

    PIPPI2021: An Approach to Automated Diagnosis and Texture Analysis of the Fetal Liver & Placenta in Fetal Growth Restriction

    Authors: Aya Mutaz Zeidan, Paula Ramirez Gilliland, Ashay Patel, Zhanchong Ou, Dimitra Flouri, Nada Mufti, Kasia Maksym, Rosalind Aughwane, Sebastien Ourselin, Anna David, Andrew Melbourne

    Abstract: Fetal growth restriction (FGR) is a prevalent pregnancy condition characterised by failure of the fetus to reach its genetically predetermined growth potential. We explore the application of model fitting techniques, linear regression machine learning models, deep learning regression, and Haralick textured features from multi-contrast MRI for multi-fetal organ analysis of FGR. We employed T2 relax… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  28. arXiv:2210.11720  [pdf, other

    cs.CL

    MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction

    Authors: Wangjie Jiang, Zhihao Ye, Zijing Ou, Ruihui Zhao, Jianguang Zheng, Yi Liu, Siheng Li, Bang Liu, Yujiu Yang, Yefeng Zheng

    Abstract: Chinese Spelling Correction (CSC) is gaining increasing attention due to its promise of automatically detecting and correcting spelling errors in Chinese texts. Despite its extensive use in many applications, like search engines and optical character recognition systems, little has been explored in medical scenarios in which complex and uncommon medical entities are easily misspelled. Correcting t… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: The full version of CIKM 2022 accepted resource paper "MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction". (https://dl.acm.org/doi/10.1145/3511808.3557636)

  29. arXiv:2210.08692  [pdf, other

    cs.CL cs.AI

    A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

    Authors: Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Building user simulators (USs) for reinforcement learning (RL) of task-oriented dialog systems (DSs) has gained more and more attention, which, however, still faces several fundamental challenges. First, it is unclear whether we can leverage pretrained language models to design, for example, GPT-2 based USs, to catch up and interact with the recently advanced GPT-2 based DSs. Second, an important… ▽ More

    Submitted 18 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 SereTOD Workshop

  30. arXiv:2210.06706  [pdf, other

    cs.CL cs.AI

    Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture

    Authors: Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as in… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: An early version of Markovian Generative Architectures (MGA) and Generative User Simulator (GUS)

  31. arXiv:2209.13464  [pdf, other

    cs.CL cs.AI

    Information Extraction and Human-Robot Dialogue towards Real-life Tasks: A Baseline Study with the MobileCS Dataset

    Authors: Hong Liu, Hao Peng, Zhijian Ou, Juanzi Li, Yi Huang, Junlan Feng

    Abstract: Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games. However, the Wizard-of-Oz data are in fact simulated data and thus are fundamentally different from real-life conversations, which are more noisy and casual. Recently, the SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog t… ▽ More

    Submitted 18 October, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by EMNLP 2022 SereTOD Workshop

  32. arXiv:2207.12235  [pdf, other

    cs.CL

    Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models

    Authors: Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Developing semi-supervised task-oriented dialog (TOD) systems by leveraging unlabeled dialog data has attracted increasing interests. For semi-supervised learning of latent state TOD models, variational learning is often used, but suffers from the annoying high-variance of the gradients propagated through discrete latent variables and the drawback of indirectly optimizing the target log-likelihood… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted into SIGDIAL 2022

  33. arXiv:2207.02657  [pdf, other

    cs.CL

    A Challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems

    Authors: Zhijian Ou, Junlan Feng, Juanzi Li, Yakun Li, Hong Liu, Hao Peng, Yi Huang, Jiangjiang Zhao

    Abstract: A challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems, Co-located with EMNLP2022 SereTOD Workshop.

    Submitted 25 September, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: Version 2.1

  34. arXiv:2205.06530  [pdf, other

    cs.CV

    Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering

    Authors: Zenan Xu, Wanjun Zhong, Qinliang Su, Zijing Ou, Fuwei Zhang

    Abstract: A key challenge in video question answering is how to realize the cross-modal semantic alignment between textual concepts and corresponding visual objects. Existing methods mostly seek to align the word representations with the video regions. However, word representations are often not able to convey a complete description of textual concepts, which are in general described by the compositions of… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 11pages, 7 figures

  35. arXiv:2204.07367  [pdf, other

    cs.CL

    On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

    Authors: Zebin Ou, Meishan Zhang, Yue Zhang

    Abstract: Word ordering is a constrained language generation task taking unordered words as input. Existing work uses linear models and neural networks for the task, yet pre-trained language models have not been studied in word ordering, let alone why they help. We use BART as an instance and show its effectiveness in the task. To explain why BART helps word ordering, we extend analysis with probing and emp… ▽ More

    Submitted 28 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: COLING 2022

  36. arXiv:2204.06452  [pdf, other

    cs.CL cs.HC

    Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

    Authors: Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using… ▽ More

    Submitted 13 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted by SLT 2022

  37. arXiv:2203.16776  [pdf, ps, other

    eess.AS cs.CL cs.LG

    An Empirical Study of Language Model Integration for Transducer based Speech Recognition

    Authors: Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan

    Abstract: Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract th… ▽ More

    Submitted 3 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  38. arXiv:2203.16758  [pdf, other

    eess.AS cs.CL

    CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

    Authors: Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, with… ▽ More

    Submitted 2 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  39. arXiv:2203.16757  [pdf, ps, other

    eess.AS cs.CL

    Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

    Authors: Keyu An, Ji Xiao, Zhijian Ou

    Abstract: Recently, the end-to-end training approach for multi-channel ASR has shown its effectiveness, which usually consists of a beamforming front-end and a recognition back-end. However, the end-to-end training becomes more difficult due to the integration of multiple modules, particularly considering that multi-channel speech data recorded in real environments are limited in size. This raises the deman… ▽ More

    Submitted 8 October, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted by ISCSLP 2022. arXiv admin note: substantial text overlap with arXiv:2107.02670

  40. arXiv:2203.01693  [pdf, other

    cs.LG

    Learning Neural Set Functions Under the Optimal Subset Oracle

    Authors: Zijing Ou, Tingyang Xu, Qinliang Su, Yingzhen Li, Peilin Zhao, Yatao Bian

    Abstract: Learning neural set functions becomes increasingly more important in many applications like product recommendation and compound selection in AI-aided drug discovery. The majority of existing works study methodologies of set function learning under the function value oracle, which, however, requires expensive supervision signals. This renders it impractical for applications with only weak supervisi… ▽ More

    Submitted 23 May, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

  41. arXiv:2112.01686  [pdf, other

    cs.CV

    Make A Long Image Short: Adaptive Token Length for Vision Transformers

    Authors: Yichen Zhu, Yuqin Zhu, Jie Du, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang

    Abstract: The vision transformer splits each image into a sequence of tokens with fixed length and processes the tokens in the same way as words in natural language processing. More tokens normally lead to better performance but considerably increased computational cost. Motivated by the proverb "A picture is worth a thousand words" we aim to accelerate the ViT model by making a long image short. To this en… ▽ More

    Submitted 5 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: 10 pages, Technical report

  42. arXiv:2112.00265  [pdf, other

    cs.LG cs.CV

    Training BatchNorm Only in Neural Architecture Search and Beyond

    Authors: Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang

    Abstract: This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only c… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: 11 pages Technical report

  43. arXiv:2111.01415  [pdf, other

    cs.SE cs.AI cs.CR

    Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

    Authors: Wenyu Zhu, Zhiyao Feng, Zihan Zhang, Jianjun Chen, Zhijian Ou, Min Yang, Chao Zhang

    Abstract: Recovering binary programs' call graphs is crucial for inter-procedural analysis tasks and applications based on them.transfer One of the core challenges is recognizing targets of indirect calls (i.e., indirect callees). Existing solutions all have high false positives and negatives, making call graphs inaccurate. In this paper, we propose a new solution Callee combining transfer learning and cont… ▽ More

    Submitted 23 December, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

  44. arXiv:2110.06354  [pdf, other

    cs.CL cs.IR cs.LG

    Tell Me How to Survey: Literature Review Made Simple with Automatic Reading Path Generation

    Authors: Jiayuan Ding, Tong Xiang, Zijing Ou, Wangyang Zuo, Ruihui Zhao, Chenghua Lin, Yefeng Zheng, Bang Liu

    Abstract: Recent years have witnessed the dramatic growth of paper volumes with plenty of new research papers published every day, especially in the area of computer science. How to glean papers worth reading from the massive literature to do a quick survey or keep up with the latest advancement about a specific research topic has become a challenging task. Existing academic search engines such as Google Sc… ▽ More

    Submitted 25 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 16 pages, 12 figures

    Journal ref: ICDE 2022

  45. arXiv:2109.04314  [pdf, other

    cs.CL

    Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

    Authors: Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the g… ▽ More

    Submitted 26 January, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted into IEEE/ACM Transactions on Audio, Speech and Language Processing

  46. arXiv:2109.02867  [pdf, other

    cs.IR

    Refining BERT Embeddings for Document Hashing via Mutual Information Maximization

    Authors: Zijing Ou, Qinliang Su, Jianxing Yu, Ruihui Zhao, Yefeng Zheng, Bang Liu

    Abstract: Existing unsupervised document hashing methods are mostly established on generative models. Due to the difficulties of capturing long dependency structures, these methods rarely model the raw documents directly, but instead to model the features extracted from them (e.g. bag-of-words (BOW), TFIDF). In this paper, we propose to learn hash codes from BERT embeddings after observing their tremendous… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

  47. arXiv:2107.05038  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

    Authors: Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou

    Abstract: The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages. A drawback suffered by previous methods in using phonological features is that the acoustic-to-PF extraction in a bottom-up way is itself difficult… ▽ More

    Submitted 30 October, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

    Comments: ASRU2021

  48. arXiv:2107.03007  [pdf, other

    eess.AS cs.CL cs.SD

    Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

    Authors: Huahuan Zheng, Wenjie Peng, Zhijian Ou, Jinsong Zhang

    Abstract: Automatic speech recognition systems have been largely improved in the past few decades and current systems are mainly hybrid-based and end-to-end-based. The recently proposed CTC-CRF framework inherits the data-efficiency of the hybrid approach and the simplicity of the end-to-end approach. In this paper, we further advance CTC-CRF based ASR technique with explorations on modeling units and neura… ▽ More

    Submitted 8 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: Submitted to ASRU 2021

  49. arXiv:2105.13066  [pdf, other

    cs.IR cs.AI

    Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval

    Authors: Zijing Ou, Qinliang Su, Jianxing Yu, Bang Liu, Jingwen Wang, Ruihui Zhao, Changyou Chen, Yefeng Zheng

    Abstract: With the need of fast retrieval speed and small memory footprint, document hashing has been playing a crucial role in large-scale information retrieval. To generate high-quality hashing code, both semantics and neighborhood information are crucial. However, most existing methods leverage only one of them or simply combine them via some intuitive criteria, lacking a theoretical principle to guide t… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Journal ref: ACL2021

  50. arXiv:2105.06138  [pdf, other

    cs.CV

    Unsupervised Hashing with Contrastive Information Bottleneck

    Authors: Zexuan Qiu, Qinliang Su, Zijing Ou, Jianxing Yu, Changyou Chen

    Abstract: Many unsupervised hashing methods are implicitly established on the idea of reconstructing the input data, which basically encourages the hashing codes to retain as much information of original data as possible. However, this requirement may force the models spending lots of their effort on reconstructing the unuseful background information, while ignoring to preserve the discriminative semantic i… ▽ More

    Submitted 18 May, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: IJCAI 2021