Skip to main content

Showing 1–31 of 31 results for author: Hua, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05361  [pdf, other

    eess.AS cs.CL

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

    Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Fix typos

  2. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo… ▽ More

    Submitted 9 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2405.16785  [pdf, other

    cs.CV

    PromptFix: You Prompt and We Fix the Photo

    Authors: Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo

    Abstract: Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover, the stochastic nature of th… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  5. arXiv:2404.15532  [pdf, other

    cs.HC cs.AI cs.CL cs.CV cs.MA

    BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

    Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang

    Abstract: This paper presents BattleAgent, an emulation system that combines the Large Vision-Language Model and Multi-agent System. This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time. It emulates both the decision-making processes of leaders and the viewpoints of ordinary participants, such as soldie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 26 pages, 14 figures The data and code for this project are accessible at https://github.com/agiresearch/battleagent

  6. arXiv:2404.14715  [pdf, other

    cs.CV cs.CL

    FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

    Authors: Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

    Abstract: Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To add… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2404.12353  [pdf, other

    cs.CV cs.AI

    V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

    Authors: Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

    Abstract: Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective fine-tuning of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooki… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  8. arXiv:2402.00827   

    cs.CV

    Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

    Authors: Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

    Abstract: Artistic video portrait generation is a significant and sought-after task in the fields of computer graphics and vision. While various methods have been developed that integrate NeRFs or StyleGANs with instructional editing models for creating and editing drivable portraits, these approaches face several challenges. They often rely heavily on large datasets, require extensive customization process… ▽ More

    Submitted 14 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: The paper paper needs a big modification, including the tile. This work is no longer its original version

  9. arXiv:2310.17661  [pdf, other

    eess.SP cs.NI

    An Overview on IEEE 802.11bf: WLAN Sensing

    Authors: Rui Du, Haocheng Hua, Hailiang Xie, Xianxin Song, Zhonghao Lyu, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

    Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications.… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 31 pages, 25 figures, this is a significant updated version of arXiv:2207.04859

  10. arXiv:2309.11827  [pdf, other

    eess.AS cs.SD

    The Impact of Silence on Speech Anti-Spoofing

    Authors: Yuxiang Zhang, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang, Pengyuan Zhang

    Abstract: The current speech anti-spoofing countermeasures (CMs) show excellent performance on specific datasets. However, removing the silence of test speech through Voice Activity Detection (VAD) can severely degrade performance. In this paper, the impact of silence on speech anti-spoofing is analyzed. First, the reasons for the impact are explored, including the proportion of silence duration and the con… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 16 pages, 9 figures, 13 tables

  11. arXiv:2308.16130  [pdf, other

    cs.IT eess.SP

    Near-Field 3D Localization via MIMO Radar: Cramér-Rao Bound Analysis and Estimator Design

    Authors: Haocheng Hua, Jie Xu, Yonina C. Eldar

    Abstract: This paper studies a near-field multiple-input multiple-output (MIMO) radar sensing system, in which the transceivers with massive antennas aim to localize multiple near-field targets in the three-dimensional (3D) space over unknown cluttered environments. We consider a spherical wavefront propagation with both channel phase and amplitude variations over different antennas. Under this setup, the u… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 13 pages (14 pages in Arxiv version..), 16 figures, submitted for journal publication. arXiv admin note: substantial text overlap with arXiv:2305.10986

  12. arXiv:2308.13365  [pdf, ps, other

    cs.SD eess.AS

    Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

    Authors: Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang

    Abstract: Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational autoencoder, called EP-MSTTS. EP-MSTTS… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: accepted at Interspeech 2024

  13. arXiv:2305.10986  [pdf, other

    cs.IT eess.SP

    Near-Field 3D Localization via MIMO Radar: Cramér-Rao Bound and Estimator Design

    Authors: Haocheng Hua, Jie Xu

    Abstract: Future sixth-generation (6G) networks are envisioned to provide both sensing and communications functionalities by using densely deployed base stations (BSs) with massive antennas operating in millimeter wave (mmWave) and terahertz (THz). Due to the large number of antennas and the high frequency band, the sensing and communications will operate within the near-field region, thus making the conven… ▽ More

    Submitted 15 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 8 pages, 4 figures as an extended version. Its 6 pages version has been accepted for presentation in IEEE Globecom 2023 Symposia

  14. arXiv:2303.12060  [pdf, other

    cs.CV cs.CL

    VideoXum: Cross-modal Visual and Textural Summarization of Videos

    Authors: Jingyang Lin, Hang Hua, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo

    Abstract: Video summarization aims to distill the most important information from a source video to produce either an abridged clip or a textual narrative. Traditionally, different methods have been proposed depending on whether the output is a video or text, thus ignoring the correlation between the two semantically related tasks of visual summarization and textual summarization. We propose a new joint vid… ▽ More

    Submitted 23 April, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: 13 pages, 7 figures

    Journal ref: IEEE Transactions on Multimedia, VOL. 26 (2024) 5548-5560

  15. arXiv:2211.10605  [pdf, other

    cs.IT

    ISAC Meets SWIPT: Multi-functional Wireless Systems Integrating Sensing, Communication, and Powering

    Authors: Yilong Chen, Haocheng Hua, Jie Xu, Derrick Wing Kwan Ng

    Abstract: This paper unifies integrated sensing and communication (ISAC) and simultaneous wireless information and power transfer (SWIPT), by investigating a new multi-functional multiple-input multiple-output (MIMO) system integrating wireless sensing, communication, and powering. In this system, one multi-antenna hybrid access point (H-AP) transmits wireless signals to communicate with one multi-antenna i… ▽ More

    Submitted 16 August, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.16716

  16. arXiv:2211.09699  [pdf, other

    cs.CV cs.CL

    PromptCap: Prompt-Guided Task-Aware Image Captioning

    Authors: Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A Smith, Jiebo Luo

    Abstract: Knowledge-based visual question answering (VQA) involves questions that require world knowledge beyond the image to yield the correct answer. Large language models (LMs) like GPT-3 are particularly helpful for this task because of their strong knowledge retrieval and reasoning capabilities. To enable LM to understand images, prior work uses a captioning model to convert images into text. However,… ▽ More

    Submitted 17 August, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted to ICCV 2023

  17. arXiv:2210.16716  [pdf, other

    cs.IT

    Transmit Optimization for Multi-functional MIMO Systems Integrating Sensing, Communication, and Powering

    Authors: Yilong Chen, Haocheng Hua, Jie Xu

    Abstract: This paper unifies integrated sensing and communication (ISAC) and simultaneous wireless information and power transfer (SWIPT), by investigating a new multi-functional multiple-input multiple-output (MIMO) system integrating wireless sensing, communication, and powering. In this system, one multi-antenna hybrid access point (H-AP) transmits wireless signals to communicate with one multi-antenna i… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: 7 pages,4 figures, ICC-WC 2023

  18. arXiv:2210.14229  [pdf, other

    cs.LG cs.AI cs.CR

    Causal Information Bottleneck Boosts Adversarial Robustness of Deep Neural Network

    Authors: Huan Hua, Jun Yan, Xi Fang, Weiquan Huang, Huilin Yin, Wancheng Ge

    Abstract: The information bottleneck (IB) method is a feasible defense solution against adversarial attacks in deep learning. However, this method suffers from the spurious correlation, which leads to the limitation of its further improvement of adversarial robustness. In this paper, we incorporate the causal inference into the IB framework to alleviate such a problem. Specifically, we divide the features o… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  19. arXiv:2209.12721  [pdf, other

    cs.IT

    MIMO Integrated Sensing and Communication: CRB-Rate Tradeoff

    Authors: Haocheng Hua, Tony Xiao Han, Jie Xu

    Abstract: This paper studies a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) sends unified wireless signals to estimate one sensing target and communicate with a multi-antenna communication user (CU) simultaneously. We consider both the point and extended target models. For the point target case, the BS estimates the targ… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 30 pages, 17 figures, submitted for journal publication

  20. arXiv:2208.14447  [pdf, ps, other

    cs.LG cs.AI cs.MA

    A further exploration of deep Multi-Agent Reinforcement Learning with Hybrid Action Space

    Authors: Hongzhi Hua, Guixuan Wen, Kaigui Wu

    Abstract: The research of extending deep reinforcement learning (drl) to multi-agent field has solved many complicated problems and made great achievements. However, almost all these studies only focus on discrete or continuous action space and there are few works having ever used multi-agent deep reinforcement learning to real-world environment problems which mostly have a hybrid action space. Therefore, i… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.05108

  21. Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

    Authors: Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

    Abstract: The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on downstream tasks. Despite its recent success and wide adoption, fine-tuning a pre-trained language model often suffers from overfitting, which leads to poor generali… ▽ More

    Submitted 8 November, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: Accepted by TNNLS

  22. arXiv:2206.05108  [pdf, ps, other

    cs.LG cs.AI

    Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy

    Authors: Hongzhi Hua, Kaigui Wu, Guixuan Wen

    Abstract: Multi-agent deep reinforcement learning has been applied to address a variety of complex problems with either discrete or continuous action spaces and achieved great success. However, most real-world environments cannot be described by only discrete action spaces or only continuous action spaces. And there are few works having ever utilized deep reinforcement learning (drl) to multi-agent problems… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

  23. arXiv:2205.14050  [pdf, other

    cs.IT

    MIMO Integrated Sensing and Communication with Extended Target: CRB-Rate Tradeoff

    Authors: Haocheng Hua, Xianxin Song, Yuan Fang, Tony Xiao Han, Jie Xu

    Abstract: This paper studies a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) sends unified wireless signals to estimate an extended target and communicate with a multi-antenna communication user (CU) at the same time. We investigate the fundamental tradeoff between the estimation Cramér-Rao bound (CRB) for sensing and the… ▽ More

    Submitted 17 August, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

  24. arXiv:2201.12567  [pdf, other

    cs.SD eess.AS

    The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

    Authors: Ziyi Chen, Hua Hua, Yuxiang Zhang, Ming Li, Pengyuan Zhang

    Abstract: The voice conversion task is to modify the speaker identity of continuous speech while preserving the linguistic content. Generally, the naturalness and similarity are two main metrics for evaluating the conversion quality, which has been improved significantly in recent years. This paper presents the HCCL-DKU entry for the fake audio generation task of the 2022 ICASSP ADD challenge. We propose a… ▽ More

    Submitted 29 January, 2022; originally announced January 2022.

  25. arXiv:2107.04835  [pdf, other

    cs.CL

    Noise Stability Regularization for Improving BERT Fine-tuning

    Authors: Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

    Abstract: Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of training samples available. The brittleness of this process is often reflected by the sensitivity to random seeds. In this paper, we propose to tackle this pro… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

  26. arXiv:2104.11871  [pdf, other

    cs.IT

    Optimal Transmit Beamforming for Integrated Sensing and Communication

    Authors: Haocheng Hua, Jie Xu, Tony Xiao Han

    Abstract: This paper studies the transmit beamforming in a downlink integrated sensing and communication (ISAC) system, where a base station (BS) equipped with a uniform linear array (ULA) sends combined information-bearing and dedicated radar signals to simultaneously perform downlink multiuser communication and radar target sensing. Under this setup, we maximize the radar sensing performance (in terms of… ▽ More

    Submitted 24 March, 2023; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Accepted by IEEE Transactions on Vehicular Technology

  27. arXiv:2006.00163  [pdf, ps, other

    cs.SI

    Tracking Public Opinion in China through Various Stages of the COVID-19 Pandemic

    Authors: Yuqi Gao, Hang Hua, Jiebo Luo

    Abstract: In recent months, COVID-19 has become a global pandemic and had a huge impact on the world. People under different conditions have very different attitudes toward the epidemic. Due to the real-time and large-scale nature of social media, we can continuously obtain a massive amount of public opinion information related to the epidemic from social media. In particular, researchers may ask questions… ▽ More

    Submitted 1 June, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

  28. Security modeling and efficient computation offloading for service workflow in mobile edge computing

    Authors: Binbin Huang, Zhongjin Lia, Peng Tang, Shangguang Wang, Jun Zhao, Haiyang Hua, Wanqing Lia, Victor Chang

    Abstract: It is a big challenge for resource-limited mobile devices (MDs) to execute various complex and energy-consumed mobile applications. Fortunately, as a novel computing paradigm, edge computing (MEC) can provide abundant computing resources to execute all or parts of the tasks of MDs and thereby can greatly reduce the energy of MD and improve the QoS of applications. However, offloading workflow task… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: published in journal "Future Generation Computer Systems": https://doi.org/10.1016/j.future.2019.03.011

    MSC Class: mobile edge computing; workflow scheduling; security modeling; energy efficient; genetic algorithm (GA)

  29. arXiv:1905.12926  [pdf, other

    cs.CL

    Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation

    Authors: Ke Wang, Hang Hua, Xiaojun Wan

    Abstract: Unsupervised text attribute transfer automatically transforms a text to alter a specific attribute (e.g. sentiment) without using any parallel data, while simultaneously preserving its attribute-independent content. The dominant approaches are trying to model the content-independent attribute separately, e.g., learning different attributes' representations or using multiple attribute-specific deco… ▽ More

    Submitted 12 December, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Neurips 2019 camera ready

  30. arXiv:1807.10935  [pdf, other

    cs.AI

    Towards Explainable Inference about Object Motion using Qualitative Reasoning

    Authors: Xiaoyu Ge, Jochen Renz, Hua Hua

    Abstract: The capability of making explainable inferences regarding physical processes has long been desired. One fundamental physical process is object motion. Inferring what causes the motion of a group of objects can even be a challenging task for experts, e.g., in forensics science. Most of the work in the literature relies on physics simulation to draw such infer- ences. The simulation requires a preci… ▽ More

    Submitted 28 July, 2018; originally announced July 2018.

  31. arXiv:1610.03706  [pdf

    cs.DL

    Bibliometric Index for Academic Leadership

    Authors: Yang Liu, Fengrong Ou, Yan Deng, Bo Wu, Ruxi Liu, Hui Hua, Yuyuan Guan, Rentong Chen, Lars Gjesteby, Jiansheng Yang, Michael Vannier, Ge Wang

    Abstract: Academic leadership is essential for research innovation and impact. Until now, there has been no dedicated measure of leadership by bibliometrics. Popular bibliometric indices are mainly based on academic output, such as the journal impact factor and the number of citations. Here we develop an academic leadership index based on readily available bibliometric data that is sensitive to not only aca… ▽ More

    Submitted 12 October, 2016; originally announced October 2016.

    Comments: 25 pages, 4 figures, 4 tables, 33 references