Zum Hauptinhalt springen

Showing 1–50 of 78 results for author: Fang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01960  [pdf, other

    cs.CV cs.AI

    AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model

    Authors: Zhenyu Yan, Qingqing Fang, Wenxi Lv, Qinliang Su

    Abstract: Anomaly detection is a critical task in industrial manufacturing, aiming to identify defective parts of products. Most industrial anomaly detection methods assume the availability of sufficient normal data for training. This assumption may not hold true due to the cost of labeling or data privacy policies. Additionally, mainstream methods require training bespoke models for different objects, whic… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures

  2. arXiv:2407.06459  [pdf, other

    cs.RO cs.AI cs.LG

    How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

    Authors: Hang Yu, Qidi Fang, Shijie Fang, Reuben M. Aronson, Elaine Schaertl Short

    Abstract: Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-e… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 8 pages. RO-MAN 2024

  3. arXiv:2407.02394  [pdf, other

    cs.CV

    Similarity Distance-Based Label Assignment for Tiny Object Detection

    Authors: Shuohao Shi, Qiang Fang, Tong Zhao, Xin Xu

    Abstract: Tiny object detection is becoming one of the most challenging tasks in computer vision because of the limited object size and lack of information. The label assignment strategy is a key factor affecting the accuracy of object detection. Although there are some effective label assignment strategies for tiny objects, most of them focus on reducing the sensitivity to the bounding boxes to increase th… ▽ More

    Submitted 26 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, this paper has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  4. arXiv:2406.07330  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    CTC-based Non-autoregressive Textless Speech-to-Speech Translation

    Authors: Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng

    Abstract: Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

    ACM Class: I.2.7

  5. arXiv:2406.07289  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

    Authors: Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 main conference. Project Page: https://ictnlp.github.io/ComSpeech-Site/

    ACM Class: I.2.7

  6. arXiv:2406.06937  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

    Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

    Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

  7. arXiv:2406.03049  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

    Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

  8. arXiv:2405.13056  [pdf, other

    cs.CL cs.SI

    Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian

    Authors: Rohitash Chandra, Baicheng Zhu, Qingying Fang, Eka Shinjikashvili

    Abstract: During the COVID-19 pandemic, the news media coverage encompassed a wide range of topics that includes viral transmission, allocation of medical resources, and government response measures. There have been studies on sentiment analysis of social media platforms during COVID-19 to understand the public response given the rise of cases and government strategies implemented to control the spread of t… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  9. arXiv:2404.16748  [pdf, other

    cs.CV

    TELA: Text to Layer-wise 3D Clothed Human Generation

    Authors: Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai

    Abstract: This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed huma… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  10. arXiv:2404.01799  [pdf, other

    cs.CL cs.CY

    PATCH! Psychometrics-AssisTed benCHmarking of Large Language Models: A Case Study of Proficiency in 8th Grade Mathematics

    Authors: Qixiang Fang, Daniel L. Oberski, Dong Nguyen

    Abstract: Many existing benchmarks of large (multimodal) language models (LLMs) focus on measuring LLMs' academic proficiency, often with also an interest in comparing model performance with human test takers. While these benchmarks have proven key to the development of LLMs, they suffer from several limitations, including questionable measurement quality (e.g., Do they measure what they are supposed to in… ▽ More

    Submitted 25 July, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  11. arXiv:2403.13344  [pdf, other

    cs.SI cs.AI cs.CL cs.HC cs.IR cs.LG

    USE: Dynamic User Modeling with Stateful Sequence Models

    Authors: Zhihan Zhou, Qixiang Fang, Leonardo Neves, Francesco Barbieri, Yozen Liu, Han Liu, Maarten W. Bos, Ron Dotsch

    Abstract: User embeddings play a crucial role in user engagement forecasting and personalized services. Recent advances in sequence modeling have sparked interest in learning user embeddings from behavioral data. Yet behavior-based user embedding learning faces the unique challenge of dynamic user modeling. As users continuously interact with the apps, user embeddings should be periodically updated to accou… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  12. arXiv:2403.07354  [pdf, other

    cs.CV

    BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin

    Authors: Qihang Fang, Chengcheng Tang, Shugao Ma, Yanchao Yang

    Abstract: Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images. Yet, they are often ambiguous and incomplete when taken out of context, even for human annotators. As infants discern gestures before associating them with words, actions can be conceptualized before being grounded with label… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 18 pages, 8 figures

    MSC Class: 68T45 ACM Class: I.4.8

  13. arXiv:2401.13535  [pdf, ps, other

    cs.GT

    On the Approximate Core and Nucleon of Flow Games

    Authors: Pengfei Liu, Han Xiao, Qizhi Fang

    Abstract: The flow game with public arcs is a cooperative revenue game derived from a flow network. In this game, each player possesses an arc, while certain arcs, known as public arcs, are not owned by any specific player and are accessible to any coalition. The aim of this game is to maximize the flow that can be routed in the network through strategic coalition formation. By exploring its connection to t… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    MSC Class: 05C57; 91A12; 91A43; 91A46

  14. arXiv:2312.12726  [pdf, other

    cs.CV

    Reducing Shape-Radiance Ambiguity in Radiance Fields with a Closed-Form Color Estimation Method

    Authors: Qihang Fang, Yafei Song, Keqiang Li, Liefeng Bo

    Abstract: Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic novel view images of a 3D scene. It includes density and color fields to model the shape and radiance of a scene, respectively. Supervised by the photometric loss in an end-to-end training manner, NeRF inherently suffers from the shape-radiance ambiguity problem, i.e., it can perfectly fit training views but does not guar… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: This work has been published in NeurIPS 2023

  15. General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study

    Authors: Qixiang Fang, Zhihan Zhou, Francesco Barbieri, Yozen Liu, Leonardo Neves, Dong Nguyen, Daniel L. Oberski, Maarten W. Bos, Ron Dotsch

    Abstract: Learning general-purpose user representations based on user behavioral logs is an increasingly popular user modeling approach. It benefits from easily available, privacy-friendly yet expressive data, and does not require extensive re-tuning of the upstream user model for different downstream tasks. While this approach has shown promise in search engines and e-commerce applications, its fit for ins… ▽ More

    Submitted 25 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: SIGIR 2024

  16. arXiv:2312.07082  [pdf, other

    cs.CV

    Continual Learning through Networks Splitting and Merging with Dreaming-Meta-Weighted Model Fusion

    Authors: Yi Sun, Xin Xu, Jian Li, Guanglei Xie, Yifei Shi, Qiang Fang

    Abstract: It's challenging to balance the networks stability and plasticity in continual learning scenarios, considering stability suffers from the update of model and plasticity benefits from it. Existing works usually focus more on the stability and restrict the learning plasticity of later tasks to avoid catastrophic forgetting of learned knowledge. Differently, we propose a continual learning method nam… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  17. arXiv:2311.12608  [pdf, other

    cs.CV

    Density-Guided Dense Pseudo Label Selection For Semi-supervised Oriented Object Detection

    Authors: Tong Zhao, Qiang Fang, Shuohao Shi, Xin Xu

    Abstract: Recently, dense pseudo-label, which directly selects pseudo labels from the original output of the teacher model without any complicated post-processing steps, has received considerable attention in semi-supervised object detection (SSOD). However, for the multi-oriented and dense objects that are common in aerial scenes, existing dense pseudo-label selection methods are inefficient because they i… ▽ More

    Submitted 14 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 9 pages, 6 figures

  18. arXiv:2311.10776  [pdf, other

    cs.IR cs.AI

    Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

    Authors: Kexin Chen, Junyou Li, Kunyi Wang, Yuyang Du, Jiahui Yu, Jiamin Lu, Lanqing Li, Jiezhong Qiu, Jianzhang Pan, Yi Huang, Qun Fang, Pheng Ann Heng, Guangyong Chen

    Abstract: Recent AI research plots a promising future of automatic chemical reactions within the chemistry society. This study proposes Chemist-X, a transformative AI agent that automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology. To emulate expert chemists' strategies when solving RCR tasks, Chemist-X utilizes advanced RAG sc… ▽ More

    Submitted 4 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  19. arXiv:2310.13361  [pdf, other

    cs.CV cs.AI cs.CL

    Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

    Authors: Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng

    Abstract: Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation. Since there is no paired image available for the input sentence in most cases, recent studies suggest utilizing powerful text-to-image generation models to provide image inputs. Nevertheless, synthetic images generated by these models often follow different distributions com… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 main conference

  20. arXiv:2310.11239  [pdf, other

    cs.CV cs.RO

    LiDAR-based 4D Occupancy Completion and Forecasting

    Authors: Xinhao Liu, Moonjun Gong, Qi Fang, Haoyu Xie, Yiming Li, Hang Zhao, Chen Feng

    Abstract: Scene completion and forecasting are two popular perception problems in research for mobile agents like autonomous vehicles. Existing approaches treat the two problems in isolation, resulting in a separate perception of the two aspects. In this paper, we introduce a novel LiDAR perception task of Occupancy Completion and Forecasting (OCF) in the context of autonomous driving to unify these aspects… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  21. arXiv:2310.07403  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation

    Authors: Qingkai Fang, Yan Zhou, Yang Feng

    Abstract: Direct speech-to-speech translation (S2ST) translates speech from one language into another using a single model. However, due to the presence of linguistic and acoustic diversity, the target speech follows a complex multimodal distribution, posing challenges to achieving both high-quality translations and fast decoding speeds for S2ST models. In this paper, we propose DASpeech, a non-autoregressi… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023. Audio samples are available at https://ictnlp.github.io/daspeech-demo/

    ACM Class: I.2.7

  22. arXiv:2309.09528  [pdf

    cs.HC

    Gesture Recognition in Millimeter-Wave Radar Based on Spatio-Temporal Feature Sequences

    Authors: Qun Fang, YiHui Yan, GuoQing Ma

    Abstract: Gesture recognition is a pivotal technology in the realm of intelligent education, and millimeter-wave (mmWave) signals possess advantages such as high resolution and strong penetration capability. This paper introduces a highly accurate and robust gesture recognition method using mmWave radar. The method involves capturing the raw signals of hand movements with the mmWave radar module and preproc… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  23. arXiv:2306.10968  [pdf, other

    cs.CL cs.AI

    BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

    Authors: Shaolei Zhang, Qingkai Fang, Zhuocheng Zhang, Zhengrui Ma, Yan Zhou, Langlin Huang, Mengyu Bu, Shangtong Gui, Yunji Chen, Xilin Chen, Yang Feng

    Abstract: Large language models (LLMs) have demonstrated remarkable prowess in language understanding and generation. Advancing from foundation LLMs to instructionfollowing LLMs, instruction tuning plays a vital role in aligning LLMs to human preferences. However, the existing LLMs are usually focused on English, leading to inferior performance in non-English languages. In order to improve the performance f… ▽ More

    Submitted 21 June, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: Try BayLing's online demo at http://nlp.ict.ac.cn/bayling/demo

  24. arXiv:2306.05281   

    eess.SP cs.AI cs.RO

    A Graph Reconstruction by Dynamic Signal Coefficient for Fault Classification

    Authors: Wenbin He, Jianxu Mao, Yaonan Wang, Zhe Li, Qiu Fang, Haotian Wu

    Abstract: To improve the performance in identifying the faults under strong noise for rotating machinery, this paper presents a dynamic feature reconstruction signal graph method, which plays the key role of the proposed end-to-end fault diagnosis model. Specifically, the original mechanical signal is first decomposed by wavelet packet decomposition (WPD) to obtain multiple subbands including coefficient ma… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 May, 2023; originally announced June 2023.

    Comments: The feature extraction algorithm DFSL has errors in derivation and experimental deficiencies

  25. arXiv:2305.14635  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation

    Authors: Yan Zhou, Qingkai Fang, Yang Feng

    Abstract: End-to-end speech translation (ST) is the task of translating speech signals in the source language into text in the target language. As a cross-modal task, end-to-end ST is difficult to train with limited data. Existing methods often try to transfer knowledge from machine translation (MT), but their performances are restricted by the modality gap between speech and text. In this paper, we propose… ▽ More

    Submitted 25 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023 main conference

  26. arXiv:2305.11878  [pdf, other

    physics.optics cs.ET

    Reservoir computing and task performing through using high-$β$ lasers with delayed optical feedback

    Authors: T. Wang, C. Jiang, Q. Fang, X. Guo, Y. Zhang, C. Jin, S. Xiang

    Abstract: Nonlinear photonic sources including semiconductor lasers have recently been utilized as ideal computation elements for information processing. They supply energy-efficient way and rich dynamics for classification and recognition tasks. In this work, we propose and numerically study the dynamics of complex photonic systems including high-$β$ laser element with delayed feedback and functional curre… ▽ More

    Submitted 23 June, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  27. arXiv:2305.08709  [pdf, other

    cs.CL cs.SD eess.AS

    Back Translation for Speech-to-text Translation Without Transcripts

    Authors: Qingkai Fang, Yang Feng

    Abstract: The success of end-to-end speech-to-text translation (ST) is often achieved by utilizing source transcripts, e.g., by pre-training with automatic speech recognition (ASR) and machine translation (MT) tasks, or by introducing additional ASR and MT data. Unfortunately, transcripts are only sometimes available since numerous unwritten languages exist worldwide. In this paper, we aim to utilize large… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: ACL 2023 main conference

    ACM Class: I.2.7

  28. arXiv:2305.08706  [pdf, other

    cs.CL cs.SD eess.AS

    Understanding and Bridging the Modality Gap for Speech Translation

    Authors: Qingkai Fang, Yang Feng

    Abstract: How to achieve better end-to-end speech translation (ST) by leveraging (text) machine translation (MT) data? Among various existing techniques, multi-task learning is one of the effective ways to share knowledge between ST and MT in which additional MT data can help to learn source-to-target mapping. However, due to the differences between speech and text, there is always a gap between ST and MT.… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: ACL 2023 main conference

    ACM Class: I.2.7

  29. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  30. arXiv:2304.04351  [pdf, other

    cs.CV

    Evaluate Geometry of Radiance Fields with Low-frequency Color Prior

    Authors: Qihang Fang, Yafei Song, Keqiang Li, Li Shen, Huaiyu Wu, Gang Xiong, Liefeng Bo

    Abstract: A radiance field is an effective representation of 3D scenes, which has been widely adopted in novel-view synthesis and 3D reconstruction. It is still an open and challenging problem to evaluate the geometry, i.e., the density field, as the ground-truth is almost impossible to obtain. One alternative indirect solution is to transform the density field into a point-cloud and compute its Chamfer Dis… ▽ More

    Submitted 17 January, 2024; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: This paper has been accepted by AAAI 2024

  31. arXiv:2303.09495  [pdf, other

    cs.RO cs.AI cs.CV cs.MA

    Among Us: Adversarially Robust Collaborative Perception by Consensus

    Authors: Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng

    Abstract: Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals, although easily suffer from adversarial attacks when using deep learning. This could be addressed by the adversarial defense, but its training requires the often-unknown attacking mechanism. Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers… ▽ More

    Submitted 17 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV 2023

  32. arXiv:2302.13925  [pdf, other

    cs.CL cs.CY

    Epicurus at SemEval-2023 Task 4: Improving Prediction of Human Values behind Arguments by Leveraging Their Definitions

    Authors: Christian Fang, Qixiang Fang, Dong Nguyen

    Abstract: We describe our experiments for SemEval-2023 Task 4 on the identification of human values behind arguments (ValueEval). Because human values are subjective concepts which require precise definitions, we hypothesize that incorporating the definitions of human values (in the form of annotation instructions and validated survey items) during model training can yield better prediction performance. We… ▽ More

    Submitted 18 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted to appear at the SemEval-2023 workshop, co-located with ACL 2023

  33. arXiv:2302.13273  [pdf, other

    cs.SD cs.MM eess.AS

    Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion

    Authors: Jianrong Wang, Jinyu Liu, Li Liu, Xuewei Li, Mei Yu, Jie Gao, Qiang Fang

    Abstract: Acoustic-to-articulatory inversion (AAI) aims to estimate the parameters of articulators from speech audio. There are two common challenges in AAI, which are the limited data and the unsatisfactory performance in speaker independent scenario. Most current works focus on extracting features directly from speech and ignoring the importance of phoneme information which may limit the performance of AA… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  34. arXiv:2302.12571  [pdf

    eess.IV cs.CV physics.med-ph

    3D PETCT Tumor Lesion Segmentation via GCN Refinement

    Authors: Hengzhi Xue, Qingqing Fang, Yudong Yao, Yueyang Teng

    Abstract: Whole-body PET/CT scan is an important tool for diagnosing various malignancies (e.g., malignant melanoma, lymphoma, or lung cancer), and accurate segmentation of tumors is a key part for subsequent treatment. In recent years, CNN-based segmentation methods have been extensively investigated. However, these methods often give inaccurate segmentation results, such as over-segmentation and under-seg… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 10 pages,5 figures,38 reference

  35. arXiv:2212.06711  [pdf, other

    cs.CL cs.CY

    On Text-based Personality Computing: Challenges and Future Directions

    Authors: Qixiang Fang, Anastasia Giachanou, Ayoub Bagheri, Laura Boeschoten, Erik-Jan van Kesteren, Mahdi Shafiee Kamalabad, Daniel L Oberski

    Abstract: Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each ch… ▽ More

    Submitted 22 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Findings of ACL 2023. Long paper

  36. arXiv:2212.06543  [pdf, ps, other

    cs.CL cs.CY cs.IR

    Improving Stance Detection by Leveraging Measurement Knowledge from Social Sciences: A Case Study of Dutch Political Tweets and Traditional Gender Role Division

    Authors: Qixiang Fang, Anastasia Giachanou, Ayoub Bagheri

    Abstract: Stance detection (SD) concerns automatically determining the viewpoint (i.e., in favour of, against, or neutral) of a text's author towards a target. SD has been applied to many research topics, among which the detection of stances behind political tweets is an important one. In this paper, we apply SD to a dataset of tweets from official party accounts in the Netherlands between 2017 and 2021, wi… ▽ More

    Submitted 25 July, 2024; v1 submitted 13 December, 2022; originally announced December 2022.

  37. arXiv:2212.01215  [pdf, other

    cs.NI cs.AI cs.DC cs.SI

    Olive Branch Learning: A Topology-Aware Federated Learning Framework for Space-Air-Ground Integrated Network

    Authors: Qingze Fang, Zhiwei Zhai, Shuai Yu, Qiong Wu, Xiaowen Gong, Xu Chen

    Abstract: The space-air-ground integrated network (SAGIN), one of the key technologies for next-generation mobile communication systems, can facilitate data transmission for users all over the world, especially in some remote areas where vast amounts of informative data are collected by Internet of remote things (IoRT) devices to support various data-driven artificial intelligence (AI) services. However, tr… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: accepted by IEEE Transactions on Wireless Communications, Dec. 2022

  38. arXiv:2210.06716  [pdf, other

    cs.CL

    Low-resource Neural Machine Translation with Cross-modal Alignment

    Authors: Zhe Yang, Qingkai Fang, Yang Feng

    Abstract: How to achieve neural machine translation with limited parallel data? Existing techniques often rely on large-scale monolingual corpora, which is impractical for some low-resource languages. In this paper, we turn to connect several low-resource languages to a particular high-resource one by additional visual modality. Specifically, we propose a cross-modal contrastive learning method to learn a s… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

  39. arXiv:2209.07302  [pdf, other

    cs.SD eess.AS

    MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement

    Authors: Jianrong Wang, Xiaomin Li, Xuewei Li, Mei Yu, Qiang Fang, Li Liu

    Abstract: Speech enhancement improves speech quality and promotes the performance of various downstream tasks. However, most current speech enhancement work was mainly devoted to improving the performance of downstream automatic speech recognition (ASR), only a relatively small amount of work focused on the automatic speaker verification (ASV) task. In this work, we propose a MVNet consisted of a memory ass… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ICONIP 2022

  40. arXiv:2207.04878  [pdf, other

    q-bio.GN cs.LG stat.AP

    Stacked Autoencoder Based Multi-Omics Data Integration for Cancer Survival Prediction

    Authors: Xing Wu, Qiulian Fang

    Abstract: Cancer survival prediction is important for developing personalized treatments and inducing disease-causing mechanisms. Multi-omics data integration is attracting widespread interest in cancer research for providing information for understanding cancer progression at multiple genetic levels. Many works, however, are limited because of the high dimensionality and heterogeneity of multi-omics data.… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  41. arXiv:2207.03026  [pdf, ps, other

    cs.GT

    Constrained Heterogeneous Two-facility Location Games with Max-variant Cost

    Authors: Qi Zhao, Wenjing Liu, Qizhi Fang, Qingqin Nong

    Abstract: In this paper, we propose a constrained heterogeneous facility location model where a set of alternative locations are feasible for building facilities and the number of facilities built at each location is limited. Supposing that a set of agents on the real line can strategically report their locations and each agent's cost is her distance to the further facility that she is interested in, we stu… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: 19 pages

  42. MGDCF: Distance Learning via Markov Graph Diffusion for Neural Collaborative Filtering

    Authors: Jun Hu, Bryan Hooi, Shengsheng Qian, Quan Fang, Changsheng Xu

    Abstract: Graph Neural Networks (GNNs) have recently been utilized to build Collaborative Filtering (CF) models to predict user preferences based on historical user-item interactions. However, there is relatively little understanding of how GNN-based CF models relate to some traditional Network Representation Learning (NRL) approaches. In this paper, we show the equivalence between some state-of-the-art GNN… ▽ More

    Submitted 6 January, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

  43. arXiv:2204.01672  [pdf, other

    cs.SD cs.CV eess.AS

    Residual-guided Personalized Speech Synthesis based on Face Image

    Authors: Jianrong Wang, Zixuan Wang, Xiaosheng Hu, Xuewei Li, Qiang Fang, Li Liu

    Abstract: Previous works derive personalized speech features by training the model on a large dataset composed of his/her audio sounds. It was reported that face information has a strong link with the speech sound. Thus in this work, we innovatively extract personalized speech features from human faces to synthesize personalized speech using neural vocoder. A Face-based Residual Personalized Speech Synthesi… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: ICASSP 2022

  44. arXiv:2203.10426  [pdf, other

    cs.CL cs.AI

    STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

    Authors: Qingkai Fang, Rong Ye, Lei Li, Yang Feng, Mingxuan Wang

    Abstract: How to learn a better speech representation for end-to-end speech-to-text translation (ST) with limited labeled data? Existing techniques often attempt to transfer powerful machine translation (MT) capabilities to ST, but neglect the representation discrepancy across modalities. In this paper, we propose the Speech-TExt Manifold Mixup (STEMM) method to calibrate such discrepancy. Specifically, we… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

    ACM Class: I.2.7

  45. arXiv:2203.10299  [pdf, other

    cs.CL cs.AI

    Neural Machine Translation with Phrase-Level Universal Visual Representations

    Authors: Qingkai Fang, Yang Feng

    Abstract: Multimodal machine translation (MMT) aims to improve neural machine translation (NMT) with additional visual information, but most existing MMT methods require paired input of source sentence and image, which makes them suffer from shortage of sentence-image pairs. In this paper, we propose a phrase-level retrieval-based method for MMT to get visual information for the source input from existing s… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

    ACM Class: I.2.7

  46. arXiv:2202.09166  [pdf, other

    cs.CY cs.CL stat.AP stat.ME

    Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

    Authors: Qixiang Fang, Dong Nguyen, Daniel L Oberski

    Abstract: Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social sci… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: Under review

    Journal ref: EPJ Data Sci. 11, 39 (2022)

  47. arXiv:2112.02991  [pdf, other

    cs.CV cs.AI eess.IV

    Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

    Authors: Qingyun Fang, Zhaokui Wang

    Abstract: Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and en… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 23 pages,11 figures, under consideration at Pattern Recognition

  48. arXiv:2112.01110  [pdf, other

    cs.LG cs.SI

    Contrastive Adaptive Propagation Graph Neural Networks for Efficient Graph Learning

    Authors: Jun Hu, Shengsheng Qian, Quan Fang, Changsheng Xu

    Abstract: Graph Neural Networks (GNNs) have achieved great success in processing graph data by extracting and propagating structure-aware features. Existing GNN research designs various propagation schemes to guide the aggregation of neighbor information. Recently the field has advanced from local propagation schemes that focus on local neighbors towards extended propagation schemes that can directly deal w… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: 12 pages, 7 figures

  49. arXiv:2111.10342  [pdf, other

    cs.IR cs.LG

    GRecX: An Efficient and Unified Benchmark for GNN-based Recommendation

    Authors: Desheng Cai, Jun Hu, Quan Zhao, Shengsheng Qian, Quan Fang, Changsheng Xu

    Abstract: In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way. GRecX consists of core libraries for building GNN-based recommendation benchmarks, as well as the implementations of popular GNN-based recommendation models. The core libraries provide essential components for building efficient and unified benchmar… ▽ More

    Submitted 22 February, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

  50. arXiv:2109.04732  [pdf, other

    cs.CL

    Assessing the Reliability of Word Embedding Gender Bias Measures

    Authors: Yupei Du, Qixiang Fang, Dong Nguyen

    Abstract: Various measures have been proposed to quantify human-like social biases in word embeddings. However, bias scores based on these measures can suffer from measurement error. One indication of measurement quality is reliability, concerning the extent to which a measure produces consistent results. In this paper, we assess three types of reliability of word embedding gender bias measures, namely test… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 23 pages, 24 figures, 3 tables. Accepted to EMNLP 2021