Zum Hauptinhalt springen

Showing 1–50 of 101 results for author: Si, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.14058  [pdf, other

    cs.LG

    On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning

    Authors: Jingyao Wang, Wenwen Qiang, Jiangmeng Li, Lingyu Si, Changwen Zheng, Bing Su

    Abstract: An effective paradigm of multi-modal learning (MML) is to learn unified representations among modalities. From a causal perspective, constraining the consistency between different modalities can mine causal representations that convey primary events. However, such simple consistency may face the risk of learning insufficient or unnecessary information: a necessary but insufficient cause is invaria… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  2. arXiv:2407.04230  [pdf, other

    cs.CV

    A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

    Abstract: Due to the selective absorption and scattering of light by diverse aquatic media, underwater images usually suffer from various visual degradations. Existing underwater image enhancement (UIE) approaches that combine underwater physical imaging models with neural networks often fail to accurately estimate imaging model parameters such as depth and veiling light, resulting in poor performance in ce… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2405.11971  [pdf, other

    cs.CV

    Data Augmentation for Text-based Person Retrieval Using Large Language Models

    Authors: Zheng Li, Lijia Si, Caili Guo, Yang Yang, Qiushi Cao

    Abstract: Text-based Person Retrieval (TPR) aims to retrieve person images that match the description given a text query. The performance improvement of the TPR model relies on high-quality data for supervised training. However, it is difficult to construct a large-scale, high-quality TPR dataset due to expensive annotation and privacy protection. Recently, Large Language Models (LLMs) have approached or ev… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  4. arXiv:2405.05769  [pdf, other

    cs.CV

    Exploring Text-Guided Single Image Editing for Remote Sensing Images

    Authors: Fangzhou Han, Lingyu Si, Hongwei Dong, Lamei Zhang, Hao Chen, Bo Du

    Abstract: Artificial Intelligence Generative Content (AIGC) technologies have significantly influenced the remote sensing domain, particularly in the realm of image generation. However, remote sensing image editing, an equally vital research area, has not garnered sufficient attention. Different from text-guided editing in natural images, which relies on extensive text-image paired data for semantic correla… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.01053  [pdf, other

    cs.LG cs.AI

    Explicitly Modeling Universality into Self-Supervised Learning

    Authors: Jingyao Wang, Wenwen Qiang, Zeen Song, Lingyu Si, Jiangmeng Li, Changwen Zheng, Bing Su

    Abstract: The goal of universality in self-supervised learning (SSL) is to learn universal representations from unlabeled data and achieve excellent performance on all samples and tasks. However, these methods lack explicit modeling of the universality in the learning objective, and the related theoretical understanding remains limited. This may cause models to overfit in data-scarce situations and generali… ▽ More

    Submitted 23 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 28 pages, submitted to ICML24 with 7766

  6. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  7. arXiv:2403.11506  [pdf, other

    cs.CV cs.AI

    End-To-End Underwater Video Enhancement: Dataset and Model

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu

    Abstract: Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  8. arXiv:2403.01549  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Meta Comprehensive Regularization

    Authors: Huijie Guo, Ying Ba, Jie Hu, Lingyu Si, Wenwen Qiang, Lei Shi

    Abstract: Self-Supervised Learning (SSL) methods harness the concept of semantic invariance by utilizing data augmentation strategies to produce similar representations for different deformations of the same input. Essentially, the model captures the shared information among multiple augmented views of samples, while disregarding the non-shared information that may be beneficial for downstream tasks. To add… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  9. Hybrid Base Complex: Extract and Visualize Structure of Hex-dominant Meshes

    Authors: Lei Si, Haowei Cao, Guoning Chen

    Abstract: Hex-dominant mesh generation has received significant attention in recent research due to its superior robustness compared to pure hex-mesh generation techniques. In this work, we introduce the first structure for analyzing hex-dominant meshes. This structure builds on the base complex of pure hex-meshes but incorporates the non-hex elements for a more comprehensive and complete representation. We… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE Transactions on Visualization and Computer Graphics

  10. arXiv:2401.15636  [pdf, other

    cs.CV eess.IV

    FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

    Authors: Feihong He, Gang Li, Mengyuan Zhang, Leilei Yan, Lingyu Si, Fanzhang Li, Li Shen

    Abstract: The rapid development of generative diffusion models has significantly advanced the field of style transfer. However, most current style transfer methods based on diffusion models typically involve a slow iterative optimization process, e.g., model fine-tuning and textual inversion of style concept. In this paper, we introduce FreeStyle, an innovative style transfer method built upon a pre-trained… ▽ More

    Submitted 18 July, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  11. arXiv:2401.11447  [pdf, other

    cs.LG q-bio.QM

    Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

    Authors: Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Liping Si, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom s… ▽ More

    Submitted 19 July, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Frontiers in Pharmacology, research topic: Methods and Metrics to Measure Medication Adherence

  12. arXiv:2312.09613  [pdf, other

    cs.LG cs.AI stat.ML

    Rethinking Causal Relationships Learning in Graph Neural Networks

    Authors: Hang Gao, Chengyu Yao, Jiangmeng Li, Lingyu Si, Yifan Jin, Fengge Wu, Changwen Zheng, Huaping Liu

    Abstract: Graph Neural Networks (GNNs) demonstrate their significance by effectively modeling complex interrelationships within graph-structured data. To enhance the credibility and robustness of GNNs, it becomes exceptionally crucial to bolster their ability to capture causal relationships. However, despite recent advancements that have indeed strengthened GNNs from a causal learning perspective, conductin… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  13. arXiv:2312.06240  [pdf, other

    cs.CV

    UIEDP:Underwater Image Enhancement with Diffusion Prior

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

    Abstract: Underwater image enhancement (UIE) aims to generate clear images from low-quality underwater images. Due to the unavailability of clear reference images, researchers often synthesize them to construct paired datasets for training deep models. However, these synthesized images may sometimes lack quality, adversely affecting training outcomes. To address this issue, we propose UIE with Diffusion Pri… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  14. arXiv:2310.03517  [pdf, other

    cs.CV

    PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification

    Authors: Feihong He, Gang Li, Lingyu Si, Leilei Yan, Fanzhang Li, Fuchun Sun

    Abstract: Few-shot image classification has received considerable attention for addressing the challenge of poor classification performance with limited samples in novel classes. However, numerous studies have employed sophisticated learning strategies and diversified feature extraction methods to address this issue. In this paper, we propose our method called PrototypeFormer, which aims to significantly ad… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Submitted to AAAI2024

  15. arXiv:2309.08251  [pdf, other

    cs.CV

    Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models

    Authors: Feihong He, Gang Li, Lingyu Si, Leilei Yan, Shimeng Hou, Hongwei Dong, Fanzhang Li

    Abstract: Image cartoonization has attracted significant interest in the field of image generation. However, most of the existing image cartoonization techniques require re-training models using images of cartoon style. In this paper, we present CartoonDiff, a novel training-free sampling approach which generates image cartoonization using diffusion transformer models. Specifically, we decompose the reverse… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages,5 figures

  16. arXiv:2308.15724  [pdf, other

    cs.CV

    Background Debiased SAR Target Recognition via Causal Interventional Regularizer

    Authors: Hongwei Dong, Fangzhou Han, Lingyu Si, Wenwen Qiang, Lamei Zhang

    Abstract: Recent studies have utilized deep learning (DL) techniques to automatically extract features from synthetic aperture radar (SAR) images, which shows great promise for enhancing the performance of SAR automatic target recognition (ATR). However, our research reveals a previously overlooked issue: SAR images to be recognized include not only the foreground (i.e., the target), but also a certain size… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 38 pages, 8 figures

  17. A Visualization System for Hexahedral Mesh Quality Study

    Authors: Lei Si, Guoning Chen

    Abstract: In this paper, we introduce a new 3D hex mesh visual analysis system that emphasizes poor-quality areas with an aggregated glyph, highlights overlapping elements, and provides detailed boundary error inspection in three forms. By supporting multi-level analysis through multiple views, our system effectively evaluates various mesh models and compares the performance of mesh generation and optimizat… ▽ More

    Submitted 24 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE VIS 2023 Short Papers and will be published on IEEE Xplore. Paper contains 4 pages, and 1 reference page. Supplemental includes 4 pages

    ACM Class: I.3.0

  18. arXiv:2306.15977  [pdf, other

    cs.CV cs.AI

    A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

    Authors: Lingyu Si, Hongwei Dong, Wenwen Qiang, Junzhi Yu, Wenlong Zhai, Changwen Zheng, Fanjiang Xu, Fuchun Sun

    Abstract: Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  19. arXiv:2305.08135  [pdf, other

    cs.CL cs.AI

    Distinguish Before Answer: Generating Contrastive Explanation as Knowledge for Commonsense Question Answering

    Authors: Qianglong Chen, Guohai Xu, Ming Yan, Ji Zhang, Fei Huang, Luo Si, Yin Zhang

    Abstract: Existing knowledge-enhanced methods have achieved remarkable results in certain QA tasks via obtaining diverse knowledge from different knowledge bases. However, limited by the properties of retrieved knowledge, they still have trouble benefiting from both the knowledge relevance and distinguishment simultaneously. To address the challenge, we propose CPACE, a Concept-centric Prompt-bAsed Contrast… ▽ More

    Submitted 21 May, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL2023(Findings). The Camera-ready Version

  20. arXiv:2303.14357  [pdf, other

    eess.IV cs.CV cs.LG

    Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation

    Authors: Xiaoxiao He, Chaowei Tan, Bo Liu, Liping Si, Weiwu Yao, Liang Zhao, Di Liu, Qilong Zhangli, Qi Chang, Kang Li, Dimitris N. Metaxas

    Abstract: Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative mo… ▽ More

    Submitted 17 April, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

  21. arXiv:2301.08496  [pdf, other

    cs.LG

    Introducing Expertise Logic into Graph Representation Learning from A Causal Perspective

    Authors: Hang Gao, Jiangmeng Li, Wenwen Qiang, Lingyu Si, Xingzhe Su, Fengge Wu, Changwen Zheng, Fuchun Sun

    Abstract: Benefiting from the injection of human prior knowledge, graphs, as derived discrete data, are semantically dense so that models can efficiently learn the semantic information from such data. Accordingly, graph neural networks (GNNs) indeed achieve impressive success in various fields. Revisiting the GNN learning paradigms, we discover that the relationship between human expertise and the knowledge… ▽ More

    Submitted 23 May, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

  22. arXiv:2301.07507  [pdf, other

    cs.CL cs.DB

    Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

    Authors: Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, Yongbin Li

    Abstract: The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023 main conference (oral)

  23. arXiv:2212.11694  [pdf, other

    cs.CV

    Timestamp-Supervised Action Segmentation from the Perspective of Clustering

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Fuchun Sun

    Abstract: Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which… ▽ More

    Submitted 22 April, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Accepted as a conference paper to the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23)

  24. arXiv:2212.04755  [pdf, other

    cs.CL

    From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader

    Authors: Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong Bing

    Abstract: We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. PMR can resolve the discrepancy between model pre-training and downstream fine-tuning of existing MLMs. To build the proposed PMR, we constructed a large volume of general-purpose and high-qu… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted to NeurIPS 2023

  25. arXiv:2211.13865  [pdf, other

    cs.CL cs.AI

    Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality?

    Authors: Pei Zhang, Baosong Yang, Haoran Wei, Dayiheng Liu, Kai Fan, Luo Si, Jun Xie

    Abstract: Neural machine translation (NMT) is often criticized for failures that happen without awareness. The lack of competency awareness makes NMT untrustworthy. This is in sharp contrast to human translators who give feedback or conduct further investigations whenever they are in doubt about predictions. To fill this gap, we propose a novel competency-aware NMT by extending conventional NMT with a self-… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: accepted to EMNLP 2022

  26. arXiv:2211.10018  [pdf, other

    cs.CL

    A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach

    Authors: Yew Ken Chia, Lidong Bing, Sharifah Mahani Aljunied, Luo Si, Soujanya Poria

    Abstract: Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard Universi… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 19 pages, 6 figures, accepted by EMNLP 2022

  27. arXiv:2211.09394  [pdf, other

    cs.CL

    ConNER: Consistency Training for Cross-lingual Named Entity Recognition

    Authors: Ran Zhou, Xin Li, Lidong Bing, Erik Cambria, Luo Si, Chunyan Miao

    Abstract: Cross-lingual named entity recognition (NER) suffers from data scarcity in the target languages, especially under zero-shot settings. Existing translate-train or knowledge distillation methods attempt to bridge the language gap, but often introduce a high level of noise. To solve this problem, consistency training methods regularize the model to be robust towards perturbations on data or hidden st… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by EMNLP 2022

  28. arXiv:2211.08794  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations

    Authors: Linlin Liu, Xingxuan Li, Megh Thakkar, Xin Li, Shafiq Joty, Luo Si, Lidong Bing

    Abstract: Due to the huge amount of parameters, fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios. In this work, we present a novel method that operates on the hidden representations of a PLM to reduce overfitting. During fine-tuning, our method inserts random autoencoders between the hidden layers of a PLM, which transform activations from the previous l… ▽ More

    Submitted 26 May, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted by ACL 2023

  29. arXiv:2211.05561  [pdf, other

    cs.CL cs.AI cs.LG

    Estimating Soft Labels for Out-of-Domain Intent Detection

    Authors: Hao Lang, Yinhe Zheng, Jian Sun, Fei Huang, Luo Si, Yongbin Li

    Abstract: Out-of-Domain (OOD) intent detection is important for practical dialog systems. To alleviate the issue of lacking OOD training samples, some works propose synthesizing pseudo OOD samples and directly assigning one-hot OOD labels to these pseudo samples. However, these one-hot labels introduce noises to the training process because some hard pseudo OOD samples may coincide with In-Domain (IND) inte… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: EMNLP2022 Main Track Long Paper (Oral presentation)

  30. arXiv:2210.14502  [pdf, other

    cs.CL

    SentBS: Sentence-level Beam Search for Controllable Summarization

    Authors: Chenhui Shen, Liying Cheng, Lidong Bing, Yang You, Luo Si

    Abstract: A wide range of control perspectives have been explored in controllable text generation. Structure-controlled summarization is recently proposed as a useful and interesting research direction. However, current structure-controlling methods have limited effectiveness in enforcing the desired structure. To address this limitation, we propose a sentence-level beam search generation method (SentBS), w… ▽ More

    Submitted 23 February, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 10 pages, 1 figure, accepted by EMNLP 2022

  31. arXiv:2210.12674  [pdf, other

    cs.CL

    Towards Generalizable and Robust Text-to-SQL Parsing

    Authors: Chang Gao, Bowen Li, Wenxuan Zhang, Wai Lam, Binhua Li, Fei Huang, Luo Si, Yongbin Li

    Abstract: Text-to-SQL parsing tackles the problem of mapping natural language questions to executable SQL queries. In practice, text-to-SQL parsers often encounter various challenging scenarios, requiring them to be generalizable and robust. While most existing work addresses a particular generalization or robustness challenge, we aim to study it in a more comprehensive manner. In specific, we believe that… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  32. arXiv:2210.11888  [pdf, other

    cs.CL

    STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing

    Authors: Zefeng Cai, Xiangyu Li, Binyuan Hui, Min Yang, Bowen Li, Binhua Li, Zheng Cao, Weijie Li, Fei Huang, Luo Si, Yongbin Li

    Abstract: In this paper, we propose a novel SQL guided pre-training framework STAR for context-dependent text-to-SQL parsing, which leverages contextual information to enrich natural language (NL) utterance and table schema representations for text-to-SQL conversations. Concretely, we propose two novel pre-training objectives which respectively explore the context-dependent interactions of NL utterances and… ▽ More

    Submitted 27 October, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  33. arXiv:2210.11060  [pdf, other

    cs.CL

    Doc2Bot: Accessing Heterogeneous Documents via Conversational Bots

    Authors: Haomin Fu, Yeqin Zhang, Haiyang Yu, Jian Sun, Fei Huang, Luo Si, Yongbin Li, Cam-Tu Nguyen

    Abstract: This paper introduces Doc2Bot, a novel dataset for building machines that help users seek information via conversations. This is of particular interest for companies and organizations that own a large number of manuals or instruction books. Despite its potential, the nature of our task poses several challenges: (1) documents contain various structures that hinder the ability of machines to compreh… ▽ More

    Submitted 19 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 17 pages, 14 figures. Accepted by Findings of EMNLP 2022

  34. arXiv:2209.06664  [pdf, other

    cs.CL

    SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation

    Authors: Wanwei He, Yinpei Dai, Min Yang, Jian Sun, Fei Huang, Luo Si, Yongbin Li

    Abstract: Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either dialog understanding or dialog generation, but not both. In this paper, we propose SPACE-3, a novel unified semi-supervised pre-trained conversation model learning from large-scale dialog corpora with limited annotations, which can be… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: 14 pages, 5 figures. Accepted by SIGIR 2022

  35. arXiv:2209.06638  [pdf, other

    cs.CL

    SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding

    Authors: Wanwei He, Yinpei Dai, Binyuan Hui, Min Yang, Zheng Cao, Jianbo Dong, Fei Huang, Luo Si, Yongbin Li

    Abstract: Pre-training methods with contrastive learning objectives have shown remarkable success in dialog understanding tasks. However, current contrastive learning solely considers the self-augmented dialog samples as positive samples and treats all other dialog samples as negative ones, which enforces dissimilar representations even for dialogs that are semantically related. In this paper, we propose SP… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: 17 pages, 6 figures. Accepted by COLING 2022

  36. arXiv:2209.06442  [pdf, other

    cs.CL

    SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers

    Authors: Bowen Qin, Lihan Wang, Binyuan Hui, Bowen Li, Xiangpeng Wei, Binhua Li, Fei Huang, Luo Si, Min Yang, Yongbin Li

    Abstract: This paper aims to improve the performance of text-to-SQL parsing by exploring the intrinsic uncertainties in the neural network based approaches (called SUN). From the data uncertainty perspective, it is indisputable that a single SQL can be learned from multiple semantically-equivalent questions.Different from previous methods that are limited to one-to-one mapping, we propose a data uncertainty… ▽ More

    Submitted 28 October, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: Accepted at COLING 2022

  37. arXiv:2208.13629  [pdf, other

    cs.CL

    A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

    Authors: Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, Fei Huang, Yongbin Li

    Abstract: Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactio… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  38. arXiv:2208.12681  [pdf, other

    cs.CV

    Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective

    Authors: Jiangmeng Li, Yanan Zhang, Wenwen Qiang, Lingyu Si, Chengbo Jiao, Xiaohui Hu, Changwen Zheng, Fuchun Sun

    Abstract: Few-shot learning models learn representations with limited human annotations, and such a learning paradigm demonstrates practicability in various tasks, e.g., image classification, object detection, etc. However, few-shot object detection methods suffer from an intrinsic defect that the limited training data makes the model cannot sufficiently explore semantic information. To tackle this, we intr… ▽ More

    Submitted 9 December, 2022; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: Accepted by AAAI 2023

  39. arXiv:2208.08584  [pdf, other

    cs.LG stat.ME

    Robust Causal Graph Representation Learning against Confounding Effects

    Authors: Hang Gao, Jiangmeng Li, Wenwen Qiang, Lingyu Si, Bing Xu, Changwen Zheng, Fuchun Sun

    Abstract: The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with… ▽ More

    Submitted 10 February, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Accepted by AAAI 2023 as Oral Presentation

  40. arXiv:2206.14017  [pdf, other

    cs.CL

    Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing

    Authors: Lihan Wang, Bowen Qin, Binyuan Hui, Bowen Li, Min Yang, Bailin Wang, Binhua Li, Fei Huang, Luo Si, Yongbin Li

    Abstract: The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing… ▽ More

    Submitted 6 August, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Accepted at KDD 2022

  41. arXiv:2206.13155  [pdf, other

    cs.CV cs.CL cs.MM

    Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

    Authors: Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

    Abstract: Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks. Though existing document pre-trained models have achieved excellent performance on standard benchmarks for VrDU, the way they model and exploit the interactions between vision and language on documents has hindered them from better generalization ability and h… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Under review

  42. Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems

    Authors: Ting-En Lin, Yuchuan Wu, Fei Huang, Luo Si, Jian Sun, Yongbin Li

    Abstract: In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection,… ▽ More

    Submitted 14 June, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: Accepted by KDD 2022, ADS track

  43. arXiv:2205.14704  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

    Authors: Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen

    Abstract: Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training… ▽ More

    Submitted 19 September, 2023; v1 submitted 29 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022 (Spotlight)

  44. arXiv:2205.13425  [pdf, other

    cs.CV

    Do we really need temporal convolutions in action segmentation?

    Authors: Dazhao Du, Bing Su, Yu Li, Zhongang Qi, Lingyu Si, Ying Shan

    Abstract: Action classification has made great progress, but segmenting and recognizing actions from long untrimmed videos remains a challenging problem. Most state-of-the-art methods focus on designing temporal convolution-based models, but the inflexibility of temporal convolutions and the difficulties in modeling long-term temporal dependencies restrict the potential of these models. Transformer-based mo… ▽ More

    Submitted 22 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  45. arXiv:2205.12005  [pdf, other

    cs.CL cs.CV

    mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

    Authors: Chenliang Li, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye, Hehong Chen, Guohai Xu, Zheng Cao, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou, Luo Si

    Abstract: Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks. This paper presents mPLUG, a new vision-language foundation model for both cross-modal understanding and generation. Most existing pre-trained models suffer from the problems of low computational efficiency and… ▽ More

    Submitted 25 May, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Journal ref: EMNLP2022

  46. arXiv:2205.03521  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

    Authors: Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen

    Abstract: Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity a… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted by NAACL 2022

  47. arXiv:2205.02357  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

    Authors: Xiang Chen, Ningyu Zhang, Lei Li, Shumin Deng, Chuanqi Tan, Changliang Xu, Fei Huang, Luo Si, Huajun Chen

    Abstract: Multimodal Knowledge Graphs (MKGs), which organize visual-text factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, dif… ▽ More

    Submitted 18 September, 2023; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted by SIGIR 2022. Fix a severe bug

  48. arXiv:2205.02355  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt Tuning

    Authors: Xiang Chen, Lei Li, Ningyu Zhang, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen

    Abstract: Pre-trained language models have contributed significantly to relation extraction by demonstrating remarkable few-shot learning abilities. However, prompt tuning methods for relation extraction may still fail to generalize to those rare or hard patterns. Note that the previous parametric learning paradigm can be viewed as memorization regarding training data as a book and inference as the close-bo… ▽ More

    Submitted 19 September, 2023; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted by SIGIR 2022, short paper

  49. arXiv:2203.12257  [pdf, other

    cs.CL

    IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks

    Authors: Liying Cheng, Lidong Bing, Ruidan He, Qian Yu, Yan Zhang, Luo Si

    Abstract: Traditionally, a debate usually requires a manual preparation process, including reading plenty of articles, selecting the claims, identifying the stances of the claims, seeking the evidence for the claims, etc. As the AI debate attracts more attention these years, it is worth exploring the methods to automate the tedious process involved in the debating system. In this work, we introduce a compre… ▽ More

    Submitted 16 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: 11 pages, 3 figures, accepted by ACL 2022

  50. arXiv:2203.09101  [pdf, other

    cs.CL

    RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction

    Authors: Yew Ken Chia, Lidong Bing, Soujanya Poria, Luo Si

    Abstract: Despite the importance of relation extraction in building and representing knowledge, less research is focused on generalizing to unseen relations types. We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE) to encourage further research in low-resource relation extraction methods. Given an input sentence, each extracted triplet consists of the head entity, relation labe… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 13 pages, 9 figures, to appear in ACL Findings 2022