Skip to main content

Showing 1–50 of 91 results for author: Lian, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07653  [pdf, other

    cs.HC

    AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Jiangyan Yi, Bin Liu, Jianhua Tao

    Abstract: Explainable Multimodal Emotion Recognition (EMER) is an emerging task that aims to achieve reliable and accurate emotion recognition. However, due to the high annotation cost, the existing dataset (denoted as EMER-Fine) is small, making it difficult to perform supervised training. To reduce the annotation cost and expand the dataset size, this paper reviews the previous dataset construction proces… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.02751  [pdf, other

    cs.CL cs.AI

    Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset

    Authors: Rui Liu, Haolin Zuo, Zheng Lian, Xiaofen Xing, Björn W. Schuller, Haizhou Li

    Abstract: Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, lang… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages, 8 figures, 12 tables, NeurIPS 2024 Dataset and Benchmark Track

  3. arXiv:2407.01925  [pdf, other

    cs.CV

    Looking From the Future: Multi-order Iterations Can Enhance Adversarial Attack Transferability

    Authors: Zijian Ying, Qianmu Li, Tao Wang, Zhichao Lian, Shunmei Meng, Xuyun Zhang

    Abstract: Various methods try to enhance adversarial transferability by improving the generalization from different perspectives. In this paper, we rethink the optimization process and propose a novel sequence optimization concept, which is named Looking From the Future (LFF). LFF makes use of the original optimization process to refine the very first local optimization choice. Adapting the LFF concept to t… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2406.18567  [pdf, ps, other

    cs.CV cs.DB

    Research on Image Processing and Vectorization Storage Based on Garage Electronic Maps

    Authors: Nan Dou, Qi Shi, Zhigang Lian

    Abstract: For the purpose of achieving a more precise definition and data analysis of images, this study conducted a research on vectorization and rasterization storage of electronic maps, focusing on a large underground parking garage map. During the research, image processing, vectorization and rasterization storage were performed. The paper proposed a method for the vectorization classification storage o… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  5. arXiv:2406.11161  [pdf, other

    cs.AI cs.MM

    Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

    Authors: Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann

    Abstract: Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing su… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 37 pages, 12 figures, Project: https://github.com/ZebangCheng/Emotion-LLaMA, Demo: https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA

  6. arXiv:2405.15776  [pdf, other

    cs.RO cs.AI

    CalliRewrite: Recovering Handwriting Behaviors from Calligraphy Images without Supervision

    Authors: Yuxuan Luo, Zekun Wu, Zhouhui Lian

    Abstract: Human-like planning skills and dexterous manipulation have long posed challenges in the fields of robotics and artificial intelligence (AI). The task of reinterpreting calligraphy presents a formidable challenge, as it involves the decomposition of strokes and dexterous utensil control. Previous efforts have primarily focused on supervised learning of a single instrument, limiting the performance… ▽ More

    Submitted 20 March, 2024; originally announced May 2024.

    Comments: 8 pages, accepted as ICRA 2024 contributed paper

  7. arXiv:2404.17113  [pdf, other

    cs.LG cs.HC

    MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing the dataset size and building more effective algorithms. However, due to problems such as complex environments and inaccurate annotations, current systems are hard to meet the demands of practical applications. Therefore, we or… ▽ More

    Submitted 18 July, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.16067  [pdf, other

    cs.HC cs.AI

    Layout2Rendering: AI-aided Greenspace design

    Authors: Ran Chen, Zeke Lian, Yueheng He, Xiao Ling, Fuyu Yang, Xueqi Yao, Xingjian Yi, Jing Zhao

    Abstract: In traditional human living environment landscape design, the establishment of three-dimensional models is an essential step for designers to intuitively present the spatial relationships of design elements, as well as a foundation for conducting landscape analysis on the site. Rapidly and effectively generating beautiful and realistic landscape spaces is a significant challenge faced by designers… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 14 pages,8 figures

  9. arXiv:2403.15044  [pdf, other

    cs.CV cs.AI

    Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

    Authors: Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  10. arXiv:2403.13846  [pdf, other

    cs.LG cs.AI

    A Clustering Method with Graph Maximum Decoding Information

    Authors: Xinrun Xu, Manying Lv, Zhanbiao Lian, Yurong Wu, Jin Yan, Shan Jiang, Zhiming Ding

    Abstract: The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of rela… ▽ More

    Submitted 18 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 9 pages, 9 figures, IJCNN 2024

  11. arXiv:2403.10299  [pdf, other

    cs.AI

    A Multi-constraint and Multi-objective Allocation Model for Emergency Rescue in IoT Environment

    Authors: Xinrun Xu, Zhanbiao Lian, Yurong Wu, Manying Lv, Zhiming Ding, Jian Yan, Shang Jiang

    Abstract: Emergency relief operations are essential in disaster aftermaths, necessitating effective resource allocation to minimize negative impacts and maximize benefits. In prolonged crises or extensive disasters, a systematic, multi-cycle approach is key for timely and informed decision-making. Leveraging advancements in IoT and spatio-temporal data analytics, we've developed the Multi-Objective Shuffled… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, 5 figures, ISCAS 2024

  12. arXiv:2402.11432  [pdf, other

    cs.CL

    Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning

    Authors: Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao

    Abstract: Deception detection has attracted increasing attention due to its importance in real-world scenarios. Its main goal is to detect deceptive behaviors from multimodal clues such as gestures, facial expressions, prosody, etc. However, these bases are usually subjective and related to personal habits. Therefore, we extend deception detection to deception reasoning, further providing objective evidence… ▽ More

    Submitted 16 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  13. arXiv:2402.00606  [pdf, other

    cs.CV

    Dynamic Texture Transfer using PatchMatch and Transformers

    Authors: Guo Pu, Shiyao Xu, Xixin Cao, Zhouhui Lian

    Abstract: How to automatically transfer the dynamic texture of a given video to the target still image is a challenging and ongoing problem. In this paper, we propose to handle this task via a simple yet effective model that utilizes both PatchMatch and Transformers. The key idea is to decompose the task of dynamic texture transfer into two stages, where the start frame of the target video with the desired… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  14. arXiv:2401.16444  [pdf, other

    cs.HC cs.AI

    Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain

    Authors: Yiming Gao, Feiyu Liu, Liang Wang, Zhenjie Lian, Dehua Zheng, Weixuan Wang, Wenjin Yang, Siqin Li, Xianliang Wang, Wenhui Chen, Jing Dai, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

    Abstract: Existing game AI research mainly focuses on enhancing agents' abilities to win games, but this does not inherently make humans have a better experience when collaborating with these agents. For example, agents may dominate the collaboration and exhibit unintended or detrimental behaviors, leading to poor experiences for their human partners. In other words, most game AI agents are modeled in a "se… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024. arXiv admin note: text overlap with arXiv:2304.11632

  15. arXiv:2401.05698  [pdf, other

    cs.CV cs.HC cs.MM cs.SD eess.AS

    HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

    Authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in recent years for its critical role in creating emotion-ware intelligent machines. Previous efforts in this area are dominated by the supervised learning paradigm. Despite significant progress, supervised learning is meeting its bottleneck due to the longstanding data scarcity issue in AVER. Motivated by recent advances in… ▽ More

    Submitted 1 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by Information Fusion. The code is available at https://github.com/sunlicai/HiCMAE

    Journal ref: Information Fusion, 2024

  16. arXiv:2401.03429  [pdf, other

    cs.HC

    MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

    Authors: Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, Jianhua Tao

    Abstract: Multimodal emotion recognition plays a crucial role in enhancing user experience in human-computer interaction. Over the past few decades, researchers have proposed a series of algorithms and achieved impressive progress. Although each method shows its superior performance, different methods lack a fair comparison due to inconsistencies in feature extractors, evaluation manners, and experimental s… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  17. arXiv:2401.01173  [pdf, other

    cs.CV

    En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

    Authors: Yifang Men, Biwen Lei, Yuan Yao, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

    Abstract: We present En3D, an enhanced generative scheme for sculpting high-quality 3D human avatars. Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalanced viewing angles and imprecise pose priors, our approach aims to develop a zero-shot 3D generative scheme capable of producing visually realistic, geometrically accurate and content-wise diverse 3D humans without r… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Project Page: https://menyifang.github.io/projects/En3D/index.html

  18. arXiv:2401.00416  [pdf, other

    cs.CV cs.HC cs.MM

    SVFAP: Self-supervised Video Facial Affect Perceiver

    Authors: Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, Jianhua Tao

    Abstract: Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction. Previous studies mainly focus on developing various deep learning architectures and training them in a fully supervised manner. Although significant progress has been achieved by these supervised methods, the longstanding lack of large-scale high-quality labeled… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: Submitted to IEEE Trans. on Affective Computing (February 8, 2023)

  19. arXiv:2312.15583  [pdf, other

    cs.MM

    ITEACH-Net: Inverted Teacher-studEnt seArCH Network for Emotion Recognition in Conversation

    Authors: Haiyang Sun, Zheng Lian, Chenglong Wang, Kang Chen, Licai Sun, Bin Liu, Jianhua Tao

    Abstract: There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for i… ▽ More

    Submitted 1 June, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  20. arXiv:2312.11037  [pdf, other

    cs.CV

    SinMPI: Novel View Synthesis from a Single Image with Expanded Multiplane Images

    Authors: Guo Pu, Peng-Shuai Wang, Zhouhui Lian

    Abstract: Single-image novel view synthesis is a challenging and ongoing problem that aims to generate an infinite number of consistent views from a single input image. Although significant efforts have been made to advance the quality of generated novel views, less attention has been paid to the expansion of the underlying scene representation, which is crucial to the generation of realistic novel view ima… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 10 pages

  21. arXiv:2312.10674  [pdf

    cs.CV

    A Framework of Full-Process Generation Design for Park Green Spaces Based on Remote Sensing Segmentation-GAN-Diffusion

    Authors: Ran Chen, Xingjian Yi, Jing Zhao, Yueheng He, Bainian Chen, Xueqi Yao, Fangjun Liu, Haoran Li, Zeke Lian

    Abstract: The development of generative design driven by artificial intelligence algorithms is speedy. There are two research gaps in the current research: 1) Most studies only focus on the relationship between design elements and pay little attention to the external information of the site; 2) GAN and other traditional generative algorithms generate results with low resolution and insufficient details. To… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  22. arXiv:2312.10314  [pdf, other

    cs.CV

    DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by Integrating Dual-modality Generative Models

    Authors: Yitian Liu, Zhouhui Lian

    Abstract: Few-shot font generation, especially for Chinese calligraphy fonts, is a challenging and ongoing problem. With the help of prior knowledge that is mainly based on glyph consistency assumptions, some recently proposed methods can synthesize high-quality Chinese glyph images. However, glyphs in calligraphy font styles often do not meet these assumptions. To address this problem, we propose a novel m… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: AAAI2024

  23. arXiv:2312.04884  [pdf, other

    cs.CV

    UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

    Authors: Yiming Zhao, Zhouhui Lian

    Abstract: Text-to-Image (T2I) generation methods based on diffusion model have garnered significant attention in the last few years. Although these image synthesis methods produce visually appealing results, they frequently exhibit spelling errors when rendering text within the generated images. Such errors manifest as missing, incorrect or extraneous characters, thereby severely constraining the performanc… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 17 pages

  24. arXiv:2312.04293  [pdf, other

    cs.CV cs.MM

    GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition

    Authors: Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, Jianhua Tao

    Abstract: Recently, GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated. To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion re… ▽ More

    Submitted 17 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  25. arXiv:2311.16114  [pdf

    cs.CV cs.AI cs.LG

    Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios

    Authors: Qi Fan, Haolin Zuo, Rui Liu, Zheng Lian, Guanglai Gao

    Abstract: Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete conditions during the training phase to enhance the system's overall robustness. Traditional methods have often involved discarding data or substituting data seg… ▽ More

    Submitted 7 May, 2024; v1 submitted 21 September, 2023; originally announced November 2023.

  26. arXiv:2311.15339  [pdf, other

    cs.CV cs.CR cs.LG eess.IV

    Adversarial Purification of Information Masking

    Authors: Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao

    Abstract: Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  27. arXiv:2311.12051  [pdf, other

    cs.CV

    Boost Adversarial Transferability by Uniform Scale and Mix Mask Method

    Authors: Tao Wang, Zijian Ying, Qianmu Li, zhichao Lian

    Abstract: Adversarial examples generated from surrogate models often possess the ability to deceive other black-box models, a property known as transferability. Recent research has focused on enhancing adversarial transferability, with input transformation being one of the most effective approaches. However, existing input transformation methods suffer from two issues. Firstly, certain methods, such as the… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  28. arXiv:2307.02227  [pdf, other

    cs.CV cs.AI cs.HC cs.MM

    MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition

    Authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines. Prior efforts in this field mainly fall into supervised learning paradigm, which is severely restricted by the limited labeled data in existing datasets. Inspired by recent unprecedented success of masked autoencoders (e.g., VideoMAE), this paper proposes MAE-DFER, a novel self-supe… ▽ More

    Submitted 7 August, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: ACM MM 2023 (camera ready). Codes and models are publicly available at https://github.com/sunlicai/MAE-DFER

  29. arXiv:2306.15401  [pdf, other

    cs.MM cs.HC

    Explainable Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Hao Gu, Zhuofan Wen, Siyuan Zhang, Shun Chen, Mingyu Xu, Ke Xu, Kang Chen, Lan Chen, Shan Liang, Ya Li, Jiangyan Yi, Bin Liu, Jianhua Tao

    Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence, whose main goal is to integrate multimodal clues to identify human emotional states. Current works generally assume accurate labels for benchmark datasets and focus on developing more effective architectures. However, emotion annotation relies on subjective judgment. To obtain more reliable labels, existing d… ▽ More

    Submitted 23 May, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

  30. arXiv:2306.09361  [pdf, other

    eess.AS cs.CL cs.SD

    MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition

    Authors: Haiyang Sun, Fulin Zhang, Yingying Gao, Zheng Lian, Shilei Zhang, Junlan Feng

    Abstract: Speech Emotion Recognition (SER) is an important research topic in human-computer interaction. Many recent works focus on directly extracting emotional cues through pre-trained knowledge, frequently overlooking considerations of appropriateness and comprehensiveness. Therefore, we propose a novel framework for pre-training knowledge in SER, called Multi-perspective Fusion Search Network (MFSN). Co… ▽ More

    Submitted 26 June, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

  31. arXiv:2305.13774  [pdf, other

    cs.SD eess.AS

    ADD 2023: the Second Audio Deepfake Detection Challenge

    Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

    Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  32. arXiv:2304.11632  [pdf, other

    cs.AI

    Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective

    Authors: Yiming Gao, Feiyu Liu, Liang Wang, Zhenjie Lian, Weixuan Wang, Siqin Li, Xianliang Wang, Xianhan Zeng, Rundong Wang, Jiawei Wang, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

    Abstract: MOBA games, e.g., Dota2 and Honor of Kings, have been actively used as the testbed for the recent AI research on games, and various AI systems have been developed at the human level so far. However, these AI systems mainly focus on how to compete with humans, less on exploring how to collaborate with humans. To this end, this paper makes the first attempt to investigate human-agent collaboration i… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted at ICLR 2023

  33. arXiv:2304.08981  [pdf, other

    cs.CL cs.CV

    MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provi… ▽ More

    Submitted 14 September, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  34. arXiv:2304.02263  [pdf, other

    cs.CV

    Towards Efficient Task-Driven Model Reprogramming with Foundation Models

    Authors: Shoukai Xu, Jiangchao Yao, Ran Luo, Shuhai Zhang, Zihao Lian, Mingkui Tan, Bo Han, Yaowei Wang

    Abstract: Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of dow… ▽ More

    Submitted 6 May, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

  35. arXiv:2303.14585  [pdf, other

    cs.CV cs.GR

    DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality

    Authors: Yuqing Wang, Yizhi Wang, Longhui Yu, Yuesheng Zhu, Zhouhui Lian

    Abstract: Vector font synthesis is a challenging and ongoing problem in the fields of Computer Vision and Computer Graphics. The recently-proposed DeepVecFont achieved state-of-the-art performance by exploiting information of both the image and sequence modalities of vector fonts. However, it has limited capability for handling long sequence data and heavily relies on an image-guided outline refinement post… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. Code: https://github.com/yizhiwang96/deepvecfont-v2

  36. arXiv:2303.12675  [pdf, other

    cs.CV

    VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions

    Authors: Zeqing Xia, Bojun Xiong, Zhouhui Lian

    Abstract: Font design is of vital importance in the digital content design and modern printing industry. Developing algorithms capable of automatically synthesizing vector fonts can significantly facilitate the font design process. However, existing methods mainly concentrate on raster image generation, and only a few approaches can directly synthesize vector fonts. This paper proposes an end-to-end trainab… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  37. eBPF-based Working Set Size Estimation in Memory Management

    Authors: Zhilu Lian, Yangzi Li, Zhixiang Chen, Shiwen Shan, Baoxin Han, Yuxin Su

    Abstract: Working set size estimation (WSS) is of great significance to improve the efficiency of program executing and memory arrangement in modern operating systems. Previous work proposed several methods to estimate WSS, including self-balloning, Zballoning and so on. However, these methods which are based on virtual machine usually cause a large overhead. Thus, using those methods to estimate WSS is imp… ▽ More

    Submitted 16 January, 2023; originally announced March 2023.

    Comments: 8 pages, 6 figures

  38. arXiv:2303.03946  [pdf, ps, other

    cs.LG

    Pseudo Labels Regularization for Imbalanced Partial-Label Learning

    Authors: Mingyu Xu, Zheng Lian

    Abstract: Partial-label learning (PLL) is an important branch of weakly supervised learning where the single ground truth resides in a set of candidate labels, while the research rarely considers the label imbalance. A recent study for imbalanced partial-Label learning proposed that the combinatorial challenge of partial-label learning and long-tail learning lies in matching between a decent marginal prior… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2209.10365 by other authors

  39. arXiv:2302.11716  [pdf, other

    cs.LG

    VRA: Variational Rectified Activation for Out-of-distribution Detection

    Authors: Mingyu Xu, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Out-of-distribution (OOD) detection is critical to building reliable machine learning systems in the open world. Researchers have proposed various strategies to reduce model overconfidence on OOD data. Among them, ReAct is a typical and effective technique to deal with model overconfidence, which truncates high activations to increase the gap between in-distribution and OOD. Despite its promising… ▽ More

    Submitted 17 May, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

  40. arXiv:2302.02650  [pdf, ps, other

    q-bio.SC cs.LG q-bio.QM

    Tree-Based Learning on Amperometric Time Series Data Demonstrates High Accuracy for Classification

    Authors: Jeyashree Krishnan, Zeyu Lian, Pieter E. Oomen, Xiulan He, Soodabeh Majdi, Andreas Schuppert, Andrew Ewing

    Abstract: Elucidating exocytosis processes provide insights into cellular neurotransmission mechanisms, and may have potential in neurodegenerative diseases research. Amperometry is an established electrochemical method for the detection of neurotransmitters released from and stored inside cells. An important aspect of the amperometry method is the sub-millisecond temporal resolution of the current recordin… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: 56 pages, 11 figures

  41. arXiv:2301.12077  [pdf, other

    cs.CV cs.LG

    ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning

    Authors: Mingyu Xu, Zheng Lian, Lei Feng, Bin Liu, Jianhua Tao

    Abstract: Noisy partial label learning (noisy PLL) is an important branch of weakly supervised learning. Unlike PLL where the ground-truth label must conceal in the candidate label set, noisy PLL relaxes this constraint and allows the ground-truth label may not be in the candidate label set. To address this challenging problem, most of the existing works attempt to detect noisy samples and estimate the grou… ▽ More

    Submitted 17 May, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  42. arXiv:2211.04774  [pdf, other

    cs.CV cs.AI

    IRNet: Iterative Refinement Network for Noisy Partial Label Learning

    Authors: Zheng Lian, Mingyu Xu, Lan Chen, Licai Sun, Bin Liu, Jianhua Tao

    Abstract: Partial label learning (PLL) is a typical weakly supervised learning, where each sample is associated with a set of candidate labels. The basic assumption of PLL is that the ground-truth label must reside in the candidate set. However, this assumption may not be satisfied due to the unprofessional judgment of the annotators, thus limiting the practical application of PLL. In this paper, we relax t… ▽ More

    Submitted 8 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  43. arXiv:2210.11298  [pdf, other

    cs.AI

    Tele-Knowledge Pre-training for Fault Analysis

    Authors: Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, Yingying Li, Lei Cheng, Huajun Chen

    Abstract: In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents. To organize this knowledge from experts uniformly, we propose to create a Tele-KG (tele-knowledge graph). Using this valuable data, we further propose a tele-d… ▽ More

    Submitted 17 February, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: ICDE 2023 https://github.com/hackerchenzhuo/KTeleBERT

  44. arXiv:2210.06301  [pdf, other

    cs.CV

    FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked Transformers

    Authors: Yitian Liu, Zhouhui Lian

    Abstract: Automatic generation of high-quality Chinese fonts from a few online training samples is a challenging task, especially when the amount of samples is very small. Existing few-shot font generation methods can only synthesize low-resolution glyph images that often possess incorrect topological structures or/and incomplete strokes. To address the problem, this paper proposes FontTransformer, a novel… ▽ More

    Submitted 12 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 23 pages, 14 Figures

  45. arXiv:2209.08791  [pdf, other

    cs.CV cs.GR

    DifferSketching: How Differently Do People Sketch 3D Objects?

    Authors: Chufeng Xiao, Wanchao Su, Jing Liao, Zhouhui Lian, Yi-Zhe Song, Hongbo Fu

    Abstract: Multiple sketch datasets have been proposed to understand how people draw 3D objects. However, such datasets are often of small scale and cover a small set of objects or categories. In addition, these datasets contain freehand sketches mostly from expert users, making it difficult to compare the drawings by expert and novice users, while such comparisons are critical in informing more effective sk… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: SIGGRAPH Asia 2022 (Journal Track)

  46. arXiv:2208.13472  [pdf, other

    cs.CL

    Supporting Medical Relation Extraction via Causality-Pruned Semantic Dependency Forest

    Authors: Yifan Jin, Jiangmeng Li, Zheng Lian, Chengbo Jiao, Xiaohui Hu

    Abstract: Medical Relation Extraction (MRE) task aims to extract relations between entities in medical texts. Traditional relation extraction methods achieve impressive success by exploring the syntactic information, e.g., dependency tree. However, the quality of the 1-best dependency tree for medical texts produced by an out-of-domain parser is relatively limited so that the performance of medical relation… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: Accepted to the conference of COLING2022 as an Oral presentation

  47. arXiv:2208.07589  [pdf, other

    cs.LG cs.CL cs.CV cs.MM

    Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis

    Authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs i… ▽ More

    Submitted 21 May, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted by TAC. The code is available at https://github.com/sunlicai/EMT-DLFR

    Journal ref: IEEE Transactions on Affective Computing, 2023

  48. arXiv:2207.11389  [pdf, other

    cs.CV cs.AI

    Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge

    Authors: Haiyang Sun, Zheng Lian, Bin Liu, Jianhua Tao, Licai Sun, Cong Cai

    Abstract: In this paper, we propose the solution to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition. The task of ABAW is to predict frame-level emotion descriptors from videos: discrete emotional state; valence and arousal; and action units. Although researchers have proposed several approaches and achieved promising results in ABAW, current work… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  49. arXiv:2207.02426  [pdf, other

    cs.CV

    DCT-Net: Domain-Calibrated Translation for Portrait Stylization

    Authors: Yifang Men, Yuan Yao, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

    Abstract: This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars ($\sim$100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted by SIGGRAPH 2022 (TOG). Project Page: https://menyifang.github.io/projects/DCTNet/DCTNet.html , Code: https://github.com/menyifang/DCT-Net

  50. arXiv:2206.13761  [pdf, other

    cs.LG

    Classification of ADHD Patients Using Kernel Hierarchical Extreme Learning Machine

    Authors: Sartaj Ahmed Salman, Zhichao Lian, Milad Taleby Ahvanooey, Hiroki Takahashi, Yuduo Zhang

    Abstract: Recently, the application of deep learning models to diagnose neuropsychiatric diseases from brain imaging data has received more and more attention. However, in practice, exploring interactions in brain functional connectivity based on operational magnetic resonance imaging data is critical for studying mental illness. Since Attention-Deficit and Hyperactivity Disorder (ADHD) is a type of chronic… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 8 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2202.08953