Zum Hauptinhalt springen

Showing 1–50 of 157 results for author: Weihong

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10614  [pdf, other

    cs.CV cs.AI

    Generalizable Facial Expression Recognition

    Authors: Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu, Weihong Deng

    Abstract: SOTA facial expression recognition (FER) methods fail on test sets that have domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model, which might be infeasible in real-world deployment. In this paper, we aim to improve the zero-shot generalization ability of FER methods on different unseen test s… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2408.04568  [pdf, other

    cs.CL cs.AI

    Learning Fine-Grained Grounded Citations for Attributed Large Language Models

    Authors: Lei Huang, Xiaocheng Feng, Weitao Ma, Yuxuan Gu, Weihong Zhong, Xiachong Feng, Weijiang Yu, Weihua Peng, Duyu Tang, Dandan Tu, Bing Qin

    Abstract: Despite the impressive performance on information-seeking tasks, large language models (LLMs) still struggle with hallucinations. Attributed LLMs, which augment generated text with in-line citations, have shown potential in mitigating hallucinations and improving verifiability. However, current approaches suffer from suboptimal citation quality due to their reliance on in-context learning. Further… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 Findings

  3. arXiv:2408.03633  [pdf, other

    cs.CL

    CARE: A Clue-guided Assistant for CSRs to Read User Manuals

    Authors: Weihong Du, Jia Liu, Zujie Wen, Dingnan Jin, Hongru Liang, Wenqiang Lei

    Abstract: It is time-saving to build a reading assistant for customer service representations (CSRs) when reading user manuals, especially information-rich ones. Current solutions don't fit the online custom service scenarios well due to the lack of attention to user questions and possible responses. Hence, we propose to develop a time-saving and careful reading assistant for CSRs, named CARE. It can help t… ▽ More

    Submitted 26 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  4. arXiv:2408.03630  [pdf, other

    cs.CL

    PAGED: A Benchmark for Procedural Graphs Extraction from Documents

    Authors: Weihong Du, Wenrui Liao, Hongru Liang, Wenqiang Lei

    Abstract: Automatic extraction of procedural graphs from documents creates a low-cost way for users to easily understand a complex procedure by skimming visual graphs. Despite the progress in recent studies, it remains unanswered: whether the existing studies have well solved this task (Q1) and whether the emerging large language models (LLMs) can bring new opportunities to this task (Q2). To this end, we p… ▽ More

    Submitted 7 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  5. arXiv:2408.03220  [pdf, other

    cs.LG cs.DC

    Masked Random Noise for Communication Efficient Federaetd Learning

    Authors: Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, and Ruixuan Li

    Abstract: Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  6. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  7. arXiv:2407.11409  [pdf, other

    cs.CL

    Representation Bias in Political Sample Simulations with Large Language Models

    Authors: Weihong Qi, Hanjia Lyu, Jiebo Luo

    Abstract: This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao Dataset, and China Family Panel Studies to simulate voting behaviors and public opinions. This me… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  8. arXiv:2407.00569  [pdf, other

    cs.CV cs.AI cs.CL

    Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

    Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

    Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More

    Submitted 3 August, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

  9. arXiv:2406.15796  [pdf, other

    cs.CL

    Rethinking Entity-level Unlearning for Large Language Models

    Authors: Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin

    Abstract: Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Work in progress

  10. arXiv:2406.08772  [pdf, other

    cs.CV cs.CL

    MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

    Authors: Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He

    Abstract: Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MM… ▽ More

    Submitted 21 August, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Project page: https://liuxuannan.github.io/MMFakeBench.github.io/

  11. arXiv:2405.21013  [pdf, other

    cs.CV

    StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

    Authors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, Jingdong Wang

    Abstract: Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Langu… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2404.12602  [pdf

    cs.CV cs.LG

    A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks

    Authors: Minzhe Huang, Changwei Nie, Weihong Zhong

    Abstract: In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. Existing FAS technologies primarily focus on intercepting physically forged faces and lack a robust solution for cross-domain FAS cha… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  13. arXiv:2403.12381  [pdf, other

    cs.CE

    Explainable AutoML (xAutoML) with adaptive modeling for yield enhancement in semiconductor smart manufacturing

    Authors: Weihong Zhai, Xiupeng Shi, Yiik Diew Wong, Qing Han, Lisheng Chen

    Abstract: Enhancing yield is recognized as a paramount driver to reducing production costs in semiconductor smart manufacturing. However, optimizing and ensuring high yield rates is a highly complex and technical challenge, especially while maintaining reliable yield diagnosis and prognosis, and this shall require understanding all the confounding factors in a complex condition. This study proposes a domain… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2403.09500  [pdf, other

    cs.CV

    Faceptor: A Generalist Model for Face Perception

    Authors: Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Deng

    Abstract: With the comprehensive research conducted on various face analysis tasks, there is a growing interest among researchers to develop a unified approach to face perception. Existing methods mainly discuss unified representation and training, which lack task extensibility and application efficiency. To tackle this issue, we focus on the unified model structure, exploring a face generalist model. As an… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  15. arXiv:2403.06529  [pdf, other

    cs.CV

    Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis

    Authors: Zijian Chen, Mei Wang, Weihong Deng, Hongzhi Shi, Dongchao Wen, Yingjie Zhang, Xingchen Cui, Jian Zhao

    Abstract: 2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. However, collecting sufficient paired RGB-D training data is expensive and time-consuming, hindering wide deployment. In this work, we first construct a diverse depth datase… ▽ More

    Submitted 16 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 9 pages, 5 figures

  16. arXiv:2403.03493  [pdf, other

    cs.CV

    VastTrack: Vast Category Visual Object Tracking

    Authors: Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

    Abstract: In this paper, we introduce a novel benchmark, dubbed VastTrack, towards facilitating the development of more general visual tracking via encompassing abundant classes and videos. VastTrack possesses several attractive properties: (1) Vast Object Category. In particular, it covers target objects from 2,115 classes, largely surpassing object categories of existing popular benchmarks (e.g., GOT-10k… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Tech. report

  17. arXiv:2403.01988  [pdf, other

    cs.CL

    FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

    Authors: Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He

    Abstract: The massive generation of multimodal fake news involving both text and images exhibits substantial distribution discrepancies, prompting the need for generalized detectors. However, the insulated nature of training restricts the capability of classical detectors to obtain open-world facts. While Large Vision-Language Models (LVLMs) have encoded rich world knowledge, they are not inherently tailore… ▽ More

    Submitted 6 August, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM MM 2024.Project page: https://liuxuannan.github.io/FKA_Owl.github.io/

  18. arXiv:2402.17970  [pdf

    cs.CR

    Exploring Advanced Methodologies in Security Evaluation for LLMs

    Authors: Jun Huang, Jiawei Zhang, Qi Wang, Weihong Han, Yanchun Zhang

    Abstract: Large Language Models (LLMs) represent an advanced evolution of earlier, simpler language models. They boast enhanced abilities to handle complex language patterns and generate coherent text, images, audios, and videos. Furthermore, they can be fine-tuned for specific tasks. This versatility has led to the proliferation and extensive use of numerous commercialized large models. However, the rapid… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  19. arXiv:2401.12507  [pdf, other

    cs.CV

    Open-Set Facial Expression Recognition

    Authors: Yuhang Zhang, Yue Yao, Xuannan Liu, Lixiong Qin, Wenjing Wang, Weihong Deng

    Abstract: Facial expression recognition (FER) models are typically trained on datasets with a fixed number of seven basic classes. However, recent research works point out that there are far more expressions than the basic ones. Thus, when these models are deployed in the real world, they may encounter unknown classes, such as compound expressions that cannot be classified into existing basic classes. To ad… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI2024

  20. arXiv:2401.09220  [pdf, other

    cs.CL

    UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

    Authors: Kai Hu, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

    Abstract: Existing methods for Visual Information Extraction (VIE) from form-like documents typically fragment the process into separate subtasks, such as key information extraction, key-value pair extraction, and choice group extraction. However, these approaches often overlook the hierarchical structure of form documents, including hierarchical key-value pairs and hierarchical choice groups. To address th… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  21. arXiv:2401.08212  [pdf, other

    cs.CV

    Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication

    Authors: Hanjia Lyu, Weihong Qi, Zhongyu Wei, Jiebo Luo

    Abstract: Leveraging Large Multimodal Models (LMMs) to simulate human behaviors when processing multimodal information, especially in the context of social media, has garnered immense interest due to its broad potential and far-reaching implications. Emojis, as one of the most unique aspects of digital communication, are pivotal in enriching and often clarifying the emotional and tonal dimensions. Yet, ther… ▽ More

    Submitted 15 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted for publication in ICWSM 2024

  22. arXiv:2401.05676  [pdf, other

    cs.CV

    Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

    Authors: Weibo Jiang, Weihong Ren, Jiandong Tian, Liangqiong Qu, Zhiyong Wang, Honghai Liu

    Abstract: Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregati… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  23. arXiv:2401.02150  [pdf, other

    cs.CV

    Marginal Debiased Network for Fair Visual Recognition

    Authors: Mei Wang, Weihong Deng, Sen Su

    Abstract: Deep neural networks (DNNs) are often prone to learn the spurious correlations between target classes and bias attributes, like gender and race, inherent in a major portion of training data (bias-aligned samples), thus showing unfair behavior and arising controversy in the modern pluralistic and egalitarian society. In this paper, we propose a novel marginal debiased network (MDN) to learn debiase… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  24. arXiv:2401.01575  [pdf, other

    cs.CV

    Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation

    Authors: Xuannan Liu, Yaoyao Zhong, Weihong Deng, Hongzhi Shi, Xingchen Cui, Yunfeng Yin, Dongchao Wen

    Abstract: The blooming of social media and face recognition (FR) systems has increased people's concern about privacy and security. A new type of adversarial privacy cloak (class-universal) can be applied to all the images of regular users, to prevent malicious FR systems from acquiring their identity information. In this work, we discover the optimization dilemma in the existing methods -- the local optima… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  25. arXiv:2401.00921  [pdf, other

    cs.CV

    Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence

    Authors: Ruizhuo Xu, Linzhi Huang, Mei Wang, Jiani Hu, Weihong Deng

    Abstract: Self-supervised pre-training paradigms have been extensively explored in the field of skeleton-based action recognition. In particular, methods based on masked prediction have pushed the performance of pre-training to a new height. However, these methods take low-level features, such as raw joint coordinates or temporal motion, as prediction targets for the masked regions, which is suboptimal. In… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: Submitted to CVPR 2024

  26. arXiv:2401.00719  [pdf, other

    cs.CV cs.AI

    Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition

    Authors: Ruizhuo Xu, Ke Wang, Chao Deng, Mei Wang, Xi Chen, Wenhui Huang, Junlan Feng, Weihong Deng

    Abstract: With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance th… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by Pattern Recognition

  27. arXiv:2312.14407  [pdf, other

    cs.CV

    AdvCloak: Customized Adversarial Cloak for Privacy Protection

    Authors: Xuannan Liu, Yaoyao Zhong, Xing Cui, Yuhang Zhang, Peipei Li, Weihong Deng

    Abstract: With extensive face images being shared on social media, there has been a notable escalation in privacy concerns. In this paper, we propose AdvCloak, an innovative framework for privacy protection using generative models. AdvCloak is designed to automatically customize class-wise adversarial masks that can maintain superior image-level naturalness while providing enhanced feature-level generalizat… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  28. arXiv:2312.06075  [pdf, other

    cs.CV

    Oracle Character Recognition using Unsupervised Discriminative Consistency Network

    Authors: Mei Wang, Weihong Deng, Sen Su

    Abstract: Ancient history relies on the study of ancient characters. However, real-world scanned oracle characters are difficult to collect and annotate, posing a major obstacle for oracle character recognition (OrCR). Besides, serious abrasion and inter-class similarity also make OrCR more challenging. In this paper, we propose a novel unsupervised domain adaptation method for OrCR, which enables to transf… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted by Pattern Recognition

  29. arXiv:2312.04257  [pdf, other

    cs.AR

    Proxima: Near-storage Acceleration for Graph-based Approximate Nearest Neighbor Search in 3D NAND

    Authors: Weihong Xu, Junwei Chen, Po-Kai Hsu, Jaeyoung Kang, Minxuan Zhou, Sumukh Pinge, Shimeng Yu, Tajana Rosing

    Abstract: Approximate nearest neighbor search (ANNS) plays an indispensable role in a wide variety of applications, including recommendation systems, information retrieval, and semantic search. Among the cutting-edge ANNS algorithms, graph-based approaches provide superior accuracy and scalability on massive datasets. However, the best-performing graph-based ANN search solutions incur tens of hundreds of me… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  30. arXiv:2311.16896  [pdf, other

    physics.optics cs.ET physics.app-ph

    65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics

    Authors: Zhongjin Lin, Bhavin J. Shastri, Shangxuan Yu, Jingxiang Song, Yuntao Zhu, Arman Safarnejadian, Wangning Cai, Yanmei Lin, Wei Ke, Mustafa Hammood, Tianye Wang, Mengyue Xu, Zibo Zheng, Mohammed Al-Qadasi, Omid Esmaeeli, Mohamed Rahim, Grzegorz Pakulski, Jens Schmid, Pedro Barrios, Weihong Jiang, Hugh Morison, Matthew Mitchell, Xiaogang Qiang, Xun Guan, Nicolas A. F. Jaeger , et al. (6 additional authors not shown)

    Abstract: Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III… ▽ More

    Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 19 pages, 6 figures

    MSC Class: 78A05

  31. arXiv:2311.16293  [pdf, other

    cs.AR cs.CR

    FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption

    Authors: Minxuan Zhou, Yujin Nam, Pranav Gangwar, Weihong Xu, Arpan Dutta, Kartikeyan Subramanyam, Chris Wilkerson, Rosario Cammarota, Saransh Gupta, Tajana Rosing

    Abstract: Fully Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption, making it ideal for securing many emerging applications. However, FHE computation is significantly slower than computation on plain data due to the increase in data size after encryption. Processing In-Memory (PIM) is a promising technology that ca… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  32. arXiv:2311.12874  [pdf, other

    q-bio.QM cs.AR cs.DC cs.LG

    SpecHD: Hyperdimensional Computing Framework for FPGA-based Mass Spectrometry Clustering

    Authors: Sumukh Pinge, Weihong Xu, Jaeyoung Kang, Tianqi Zhang, Neima Moshiri, Wout Bittremieux, Tajana Rosing

    Abstract: Mass spectrometry-based proteomics is a key enabler for personalized healthcare, providing a deep dive into the complex protein compositions of biological systems. This technology has vast applications in biotechnology and biomedicine but faces significant computational bottlenecks. Current methodologies often require multiple hours or even days to process extensive datasets, particularly in the d… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  33. arXiv:2311.05232  [pdf, other

    cs.CL

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    Authors: Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting Liu

    Abstract: The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses subs… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Work in progress; 49 pages

  34. arXiv:2310.19636  [pdf, other

    cs.CV

    Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition

    Authors: Yuhang Zhang, Yaqi Li, Lixiong Qin, Xuannan Liu, Weihong Deng

    Abstract: Facial expression data is characterized by a significant imbalance, with most collected data showing happy or neutral expressions and fewer instances of fear or disgust. This imbalance poses challenges to facial expression recognition (FER) models, hindering their ability to fully understand various human emotional states. Existing FER methods typically report overall accuracy on highly imbalanced… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS2023

  35. arXiv:2310.15429  [pdf, other

    cs.CL cs.CY

    Beyond Sentiment: Leveraging Topic Metrics for Political Stance Classification

    Authors: Weihong Qi

    Abstract: Sentiment analysis, widely critiqued for capturing merely the overall tone of a corpus, falls short in accurately reflecting the latent structures and political stances within texts. This study introduces topic metrics, dummy variables converted from extracted topics, as both an alternative and complement to sentiment metrics in stance classification. By employing three datasets identified by Best… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  36. arXiv:2310.15342  [pdf, other

    cs.LG cs.IR

    Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network

    Authors: Fuyuan Lyu, Xing Tang, Dugang Liu, Chen Ma, Weihong Luo, Liang Chen, Xiuqiang He, Xue Liu

    Abstract: Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-gra… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 poster

  37. arXiv:2310.07236  [pdf, other

    cs.CV cs.MM

    AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

    Authors: Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang, Helen Meng

    Abstract: Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to… ▽ More

    Submitted 19 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Project Page: https://adamesh.github.io

  38. arXiv:2309.15490  [pdf, other

    cs.CV

    Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

    Authors: Wenjie Li, Mei Wang, Kai Zhang, Juncheng Li, Xiaoming Li, Yuhang Zhang, Guangwei Gao, Weihong Deng, Chia-Wen Lin

    Abstract: Face restoration (FR) is a specialized field within image restoration that aims to recover low-quality (LQ) face images into high-quality (HQ) face images. Recent advances in deep learning technology have led to significant progress in FR methods. In this paper, we begin by examining the prevalent factors responsible for real-world LQ images and introduce degradation techniques used to synthesize… ▽ More

    Submitted 8 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Face restoration, Survey, Deep learning, Non-blind/Blind, Joint restoration tasks, Facial priors

  39. arXiv:2309.14962  [pdf, other

    cs.CV

    GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction

    Authors: Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, Jingdong Wang

    Abstract: All tables can be represented as grids. Based on this observation, we propose GridFormer, a novel approach for interpreting unconstrained table structures by predicting the vertex and edge of a grid. First, we propose a flexible table representation in the form of an MXN grid. In this representation, the vertexes and edges of the grid store the localization and adjacency information of the table.… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: ACMMM2023

  40. arXiv:2309.10935  [pdf, other

    cs.CV eess.IV

    A Geometric Flow Approach for Segmentation of Images with Inhomongeneous Intensity and Missing Boundaries

    Authors: Paramjyoti Mohapatra, Richard Lartey, Weihong Guo, Michael Judkovich, Xiaojuan Li

    Abstract: Image segmentation is a complex mathematical problem, especially for images that contain intensity inhomogeneity and tightly packed objects with missing boundaries in between. For instance, Magnetic Resonance (MR) muscle images often contain both of these issues, making muscle segmentation especially difficult. In this paper we propose a novel intensity correction and a semi-automatic active conto… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Presented at CVIT 2023 Conference. Accepted to Journal of Image and Graphics

  41. arXiv:2309.09508  [pdf, other

    cs.CL cs.SI

    Understanding Divergent Framing of the Supreme Court Controversies: Social Media vs. News Outlets

    Authors: Jinsheng Pan, Zichen Wang, Weihong Qi, Hanjia Lyu, Jiebo Luo

    Abstract: Understanding the framing of political issues is of paramount importance as it significantly shapes how individuals perceive, interpret, and engage with these matters. While prior research has independently explored framing within news media and by social media users, there remains a notable gap in our comprehension of the disparities in framing political issues between these two distinct groups.… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  42. SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation

    Authors: Lixiong Qin, Mei Wang, Chao Deng, Ke Wang, Xi Chen, Jiani Hu, Weihong Deng

    Abstract: In recent years, vision transformers have been introduced into face recognition and analysis and have achieved performance breakthroughs. However, most previous methods generally train a single model or an ensemble of models to perform the desired task, which ignores the synergy among different tasks and fails to achieve improved prediction accuracy, increased data efficiency, and reduced training… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  43. arXiv:2308.10613  [pdf, ps, other

    cs.CR cs.SE

    Static Application Security Testing of Consensus-Critical Code in the Cosmos Network

    Authors: Jasper Surmont, Weihong Wang, Tom Van Cutsem

    Abstract: Blockchains require deterministic execution in order to reach consensus. This is often guaranteed in languages designed to write smart contracts, such as Solidity. Application-specific blockchains or ``appchains'' allow the blockchain application logic to be written using general-purpose programming languages, giving developers more flexibility but also additional responsibilities. In particular,… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 5th Conference on Blockchain Research & Applications for Innovative Networks and Services (BRAINS'23)

  44. arXiv:2308.06015  [pdf, other

    cs.CV

    Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation

    Authors: Xuannan Liu, Yaoyao Zhong, Yuhang Zhang, Lixiong Qin, Weihong Deng

    Abstract: Deep neural networks are vulnerable to universal adversarial perturbation (UAP), an instance-agnostic perturbation capable of fooling the target model for most samples. Compared to instance-specific adversarial examples, UAP is more challenging as it needs to generalize across various samples and models. In this paper, we examine the serious dilemma of UAP generation methods from a generalization… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  45. arXiv:2308.04830  [pdf, other

    cs.CV

    VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

    Authors: Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao

    Abstract: Current talking face generation methods mainly focus on speech-lip synchronization. However, insufficient investigation on the facial talking style leads to a lifeless and monotonous avatar. Most previous works fail to imitate expressive styles from arbitrary video prompts and ensure the authenticity of the generated video. This paper proposes an unsupervised variational style transfer model (VAST… ▽ More

    Submitted 11 August, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023 Workshop

  46. arXiv:2307.05809  [pdf, other

    cs.SI

    Excitements and Concerns in the Post-ChatGPT Era: Deciphering Public Perception of AI through Social Media Analysis

    Authors: Weihong Qi, Jinsheng Pan, Hanjia Lyu, Jiebo Luo

    Abstract: As AI systems become increasingly prevalent in various aspects of daily life, gaining a comprehensive understanding of public perception towards these AI systems has become increasingly essential for several reasons such as ethical considerations, user experience, fear, disinformation, regulation, collaboration, and co-creation. In this study, we investigate how mass social media users perceive th… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  47. arXiv:2306.14097  [pdf, other

    eess.IV cs.CV math.NA

    Interpretable Small Training Set Image Segmentation Network Originated from Multi-Grid Variational Model

    Authors: Junying Meng, Weihong Guo, Jun Liu, Mingrui Yang

    Abstract: The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 25 pages, 9 figures, 6 tables

    MSC Class: 94A08; 68U10

  48. Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals

    Authors: Hongru Liang, Jia Liu, Weihong Du, Dingnan Jin, Wenqiang Lei, Zujie Wen, Jiancheng Lv

    Abstract: The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However, current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts… ▽ More

    Submitted 8 August, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023. (2023)

  49. arXiv:2305.13605  [pdf, other

    cs.CV

    Adaptive Face Recognition Using Adversarial Information Network

    Authors: Mei Wang, Weihong Deng

    Abstract: In many real-world applications, face recognition models often degenerate when training data (referred to as source domain) are different from testing data (referred to as target domain). To alleviate this mismatch caused by some factors like pose and skin tone, the utilization of pseudo-labels generated by clustering algorithms is an effective way in unsupervised domain adaptation. However, they… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by TIP

  50. arXiv:2305.11094  [pdf, other

    cs.HC cs.CV cs.MM cs.SD eess.AS

    QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

    Authors: Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang

    Abstract: Speech-driven gesture generation is highly challenging due to the random jitters of human motion. In addition, there is an inherent asynchronous relationship between human speech and gestures. To tackle these challenges, we introduce a novel quantization-based and phase-guided motion-matching framework. Specifically, we first present a gesture VQ-VAE module to learn a codebook to summarize meaning… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 15 pages, 12 figures, CVPR 2023 Highlight