Skip to main content

Showing 1–50 of 231 results for author: Cao, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09853  [pdf, other

    cs.CV

    Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

    Authors: Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrate… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024, project: https://github.com/qingshi9974/ECCV2024-AdpatICMH

  2. arXiv:2407.08340  [pdf, other

    cs.LG

    SLRL: Structured Latent Representation Learning for Multi-view Clustering

    Authors: Zhangci Xiong, Meng Cao

    Abstract: In recent years, Multi-View Clustering (MVC) has attracted increasing attention for its potential to reduce the annotation burden associated with large datasets. The aim of MVC is to exploit the inherent consistency and complementarity among different views, thereby integrating information from multiple perspectives to improve clustering outcomes. Despite extensive research in MVC, most existing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2406.16935  [pdf, other

    eess.SP cs.AI

    Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex

    Authors: Spandan Madan, Will Xiao, Mingran Cao, Hanspeter Pfister, Margaret Livingstone, Gabriel Kreiman

    Abstract: We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \tex… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2406.14887  [pdf, other

    cs.CL

    InternLM-Law: An Open Source Chinese Legal Large Language Model

    Authors: Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge

    Abstract: While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., l… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Our dataset, code and models will be released at https://github.com/InternLM/InternLM-Law

  5. arXiv:2406.12703  [pdf, other

    eess.IV cs.CV

    Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

    Authors: Jincheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yinping Zhao, Xin Yuan

    Abstract: We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-b… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, Accepted by ICIP2024

  6. arXiv:2406.10318  [pdf, other

    cs.CV cs.AI

    Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding

    Authors: Tuo Zhang, Tiantian Feng, Yibin Ni, Mengqin Cao, Ruying Liu, Katharine Butler, Yanjun Weng, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

    Abstract: Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  7. arXiv:2406.09838  [pdf, other

    cs.CV cs.AI

    Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang

    Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.06329  [pdf, other

    cs.CL eess.AS

    A Parameter-efficient Language Extension Framework for Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2405.20607  [pdf, other

    cs.CV

    Textual Inversion and Self-supervised Refinement for Radiology Report Generation

    Authors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang

    Abstract: Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specific… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: This paper has been early accepted by MICCAI 2024!

  10. arXiv:2405.19689  [pdf, other

    cs.CV cs.IR

    Uncertainty-aware sign language video retrieval with probability distribution modeling

    Authors: Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu

    Abstract: Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the mapping between sign language video and text through fine-grained modal alignment. However, due to th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.19465  [pdf, other

    cs.CV

    RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

    Authors: Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li

    Abstract: Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 Findings

  12. arXiv:2405.13865  [pdf, other

    cs.CV

    ReVideo: Remake a Video with Motion and Content Control

    Authors: Chong Mou, Mingdeng Cao, Xintao Wang, Zhaoyang Zhang, Ying Shan, Jian Zhang

    Abstract: Despite significant advancements in video generation and editing using diffusion models, achieving accurate and localized video editing remains a substantial challenge. Additionally, most existing video editing methods primarily focus on altering visual content, with limited research dedicated to motion editing. In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  13. arXiv:2405.07740  [pdf, ps, other

    cs.IT

    The $σ$ hulls of matrix-product codes and related entanglement-assisted quantum error-correcting codes

    Authors: Meng Cao

    Abstract: Let $\mathrm{SLAut}(\mathbb{F}_{q}^{n})$ denote the group of all semilinear isometries on $\mathbb{F}_{q}^{n}$, where $q=p^{e}$ is a prime power. Matrix-product (MP) codes are a class of long classical codes generated by combining several commensurate classical codes with a defining matrix. We give an explicit formula for calculating the dimension of the $σ$ hull of a MP code. As a result, we give… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  14. arXiv:2405.02538  [pdf, other

    cs.CV

    AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

    Authors: Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

    Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  15. arXiv:2405.02285  [pdf, ps, other

    cs.IT

    Special matrices over finite fields and their applications to quantum error-correcting codes

    Authors: Meng Cao

    Abstract: The matrix-product (MP) code $\mathcal{C}_{A,k}:=[\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{k}]\cdot A$ with a non-singular by column (NSC) matrix $A$ plays an important role in constructing good quantum error-correcting codes. In this paper, we study the MP code when the defining matrix $A$ satisfies the condition that $AA^†$ is $(D,τ)$-monomial. We give an explicit formula for calculat… ▽ More

    Submitted 11 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  16. arXiv:2404.18106  [pdf, other

    cs.CV

    Semi-supervised Text-based Person Search

    Authors: Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang

    Abstract: Text-based person search (TBPS) aims to retrieve images of a specific person from a large image gallery based on a natural language description. Existing methods rely on massive annotated image-text data to achieve satisfactory performance in fully-supervised learning. It poses a significant challenge in practice, as acquiring person images from surveillance videos is relatively easy, while obtain… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 13 pages

  17. arXiv:2404.09842  [pdf, other

    cs.CV

    STMixer: A One-Stage Sparse Action Detector

    Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

    Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context inf… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Extended version of the paper arXiv:2303.15879 presented at CVPR 2023. Accepted by TPAMI 2024

  18. arXiv:2404.06784  [pdf

    quant-ph cond-mat.mes-hall cs.AR eess.SY

    Statistical evaluation of 571 GaAs quantum point contact transistors showing the 0.7 anomaly in quantized conductance using millikelvin cryogenic on-chip multiplexing

    Authors: Pengcheng Ma, Kaveh Delfanazari, Reuben K. Puddy, Jiahui Li, Moda Cao, Teng Yi, Jonathan P. Griffiths, Harvey E. Beere, David A. Ritchie, Michael J. Kelly, Charles G. Smith

    Abstract: The mass production and the practical number of cryogenic quantum devices producible in a single chip are limited to the number of electrical contact pads and wiring of the cryostat or dilution refrigerator. It is, therefore, beneficial to contrast the measurements of hundreds of devices fabricated in a single chip in one cooldown process to promote the scalability, integrability, reliability, and… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  19. arXiv:2404.06350  [pdf, other

    cs.CV

    Rolling Shutter Correction with Intermediate Distortion Flow Estimation

    Authors: Mingdeng Cao, Sidi Yang, Yujiu Yang, Yinqiang Zheng

    Abstract: This paper proposes to correct the rolling shutter (RS) distorted images by estimating the distortion flow from the global shutter (GS) to RS directly. Existing methods usually perform correction using the undistortion flow from the RS to GS. They initially predict the flow from consecutive RS frames, subsequently rescaling it as the displacement fields from the RS frame to the underlying GS image… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  20. arXiv:2404.02845  [pdf, other

    cs.CV

    Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

    Authors: Xiaoshuang Huang, Hongxiang Li, Meng Cao, Long Chen, Chenyu You, Dong An

    Abstract: Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit and ambiguous architectures to embed textual information. This leads to segmentation results that are inconsistent with the semanti… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  21. arXiv:2403.19238  [pdf, other

    cs.CV cs.AI eess.IV

    Taming Lookup Tables for Efficient Image Retouching

    Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

    Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More

    Submitted 13 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV2024

  22. arXiv:2403.18167  [pdf, other

    cs.CL cs.AI

    Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations

    Authors: Lei Yu, Meng Cao, Jackie Chi Kit Cheung, Yue Dong

    Abstract: State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of ha… ▽ More

    Submitted 17 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  23. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  24. arXiv:2403.15805  [pdf, other

    cs.RO

    AirCrab: A Hybrid Aerial-Ground Manipulator with An Active Wheel

    Authors: Muqing Cao, Jiayan Zhao, Xinhang Xu, Lihua Xie

    Abstract: Inspired by the behavior of birds, we present AirCrab, a hybrid aerial ground manipulator (HAGM) with a single active wheel and a 3-degree of freedom (3-DoF) manipulator. AirCrab leverages a single point of contact with the ground to reduce position drift and improve manipulation accuracy. The single active wheel enables locomotion on narrow surfaces without adding significant weight to the robot.… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  25. arXiv:2403.14668  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Predicting Learning Performance with Large Language Models: A Study in Adult Literacy

    Authors: Liang Zhang, Jionghao Lin, Conrad Borchers, John Sabatini, John Hollander, Meng Cao, Xiangen Hu

    Abstract: Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the po… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 26TH International Conference on Human-Computer Interaction

  26. arXiv:2403.14416  [pdf, other

    quant-ph cs.IT

    Quantum Channel Simulation in Fidelity is no more difficult than State Splitting

    Authors: Michael X. Cao, Rahul Jain, Marco Tomamichel

    Abstract: Characterizing the minimal communication needed for the quantum channel simulation is a fundamental task in the quantum information theory. In this paper, we show that, in fidelity, the quantum channel simulation can be directly achieved via quantum state splitting without using a technique known as the de~Finetti reduction, and thus provide a pair of tighter one-shot bounds. Using the bounds, we… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  27. arXiv:2403.14173  [pdf, other

    cs.RO

    HCTO: Optimality-Aware LiDAR Inertial Odometry with Hybrid Continuous Time Optimization for Compact Wearable Mapping System

    Authors: Jianping Li, Shenghai Yuan, Muqing Cao, Thien-Minh Nguyen, Kun Cao, Lihua Xie

    Abstract: Compact wearable mapping system (WMS) has gained significant attention due to their convenience in various applications. Specifically, it provides an efficient way to collect prior maps for 3D structure inspection and robot-based "last-mile delivery" in complex environments. However, vibrations in human motion and the uneven distribution of point cloud features in complex environments often lead t… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  28. arXiv:2403.13839  [pdf, other

    cs.LG cs.AI cs.PL

    depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

    Authors: Kaichao You, Runsheng Bai, Meng Cao, Jianmin Wang, Ion Stoica, Mingsheng Long

    Abstract: PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 16 pages, 2 figures

  29. arXiv:2403.11183  [pdf, other

    cs.CL

    Decoding Continuous Character-based Language from Non-invasive Brain Recordings

    Authors: Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuanjing Huang, Miao Cao, Jianfeng Feng

    Abstract: Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  30. arXiv:2403.09323  [pdf, other

    cs.CV

    E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

    Authors: Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Wenbo Huang, Yunsong Li

    Abstract: Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high perfor… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  31. arXiv:2403.03048  [pdf, other

    eess.SY cs.CR

    Design of Stochastic Quantizers for Privacy Preservation

    Authors: Le Liu, Yu Kawano, Ming Cao

    Abstract: In this paper, we examine the role of stochastic quantizers for privacy preservation. We first employ a static stochastic quantizer and investigate its corresponding privacy-preserving properties. Specifically, we demonstrate that a sufficiently large quantization step guarantees $(0, δ)$ differential privacy. Additionally, the degradation of control performance caused by quantization is evaluated… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures

  32. arXiv:2403.01225  [pdf, other

    cs.RO

    A Cost-Effective Cooperative Exploration and Inspection Strategy for Heterogeneous Aerial System

    Authors: Xinhang Xu, Muqing Cao, Shenghai Yuan, Thien Hoang Nguyen, Thien-Minh Nguyen, Lihua Xie

    Abstract: In this paper, we propose a cost-effective strategy for heterogeneous UAV swarm systems for cooperative aerial inspection. Unlike previous swarm inspection works, the proposed method does not rely on precise prior knowledge of the environment and can complete full 3D surface coverage of objects in any shape. In this work, agents are partitioned into teams, with each drone assign a different task,… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: Baseline method of CARIC at CDC 2023, Singapore

  33. arXiv:2402.13822  [pdf, other

    cs.CV

    MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification

    Authors: Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyen

    Abstract: Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design an… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  34. arXiv:2402.11907  [pdf, other

    cs.CL

    Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

    Authors: Aiwei Liu, Haoping Bai, Zhiyun Lu, Xiang Kong, Simon Wang, Jiulong Shan, Meng Cao, Lijie Wen

    Abstract: Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an aut… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 24 pages, 5 pages

    MSC Class: 68T50 ACM Class: I.2.7

  35. arXiv:2402.03450  [pdf, other

    cs.SI cs.CY cs.IR

    Recommendation Fairness in Social Networks Over Time

    Authors: Meng Cao, Hussain Hussain, Sandipan Sikdar, Denis Helic, Markus Strohmaier, Roman Kern

    Abstract: In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  36. arXiv:2402.01746  [pdf, other

    cs.CY cs.AI cs.LG

    3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems

    Authors: Liang Zhang, Jionghao Lin, Conrad Borchers, Meng Cao, Xiangen Hu

    Abstract: Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor f… ▽ More

    Submitted 29 January, 2024; originally announced February 2024.

    Journal ref: LAK 2024: International Workshop on Generative AI for Learning Analytics (GenAI-LA)

  37. arXiv:2401.16420  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension. This model goes beyond conventional vision-language understanding, adeptly crafting interleaved text-image content from diverse inputs like outlines, detailed textual specifications, and reference images, enabling highly customizable content creation. InternLM-XCo… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Code and models are available at https://github.com/InternLM/InternLM-XComposer

  38. Coordinated Guiding Vector Field Design for Ordering-Flexible Multi-Robot Surface Navigation

    Authors: Bin-Bin Hu, Hai-Tao Zhang, Weijia Yao, Zhiyong Sun, Ming Cao

    Abstract: We design a distributed coordinated guiding vector field (CGVF) for a group of robots to achieve ordering-flexible motion coordination while maneuvering on a desired two-dimensional (2D) surface. The CGVF is characterized by three terms, i.e., a convergence term to drive the robots to converge to the desired surface, a propagation term to provide a traversing direction for maneuvering on the desir… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Published on IEEE Transactions on Automatic Control, 2024

  39. arXiv:2401.11225  [pdf, ps, other

    cs.CR

    Protecting Personalized Trajectory with Differential Privacy under Temporal Correlations

    Authors: Mingge Cao, Haopeng Zhu, Minghui Min, Yulu Li, Shiyin Li, Hongliang Zhang, Zhu Han

    Abstract: Location-based services (LBSs) in vehicular ad hoc networks (VANETs) offer users numerous conveniences. However, the extensive use of LBSs raises concerns about the privacy of users' trajectories, as adversaries can exploit temporal correlations between different locations to extract personal information. Additionally, users have varying privacy requirements depending on the time and location. To… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  40. arXiv:2401.07382  [pdf, other

    cs.CL cs.AI

    Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation

    Authors: Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng

    Abstract: Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework… ▽ More

    Submitted 19 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  41. arXiv:2401.03689  [pdf, other

    eess.AS cs.SD

    LUPET: Incorporating Hierarchical Information Path into Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Toward high-performance multilingual automatic speech recognition (ASR), various types of linguistic information and model design have demonstrated their effectiveness independently. They include language identity (LID), phoneme information, language-specific processing modules, and cross-lingual self-supervised speech representation. It is expected that leveraging their benefits synergistically i… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech 2024

  42. arXiv:2401.03182  [pdf, other

    cs.CV

    Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li

    Abstract: Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the nov… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  43. arXiv:2312.16943  [pdf, other

    cs.CV

    Multi-scale direction-aware SAR object detection network via global information fusion

    Authors: Mingxiang Cao, Weiying Xie, Jie Lei, Jiaqing Zhang, Daixun Li, Yunsong Li

    Abstract: Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR o… ▽ More

    Submitted 22 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  44. arXiv:2312.09445  [pdf, other

    eess.SP cs.CV cs.LG

    IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis

    Authors: Tue Minh Cao, Nhat Hong Tran, Le Phi Nguyen, Hieu Huy Pham, Hung Thanh Nguyen

    Abstract: Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha… ▽ More

    Submitted 16 November, 2023; originally announced December 2023.

  45. arXiv:2312.04461  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

    Authors: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan

    Abstract: Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized t… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Tech report; Project page: https://photo-maker.github.io/

  46. arXiv:2311.11103  [pdf, other

    cs.CL

    Responsible AI Considerations in Text Summarization Research: A Review of Current Practices

    Authors: Yu Lu Liu, Meng Cao, Su Lin Blodgett, Jackie Chi Kit Cheung, Alexandra Olteanu, Adam Trischler

    Abstract: AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task lar… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  47. arXiv:2311.06278  [pdf

    q-fin.ST cs.AI cs.LG

    Boosting Stock Price Prediction with Anticipated Macro Policy Changes

    Authors: Md Sabbirul Haque, Md Shahedul Amin, Jonayet Miah, Duc Minh Cao, Ashiqul Haque Ahmed

    Abstract: Prediction of stock prices plays a significant role in aiding the decision-making of investors. Considering its importance, a growing literature has emerged trying to forecast stock prices with improved accuracy. In this study, we introduce an innovative approach for forecasting stock prices with greater accuracy. We incorporate external economic environment-related information along with stock pr… ▽ More

    Submitted 27 October, 2023; originally announced November 2023.

    Journal ref: Journal of Mathematics and Statistics Studies, 4(3), 29-34 (2023)

  48. arXiv:2311.04921  [pdf, other

    cs.CL cs.AI

    Successor Features for Efficient Multisubject Controlled Text Generation

    Authors: Meng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung, Samira Shabanian

    Abstract: While large language models (LLMs) have achieved impressive performance in generating fluent and realistic text, controlling the generated text so that it exhibits properties such as safety, factuality, and non-toxicity remains challenging. % such as DExperts, GeDi, and rectification Existing decoding-based methods are static in terms of the dimension of control; if the target subject is changed,… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  49. arXiv:2311.04199  [pdf, other

    cs.IR cs.CL

    Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case Study

    Authors: Peilin Zhou, Meng Cao, You-Liang Huang, Qichen Ye, Peiyan Zhang, Junling Liu, Yueqi Xie, Yining Hua, Jaeboum Kim

    Abstract: Large Multimodal Models (LMMs) have demonstrated impressive performance across various vision and language tasks, yet their potential applications in recommendation tasks with visual assistance remain unexplored. To bridge this gap, we present a preliminary case study investigating the recommendation capabilities of GPT-4V(ison), a recently released LMM by OpenAI. We construct a series of qualitat… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: In Progress

  50. arXiv:2311.01957  [pdf, ps, other

    math.OC cs.MA

    Distributed online constrained convex optimization with event-triggered communication

    Authors: Kunpeng Zhang, Xinlei Yi, Yuzhe Li, Ming Cao, Tianyou Chai, Tao Yang

    Abstract: This paper focuses on the distributed online convex optimization problem with time-varying inequality constraints over a network of agents, where each agent collaborates with its neighboring agents to minimize the cumulative network-wide loss over time. To reduce communication overhead between the agents, we propose a distributed event-triggered online primal-dual algorithm over a time-varying dir… ▽ More

    Submitted 2 May, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 12 pages, 3 figures