Skip to main content

Showing 1–50 of 259 results for author: Meng, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12665  [pdf, other

    cs.CL cs.AI cs.LG

    Patch-Level Training for Large Language Models

    Authors: Chenze Shao, Fandong Meng, Jie Zhou

    Abstract: As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to process an extensive number of tokens. To mitigate this… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.02933  [pdf, other

    cs.RO

    Online Time-Informed Kinodynamic Motion Planning of Nonlinear Systems

    Authors: Fei Meng, Jianbang Liu, Haojie Shi, Han Ma, Hongliang Ren, Max Q. -H. Meng

    Abstract: Sampling-based kinodynamic motion planners (SKMPs) are powerful in finding collision-free trajectories for high-dimensional systems under differential constraints. Time-informed set (TIS) can provide the heuristic search domain to accelerate their convergence to the time-optimal solution. However, existing TIS approximation methods suffer from the curse of dimensionality, computational burden, and… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.02894  [pdf, other

    cs.CL cs.AI

    Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

    Authors: Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Min Zhang, Jinsong Su

    Abstract: In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this regard, conventional cascaded methods suffer from issues such as error propagation, massive parameters, and difficulties in deployment and retaining visual characteristics of the input image. Thus, constructing end-to-end models has be… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Findings

  4. arXiv:2407.00102  [pdf, other

    cs.LG cs.AI cs.CL

    Curriculum Learning with Quality-Driven Data Selection

    Authors: Biao Wu, Fang Meng, Ling Chen

    Abstract: The impressive multimodal capabilities demonstrated by OpenAI's GPT-4 have generated significant interest in the development of Multimodal Large Language Models (MLLMs). Visual instruction tuning of MLLMs with machine-generated instruction-following data has shown to enhance zero-shot capabilities across various tasks. However, there has been limited exploration into controlling the quality of the… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  5. arXiv:2406.16536  [pdf, other

    cs.CL

    C-LLM: Learn to Check Chinese Spelling Errors Character by Character

    Authors: Kunting Li, Yong Hu, Liang He, Fandong Meng, Jie Zhou

    Abstract: Chinese Spell Checking (CSC) aims to detect and correct spelling errors in sentences. Despite Large Language Models (LLMs) exhibit robust capabilities and are widely applied in various tasks, their performance on CSC is often unsatisfactory. We find that LLMs fail to meet the Chinese character-level constraints of the CSC task, namely equal length and phonetic similarity, leading to a performance… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16416  [pdf, other

    cs.CL

    Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

    Authors: Xue zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou

    Abstract: Multilingual knowledge editing (MKE) aims to simultaneously revise factual knowledge across multilingual languages within large language models (LLMs). However, most existing MKE methods just adapt existing monolingual editing methods to multilingual scenarios, overlooking the deep semantic connections of the same factual knowledge between different languages, thereby limiting edit performance. To… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures, 7 tables

  7. arXiv:2406.13979  [pdf, other

    eess.IV cs.CV cs.LG

    Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning

    Authors: Yupei Zhang, Xiaofei Wang, Fangliangzi Meng, Jin Tang, Chao Li

    Abstract: Multi-modal learning plays a crucial role in cancer diagnosis and prognosis. Current deep learning based multi-modal approaches are often limited by their abilities to model the complex correlations between genomics and histology data, addressing the intrinsic complexity of tumour ecosystem where both tumour and microenvironment contribute to malignancy. We propose a biologically interpretative an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.12324  [pdf, other

    cs.RO

    AutoDSL: Automated domain-specific language design for structural representation of procedures with constraints

    Authors: Yu-Zhe Shi, Haofei Hou, Zhangqian Bi, Fanxu Meng, Xiang Wei, Lecheng Ruan, Qining Wang

    Abstract: Accurate representation of procedures in restricted scenarios, such as non-standardized scientific experiments, requires precise depiction of constraints. Unfortunately, Domain-specific Language (DSL), as an effective tool to express constraints structurally, often requires case-by-case hand-crafting, necessitating customized, labor-intensive efforts. To overcome this challenge, we introduce the A… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL'24)

  9. arXiv:2406.11802  [pdf, other

    cs.CV

    PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

    Authors: Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal know… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.08451  [pdf, other

    cs.CV

    GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

    Authors: Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, entertainment, and productivity by streamlining workflows and reducing manual intervention. However, prior GUI agents often trained with datasets compri… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures, a cross-app GUI navigation dataset

  11. arXiv:2406.08434  [pdf, other

    cs.CL cs.AI

    TasTe: Teaching Large Language Models to Translate through Self-Reflection

    Authors: Yutong Wang, Jiali Zeng, Xuebo Liu, Fandong Meng, Jie Zhou, Min Zhang

    Abstract: Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. Techniques like instruction tuning have effectively enhanced the proficiency of LLMs in the downstream task of machine translation. However, the existing approaches fail to yield satisfactory translation outputs that match the quality of supervised neural machine translation (NMT) syste… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to the ACL 2024 main conference

  12. arXiv:2406.06517  [pdf, other

    cs.CV

    Genomics-guided Representation Learning for Pathologic Pan-cancer Tumor Microenvironment Subtype Prediction

    Authors: Fangliangzi Meng, Hongrun Zhang, Ruodan Yan, Guohui Chuai, Chao Li, Qi Liu

    Abstract: The characterization of Tumor MicroEnvironment (TME) is challenging due to its complexity and heterogeneity. Relatively consistent TME characteristics embedded within highly specific tissue features, render them difficult to predict. The capability to accurately classify TME subtypes is of critical significance for clinical tumor diagnosis and precision medicine. Based on the observation that tumo… ▽ More

    Submitted 8 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: MICCAI2024

  13. arXiv:2406.03813  [pdf, other

    cs.RO

    Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation

    Authors: Ning Cheng, Changhao Guan, Jing Gao, Weihao Wang, You Li, Fandong Meng, Jie Zhou, Bin Fang, Jinan Xu, Wenjuan Han

    Abstract: Touch holds a pivotal position in enhancing the perceptual and interactive capabilities of both humans and robots. Despite its significance, current tactile research mainly focuses on visual and tactile modalities, overlooking the language domain. Inspired by this, we construct Touch100k, a paired touch-language-vision dataset at the scale of 100k, featuring tactile sensation descriptions in multi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  14. arXiv:2406.02882  [pdf, other

    cs.CL cs.AI

    Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge

    Authors: Zengkui Sun, Yijin Liu, Jiaan Wang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou

    Abstract: Recently, Knowledge Editing has received increasing attention, since it could update the specific knowledge from outdated ones in pretrained models without re-training. However, as pointed out by recent studies, existing related methods tend to merely memorize the superficial word composition of the edited knowledge, rather than truly learning and absorbing it. Consequently, on the reasoning quest… ▽ More

    Submitted 16 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024 Findings, Codes are at https://github.com/Acerkoo/DISCO

  15. arXiv:2406.02876  [pdf, other

    cs.CL cs.AI

    LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation

    Authors: Zengkui Sun, Yijin Liu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou

    Abstract: Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences. However, current LT strategies cannot indicate the desired target language as expected on zero-shot translation, i.e., the off-target issue. Our analysis reveals that the indication of the target language is sensitive to the placement of t… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024 Findings, Codes are at https://github.com/Acerkoo/LCS

  16. arXiv:2406.01441  [pdf, other

    cs.CL

    LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation

    Authors: Yongjing Yin, Jiali Zeng, Yafu Li, Fandong Meng, Yue Zhang

    Abstract: The fine-tuning of open-source large language models (LLMs) for machine translation has recently received considerable attention, marking a shift towards data-centric research from traditional neural machine translation. However, the area of data collection for instruction fine-tuning in machine translation remains relatively underexplored. In this paper, we present LexMatcher, a simple yet effect… ▽ More

    Submitted 2 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  17. arXiv:2405.18922  [pdf, other

    cs.CL

    Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective

    Authors: Chenze Shao, Fandong Meng, Jiali Zeng, Jie Zhou

    Abstract: Neural Machine Translation (NMT) has made remarkable progress over the past years. However, under-translation and over-translation remain two challenging problems in state-of-the-art NMT systems. In this work, we conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective. To optimize the beam search objectiv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ACL 2024 main conference

  18. arXiv:2405.18906  [pdf, other

    cs.CL cs.LG

    Language Generation with Strictly Proper Scoring Rules

    Authors: Chenze Shao, Fandong Meng, Yijin Liu, Jie Zhou

    Abstract: Language generation based on maximum likelihood estimation (MLE) has become the fundamental approach for text generation. Maximum likelihood estimation is typically performed by minimizing the log-likelihood loss, also known as the logarithmic score in statistical decision theory. The logarithmic score is strictly proper in the sense that it encourages honest forecasts, where the expected score is… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  19. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  20. arXiv:2404.10458  [pdf, other

    cs.LG cs.AI

    Advancing Long-Term Multi-Energy Load Forecasting with Patchformer: A Patch and Transformer-Based Approach

    Authors: Qiuyi Hong, Fanlin Meng, Felipe Maldonado

    Abstract: In the context of increasing demands for long-term multi-energy load forecasting in real-world applications, this paper introduces Patchformer, a novel model that integrates patch embedding with encoder-decoder Transformer-based architectures. To address the limitation in existing Transformer-based models, which struggle with intricate temporal patterns in long-term forecasting, Patchformer employ… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  21. arXiv:2404.09686  [pdf, other

    cs.LG cs.DC

    AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster

    Authors: Siyuan Li, Youshao Xiao, Fanzhuang Meng, Lin Ju, Lei Liang, Lin Wang, Jun Zhou

    Abstract: Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these chall… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  22. arXiv:2404.09443  [pdf, other

    cs.LG cs.DC

    Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network

    Authors: Jaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng

    Abstract: Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. Most existing works have focused on horizontal or vertical data distributions, where each client possesses different samples with shared features, or each client fully shares only sample indices, respectively. However, the hybrid… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.07549  [pdf, other

    cs.CL

    Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective

    Authors: Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

    Abstract: Code generation aims to understand the problem description and generate corresponding code snippets, where existing works generally decompose such complex tasks into intermediate steps by prompting strategies, such as Chain-of-Thought and its variants. While these studies have achieved some success, their effectiveness is highly dependent on the capabilities of advanced Large Language Models (LLMs… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: The code is publicly available at https://github.com/pppa2019/Mango

  24. arXiv:2404.06954  [pdf, other

    cs.CL

    Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy

    Authors: Yijin Liu, Fandong Meng, Jie Zhou

    Abstract: Recently, dynamic computation methods have shown notable acceleration for Large Language Models (LLMs) by skipping several layers of computations through elaborate heuristics or additional predictors. However, in the decoding process of existing approaches, different samples are assigned different computational budgets, which cannot guarantee a stable and precise acceleration effect. Furthermore,… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, codes at https://github.com/Adaxry/Unified_Layer_Skipping

  25. arXiv:2404.02948  [pdf, other

    cs.LG cs.AI

    PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

    Authors: Fanxu Meng, Zhaohui Wang, Muhan Zhang

    Abstract: To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank adaptation (LoRA) method approximates the model changes $ΔW \in \mathbb{R}^{m \times n}$ through the product of two matrices $A \in \mathbb{R}^{m \times r}$ and $B \in \mathbb{R}^{r \times n}$, where $r \ll \min(m, n)$, $A$ is initialized with Gaussian noise, and $B$ with zeros. LoRA freezes the original model… ▽ More

    Submitted 28 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  26. arXiv:2403.20009  [pdf, other

    cs.CL cs.LG

    On Large Language Models' Hallucination with Regard to Known Facts

    Authors: Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong Meng, Mo Yu, Bowen Zhou, Jie Zhou

    Abstract: Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual question… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024 MainConference

  27. arXiv:2403.17012  [pdf

    cs.NE cs.AI

    Evolution and Efficiency in Neural Architecture Search: Bridging the Gap Between Expert Design and Automated Optimization

    Authors: Fanfei Meng, Chen-Ao Wang, Lele Zhang

    Abstract: The paper provides a comprehensive overview of Neural Architecture Search (NAS), emphasizing its evolution from manual design to automated, computationally-driven approaches. It covers the inception and growth of NAS, highlighting its application across various domains, including medical imaging and natural language processing. The document details the shift from expert-driven design to algorithm-… ▽ More

    Submitted 2 April, 2024; v1 submitted 11 February, 2024; originally announced March 2024.

    Comments: 7 Pages, Double Column

    Journal ref: Journal of Mathematical Techniques and Computational Mathematics, 2024, Volume 3, Issue 3

  28. arXiv:2403.12400  [pdf, other

    cs.LG cs.AI eess.SP

    Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless Sensing

    Authors: Zijian Zhao, Tingwei Chen, Fanyi Meng, Hang Li, Xiaoyang Li, Guangxu Zhu

    Abstract: Despite the development of various deep learning methods for Wi-Fi sensing, package loss often results in noncontinuous estimation of the Channel State Information (CSI), which negatively impacts the performance of the learning models. To overcome this challenge, we propose a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) for CSI recovery, named CSI-BER… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 6 pages, accepted by IEEE INFOCOM Deepwireless Workshop 2024

  29. arXiv:2402.18150  [pdf, other

    cs.CL cs.AI cs.IR

    Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation

    Authors: Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou

    Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating additional information from retrieval. However, studies have shown that LLMs still face challenges in effectively using the retrieved information, even ignoring it or being misled by it. The key reason is that the training of LLMs does not clearly make LLMs learn how to utilize input retrieved texts with va… ▽ More

    Submitted 11 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Main

  30. arXiv:2402.08210  [pdf, other

    quant-ph cs.CE cs.GT cs.LG

    Quantum Computing-Enhanced Algorithm Unveils Novel Inhibitors for KRAS

    Authors: Mohammad Ghazi Vakili, Christoph Gorgulla, AkshatKumar Nigam, Dmitry Bezrukov, Daniel Varoli, Alex Aliper, Daniil Polykovsky, Krishna M. Padmanabha Das, Jamie Snider, Anna Lyakisheva, Ardalan Hosseini Mansob, Zhong Yao, Lela Bitar, Eugene Radchenko, Xiao Ding, Jinxin Liu, Fanye Meng, Feng Ren, Yudong Cao, Igor Stagljar, Alán Aspuru-Guzik, Alex Zhavoronkov

    Abstract: The discovery of small molecules with therapeutic potential is a long-standing challenge in chemistry and biology. Researchers have increasingly leveraged novel computational techniques to streamline the drug development process to increase hit rates and reduce the costs associated with bringing a drug to market. To this end, we introduce a quantum-classical generative model that seamlessly integr… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  31. arXiv:2402.02431  [pdf, other

    cs.CV cs.LG

    Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition

    Authors: Mengyuan Liu, Chen Chen, Songtao Wu, Fanyang Meng, Hong Liu

    Abstract: Recognizing interactive actions, including hand-to-hand interaction and human-to-human interaction, has attracted increasing attention for various applications in the field of video analysis and human-robot interaction. Considering the success of graph convolution in modeling topology-aware features from skeleton data, recent methods commonly operate graph convolution on separate entities and use… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  32. arXiv:2401.18018  [pdf, other

    cs.LG cs.AI cs.CL

    On Prompt-Driven Safeguarding for Large Language Models

    Authors: Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

    Abstract: Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refu… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: ICML 2024

  33. Spintronic logic: from transducers to logic gates and circuits

    Authors: Christoph Adelmann, Florin Ciubotaru, Fanfan Meng, Sorin Cotofana, Sebastien Couet

    Abstract: While magnetic solid-state memory has found commercial applications to date, magnetic logic has rather remained on a conceptual level so far. Here, we discuss open challenges of different spintronic logic approaches, which use magnetic excitations for computation. While different logic gate designs have been proposed and proof of concept experiments have been reported, no nontrivial operational sp… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: This work has received funding from the European Union's Horizon 2020 research and innovation program within the project CHIRON (grant agreement no. 801055) as well as from the Horizon Europe research and innovation program within the project SPIDER (grant agreement no. 101070417)

    Journal ref: Intermag 2023 Short Papers

  34. arXiv:2401.08206  [pdf, other

    cs.IR cs.CL

    Generative Multi-Modal Knowledge Retrieval with Large Language Models

    Authors: Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, Bowen Zhou, Jie Zhou

    Abstract: Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications. However, existing methods face challenges in terms of their effectiveness and training efficiency, especially when it comes to training and integrating multiple retrievers to handle multi-modal queries. In this paper, we propose an innovative end-to-end generative framework… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  35. arXiv:2401.02384  [pdf, other

    cs.CV

    ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

    Authors: Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for general-purpose multimodal models. While vision-language models trained on chart data excel in comprehension, they struggle with generalization. To a… ▽ More

    Submitted 15 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Updated and corrected experimental results, removal of inappropriate experiments, and a more comprehensive experimental setup

  36. arXiv:2401.02138  [pdf, other

    cs.CV

    Explore Human Parsing Modality for Action Recognition

    Authors: Jinfu Liu, Runwei Ding, Yuhang Wen, Nan Dai, Fanyang Meng, Shen Zhao, Mengyuan Liu

    Abstract: Multimodal-based action recognition methods have achieved high success using pose and RGB modality. However, skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations. To address this, we introduce human parsing feature map as a novel modality, since it can selectively retain effective semantic features of the body parts, while filtering out m… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.07977

  37. arXiv:2401.01609  [pdf, other

    cs.IT eess.SP

    Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning

    Authors: Fan Meng, Cheng Zhang, Yongming Huang, Zhilei Zhang, Xiaoyu Bai, Zhaohua Lu

    Abstract: Hierarchical beam search in mmWave communications incurs substantial training overhead, necessitating deep learning-enabled beam predictions to effectively leverage channel priors and mitigate this overhead. In this study, we introduce a comprehensive probabilistic model of power distribution in beamspace, and formulate the joint optimization problem of probing beam selection and probabilistic bea… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  38. arXiv:2312.16571  [pdf, other

    cs.CV

    GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection

    Authors: Hefei Mei, Taijin Zhao, Shiyuan Tang, Heqian Qiu, Lanxiao Wang, Minjian Zhang, Fanman Meng, Hongliang Li

    Abstract: Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To ad… ▽ More

    Submitted 29 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

  39. arXiv:2312.15634   

    cs.HC

    Incorporating Feature Signal Transmission with Block-based Haptic Data Reduction for Time-delayed Teleoperation

    Authors: Hongjun Wu, Xiao Xu, Zhi Jin, Fanle Meng

    Abstract: This paper presents an innovative feature signal transmission approach incorpo-rating block-based haptic data reduction to address time-delayed teleoperation. Numerous data reduction techniques rely on perceptual deadband (DB). In the preceding block-based approaches, the whole block within the DB is discarded. However, disregarding all signals within the DB loses too much information and hinders… ▽ More

    Submitted 18 January, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: The paper contains some fundamental errors that need to be withdrawn

  40. arXiv:2312.14249  [pdf, other

    q-bio.GN cs.LG

    GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization

    Authors: Yingzhou Lu, Minjie Shen, Yue Zhao, Chenhao Li, Fan Meng, Xiao Wang, David Herrington, Yue Wang, Tim Fu, Capucine Van Rechem

    Abstract: The surge in high-throughput omics data has reshaped the landscape of biological research, underlining the need for powerful, user-friendly data analysis and interpretation tools. This paper presents GenoCraft, a web-based comprehensive software solution designed to handle the entire pipeline of omics data processing. GenoCraft offers a unified platform featuring advanced bioinformatics tools, cov… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  41. arXiv:2312.05259  [pdf

    cs.AI physics.soc-ph

    Optimizing the Passenger Flow for Airport Security Check

    Authors: Yuxin Wang, Fanfei Meng, Xiaotian Wang, Chaoyu Xie

    Abstract: Due to the necessary security for the airport and flight, passengers are required to have strict security check before getting aboard. However, there are frequent complaints of wasting huge amount of time while waiting for the security check. This paper presents a potential solution aimed at optimizing gate setup procedures specifically tailored for Chicago OHare International Airport. By referrin… ▽ More

    Submitted 13 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

  42. arXiv:2312.03038   

    cs.LG cs.AI cs.NE

    Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit

    Authors: Fanfei Meng, Lele Zhang, Yu Chen, Yuxin Wang

    Abstract: Transformer requires a fixed number of layers and heads which makes them inflexible to the complexity of individual samples and expensive in training and inference. To address this, we propose a sample-based Dynamic Hierarchical Transformer (DHT) model whose layers and heads can be dynamically configured with single data samples via solving contextual bandit problems. To determine the number of la… ▽ More

    Submitted 10 January, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: We miss some authorship information. And miss some important information in references

  43. arXiv:2312.00102   

    cs.LG cs.AI

    FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation

    Authors: Fanfei Meng, Lele Zhang, Yu Chen, Yuxin Wang

    Abstract: Federated learning (FL) is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. The learning scheme may be horizontal, vertical or hybrid (both vertical and horizontal). Most existing research work with deep neural network (DNN) modelling is focused on horizontal data distributions, while vertical and h… ▽ More

    Submitted 10 January, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Miss some important information and references. The publication hasn't been online in the journal

    Journal ref: Proceedings on Engineering Sciences, 2620-2832, 2023/10

  44. Joint Detection Algorithm for Multiple Cognitive Users in Spectrum Sensing

    Authors: Fanfei Meng, Yuxin Wang, Lele Zhang, Yingxin Zhao

    Abstract: Spectrum sensing technology is a crucial aspect of modern communication technology, serving as one of the essential techniques for efficiently utilizing scarce information resources in tight frequency bands. This paper first introduces three common logical circuit decision criteria in hard decisions and analyzes their decision rigor. Building upon hard decisions, the paper further introduces a met… ▽ More

    Submitted 1 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: https://aei.ewapublishing.org/article.html?pk=e24c40d220434209ae2fe2e984bcf2c2

    Journal ref: Advances in Engineering Innovation, Vol. 4, 16-25, Published 27 November 2023

  45. arXiv:2311.15846  [pdf, other

    cs.CV eess.IV

    Learning with Noisy Low-Cost MOS for Image Quality Assessment via Dual-Bias Calibration

    Authors: Lei Wang, Qingbo Wu, Desen Yuan, King Ngi Ngan, Hongliang Li, Fanman Meng, Linfeng Xu

    Abstract: Learning based image quality assessment (IQA) models have obtained impressive performance with the help of reliable subjective quality labels, where mean opinion score (MOS) is the most popular choice. However, in view of the subjective bias of individual annotators, the labor-abundant MOS (LA-MOS) typically requires a large collection of opinion scores from multiple annotators for each image, whi… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  46. Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction

    Authors: Wanying Zhang, Shen Zhao, Fanyang Meng, Songtao Wu, Mengyuan Liu

    Abstract: With potential applications in fields including intelligent surveillance and human-robot interaction, the human motion prediction task has become a hot research topic and also has achieved high success, especially using the recent Graph Convolutional Network (GCN). Current human motion prediction task usually focuses on predicting human motions for atomic actions. Observing that atomic actions can… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia, October 2023, Pages 2856-2864

  47. arXiv:2311.09241  [pdf, other

    cs.CV cs.AI cs.LG

    Chain of Images for Intuitively Reasoning

    Authors: Fanxu Meng, Haotong Yang, Yiding Wang, Muhan Zhang

    Abstract: The human brain is naturally equipped to comprehend and interpret visual information rapidly. When confronted with complex problems or concepts, we use flowcharts, sketches, and diagrams to aid our thought process. Leveraging this inherent ability can significantly enhance logical reasoning. However, current Large Language Models (LLMs) do not utilize such visual intuition to help their thinking.… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  48. arXiv:2311.08219  [pdf, other

    cs.CL cs.AI

    Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction

    Authors: Kunting Li, Yong Hu, Shaolei Wang, Hanhan Ma, Liang He, Fandong Meng, Jie Zhou

    Abstract: ChatGPT has demonstrated impressive performance in various downstream tasks. However, in the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics. We believe this inconsistency arises because the traditional metrics are not well-suited for evaluating generative models. Their overly st… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  49. arXiv:2311.08147  [pdf, other

    cs.CL cs.AI

    RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

    Authors: Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun

    Abstract: LLMs and AI chatbots have improved people's efficiency in various fields. However, the necessary knowledge for answering the question may be beyond the models' knowledge boundaries. To mitigate this issue, many researchers try to introduce external knowledge, such as knowledge graphs and Internet contents, into LLMs for up-to-date information. However, the external information from the Internet ma… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  50. arXiv:2311.04589  [pdf, other

    cs.CL cs.AI

    TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

    Authors: Zhen Yang, Yingxue Zhang, Fandong Meng, Jie Zhou

    Abstract: Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we propose TEAL (Tokenize and Embed ALl)}, an approach to treat the input from any modality as a token sequence and learn a joint embedding space for all modaliti… ▽ More

    Submitted 4 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Multi-modal, Large Language Models, Tokenizer, Understanding and Generation