Skip to main content

Showing 1–50 of 1,052 results for author: Lu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12886  [pdf, other

    cs.CL cs.AI cs.LG

    Whitening Not Recommended for Classification Tasks in LLMs

    Authors: Ali Forooghi, Shaghayegh Sadeghi, Jianguo Lu

    Abstract: Sentence embedding is a cornerstone in NLP. Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs). However, we find that the efficacy of whitening is model-dependent and task-dependent. In particular, whitening degenerates embeddings for classification tasks. The conclusion is supported by extensive experiments. We also expl… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11489  [pdf, other

    cs.LG cs.AI

    A Meta-Learning Approach for Multi-Objective Reinforcement Learning in Sustainable Home Environments

    Authors: Junlin Lu, Patrick Mannion, Karl Mason

    Abstract: Effective residential appliance scheduling is crucial for sustainable living. While multi-objective reinforcement learning (MORL) has proven effective in balancing user preferences in appliance scheduling, traditional MORL struggles with limited data in non-stationary residential settings characterized by renewable generation variations. Significant context shifts that can invalidate previously le… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.10794  [pdf, other

    cs.CL cs.AI

    Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

    Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

    Abstract: Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local pe… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 11 figures, 13 tables. arXiv admin note: substantial text overlap with arXiv:2402.14293

  4. arXiv:2407.10080  [pdf, other

    cs.IT eess.SY

    Design and Optimization on Successive RIS-assisted Multi-hop Wireless Communications

    Authors: Rujing Xiong, Jialong Lu, Jianan Zhang, Minggang Liu, Xuehui Dong, Tiebin Mi, Robert Caiming Qiu

    Abstract: As an emerging wireless communication technology, reconfigurable intelligent surface (RIS) has become a basic choice for providing signal coverage services in scenarios with dense obstacles or long tunnels through multi-hop configurations. Conventional works of literature mainly focus on alternating optimization or single-beam calculation in RIS phase configuration, which is limited in considering… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  5. arXiv:2407.08514  [pdf, other

    cs.CV

    Rethinking the Threat and Accessibility of Adversarial Attacks against Face Recognition Systems

    Authors: Yuxin Cao, Yumeng Zhu, Derui Wang, Sheng Wen, Minhui Xue, Jin Lu, Hao Ge

    Abstract: Face recognition pipelines have been widely deployed in various mission-critical systems in trust, equitable and responsible AI applications. However, the emergence of adversarial attacks has threatened the security of the entire recognition pipeline. Despite the sheer number of attack methods proposed for crafting adversarial examples in both digital and physical forms, it is never an easy task t… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 12 figures

  6. arXiv:2407.08196  [pdf, other

    cs.AI

    SoupLM: Model Integration in Large Language and Multi-Modal Models

    Authors: Yue Bai, Zichen Zhang, Jiasen Lu, Yun Fu

    Abstract: Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks. For instance, LLaMA, Vicuna, and LLaVA are three LLM variants trained with LLaMA base models using very different training recipes, tasks, and data modalities. The traini… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.07410  [pdf

    cs.CV cs.GR cs.LG

    Mutual Information calculation on different appearances

    Authors: Jiecheng Liao, Junhao Lu, Jeff Ji, Jiacheng He

    Abstract: Mutual information has many applications in image alignment and matching, mainly due to its ability to measure the statistical dependence between two images, even if the two images are from different modalities (e.g., CT and MRI). It considers not only the pixel intensities of the images but also the spatial relationships between the pixels. In this project, we apply the mutual information formula… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: demo for the work: elucidator.cn/demo-mi/

  8. arXiv:2407.06089  [pdf, other

    cs.CL

    Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models

    Authors: Jinliang Lu, Ziliang Pang, Min Xiao, Yaochen Zhu, Rui Xia, Jiajun Zhang

    Abstract: The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies f… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.05418  [pdf, other

    cs.CV cs.AI

    EMBANet: A Flexible Efffcient Multi-branch Attention Network

    Authors: Keke Zu, Hu Zhang, Jian Lu, Lei Zhang, Chen Xu

    Abstract: This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map. The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted. Two important transformation operators, multiplex and split, are… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  10. arXiv:2407.04255  [pdf, other

    cs.CV

    Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge

    Authors: Xiangyu Wu, Zhouyang Chi, Yang Yang, Jianfeng Lu

    Abstract: In this paper, we present our solution for the WSDM2023 Toloka Visual Question Answering Challenge. Inspired by the application of multimodal pre-trained models to various downstream tasks(e.g., visual question answering, visual grounding, and cross-modal retrieval), we approached this competition as a visual grounding task, where the input is an image and a question, guiding the model to answer t… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Second Place of WSDM2023 Toloka Visual Question Answering Challenge

  11. arXiv:2407.04237  [pdf, other

    cs.CV cs.GR

    GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

    Authors: Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng

    Abstract: We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  12. arXiv:2407.03605  [pdf, other

    math.OC cs.CV

    Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping

    Authors: Xiaoxia Liu, Shijie Yu, Jian Lu, Xiaojun Chen

    Abstract: Hyperspectral images (HSIs) are often contaminated by a mixture of noises such as Gaussian noise, dead lines, stripes, and so on. In this paper, we propose a novel approach for HSI denoising and destriping, called NLTL2p, which consists of an orthogonal constrained minimization model and an iterative algorithm with convergence guarantees. The model of the proposed NLTL2p approach is built based on… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68U10; 90C26; 15A18; 65F22

  13. arXiv:2407.02318  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

    Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

    Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  14. arXiv:2407.01976  [pdf, other

    cs.CL cs.AI cs.MM

    A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

    Authors: Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang

    Abstract: Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In th… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  15. arXiv:2407.01271  [pdf, other

    cs.CL

    First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

    Authors: Xiangyu Wu, Hailiang Zhang, Yang Yang, Jianfeng Lu

    Abstract: In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: First Place of 2023 Global Artificial Intelligence Technology Innovation Competition

  16. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo… ▽ More

    Submitted 9 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.17442  [pdf, other

    cs.CV

    Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

    Authors: Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

    Abstract: Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, w… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  18. arXiv:2406.16317  [pdf

    cs.SD eess.AS

    SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

    Authors: Zhongshu Hou, Qinwen Hu, Zhanzhong Cao, Ming Tang, Jing Lu

    Abstract: Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  19. arXiv:2406.15333  [pdf, other

    cs.CV

    GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

    Authors: Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang

    Abstract: In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/alibaba-yuanjing-aigclab/GeoLRM

  20. arXiv:2406.15269  [pdf, other

    cs.CV

    You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation

    Authors: Hongyu Chen, Weiming Zeng, Luhui Cai, Yueyang Li, Lei Wang, Jia Lu, Hongjie Yan, Wai Ting Siok, Nizhuan Wang

    Abstract: High-precision acquisition of dense-channel electroencephalogram (EEG) signals is often impeded by the costliness and lack of portability of equipment. In contrast, generating dense-channel EEG signals effectively from sparse channels shows promise and economic viability. However, sparse-channel EEG poses challenges such as reduced spatial resolution, information loss, signal mixing, and heightene… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  21. arXiv:2406.14862  [pdf, other

    cs.LG cs.CL cs.CV

    LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

    Authors: Mengdan Zhu, Raasikh Kanjiani, Jiahui Lu, Andrew Choi, Qirui Ye, Liang Zhao

    Abstract: Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framewo… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.14408  [pdf, other

    cs.AI cs.CL cs.LG

    FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

    Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

    Abstract: Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as anoth… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  24. arXiv:2406.13514  [pdf, other

    cs.CV

    Locally orderless networks

    Authors: Jon Sporring, Peidi Xu, Jiahao Lu, François Lauze, Sune Darkner

    Abstract: We present Locally Orderless Networks (LON) and its theoretic foundation which links it to Convolutional Neural Networks (CNN), to Scale-space histograms, and measurement theory. The key elements are a regular sampling of the bias and the derivative of the activation function. We compare LON, CNN, and Scale-space histograms on prototypical single-layer networks. We show how LON and CNN can emulate… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  25. arXiv:2406.13457  [pdf, other

    cs.CV cs.AI

    EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

    Authors: Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun

    Abstract: Event-based vision has drawn increasing attention due to its unique characteristics, such as high temporal resolution and high dynamic range. It has been used in video super-resolution (VSR) recently to enhance the flow estimation and temporal alignment. Rather than for motion learning, we propose in this paper the first VSR method that utilizes event signals for texture enhancement. Our method, c… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2024. Project page: https://dachunkai.github.io/evtexture.github.io/

  26. arXiv:2406.13372  [pdf, other

    cs.AI

    Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

    Authors: Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-conne… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  27. arXiv:2406.12241  [pdf, other

    cs.LG cs.AI

    More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

    Authors: Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu

    Abstract: Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal r… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Accepted to the Reinforcement Learning Conference (RLC) 2024

  28. arXiv:2406.12178  [pdf, other

    cs.CV

    FCA-RAC: First Cycle Annotated Repetitive Action Counting

    Authors: Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao

    Abstract: Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  29. arXiv:2406.11818  [pdf, other

    cs.RO cs.AI

    Embodied Instruction Following in Unknown Environments

    Authors: Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan

    Abstract: Enabling embodied agents to complete complex human instructions from natural language is crucial to autonomous systems in household services. Conventional methods can only accomplish human instructions in the known environment where all interactive objects are provided to the embodied agent, and directly deploying the existing approaches for the unknown environment usually generates infeasible pla… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Project Page: https://gary3410.github.io/eif_unknown/

  30. arXiv:2406.10957  [pdf, other

    cs.CL

    Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

    Authors: Junru Lu, Jiazheng Li, Siyu An, Meng Zhao, Yulan He, Di Yin, Xing Sun

    Abstract: Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observ… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  31. arXiv:2406.09816  [pdf, other

    math.OC cs.MA

    A Zeroth-Order Proximal Algorithm for Consensus Optimization

    Authors: Chengan Wang, Zichong Ou, Jie Lu

    Abstract: This paper considers a consensus optimization problem, where all the nodes in a network, with access to the zeroth-order information of its local objective function only, attempt to cooperatively achieve a common minimizer of the sum of their local objectives. To address this problem, we develop ZoPro, a zeroth-order proximal algorithm, which incorporates a zeroth-order oracle for approximating He… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  32. arXiv:2406.08953  [pdf, other

    cs.CV cs.LG

    Preserving Identity with Variational Score for General-purpose 3D Editing

    Authors: Duong H. Le, Tuan Pham, Aniruddha Kembhavi, Stephan Mandt, Wei-Chiu Ma, Jiasen Lu

    Abstract: We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 22 pages, 14 figures

  33. arXiv:2406.07877  [pdf, other

    cs.RO cs.AI cs.LG

    Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty

    Authors: Qizhen Wu, Kexin Liu, Lei Chen, Jinhu Lü

    Abstract: In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies and dynamic obstacles complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal wit… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  34. arXiv:2406.06379  [pdf, other

    cs.CE

    FinVerse: An Autonomous Agent System for Versatile Financial Analysis

    Authors: Siyu An, Qin Li, Junru Lu, Di Yin, Xing Sun

    Abstract: With the significant advancements in cognitive intelligence driven by LLMs, autonomous agent systems have attracted extensive attention. Despite this growing interest, the development of stable and efficient agent systems poses substantial practical challenges. In this paper, we introduce FinVerse, a meticulously crafted agent system designed for a broad range of financial topics. FinVerse integra… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  35. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  36. arXiv:2406.04941  [pdf, ps, other

    cs.CL

    TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

    Authors: Ping Yu, Kaitao Song, Fengchen He, Ming Chen, Jianfeng Lu

    Abstract: The recently unprecedented advancements in Large Language Models (LLMs) have propelled the medical community by establishing advanced medical-domain models. However, due to the limited collection of medical datasets, there are only a few comprehensive benchmarks available to gauge progress in this area. In this paper, we introduce a new medical question-answering (QA) dataset that contains massive… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  37. arXiv:2406.03866  [pdf, other

    cs.CV

    LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

    Authors: Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng

    Abstract: Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  38. arXiv:2406.03637  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Style Mixture of Experts for Expressive Text-To-Speech Synthesis

    Authors: Ahad Jawaid, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman

    Abstract: Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The propose… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  39. arXiv:2406.02862  [pdf, other

    cs.CV

    Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

    Authors: Yulong Zhang, Yuan Yao, Shuhao Chen, Pengrong Jin, Yu Zhang, Jian Jin, Jiangang Lu

    Abstract: Empirical Risk Minimization (ERM) is fragile in scenarios with insufficient labeled samples. A vanilla extension of ERM to unlabeled samples is Entropy Minimization (EntMin), which employs the soft-labels of unlabeled samples to guide their learning. However, EntMin emphasizes prediction discriminability while neglecting prediction diversity. To alleviate this issue, in this paper, we rethink the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  40. arXiv:2406.02120  [pdf, other

    cs.CL

    Diver: Large Language Model Decoding with Span-Level Mutual Information Verification

    Authors: Jinliang Lu, Chen Wang, Jiajun Zhang

    Abstract: Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often struggle with deviations from the inputs. Intuitively, compliant LLM outputs should reflect the information present in the input, which can be measured by point-wise mutual information (PMI) scores. Theref… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  41. arXiv:2406.02050  [pdf, other

    cs.CL

    Analyzing Social Biases in Japanese Large Language Models

    Authors: Hitomi Yanaka, Namgi Han, Ryoma Kumon, Jie Lu, Masashi Takeshita, Ryo Sekizawa, Taisei Kato, Hiromi Arai

    Abstract: With the development of Large Language Models (LLMs), social biases in the LLMs have become a crucial issue. While various benchmarks for social biases have been provided across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated. In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias be… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  42. arXiv:2406.01940  [pdf, other

    cs.CL cs.LG cs.LO

    Process-Driven Autoformalization in Lean 4

    Authors: Jianqiao Lu, Zhengying Liu, Yingjia Wan, Yinya Huang, Haiming Wang, Zhicheng Yang, Jing Tang, Zhijiang Guo

    Abstract: Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 1 figures, 11 tables

  43. arXiv:2406.00983  [pdf, other

    cs.CL cs.AI

    Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

    Authors: Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang, Hongfei Lin

    Abstract: Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  44. arXiv:2406.00958  [pdf, other

    cs.LG cs.CV

    Navigating Conflicting Views: Harnessing Trust for Learning

    Authors: Jueqing Lu, Lan Du, Wray Buntine, Myong Chol Jung, Joanna Dipnall, Belinda Gabbe

    Abstract: Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct inform… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  45. arXiv:2406.00508  [pdf, other

    cs.CV

    FlowIE: Efficient Image Enhancement via Rectified Flow

    Authors: Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang, Jie Zhou, Jiwen Lu

    Abstract: Image enhancement holds extensive applications in real-world scenarios due to complex environments and limitations of imaging devices. Conventional methods are often constrained by their tailored models, resulting in diminished robustness when confronted with challenging degradation conditions. In response, we propose FlowIE, a simple yet highly effective flow-based image enhancement framework tha… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 as an oral presentation

  46. arXiv:2405.20337  [pdf, other

    cs.CV cs.AI

    OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

    Authors: Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jiwen Lu

    Abstract: Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/wzzheng/OccSora

  47. arXiv:2405.19744  [pdf, other

    cs.CL cs.AI

    X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions

    Authors: Chong Li, Wen Yang, Jiajun Zhang, Jinliang Lu, Shaonan Wang, Chengqing Zong

    Abstract: Large language models respond well in high-resource languages like English but struggle in low-resource languages. It may arise from the lack of high-quality instruction following data in these languages. Directly translating English samples into these languages can be a solution but unreliable, leading to responses with translation errors and lacking language-specific or cultural knowledge. To ad… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ACL 2024. Our codes, data and model weights are available at https://github.com/ZNLP/X-Instruction

  48. arXiv:2405.17789  [pdf, ps, other

    cs.IT

    On the Downlink Average Energy Efficiency of Non-Stationary XL-MIMO

    Authors: Jun Zhang, Jiacheng Lu, Jingjing Zhang, Yu Han, Jue Wang, Shi Jin

    Abstract: Extra large-scale multiple-input multiple-output (XL-MIMO) is a key technology for future wireless communication systems. This paper considers the effects of visibility region (VR) at the base station (BS) in a non-stationary multi-user XL-MIMO scenario, where only partial antennas can receive users' signal. In time division duplexing (TDD) mode, we first estimate the VR at the BS by detecting the… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 13 pages, 11 figures

  49. arXiv:2405.17429  [pdf, other

    cs.CV cs.AI

    GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

    Authors: Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, Jiwen Lu

    Abstract: 3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resource… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/huang-yh/GaussianFormer

  50. arXiv:2405.17422  [pdf, other

    cs.CV cs.AI cs.LG

    Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

    Authors: Shuai Zeng, Wenzhao Zheng, Jiwen Lu, Haibin Yan

    Abstract: 3D object detection aims to recover the 3D information of concerning objects and serves as the fundamental task of autonomous driving perception. Its performance greatly depends on the scale of labeled training data, yet it is costly to obtain high-quality annotations for point cloud data. While conventional methods focus on generating pseudo-labels for unlabeled samples as supplements for trainin… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/wzzheng/HASS