Zum Hauptinhalt springen

Showing 1–50 of 5,900 results for author: LI, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.08250  [pdf, other

    cs.HC cs.AI

    OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering

    Authors: Jiahao Nick Li, Zhuohao Jerry Zhang, Jiaju Ma

    Abstract: People often capture memories through photos, screenshots, and videos. While existing AI-based tools enable querying this data using natural language, they mostly only support retrieving individual pieces of information like certain objects in photos and struggle with answering more complex queries that involve interpreting interconnected memories like event sequences. We conducted a one-month dia… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  2. arXiv:2409.08207  [pdf, other

    cs.CV

    VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

    Authors: Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao

    Abstract: Recently, methods like Zero-1-2-3 have focused on single-view based 3D reconstruction and have achieved remarkable success. However, their predictions for unseen areas heavily rely on the inductive bias of large-scale pretrained diffusion models. Although subsequent work, such as DreamComposer, attempts to make predictions more controllable by incorporating additional views, the results remain unr… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  3. arXiv:2409.08159  [pdf, other

    cs.CV

    SDformer: Efficient End-to-End Transformer for Depth Completion

    Authors: Jian Qian, Miao Sun, Ashley Lee, Jie Li, Shenglong Zhuo, Patrick Yin Chiang

    Abstract: Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor. Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks. However, despite the excellent high-end performance, they suffer from a limited representation area. To overcome the drawbacks of CNNs, a more effective and powerful method ha… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: Presented at the International Conference on Industrial Automation, Robotics and Control Engineering (IARCE) 2022

  4. arXiv:2409.07843  [pdf, other

    cs.CV cs.RO

    Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes

    Authors: Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li

    Abstract: Omnidirectional Depth Estimation has broad application prospects in fields such as robotic navigation and autonomous driving. In this paper, we propose a robotic prototype system and corresponding algorithm designed to validate omnidirectional depth estimation for navigation and obstacle avoidance in real-world scenarios for both robots and vehicles. The proposed HexaMODE system captures 360… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  5. arXiv:2409.07714  [pdf, other

    cs.CV cs.MA

    CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

    Authors: Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li

    Abstract: By sharing complementary perceptual information, multi-agent collaborative perception fosters a deeper understanding of the environment. Recent studies on collaborative perception mostly utilize CNNs or Transformers to learn feature representation and fusion in the spatial dimension, which struggle to handle long-range spatial-temporal features under limited computing and communication resources.… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Submitted to AAAI 2025

  6. arXiv:2409.07713  [pdf, other

    cs.CL

    Experimenting with Legal AI Solutions: The Case of Question-Answering for Access to Justice

    Authors: Jonathan Li, Rohan Bhambhoria, Samuel Dahan, Xiaodan Zhu

    Abstract: Generative AI models, such as the GPT and Llama series, have significant potential to assist laypeople in answering legal questions. However, little prior work focuses on the data sourcing, inference, and evaluation of these models in the context of laypersons. To this end, we propose a human-centric legal NLP pipeline, covering data sourcing, inference, and evaluation. We introduce and release a… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted into GenLaw '24 (ICML 2024 workshop)

  7. arXiv:2409.07486  [pdf, other

    q-fin.CP cs.AI cs.CE cs.LG q-fin.TR

    MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

    Authors: Junjie Li, Yang Liu, Weiqing Liu, Shikai Fang, Lewen Wang, Chang Xu, Jiang Bian

    Abstract: Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite efforts to build real-world simulators, leveraging generative models for virtual worlds, like financial markets, remains underexplored. In financial markets, generative models can simulate market effects of various behaviors, enabling interaction with ma… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 19 pages, 12 figures

  8. arXiv:2409.07434  [pdf, other

    stat.ML cs.LG math.ST

    Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models

    Authors: Jiaqi Li, Johannes Schmidt-Hieber, Wei Biao Wu

    Abstract: This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide que… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 77 pages, 5 figures, 4 tables

    MSC Class: 62E20; 62F12; 68W27

  9. arXiv:2409.07372  [pdf, other

    cs.CL cs.AI cs.HC

    Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination

    Authors: Daniel Zhang-Li, Zheyuan Zhang, Jifan Yu, Joy Lim Jia Yin, Shangqing Tu, Linlu Gong, Haohua Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li

    Abstract: The vast pre-existing slides serve as rich and important materials to carry lecture knowledge. However, effectively leveraging lecture slides to serve students is difficult due to the multi-modal nature of slide content and the heterogeneous teaching actions. We study the problem of discovering effective designs that convert a slide into an interactive lecture. We develop Slide2Lecture, a tuning-f… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  10. arXiv:2409.07064  [pdf, other

    cs.CL

    Automated Speaking Assessment of Conversation Tests with Novel Graph-based Modeling on Spoken Response Coherence

    Authors: Jiun-Ting Li, Bi-Cheng Yan, Tien-Hong Lo, Yi-Cheng Wang, Yung-Chang Hsu, Berlin Chen

    Abstract: Automated speaking assessment in conversation tests (ASAC) aims to evaluate the overall speaking proficiency of an L2 (second-language) speaker in a setting where an interlocutor interacts with one or more candidates. Although prior ASAC approaches have shown promising performance on their respective datasets, there is still a dearth of research specifically focused on incorporating the coherence… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE SLT 2024

  11. arXiv:2409.06942  [pdf

    cs.CV

    Automated Body Composition Analysis Using DAFS Express on 2D MRI Slices at L3 Vertebral Level

    Authors: Varun Akella, Razeyeh Bagherinasab, Jia Ming Li, Long Nguyen, Vincent Tze Yang Chow, Hyunwoo Lee, Karteek Popuri, Mirza Faisal Beg

    Abstract: Body composition analysis is vital in assessing health conditions such as obesity, sarcopenia, and metabolic syndromes. MRI provides detailed images of skeletal muscle (SKM), visceral adipose tissue (VAT), and subcutaneous adipose tissue (SAT), but their manual segmentation is labor-intensive and limits clinical applicability. This study validates an automated tool for MRI-based 2D body compositio… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  12. arXiv:2409.06888  [pdf, other

    cs.MA

    A Quality Diversity Approach to Automatically Generate Multi-Agent Path Finding Benchmark Maps

    Authors: Cheng Qian, Yulun Zhang, Varun Bhatt, Matthew Christopher Fontaine, Stefanos Nikolaidis, Jiaoyang Li

    Abstract: We use the Quality Diversity (QD) algorithm with Neural Cellular Automata (NCA) to generate benchmark maps for Multi-Agent Path Finding (MAPF) algorithms. Previously, MAPF algorithms are tested using fixed, human-designed benchmark maps. However, such fixed benchmark maps have several problems. First, these maps may not cover all the potential failure scenarios for the algorithms. Second, when com… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 13 pages, 20 figures

  13. arXiv:2409.06803  [pdf, other

    cs.CL cs.IT

    Decomposition of surprisal: Unified computational model of ERP components in language processing

    Authors: Jiaxuan Li, Richard Futrell

    Abstract: The functional interpretation of language-related ERP components has been a central debate in psycholinguistics for decades. We advance an information-theoretic model of human language processing in the brain in which incoming linguistic input is processed at first shallowly and later with more depth, with these two kinds of information processing corresponding to distinct electroencephalographic… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  14. arXiv:2409.06679  [pdf, other

    cs.CL

    E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

    Authors: Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang

    Abstract: In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 12 pages, 4 figures

  15. arXiv:2409.06206  [pdf, other

    cs.CV

    AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration

    Authors: Hongyi Cai, Mohammad Mahdinur Rahman, Mohammad Shahid Akhtar, Jie Li, Jingyu Wu, Zhili Fang

    Abstract: Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  16. arXiv:2409.06178  [pdf, other

    cs.HC cs.CL

    SQLucid: Grounding Natural Language Database Queries with Interactive Explanations

    Authors: Yuan Tian, Jonathan K. Kummerfeld, Toby Jia-Jun Li, Tianyi Zhang

    Abstract: Though recent advances in machine learning have led to significant improvements in natural language interfaces for databases, the accuracy and reliability of these systems remain limited, especially in high-stakes domains. This paper introduces SQLucid, a novel user interface that bridges the gap between non-expert users and complex database querying processes. SQLucid addresses existing limitatio… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted to UIST'24

  17. arXiv:2409.06164  [pdf, other

    cs.CL

    Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines

    Authors: Yining Chen, Jianqiang Li, Changwei Song, Qing Zhao, Yongsheng Tong, Guanghui Fu

    Abstract: Suicide is a pressing global issue, demanding urgent and effective preventive interventions. Among the various strategies in place, psychological support hotlines had proved as a potent intervention method. Approximately two million people in China attempt suicide annually, with many individuals making multiple attempts. Prompt identification and intervention for high-risk individuals are crucial… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  18. arXiv:2409.06154  [pdf, other

    cs.CV

    UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos

    Authors: Yin Chen, Jia Li, Yu Zhang, Zhenzhen Hu, Shiguang Shan, Meng Wang, Richang Hong

    Abstract: Dynamic facial expression recognition (DFER) is essential for understanding human emotions and behavior. However, conventional DFER methods, which primarily use dynamic facial data, often underutilize static expression images and their labels, limiting their performance and robustness. To overcome this, we introduce UniLearn, a novel unified learning paradigm that integrates static facial expressi… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  19. arXiv:2409.05885  [pdf, other

    cs.LG cs.CE

    A Dual-Path neural network model to construct the flame nonlinear thermoacoustic response in the time domain

    Authors: Jiawei Wu, Teng Wang, Jiaqi Nan, Lijun Yang, Jingxuan Li

    Abstract: Traditional numerical simulation methods require substantial computational resources to accurately determine the complete nonlinear thermoacoustic response of flames to various perturbation frequencies and amplitudes. In this paper, we have developed deep learning algorithms that can construct a comprehensive flame nonlinear response from limited numerical simulation data. To achieve this, we prop… ▽ More

    Submitted 26 August, 2024; originally announced September 2024.

    Comments: 23 pages 14figures, 1 supplemmentary meterial

  20. arXiv:2409.05872  [pdf, other

    cs.IR cs.LG

    CSRec: Rethinking Sequential Recommendation from A Causal Perspective

    Authors: Xiaoyu Liu, Jiaxin Yuan, Yuhang Zhou, Jingling Li, Furong Huang, Wei Ai

    Abstract: The essence of sequential recommender systems (RecSys) lies in understanding how users make decisions. Most existing approaches frame the task as sequential prediction based on users' historical purchase records. While effective in capturing users' natural preferences, this formulation falls short in accurately modeling actual recommendation scenarios, particularly in accounting for how unsuccessf… ▽ More

    Submitted 23 August, 2024; originally announced September 2024.

  21. arXiv:2409.05701  [pdf, other

    cs.LG cs.AI

    pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning

    Authors: Jiahao Lai, Jiaqi Li, Jian Xu, Yanru Wu, Boshi Tang, Siqi Chen, Yongfeng Huang, Wenbo Ding, Yang Li

    Abstract: Federated Learning (FL) offers a decentralized approach to model training, where data remains local and only model parameters are shared between the clients and the central server. Traditional methods, such as Federated Averaging (FedAvg), linearly aggregate these parameters which are usually trained on heterogeneous data distributions, potentially overlooking the complex, high-dimensional nature… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  22. arXiv:2409.05679  [pdf

    cs.CV

    AnomalyCD: A benchmark for Earth anomaly change detection with high-resolution and time-series observations

    Authors: Jingtao Li, Qian Zhu, Xinyu Wang, Hengwei Zhao, Yanfei Zhong

    Abstract: Various Earth anomalies have destroyed the stable, balanced state, resulting in fatalities and serious destruction of property. With the advantages of large-scale and precise observation, high-resolution remote sensing images have been widely used for anomaly monitoring and localization. Powered by the deep representation, the existing methods have achieved remarkable advances, primarily in classi… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: remote sensing benchmark

  23. arXiv:2409.05642  [pdf, ps, other

    cs.CV

    Prototype-Driven Multi-Feature Generation for Visible-Infrared Person Re-identification

    Authors: Jiarui Li, Zhen Qiu, Yilin Yang, Yuqi Li, Zeyu Dong, Chuanguang Yang

    Abstract: The primary challenges in visible-infrared person re-identification arise from the differences between visible (vis) and infrared (ir) images, including inter-modal and intra-modal variations. These challenges are further complicated by varying viewpoints and irregular movements. Existing methods often rely on horizontal partitioning to align part-level features, which can introduce inaccuracies a… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 7 pages

  24. arXiv:2409.05592  [pdf, other

    cs.CL cs.AI

    ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

    Authors: Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Yulan He

    Abstract: Predicting unknown drug-drug interactions (DDIs) is crucial for improving medication safety. Previous efforts in DDI prediction have typically focused on binary classification or predicting DDI categories, with the absence of explanatory insights that could enhance trust in these predictions. In this work, we propose to generate natural language explanations for DDI predictions, enabling the model… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 17 pages, 4 figures

  25. arXiv:2409.05552  [pdf, other

    cs.CV

    Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

    Authors: Xuesong Zhang, Jia Li, Yunbo Xu, Zhenzhen Hu, Richang Hong

    Abstract: Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN). Despite remarkable recent progress in learning fine-grained and multifarious visual representations, the tendency to overfit to the training environments leads to unsatisfactory generalization performance. In this work, we present a versatile M… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2025

  26. arXiv:2409.05171  [pdf, other

    cs.GR cs.CV cs.HC cs.LG

    Exploring Fungal Morphology Simulation and Dynamic Light Containment from a Graphics Generation Perspective

    Authors: Kexin Wang, Ivy He, Jinke Li, Ali Asadipour, Yitong Sun

    Abstract: Fungal simulation and control are considered crucial techniques in Bio-Art creation. However, coding algorithms for reliable fungal simulations have posed significant challenges for artists. This study equates fungal morphology simulation to a two-dimensional graphic time-series generation problem. We propose a zero-coding, neural network-driven cellular automaton. Fungal spread patterns are learn… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: Siggraph Asia 2024 Art Paper

  27. arXiv:2409.05143  [pdf, other

    cs.GR cs.HC

    PhysHand: A Hand Simulation Model with Physiological Geometry, Physical Deformation, and Accurate Contact Handling

    Authors: Mingyang Sun, Dongliang Kou, Ruisheng Yuan, Dingkang Yang, Peng Zhai, Xiao Zhao, Yang Jiang, Xiong Li, Jingchen Li, Lihua Zhang

    Abstract: In virtual Hand-Object Interaction (HOI) scenarios, the authenticity of the hand's deformation is important to immersive experience, such as natural manipulation or tactile feedback. Unrealistic deformation arises from simplified hand geometry, neglect of the different physics attributes of the hand, and penetration due to imprecise contact handling. To address these problems, we propose PhysHand,… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: 11 pages

    ACM Class: I.3.2; I.3.4; I.3.5; I.3.6; I.3.8; I.6.1; I.6.3

  28. arXiv:2409.05006  [pdf, other

    cs.RO

    HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions

    Authors: Jianping Li, Qiutong Leng, Jinxing Liu, Xinhang Xu, Tongxin Jin, Muqing Cao, Thien-Minh Nguyen, Shenghai Yuan, Kun Cao, Lihua Xie

    Abstract: Helmet-mounted wearable positioning systems are crucial for enhancing safety and facilitating coordination in industrial, construction, and emergency rescue environments. These systems, including LiDAR-Inertial Odometry (LIO) and Visual-Inertial Odometry (VIO), often face challenges in localization due to adverse environmental conditions such as dust, smoke, and limited visual features. To address… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  29. arXiv:2409.04456  [pdf, other

    math.OC cs.AI cs.CV cs.LG

    Pattern based learning and optimisation through pricing for bin packing problem

    Authors: Huayan Zhang, Ruibin Bai, Tie-Yan Liu, Jiawei Li, Bingchen Lin, Jianfeng Ren

    Abstract: As a popular form of knowledge and experience, patterns and their identification have been critical tasks in most data mining applications. However, as far as we are aware, no study has systematically examined the dynamics of pattern values and their reuse under varying conditions. We argue that when problem conditions such as the distributions of random variables change, the patterns that perform… ▽ More

    Submitted 27 August, 2024; originally announced September 2024.

  30. arXiv:2409.04183  [pdf, other

    cs.CL cs.AI

    GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding

    Authors: Ziyin Zhang, Hang Yu, Shijie Li, Peng Di, Jianguo Li, Rui Wang

    Abstract: Programming languages possess rich semantic information such as data flow that is represented by graphs and not available from the surface form of source code. Recent code language models have scaled to billions of parameters, but model source code solely as text tokens while ignoring any other structural information. Conversely, models that do encode structural information of code make modificati… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  31. arXiv:2409.04016  [pdf, other

    cs.SD eess.AS

    Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

    Authors: Jiaqi Li, Dongmei Wang, Xiaofei Wang, Yao Qian, Long Zhou, Shujie Liu, Midia Yousefi, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yanqing Liu, Junkun Chen, Sheng Zhao, Jinyu Li, Zhizheng Wu, Michael Zeng

    Abstract: Neural audio codec tokens serve as the fundamental building blocks for speech language model (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In this work, we examine codec tokens within SLM framework for speech generation to provide insights for effective codec design. We retrain existing hig… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT-2024

  32. arXiv:2409.03684  [pdf, ps, other

    quant-ph cs.DS cs.LG

    Predicting quantum channels over general product distributions

    Authors: Sitan Chen, Jaume de Dios Pont, Jun-Ting Hsieh, Hsin-Yuan Huang, Jane Lange, Jerry Li

    Abstract: We investigate the problem of predicting the output behavior of unknown quantum channels. Given query access to an $n$-qubit channel $E$ and an observable $O$, we aim to learn the mapping \begin{equation*} ρ\mapsto \mathrm{Tr}(O E[ρ]) \end{equation*} to within a small error for most $ρ$ sampled from a distribution $D$. Previously, Huang, Chen, and Preskill proved a surprising result that even if… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 20 pages, comments welcome

  33. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  34. arXiv:2409.03508  [pdf, other

    cs.AR

    Revealing Untapped DSP Optimization Potentials for FPGA-Based Systolic Matrix Engines

    Authors: Jindong Li, Tenglong Li, Guobin Shen, Dongcheng Zhao, Qian Zhang, Yi Zeng

    Abstract: Systolic architectures are widely embraced by neural network accelerators for their superior performance in highly parallelized computation. The DSP48E2s serve as dedicated arithmetic blocks in Xilinx Ultrascale series FPGAs and constitute a fundamental component in FPGA-based systolic matrix engines. Harnessing the full potential of DSP48E2s in architectural design can result in significant perfo… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by FPL2024

  35. arXiv:2409.03346  [pdf, other

    cs.CL cs.AI

    Sketch: A Toolkit for Streamlining LLM Operations

    Authors: Xin Jiang, Xiang Li, Wenjia Ma, Xuezhi Fang, Yiqun Yao, Naitong Yu, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang

    Abstract: Large language models (LLMs) represented by GPT family have achieved remarkable success. The characteristics of LLMs lie in their ability to accommodate a wide range of tasks through a generative approach. However, the flexibility of their output format poses challenges in controlling and harnessing the model's outputs, thereby constraining the application of LLMs in various domains. In this work,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  36. arXiv:2409.03160  [pdf, other

    cs.RO eess.SY

    Autonomous Drifting Based on Maximal Safety Probability Learning

    Authors: Hikaru Hoshino, Jiaxing Li, Arnav Menon, John M. Dolan, Yorie Nakahira

    Abstract: This paper proposes a novel learning-based framework for autonomous driving based on the concept of maximal safety probability. Efficient learning requires rewards that are informative of desirable/undesirable states, but such rewards are challenging to design manually due to the difficulty of differentiating better states among many safe states. On the other hand, learning policies that maximize… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.16391

  37. arXiv:2409.02897  [pdf, other

    cs.CL

    LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

    Authors: Jiajie Zhang, Yushi Bai, Xin Lv, Wanjun Gu, Danqing Liu, Minhao Zou, Shulin Cao, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li

    Abstract: Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations. In this work, we aim to enable long-context LLMs to generate responses with fine-graine… ▽ More

    Submitted 10 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  38. AlignGroup: Learning and Aligning Group Consensus with Member Preferences for Group Recommendation

    Authors: Jinfeng Xu, Zheyu Chen, Jinze Li, Shuo Yang, Hewei Wang, Edith C. -H. Ngai

    Abstract: Group activities are important behaviors in human society, providing personalized recommendations for groups is referred to as the group recommendation task. Existing methods can usually be categorized into two strategies to infer group preferences: 1) determining group preferences by aggregating members' personalized preferences, and 2) inferring group consensus by capturing group members' cohere… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 10 pages, accepted by CIKM 2024

  39. arXiv:2409.02492  [pdf

    cs.CV cs.LG eess.IV

    Reliable Deep Diffusion Tensor Estimation: Rethinking the Power of Data-Driven Optimization Routine

    Authors: Jialong Li, Zhicheng Zhang, Yunwei Chen, Qiqi Lu, Ye Wu, Xiaoming Liu, QianJin Feng, Yanqiu Feng, Xinyuan Zhang

    Abstract: Diffusion tensor imaging (DTI) holds significant importance in clinical diagnosis and neuroscience research. However, conventional model-based fitting methods often suffer from sensitivity to noise, leading to decreased accuracy in estimating DTI parameters. While traditional data-driven deep learning methods have shown potential in terms of accuracy and efficiency, their limited generalization to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  40. arXiv:2409.02453  [pdf, other

    eess.IV cs.CV cs.ET cs.MM

    FrameCorr: Adaptive, Autoencoder-based Neural Compression for Video Reconstruction in Resource and Timing Constrained Network Settings

    Authors: John Li, Shehab Sarar Ahmed, Deepak Nair

    Abstract: Despite the growing adoption of video processing via Internet of Things (IoT) devices due to their cost-effectiveness, transmitting captured data to nearby servers poses challenges due to varying timing constraints and scarcity of network bandwidth. Existing video compression methods face difficulties in recovering compressed data when incomplete data is provided. Here, we introduce FrameCorr, a d… ▽ More

    Submitted 10 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  41. arXiv:2409.02415  [pdf, other

    cs.CV

    Local map Construction Methods with SD map: A Novel Survey

    Authors: Jiaqi Li, Pingfan Jia, Jiaxing Chen, Jiaxi Liu, Lei He

    Abstract: In recent years, significant academic advancements have been made in the field of autonomous vehicles, with Local maps emerging as a crucial component of autonomous driving technology. Local maps not only provide intricate details of road networks but also serve as fundamental inputs for critical tasks such as vehicle localization, navigation, and decision-making. Given the characteristics of SD m… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 14 pages, 11 figures

  42. arXiv:2409.01995  [pdf, other

    eess.AS cs.AI cs.SD

    vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

    Authors: Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu

    Abstract: We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task. To amend the loss of speaker timbre in the content tokens, vec2wav 2.0 utilizes the WavLM features to provide strong timbre-dependent information. A novel adap… ▽ More

    Submitted 11 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures. Submitted to ICASSP 2025. Demo page: https://cantabile-kwok.github.io/vec2wav2/

  43. arXiv:2409.01568  [pdf, other

    cs.LG

    Quantifying Emergence in Neural Networks: Insights from Pruning and Training Dynamics

    Authors: Faisal AlShinaifi, Zeyad Almoaigel, Johnny Jingze Li, Abdulla Kuleib, Gabriel A. Silva

    Abstract: Emergence, where complex behaviors develop from the interactions of simpler components within a network, plays a crucial role in enhancing neural network capabilities. We introduce a quantitative framework to measure emergence during the training process and examine its impact on network performance, particularly in relation to pruning and training dynamics. Our hypothesis posits that the degree o… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  44. arXiv:2409.01560  [pdf, other

    cs.CV cs.AI

    Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models

    Authors: Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen

    Abstract: Categorization, a core cognitive ability in humans that organizes objects based on common features, is essential to cognitive science as well as computer vision. To evaluate the categorization ability of visual AI models, various proxy tasks on recognition from datasets to open world scenarios have been proposed. Recent development of Large Multimodal Models (LMMs) has demonstrated impressive resu… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 39 pages, 28 figures, 4 tables. Accepted at The 35th British Machine Vision Conference (BMVC 2024). Project page at https://fubin29.github.io/Blocks-as-Probes/

  45. arXiv:2409.00876  [pdf, other

    cs.DC cs.CE cs.DS

    Rapid GPU-Based Pangenome Graph Layout

    Authors: Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang

    Abstract: Computational Pangenomics is an emerging field that studies genetic variation using a graph structure encompassing multiple genomes. Visualizing pangenome graphs is vital for understanding genome diversity. Yet, handling large graphs can be challenging due to the high computational demands of the graph layout process. In this work, we conduct a thorough performance characterization of a state-of… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: SC 2024

  46. arXiv:2409.00839  [pdf, other

    cs.CV cs.AI cs.IT

    Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving

    Authors: Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang, Li Wang, Yifan Tang, Jilong Guo, Jun Li

    Abstract: With the increasing complexity of the traffic environment, the significance of safety perception in intelligent driving is intensifying. Traditional methods in the field of intelligent driving perception rely on deep learning, which suffers from limited interpretability, often described as a "black box." This paper introduces a novel type of loss function, termed "Entropy Loss," along with an inno… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  47. arXiv:2409.00575  [pdf, other

    cs.LG cs.IT

    Online Optimization for Learning to Communicate over Time-Correlated Channels

    Authors: Zheshun Wu, Junfan Li, Zenglin Xu, Sumei Sun, Jie Liu

    Abstract: Machine learning techniques have garnered great interest in designing communication systems owing to their capacity in tacking with channel uncertainty. To provide theoretical guarantees for learning-based communication systems, some recent works analyze generalization bounds for devised methods based on the assumption of Independently and Identically Distributed (I.I.D.) channels, a condition rar… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 14 pages, 4 figures, submitted for possible journal publication

  48. arXiv:2409.00387  [pdf, other

    eess.AS cs.SD

    Progressive Residual Extraction based Pre-training for Speech Representation Learning

    Authors: Tianrui Wang, Jin Li, Ziyang Ma, Rui Cao, Xie Chen, Longbiao Wang, Meng Ge, Xiaobao Wang, Yuguang Wang, Jianwu Dang, Nyima Tashi

    Abstract: Self-supervised learning (SSL) has garnered significant attention in speech processing, excelling in linguistic tasks such as speech recognition. However, jointly improving the performance of pre-trained models on various downstream tasks, each requiring different speech information, poses significant challenges. To this purpose, we propose a progressive residual extraction based self-supervised l… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  49. arXiv:2409.00330  [pdf, other

    cs.CV

    GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

    Authors: Jun Li, Jinying Wu, Qiming Li, Feifei Guo

    Abstract: With the continuous development of deep learning, the field of repetitive action counting is gradually gaining notice from many researchers. Extraction of pose keypoints using human pose estimation networks is proven to be an effective pose-level method. However, existing pose-level methods suffer from the shortcomings that the single coordinate is not stable enough to handle action distortions du… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  50. arXiv:2409.00295  [pdf, other

    cs.CV cs.LG

    Box2Flow: Instance-based Action Flow Graphs from Videos

    Authors: Jiatong Li, Kalliopi Basioti, Vladimir Pavlovic

    Abstract: A large amount of procedural videos on the web show how to complete various tasks. These tasks can often be accomplished in different ways and step orderings, with some steps able to be performed simultaneously, while others are constrained to be completed in a specific order. Flow graphs can be used to illustrate the step relationships of a task. Current task-based methods try to learn a single f… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.