Skip to main content

Showing 1–50 of 1,459 results for author: Yang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13622  [pdf, other

    cs.LG cs.AI

    Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

    Authors: Ally Yalei Du, Lin F. Yang, Ruosong Wang

    Abstract: The recent work by Dong & Yang (2023) showed for misspecified sparse linear bandits, one can obtain an $O\left(ε\right)$-optimal policy using a polynomial number of samples when the sparsity is a constant, where $ε$ is the misspecification error. This result is in sharp contrast to misspecified linear bandits without sparsity, which require an exponential number of samples to get the same guarante… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 21 pages

  2. arXiv:2407.12622  [pdf, other

    cs.CV

    Rethinking the Architecture Design for Efficient Generic Event Boundary Detection

    Authors: Ziwei Zheng, Zechuan Zhang, Yulin Wang, Shiji Song, Gao Huang, Le Yang

    Abstract: Generic event boundary detection (GEBD), inspired by human visual cognitive behaviors of consistently segmenting videos into meaningful temporal chunks, finds utility in various applications such as video editing and. In this paper, we demonstrate that SOTA GEBD models often prioritize final performance over model complexity, resulting in low inference speed and hindering efficient deployment in r… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ACM MM 2024

  3. arXiv:2407.12370  [pdf, other

    cs.LG

    Temporal receptive field in dynamic graph learning: A comprehensive analysis

    Authors: Yannis Karmim, Leshanshui Yang, Raphaël Fournier S'Niehotta, Clément Chatelain, Sébastien Adam, Nicolas Thome

    Abstract: Dynamic link prediction is a critical task in the analysis of evolving networks, with applications ranging from recommender systems to economic exchanges. However, the concept of the temporal receptive field, which refers to the temporal context that models use for making predictions, has been largely overlooked and insufficiently analyzed in existing research. In this study, we present a comprehe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Journal ref: MLG Workshop at ECML-PKDD, Sep 2024, Vilnius (Lituanie), France

  4. arXiv:2407.11998  [pdf, other

    cs.HC

    Custom Cloth Creation and Virtual Try-on for Everyone

    Authors: Pei Chen, Heng Wang, Sainan Sun, Zhiyuan Chen, Zhenkun Liu, Shuhua Cao, Li Yang, Minghui Yang

    Abstract: This demo showcases a simple tool that utilizes AIGC technology, enabling both professional designers and regular users to easily customize clothing for their digital avatars. Customization options include changing clothing colors, textures, logos, and patterns. Compared with traditional 3D modeling processes, our approach significantly enhances efficiency and interactivity and reduces production… ▽ More

    Submitted 13 June, 2024; originally announced July 2024.

  5. arXiv:2407.09792  [pdf, other

    cs.RO

    Language-Augmented Symbolic Planner for Open-World Task Planning

    Authors: Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, Jia Pan

    Abstract: Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by Robotics: Science and Systems (RSS) 2024

  6. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  7. arXiv:2407.08348  [pdf, other

    cs.AI cs.CL cs.LG

    Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

    Authors: Liang Zeng, Liangjun Zhong, Liang Zhao, Tianwen Wei, Liu Yang, Jujie He, Cheng Cheng, Rui Hu, Yang Liu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.08029  [pdf, other

    cs.LG cs.CL

    A Critical Review of Causal Reasoning Benchmarks for Large Language Models

    Authors: Linying Yang, Vik Shirvaikar, Oscar Clivio, Fabian Falck

    Abstract: Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thoro… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: AAAI 2024 Workshop on ''Are Large Language Models Simply Causal Parrots?''

  9. arXiv:2407.07311  [pdf

    cs.LG cs.AI cs.CV

    ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting

    Authors: Luoxiao Yang, Yun Wang, Xinqi Fan, Israel Cohen, Yue Zhao, Zijun Zhang

    Abstract: The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual information, prefer predicting future trends by observing vi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  10. arXiv:2407.06168  [pdf, other

    cs.RO cs.CV

    TARGO: Benchmarking Target-driven Object Grasping under Occlusions

    Authors: Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers

    Abstract: Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contr… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 19 pages, 17 figures

  11. arXiv:2407.05858  [pdf, other

    cs.AI

    Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU

    Authors: Daliang Xu, Hao Zhang, Liming Yang, Ruiqi Liu, Gang Huang, Mengwei Xu, Xuanzhe Liu

    Abstract: On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  12. arXiv:2407.05603  [pdf, other

    cs.CV cs.AI

    WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

    Authors: Pingyi Chen, Chenglu Zhu, Sunyi Zheng, Honglin Li, Lin Yang

    Abstract: Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs b… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  13. arXiv:2407.05580  [pdf, other

    cs.LG cs.AI

    $\mathrm{E^{2}CFD}$: Towards Effective and Efficient Cost Function Design for Safe Reinforcement Learning via Large Language Model

    Authors: Zepeng Wang, Chao Ma, Linjiang Zhou, Libing Wu, Lei Yang, Xiaochuan Shi, Guojun Peng

    Abstract: Different classes of safe reinforcement learning algorithms have shown satisfactory performance in various types of safety requirement scenarios. However, the existing methods mainly address one or several classes of specific safety requirement scenario problems and cannot be applied to arbitrary safety requirement scenarios. In addition, the optimization objectives of existing reinforcement learn… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  14. arXiv:2407.04274  [pdf, other

    cs.CV

    Fine-grained Dynamic Network for Generic Event Boundary Detection

    Authors: Ziwei Zheng, Lijun He, Le Yang, Fan Li

    Abstract: Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their di… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  15. arXiv:2407.03197  [pdf, other

    cs.CV

    DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

    Authors: Le Yang, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li

    Abstract: Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneo… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.02398  [pdf, other

    cs.CV

    Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

    Authors: Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, Bin Cui

    Abstract: Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/YangLing0818/consistency_flow_matching

  17. arXiv:2407.02031  [pdf, other

    cs.DC cs.AI cs.LG

    SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

    Authors: Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

    Abstract: This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  18. arXiv:2407.01004  [pdf, other

    cs.LG stat.ME

    CURLS: Causal Rule Learning for Subgroups with Significant Treatment Effect

    Authors: Jiehui Zhou, Linxiao Yang, Xingyu Liu, Xinyue Gu, Liang Sun, Wei Chen

    Abstract: In causal inference, estimating heterogeneous treatment effects (HTE) is critical for identifying how different subgroups respond to interventions, with broad applications in fields such as precision medicine and personalized advertising. Although HTE estimation methods aim to improve accuracy, how to provide explicit subgroup descriptions remains unclear, hindering data interpretation and strateg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  19. arXiv:2407.00579  [pdf, ps, other

    cs.IT eess.SP

    Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems

    Authors: Miaomiao Zhu, Pengxu Chen, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu

    Abstract: Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  20. arXiv:2407.00203  [pdf, other

    cs.CV

    PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

    Authors: Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, Jingxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

    Abstract: Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology imag… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 13 pages, 3 figures

  21. arXiv:2406.19905  [pdf, other

    cs.CV

    Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

    Authors: Longrong Yang, Dong Sheng, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

    Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they e… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  22. arXiv:2406.19578  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    PathAlign: A vision-language model for whole slide images in histopathology

    Authors: Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado, Quang Duong, Dale R. Webster, Shravya Shetty, Daniel Golden, Yun Liu, David F. Steiner, Ellery Wulczyn

    Abstract: Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggrega… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 9 main pages and 19 pages of supplemental material; 3 main tables, 3 main figures and 11 supplemental tables, 7 supplemental figures

  23. arXiv:2406.18691  [pdf, other

    cs.CV

    Geometric Features Enhanced Human-Object Interaction Detection

    Authors: Manli Zhu, Edmond S. L. Ho, Shuang Chen, Longzhi Yang, Hubert P. H. Shum

    Abstract: Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. Howe… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE TIM

  24. arXiv:2406.18529  [pdf, ps, other

    cs.LG

    Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs

    Authors: Tian Tian, Lin F. Yang, Csaba Szepesvári

    Abstract: The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  25. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  26. arXiv:2406.18181  [pdf, ps, other

    cs.SE

    An Empirical Study of Unit Test Generation with Large Language Models

    Authors: Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen

    Abstract: Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  27. arXiv:2406.18072  [pdf, ps, other

    stat.ML cs.LG

    Learning for Bandits under Action Erasures

    Authors: Osama Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

    Abstract: We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether th… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  28. arXiv:2406.17326  [pdf, other

    cs.AI

    The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game

    Authors: Lanyu Yang, Dongchun Jiang, Fuqiang Guo, Mingjian Fu

    Abstract: Cooperative behavior is prevalent in both human society and nature. Understanding the emergence and maintenance of cooperation among self-interested individuals remains a significant challenge in evolutionary biology and social sciences. Reinforcement learning (RL) provides a suitable framework for studying evolutionary game theory as it can adapt to environmental changes and maximize expected ben… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  29. arXiv:2406.17172  [pdf, other

    cs.CR cs.DC cs.LG

    Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework

    Authors: Shiva Raj Pokhrel, Luxing Yang, Sutharshan Rajasegarar, Gang Li

    Abstract: This paper introduces a robust zero-trust architecture (ZTA) tailored for the decentralized system that empowers efficient remote work and collaboration within IoT networks. Using blockchain-based federated learning principles, our proposed framework includes a robust aggregation mechanism designed to counteract malicious updates from compromised clients, enhancing the security of the global learn… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Journal ref: ACM SIGCOMM 2024 Sydney

  30. arXiv:2406.16441  [pdf, other

    cs.CL

    UniCoder: Scaling Code Large Language Model via Universal Code

    Authors: Tao Sun, Linzheng Chai, Jian Yang, Yuwei Yin, Hongcheng Guo, Jiaheng Liu, Bing Wang, Liqun Yang, Zhoujun Li

    Abstract: Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural lan… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 (Main)

  31. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  32. arXiv:2406.15303  [pdf, other

    cs.CV

    ADR: Attention Diversification Regularization for Mitigating Overfitting in Multiple Instance Learning based Whole Slide Image Classification

    Authors: Yunlong Zhang, Zhongyi Shui, Yunxuan Sun, Honglin Li, Jingxiong Li, Chenglu Zhu, Sunyi Zheng, Lin Yang

    Abstract: Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications. This paper reveals the correlation between MIL's performance and the entropy of attention values. Based on this observation, we propose Attention Diversity Regularization (ADR), a simple but effective technique aimed at p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.14976  [pdf, other

    eess.IV cs.CV

    CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of developing cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  34. arXiv:2406.14964  [pdf, other

    cs.CV

    VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

    Authors: Zixuan Chen, Ruijie Su, Jiahao Zhu, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  35. arXiv:2406.14825  [pdf, other

    cs.CL

    TemPrompt: Multi-Task Prompt Learning for Temporal Relation Extraction in RAG-based Crowdsourcing Systems

    Authors: Jing Yang, Yu Zhao, Linyao Yang, Xiao Wang, Long Chen, Fei-Yue Wang

    Abstract: Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks, so it holds promise in helping understand task requests initiated by requesters in crowdsourcing systems. However, existing methods still struggle with limited and unevenly distributed annotated data. Therefore, inspired by the abundant global knowledge stored withi… ▽ More

    Submitted 9 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  36. arXiv:2406.14136  [pdf, other

    cs.RO

    One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging

    Authors: Linhan Yang, Lei Yang, Haoran Sun, Zeqing Zhang, Haibin He, Fang Wan, Chaoyang Song, Jia Pan

    Abstract: Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we pr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  37. arXiv:2406.14043  [pdf, other

    cs.IR cs.CL

    Taxonomy-Guided Zero-Shot Recommendations with LLMs

    Authors: Yueqing Liang, Liangwei Yang, Chen Wang, Xiongxiao Xu, Philip S. Yu, Kai Shu

    Abstract: With the emergence of large language models (LLMs) and their ability to perform a variety of tasks, their application in recommender systems (RecSys) has shown promise. However, we are facing significant challenges when deploying LLMs into RecSys, such as limited prompt length, unstructured item information, and un-constrained generation of recommendations, leading to sub-optimal performance. To a… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  38. arXiv:2406.12386  [pdf, other

    cs.CL

    IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

    Authors: Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin

    Abstract: The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  39. arXiv:2406.12193  [pdf, other

    cs.LG

    Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

    Authors: Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li

    Abstract: Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers withi… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  40. arXiv:2406.11234  [pdf, other

    cs.CL cs.AI

    MiniConGTS: A Near Ultimate Minimalist Contrastive Grid Tagging Scheme for Aspect Sentiment Triplet Extraction

    Authors: Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) aims to co-extract the sentiment triplets in a given corpus. Existing approaches within the pretraining-finetuning paradigm tend to either meticulously craft complex tagging schemes and classification heads, or incorporate external semantic augmentation to enhance performance. In this study, we, for the first time, re-evaluate the redundancy in tagging sc… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.07342

  41. arXiv:2406.10907  [pdf, other

    cs.CV

    SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection

    Authors: Lin Liu, Ziying Song, Qiming Xia, Feiyang Jia, Caiyan Jia, Lei Yang, Hongyu Pan

    Abstract: LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficie… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.02702

  42. arXiv:2406.10722  [pdf, other

    cs.CV cs.AI cs.LG

    GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

    Authors: Bharat Singh, Viveka Kulharia, Luyu Yang, Avinash Ravichandran, Ambrish Tyagi, Ashish Shrivastava

    Abstract: Multimodal synthetic data generation is crucial in domains such as autonomous driving, robotics, augmented/virtual reality, and retail. We propose a novel approach, GenMM, for jointly editing RGB videos and LiDAR scans by inserting temporally and geometrically consistent 3D objects. Our method uses a reference image and 3D bounding boxes to seamlessly insert and blend new objects into target video… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  43. arXiv:2406.10290  [pdf, other

    cs.CL cs.AI cs.LG

    MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

    Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  44. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  45. arXiv:2406.09414  [pdf, other

    cs.CV

    Depth Anything V2

    Authors: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

    Abstract: This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of ou… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://depth-anything-v2.github.io

  46. arXiv:2406.09229  [pdf, other

    cs.CV

    MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

    Authors: Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu

    Abstract: Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruct… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 IEEE International Conference on Image Processing

  47. arXiv:2406.08029  [pdf, ps, other

    cs.CY

    Metaverse Identity: Core Principles and Critical Challenges

    Authors: Liang Yang, Yan Xu, Pan Hui

    Abstract: This paper explores the core principles that should guide the construction and governance of identity in the metaverse and identifies the critical challenges that need to be addressed. Drawing on multidisciplinary theories and perspectives, we define metaverse identity and propose two core principles for understanding its intrinsic characteristics and impacts: Equivalence and Alignment, and Fusion… ▽ More

    Submitted 13 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  48. arXiv:2406.07537  [pdf, other

    cs.CV

    Autoregressive Pretraining with Mamba in Vision

    Authors: Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

    Abstract: The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  49. arXiv:2406.07436  [pdf, other

    cs.PL

    McEval: Massively Multilingual Code Evaluation

    Authors: Linzheng Chai, Shukai Liu, Jian Yang, Yuwei Yin, Ke Jin, Jiaheng Liu, Tao Sun, Ge Zhang, Changyu Ren, Hongcheng Guo, Zekun Wang, Boyang Wang, Xianjie Wu, Bing Wang, Tongliang Li, Liqun Yang, Sufeng Duan, Zhoujun Li

    Abstract: Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited nu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 22 pages

  50. arXiv:2406.06839  [pdf, other

    cs.CL

    EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction

    Authors: Li Yang, Qifan Wang, Jianfeng Chi, Jiahao Liu, Jingang Wang, Fuli Feng, Zenglin Xu, Yi Fang, Lifu Huang, Dongfang Liu

    Abstract: Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, ne… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.