Skip to main content

Showing 1–50 of 8,215 results for author: Zhang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13561  [pdf, other

    cs.CL

    Research on Tibetan Tourism Viewpoints information generation system based on LLM

    Authors: Jinhu Qi, Shuai Yan, Wentao Zhang, Yibo Zhang, Zirui Liu, Ke Wang

    Abstract: Tibet, ensconced within China's territorial expanse, is distinguished by its labyrinthine and heterogeneous topography, a testament to its profound historical heritage, and the cradle of a unique religious ethos. The very essence of these attributes, however, has impeded the advancement of Tibet's tourism service infrastructure, rendering existing smart tourism services inadequate for the region's… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: ICWOC 2024

  2. arXiv:2407.13417  [pdf, other

    cs.CV

    GDDS: A Single Domain Generalized Defect Detection Frame of Open World Scenario using Gather and Distribute Domain-shift Suppression Network

    Authors: Haiyong Chen, Yaxiu Zhang, Yan Zhang, Xin Zhang, Xingwei Yan

    Abstract: Efficient and intelligent surface defect detection of photovoltaic modules is crucial for improving the quality of photovoltaic modules and ensuring the reliable operation of large-scale infrastructure. However, the scenario characteristics of data distribution deviation make the construction of defect detection models for open world scenarios such as photovoltaic manufacturing and power plant ins… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 images

    ACM Class: I.4.9; I.5.1

  3. arXiv:2407.13349  [pdf, other

    cs.IR

    DCNv3: Towards Next Generation Deep Cross Network for CTR Prediction

    Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Hanwei Li, Lei Sang

    Abstract: Deep & Cross Network and its derivative models have become an important paradigm in click-through rate (CTR) prediction due to their effective balance between computational cost and performance. However, these models face four major limitations: (1) while most models claim to capture high-order feature interactions, they often do so implicitly and non-interpretably through deep neural networks (DN… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.13163  [pdf, other

    cs.IR cs.AI

    ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

    Authors: Yi Zhang, Ruihong Qiu, Jiajun Liu, Sen Wang

    Abstract: Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made p… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: CIKM 2024

  6. arXiv:2407.13126  [pdf, other

    cs.DC

    Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

    Authors: Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang

    Abstract: Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated model (if available) to serve overtime arriving inference requests. It is generally beneficial to co-locate the retraining and inference together to enabl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.13096  [pdf, other

    cs.PF

    DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information

    Authors: Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang

    Abstract: Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (DVFS) has emerged as a promising technique for conserving energy while maintaining the quality of service (QoS) of GPU applications. However, existing solutions using DVFS are hindered by inefficiency… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  8. arXiv:2407.12899  [pdf, other

    cs.CV cs.AI cs.MM

    DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

    Authors: Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

    Abstract: Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subject-consistent frames based solely on a story. To this end, we propose DreamStory, an automatic open-domain story visualization framework by leveragin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.12887  [pdf, other

    cs.RO

    Self-Adaptive Robust Motion Planning for High DoF Robot Manipulator using Deep MPC

    Authors: Ye Zhang, Kangtong Mo, Fangzhou Shen, Xuanzhen Xu, Xingyu Zhang, Jiayue Yu, Chang Yu

    Abstract: In contemporary control theory, self-adaptive methodologies are highly esteemed for their inherent flexibility and robustness in managing modeling uncertainties. Particularly, robust adaptive control stands out owing to its potent capability of leveraging robust optimization algorithms to approximate cost functions and relax the stringent constraints often associated with conventional self-adaptiv… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  10. arXiv:2407.12823  [pdf, other

    cs.CL cs.AI

    WTU-EVAL: A Whether-or-Not Tool Usage Evaluation Benchmark for Large Language Models

    Authors: Kangyun Ning, Yisong Su, Xueqiang Lv, Yuanzhe Zhang, Jian Liu, Kang Liu, Jinan Xu

    Abstract: Although Large Language Models (LLMs) excel in NLP tasks, they still need external tools to extend their ability. Current research on tool learning with LLMs often assumes mandatory tool use, which does not always align with real-world situations, where the necessity for tools is uncertain, and incorrect or unnecessary use of tools can damage the general abilities of LLMs. Therefore, we propose to… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2407.12821  [pdf, other

    cs.CL cs.AI cs.LG

    AutoFlow: Automated Workflow Generation for Large Language Model Agents

    Authors: Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang

    Abstract: Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages the ability of LLM as well as external tools for complex-task solving. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed workflows are usuall… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Open source code available at https://github.com/agiresearch/AutoFlow

  12. arXiv:2407.12772  [pdf, other

    cs.CL cs.CV

    LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

    Authors: Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu

    Abstract: The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) remain limited. In this work, we introduce LMMS-EVAL, a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 mod… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Code ad leaderboard are available at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench

  13. arXiv:2407.12764  [pdf, other

    cs.LG

    Jigsaw Game: Federated Clustering

    Authors: Jinxuan Xu, Hong-You Chen, Wei-Lun Chao, Yuqian Zhang

    Abstract: Federated learning has recently garnered significant attention, especially within the domain of supervised learning. However, despite the abundance of unlabeled data on end-users, unsupervised learning problems such as clustering in the federated setting remain underexplored. In this paper, we investigate the federated clustering problem, with a focus on federated k-means. We outline the challenge… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to TMLR

  14. arXiv:2407.12725  [pdf, other

    cs.CL

    Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?

    Authors: Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin

    Abstract: Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensiv… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 13 pages, 2 figures

  15. Multiple Access Integrated Adaptive Finite Blocklength for Ultra-Low Delay in 6G Wireless Networks

    Authors: Yixin Zhang, Wenchi Cheng, Wei Zhang

    Abstract: Facing the dramatic increase of real-time applications and time-sensitive services, large-scale ultra-low delay requirements are put forward for the sixth generation (6G) wireless networks. To support massive ultra-reliable and low-latency communications (mURLLC), in this paper we propose an adaptive finite blocklength framework to reduce the over-the-air delay for short packet transmissions with… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Journal ref: IEEE Transactions on Wireless Communications ( Volume: 23, Issue: 3, March 2024)

  16. arXiv:2407.12701  [pdf, other

    cs.CR

    Efficient and Flexible Differet-Radix Montgomery Modular Multiplication for Hardware Implementation

    Authors: Yuxuan Zhang, Hua Guo, Chen Chen, Yewei Guan, Xiyong Zhang, Zhenyu Guan

    Abstract: Montgomery modular multiplication is widely-used in public key cryptosystems (PKC) and affects the efficiency of upper systems directly. However, modulus is getting larger due to the increasing demand of security, which results in a heavy computing cost. High-performance implementation of Montgomery modular multiplication is urgently required to ensure the highly-efficient operations in PKC. Howev… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  17. arXiv:2407.12684  [pdf, other

    cs.CV

    4Dynamic: Text-to-4D Generation with Hybrid Priors

    Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

    Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  18. Adaptive Finite Blocklength for Low Access Delay in 6G Wireless Networks

    Authors: Yixin Zhang, Wenchi Cheng, Wei Zhang

    Abstract: As the number of real-time applications with ultra-low delay requirements quickly grows, massive ultra-reliable and low-latency communication (mURLLC) has been proposed to provide a wide range of delay-sensitive services for the sixth generation (6G) wireless networks. However, it is difficult to meet the stringent delay demand of massive connectivity with existing grant-based (GB) random access a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Journal ref: GLOBECOM 2022 - 2022 IEEE Global Communications Conference

  19. arXiv:2407.12581  [pdf, other

    cs.CR cs.AI cs.CV cs.CY

    Towards Understanding Unsafe Video Generation

    Authors: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang

    Abstract: Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 18 pages

  20. arXiv:2407.12307  [pdf, other

    cs.CV

    Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance

    Authors: Yufei Zhang, Jeffrey O. Kephart, Qiang Ji

    Abstract: Fully-supervised monocular 3D hand reconstruction is often difficult because capturing the requisite 3D data entails deploying specialized equipment in a controlled environment. We introduce a weakly-supervised method that avoids such requirements by leveraging fundamental principles well-established in the understanding of the human hand's unique structure and functionality. Specifically, we syst… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  21. Performance Analysis and Blocklength Minimization of Uplink RSMA for Short Packet Transmissions in URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: Rate splitting multiple access (RSMA) is one of the most promising techniques for ultra-reliable and low-latency communications (URLLC) with stringent requirements on delay and reliability of multiple access. To fully explore the delay performance enhancement brought by uplink RSMA to URLLC, in this paper, we evaluate the performance of two-user uplink RSMA and propose the corresponding blocklengt… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: GLOBECOM 2023 - 2023 IEEE Global Communications Conference

  22. Dumb RIS-Assisted Random Beamforming for Energy Efficiency Enhancement of Wireless Communications

    Authors: Yixin Zhang, Wenchi Cheng, Wei Zhang

    Abstract: Energy efficiency (EE) is one of the most important metrics for the beyond fifth generation (B5G) and the future sixth generation (6G) wireless networks. Reconfigurable intelligent surface (RIS) has been widely focused on EE enhancement for wireless networks because it is power-saving, programmable, and easy to be deployed. However, RIS is generally passive and thus difficult to obtain correspondi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures

    Journal ref: ICC 2022 - IEEE International Conference on Communications

  23. arXiv:2407.12237  [pdf, other

    cs.IT

    Delay Tradeoff and Adaptive Finite Blocklength Framework for URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: With various time-sensitive tasks to be served, ultra-reliable and low-latency communications (URLLC) has become one of the most important scenarios for the fifth generation (5G) wireless communications. The end-to-end delay from the sub-millisecond-level to the second-level is first put forward for a wide range of delay-sensitive tasks in the future sixth generation (6G) communication networks, w… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  24. arXiv:2407.11946  [pdf, other

    cs.CV

    Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

    Authors: Ping Wang, Yulun Zhang, Lishun Wang, Xin Yuan

    Abstract: Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstr… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  25. arXiv:2407.11906  [pdf, other

    cs.CV cs.RO

    SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

    Authors: Hao Ding, Tuxun Lu, Yuqian Zhang, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Mathias Unberath

    Abstract: Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's pe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  26. Trajectory and Power Optimization for Multi-UAV Enabled Emergency Wireless Communications Networks

    Authors: Yixin Zhang, Wenchi Cheng

    Abstract: Recently, unmanned aerial vehicle (UAV) has attracted much attention due to its flexible deployment and controllable mobility. As the general communication network cannot meet the emergency requirements, in this paper we study the multi-UAV enabled wireless emergency communication system. Our goal is to maximize the capacity with jointly optimizing trajectory and allocating power. To tackle this n… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures

    Journal ref: 2019 IEEE International Conference on Communications Workshops (ICC Workshops)

  27. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  28. arXiv:2407.11741  [pdf, other

    cs.RO

    Puppeteer Your Robot: Augmented Reality Leader-Follower Teleoperation

    Authors: Jonne van Haastregt, Michael C. Welle, Yuchong Zhang, Danica Kragic

    Abstract: High-quality demonstrations are necessary when learning complex and challenging manipulation tasks. In this work, we introduce an approach to puppeteer a robot by controlling a virtual robot in an augmented reality setting. Our system allows for retaining the advantages of being intuitive from a physical leader-follower side while avoiding the unnecessary use of expensive physical setup. In additi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  29. arXiv:2407.11644  [pdf, other

    cs.CV cs.RO

    Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

    Authors: Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang

    Abstract: When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  30. arXiv:2407.11554  [pdf, ps, other

    cs.IT math.CO

    Optimal Constant-Weight and Mixed-Weight Conflict-Avoiding Codes

    Authors: Yuan-Hsun Lo, Tsai-Lien Wong, Kangkang Xu, Yijin Zhang

    Abstract: A conflict-avoiding code (CAC) is a deterministic transmission scheme for asynchronous multiple access without feedback. When the number of simultaneously active users is less than or equal to $w$, a CAC of length $L$ with weight $w$ can provide a hard guarantee that each active user has at least one successful transmission within every consecutive $L$ slots. In this paper, we generalize some prev… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 32 pages

    MSC Class: 94B25

  31. arXiv:2407.11454  [pdf, other

    quant-ph cs.CR cs.DC

    Cloud-based Semi-Quantum Money

    Authors: Yichi Zhang, Siyuan Jin, Yuhan Huang, Bei Zeng, Qiming Shao

    Abstract: In the 1970s, Wiesner introduced the concept of quantum money, where quantum states generated according to specific rules function as currency. These states circulate among users with quantum resources through quantum channels or face-to-face interactions. Quantum mechanics grants quantum money physical-level unforgeability but also makes minting, storing, and circulating it significantly challeng… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  32. arXiv:2407.11431  [pdf

    cs.CV

    MRIo3DS-Net: A Mutually Reinforcing Images to 3D Surface RNN-like framework for model-adaptation indoor 3D reconstruction

    Authors: Chang Li, Jiao Guo, Yufei Zhao, Yongjun Zhang

    Abstract: This paper is the first to propose an end-to-end framework of mutually reinforcing images to 3D surface recurrent neural network-like for model-adaptation indoor 3D reconstruction,where multi-view dense matching and point cloud surface optimization are mutually reinforced by a RNN-like structure rather than being treated as a separate issue.The characteristics are as follows:In the multi-view dens… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  33. arXiv:2407.11401  [pdf, other

    cs.CV cs.IR

    EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis

    Authors: Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang

    Abstract: Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitatio… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  34. arXiv:2407.11282  [pdf, other

    cs.CL

    Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

    Authors: Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang

    Abstract: Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  35. arXiv:2407.11107  [pdf, other

    cs.RO cs.LG

    Latent Linear Quadratic Regulator for Robotic Control Tasks

    Authors: Yuan Zhang, Shaohui Yang, Toshiyuki Ohtsuka, Colin Jones, Joschka Boedecker

    Abstract: Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a $\textbf{la}$tent $\textbf{l}$inear $\textbf{q}$uadratic $\textbf{r}$egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cos… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at RSS 2024 workshop on Koopman Operators in Robotics

  36. arXiv:2407.11044  [pdf, other

    cs.LG cs.AI

    Generalizing soft actor-critic algorithms to discrete action spaces

    Authors: Le Zhang, Yong Gu, Xin Zhao, Yanshuo Zhang, Shu Zhao, Yifei Jin, Xinxin Wu

    Abstract: ATARI is a suite of video games used by reinforcement learning (RL) researchers to test the effectiveness of the learning algorithm. Receiving only the raw pixels and the game score, the agent learns to develop sophisticated strategies, even to the comparable level of a professional human games tester. Ideally, we also want an agent requiring very few interactions with the environment. Previous co… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2024. GitHub Repo https://github.com/lezhang-thu/bigger-better-faster-SAC

  37. arXiv:2407.10980  [pdf, ps, other

    cs.NI

    Learning-based Big Data Sharing Incentive in Mobile AIGC Networks

    Authors: Jinbo Wen, Yang Zhang, Yulin Chen, Weifeng Zhong, Xumin Huang, Lei Liu, Dusit Niyato

    Abstract: Rapid advancements in wireless communication have led to a dramatic upsurge in data volumes within mobile edge networks. These substantial data volumes offer opportunities for training Artificial Intelligence-Generated Content (AIGC) models to possess strong prediction and decision-making capabilities. AIGC represents an innovative approach that utilizes sophisticated generative AI algorithms to a… ▽ More

    Submitted 10 June, 2024; originally announced July 2024.

  38. arXiv:2407.10979  [pdf, ps, other

    cs.NI

    Diffusion Model-based Incentive Mechanism with Prospect Theory for Edge AIGC Services in 6G IoT

    Authors: Jinbo Wen, Jiangtian Nie, Yue Zhong, Changyan Yi, Xiaohuan Li, Jiangming Jin, Yang Zhang, Dusit Niyato

    Abstract: The fusion of Internet of Things (IoT) with Sixth-Generation (6G) technology has significant potential to revolutionize the IoT landscape. Utilizing the ultra-reliable and low-latency communication capabilities of 6G, 6G-IoT networks can transmit high-quality and diverse data to enhance edge learning. Artificial Intelligence-Generated Content (AIGC) harnesses advanced AI algorithms to automaticall… ▽ More

    Submitted 10 June, 2024; originally announced July 2024.

  39. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  40. arXiv:2407.10648  [pdf, other

    cs.RO

    Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

    Authors: Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, Weiyao Lin

    Abstract: Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  41. arXiv:2407.10636  [pdf, other

    cs.CV

    Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

    Authors: Lin Zhu, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang

    Abstract: Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  42. arXiv:2407.10402  [pdf, ps, other

    cs.SE

    A Framework for QoS of Integration Testing in Satellite Edge Clouds

    Authors: Guogen Zeng, Juan Luo, Yufeng Zhang, Ying Qiao, Shuyang Teng

    Abstract: The diversification of satellite communication services imposes varied requirements on network service quality, making quality of service (QoS) testing for microservices running on satellites more complex. Existing testing tools have limitations, potentially offering only single-functionality testing, thus failing to meet the requirements of QoS testing for edge cloud services in mobile satellite… ▽ More

    Submitted 16 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

  43. arXiv:2407.10366  [pdf, other

    cs.CV cs.AI cs.LG

    Accessing Vision Foundation Models at ImageNet-level Costs

    Authors: Yitian Zhang, Xu Ma, Yue Bai, Huan Wang, Yun Fu

    Abstract: Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  44. arXiv:2407.10359  [pdf, other

    cs.NE cs.AI

    Evolved Developmental Artificial Neural Networks for Multitasking with Advanced Activity Dependence

    Authors: Yintong Zhang, Jason A. Yoder

    Abstract: Recently, Cartesian Genetic Programming has been used to evolve developmental programs to guide the formation of artificial neural networks (ANNs). This approach has demonstrated success in enabling ANNs to perform multiple tasks while avoiding catastrophic forgetting. One unique aspect of this approach is the use of separate developmental programs evolved to regulate the development of separate s… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures

    ACM Class: I.2.6; I.2.11

  45. arXiv:2407.10328  [pdf, other

    cs.SD cs.AI eess.AS

    The Interpretation Gap in Text-to-Music Generation Models

    Authors: Yongyi Zang, Yixiao Zhang

    Abstract: Large-scale text-to-music generation models have significantly enhanced music creation capabilities, offering unprecedented creative freedom. However, their ability to collaborate effectively with human musicians remains limited. In this paper, we propose a framework to describe the musical interaction process, which includes expression, interpretation, and execution of controls. Following this fr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Under review

  46. arXiv:2407.10285  [pdf, other

    cs.CV

    Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

    Authors: Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia, Xiaodong Cun, Zhixun Su, Ying Shan

    Abstract: In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://yangqy1110.github.io/NC-SDEdit/, Code Repo: https://github.com/yangqy1110/NC-SDEdit/

    ACM Class: I.2; I.4.3

  47. arXiv:2407.10233  [pdf, other

    cs.CV cs.AI

    Visual Prompt Selection for In-Context Learning Segmentation

    Authors: Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang

    Abstract: As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual promp… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accept by ECCV2024

  48. arXiv:2407.10135  [pdf, other

    cs.CV

    FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection

    Authors: Zheng Jiang, Jinqing Zhang, Yanan Zhang, Qingjie Liu, Zhenghui Hu, Baohui Wang, Yunhong Wang

    Abstract: Although multi-view 3D object detection based on the Bird's-Eye-View (BEV) paradigm has garnered widespread attention as an economical and deployment-friendly perception solution for autonomous driving, there is still a performance gap compared to LiDAR-based methods. In recent years, several cross-modal distillation methods have been proposed to transfer beneficial information from teacher models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  49. arXiv:2407.10125  [pdf, other

    cs.CV

    When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

    Authors: Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

    Abstract: Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike p… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV'2024

  50. arXiv:2407.10084  [pdf, other

    cs.CV

    Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

    Authors: Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, Sibei Yang

    Abstract: Unsupervised 3D instance segmentation aims to segment objects from a 3D point cloud without any annotations. Existing methods face the challenge of either too loose or too tight clustering, leading to under-segmentation or over-segmentation. To address this issue, we propose Part2Object, hierarchical clustering with object guidance. Part2Object employs multi-layer clustering from points to object… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accept to ECCV2024