Skip to main content

Showing 1–50 of 1,601 results for author: Wang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13126  [pdf, other

    cs.DC

    Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

    Authors: Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang

    Abstract: Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated model (if available) to serve overtime arriving inference requests. It is generally beneficial to co-locate the retraining and inference together to enabl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.13038  [pdf, other

    cs.CV cs.LG

    Universal Facial Encoding of Codec Avatars from VR Headsets

    Authors: Shaojie Bai, Te-Li Wang, Chenghui Li, Akshay Venkatesh, Tomas Simon, Chen Cao, Gabriel Schwartz, Ryan Wrench, Jason Saragih, Yaser Sheikh, Shih-En Wei

    Abstract: Faithful real-time facial animation is essential for avatar-mediated telepresence in Virtual Reality (VR). To emulate authentic communication, avatar animation needs to be efficient and accurate: able to capture both extreme and subtle expressions within a few milliseconds to sustain the rhythm of natural conversations. The oblique and incomplete views of the face, variability in the donning of he… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024 (ACM Transactions on Graphics (TOG))

    Journal ref: ACM Trans. Graph. 43, 4, Article 93 (July 2024), 22 pages.

  3. arXiv:2407.12880  [pdf, other

    cs.LG cs.AI cs.CL

    Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

    Authors: Ye Jiang, Taihang Wang, Xiaoman Xu, Yimin Wang, Xingyi Song, Diana Maynard

    Abstract: The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large nu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.12794  [pdf, other

    cs.DB cs.LG

    Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond

    Authors: George-Octavian Bărbulescu, Taiyi Wang, Zak Singh, Eiko Yoneki

    Abstract: Query rewrite systems perform graph substitutions using rewrite rules to generate optimal SQL query plans. Rewriting logical and physical relational query plans is proven to be an NP-hard sequential decision-making problem with a search space exponential in the number of rewrite rules. In this paper, we address the query rewrite problem by interleaving Equality Saturation and Graph Reinforcement L… ▽ More

    Submitted 19 June, 2024; originally announced July 2024.

  5. arXiv:2407.12581  [pdf, other

    cs.CR cs.AI cs.CV cs.CY

    Towards Understanding Unsafe Video Generation

    Authors: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang

    Abstract: Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 18 pages

  6. arXiv:2407.12319  [pdf, other

    cs.CV

    Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model

    Authors: Tao Wang, Wei Wen, Jingzhi Zhai, Kang Xu, Haoming Luo

    Abstract: Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Po… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Shuchen Shi, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description

  8. arXiv:2407.11422  [pdf, other

    cs.CV

    Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

    Authors: Jinrui Zhang, Teng Wang, Haigang Zhang, Ping Lu, Feng Zheng

    Abstract: Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. However, they remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. While various mitigation strategies have been proposed, they often neglect a key contributor to hallucinations: lack of fine-grained reasoning supervision during training.… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: To appear at ECCV2024

  9. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  10. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  11. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  12. arXiv:2407.10749  [pdf, other

    cs.CV

    SEED: A Simple and Effective 3D DETR in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai

    Abstract: Recently, detection transformers (DETRs) have gradually taken a dominant position in 2D detection thanks to their elegant framework. However, DETR-based detectors for 3D point clouds are still difficult to achieve satisfactory performance. We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution o… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  13. Pattern Guided UV Recovery for Realistic Video Garment Texturing

    Authors: Youyi Zhan, Tuanfeng Y. Wang, Tianjia Shao, Kun Zhou

    Abstract: The fast growth of E-Commerce creates a global market worth USD 821 billion for online fashion shopping. What unique about fashion presentation is that, the same design can usually be offered with different cloths textures. However, only real video capturing or manual per-frame editing can be used for virtual showcase on the same design with different textures, both of which are heavily labor inte… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to IEEE Transactions on Visualization and Computer Graphics

  14. arXiv:2407.09016  [pdf, other

    cs.RO

    OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

    Authors: Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui Liu

    Abstract: Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration beco… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  15. arXiv:2407.08442  [pdf, other

    cs.LG cs.AI

    How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation

    Authors: Linglong Qian, Tao Wang, Jun Wang, Hugh Logan Ellis, Robin Mitra, Richard Dobson, Zina Ibrahim

    Abstract: We introduce a novel classification framework for time-series imputation using deep learning, with a particular focus on clinical data. By identifying conceptual gaps in the literature and existing reviews, we devise a taxonomy grounded on the inductive bias of neural imputation frameworks, resulting in a classification of existing deep imputation strategies based on their suitability for specific… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  16. arXiv:2407.08334  [pdf, other

    cs.IR

    ADMM Based Semi-Structured Pattern Pruning Framework For Transformer

    Authors: TianChen Wang

    Abstract: NLP(natural language processsing) has achieved great success through the transformer model.However, the model has hundreds of millions or billions parameters,which is huge burden for its deployment on personal computer or small scale of server.To deal with it, we either make the model's weight matrix relatively sparser, or compress attention layer. Pattern pruning ,one of the most important prunin… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures

  17. arXiv:2407.07966  [pdf, other

    cs.CR cs.AI

    A Comprehensive Survey on the Security of Smart Grid: Challenges, Mitigations, and Future Research Opportunities

    Authors: Arastoo Zibaeirad, Farnoosh Koleini, Shengping Bi, Tao Hou, Tao Wang

    Abstract: In this study, we conduct a comprehensive review of smart grid security, exploring system architectures, attack methodologies, defense strategies, and future research opportunities. We provide an in-depth analysis of various attack vectors, focusing on new attack surfaces introduced by advanced components in smart grids. The review particularly includes an extensive analysis of coordinated attacks… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  18. arXiv:2407.06861  [pdf, other

    cs.CV

    Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization

    Authors: Lei Cheng, Teng Wang, Lingquan Meng, Changyin Sun

    Abstract: Cross-view geo-localization confronts significant challenges due to large perspective changes, especially when the ground-view query image has a limited field of view with unknown orientation. To bridge the cross-view domain gap, we for the first time explore to learn a BEV representation directly from the ground query image. However, the unknown orientation between ground and aerial images combin… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  19. arXiv:2407.06833  [pdf, other

    q-bio.QM cs.CV eess.IV

    Training-free CryoET Tomogram Segmentation

    Authors: Yizhou Zhao, Hengwei Bian, Michael Mu, Mostofa R. Uddin, Zhenyang Li, Xiang Li, Tianyang Wang, Min Xu

    Abstract: Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution will be published in MICCAI 2024

  20. arXiv:2407.06730  [pdf, other

    cs.CV

    LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

    Authors: Teng Wang, Lingquan Meng, Lei Cheng, Changyin Sun

    Abstract: Visual place recognition (VPR) remains challenging due to significant viewpoint changes and appearance variations. Mainstream works tackle these challenges by developing various feature aggregation methods to transform deep features into robust and compact global representations. Unfortunately, satisfactory results cannot be achieved under challenging conditions. We start from a new perspective an… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  21. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  22. arXiv:2407.06247  [pdf, other

    cs.CV

    Context Propagation from Proposals for Semantic Video Object Segmentation

    Authors: Tinghuai Wang

    Abstract: In this paper, we propose a novel approach to learning semantic contextual relationships in videos for semantic object segmentation. Our algorithm derives the semantic contexts from video object proposals which encode the key evolution of objects and the relationship among objects over the spatio-temporal domain. This semantic contexts are propagated across the video to estimate the pairwise conte… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2407.05916

  23. arXiv:2407.06187  [pdf, other

    cs.CV cs.GR

    JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

    Authors: Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming.… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR 24

  24. Neural Garment Dynamics via Manifold-Aware Transformers

    Authors: Peizhuo Li, Tuanfeng Y. Wang, Timur Levent Kesdogan, Duygu Ceylan, Olga Sorkine-Hornung

    Abstract: Data driven and learning based solutions for modeling dynamic garments have significantly advanced, especially in the context of digital humans. However, existing approaches often focus on modeling garments with respect to a fixed parametric human body model and are limited to garment geometries that were seen during training. In this work, we take a different approach and model the dynamics of a… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: EUROGRAPHICS 2024. Project page: https://peizhuoli.github.io/manifold-aware-transformers/ Video: https://www.youtube.com/watch?v=v6FCTHmjyqI

  25. arXiv:2407.05924  [pdf, other

    cs.CV

    Graph-Boosted Attentive Network for Semantic Body Parsing

    Authors: Tinghuai Wang, Huiling Wang

    Abstract: Human body parsing remains a challenging problem in natural scenes due to multi-instance and inter-part semantic confusions as well as occlusions. This paper proposes a novel approach to decomposing multiple human bodies into semantic part regions in unconstrained environments. Specifically we propose a convolutional neural network (CNN) architecture which comprises of novel semantic and contour a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  26. arXiv:2407.05916  [pdf, other

    cs.CV

    Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation

    Authors: Tinghuai Wang, Huiling Wang

    Abstract: We propose a novel approach for modeling semantic contextual relationships in videos. This graph-based model enables the learning and propagation of higher-level spatial-temporal contexts to facilitate the semantic labeling of local regions. We introduce an exemplar-based nonparametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarit… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  27. arXiv:2407.05913  [pdf, other

    cs.CV

    Submodular video object proposal selection for semantic object segmentation

    Authors: Tinghuai Wang

    Abstract: Learning a data-driven spatio-temporal semantic representation of the objects is the key to coherent and consistent labelling in video. This paper proposes to achieve semantic video object segmentation by learning a data-driven representation which captures the synergy of multiple instances from continuous frames. To prune the noisy detections, we exploit the rich information among multiple instan… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:1606.02280

  28. arXiv:2407.05437  [pdf, other

    cs.AI

    Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

    Authors: Tianyu Wang, Nianjun Zhou, Zhixiong Chen

    Abstract: Large language models (LLMs) and prompt engineering hold significant potential for advancing computer programming education through personalized instruction. This paper explores this potential by investigating three critical research questions: the systematic categorization of prompt engineering strategies tailored to diverse educational needs, the empowerment of LLMs to solve complex problems bey… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 18 pages, 9 figures

    ACM Class: K.3.2; I.2.7

  29. arXiv:2407.05421  [pdf, other

    eess.AS cs.SD

    ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

    Authors: Ruibo Fu, Xin Qi, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Zhiyong Wang, Yi Lu, Xiaopeng Wang, Shuchen Shi, Yukun Liu, Xuefei Liu, Shuai Zhang

    Abstract: Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: The audio demo is available at https://7xin.github.io/ASRRL/

  30. arXiv:2407.05276  [pdf, other

    cs.DC

    BFLN: A Blockchain-based Federated Learning Model for Non-IID Data

    Authors: Yang Li, Chunhe Xia, Dongchi Huang, Xiaojian Li, Tianbo Wang

    Abstract: As the application of federated learning becomes increasingly widespread, the issue of imbalanced training data distribution has emerged as a significant challenge. Federated learning utilizes local data stored on different training clients for model training, rather than centralizing data on a server, thereby greatly enhancing the privacy and security of training data. However, the distribution o… ▽ More

    Submitted 10 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  31. arXiv:2407.04359  [pdf, other

    cs.AI cs.NE cs.SE

    Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing

    Authors: Tong Wang, Taotao Gu, Huan Deng, Hu Li, Xiaohui Kuang, Gang Zhao

    Abstract: As autonomous driving systems (ADS) advance towards higher levels of autonomy, orchestrating their safety verification becomes increasingly intricate. This paper unveils ScenarioFuzz, a pioneering scenario-based fuzz testing methodology. Designed like a choreographer who understands the past performances, it uncovers vulnerabilities in ADS without the crutch of predefined scenarios. Leveraging map… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: This paper was accepted by 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)

    MSC Class: 68Txx (Primary) ACM Class: D.2.4; I.2.9; I.6.7

  32. arXiv:2407.02639  [pdf, other

    cs.CV

    Holistically-Nested Structure-Aware Graph Neural Network for Road Extraction

    Authors: Tinghuai Wang, Guangming Wang, Kuan Eeik Tan

    Abstract: Convolutional neural networks (CNN) have made significant advances in detecting roads from satellite images. However, existing CNN approaches are generally repurposed semantic segmentation architectures and suffer from the poor delineation of long and curved regions. Lack of overall road topology and structure information further deteriorates their performance on challenging remote sensing images.… ▽ More

    Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  33. arXiv:2407.02616  [pdf

    eess.IV cs.CV

    Deep Learning Based Apparent Diffusion Coefficient Map Generation from Multi-parametric MR Images for Patients with Diffuse Gliomas

    Authors: Zach Eidex, Mojtaba Safari, Jacob Wynne, Richard L. J. Qiu, Tonghe Wang, David Viar Hernandez, Hui-Kuo Shu, Hui Mao, Xiaofeng Yang

    Abstract: Purpose: Apparent diffusion coefficient (ADC) maps derived from diffusion weighted (DWI) MRI provides functional measurements about the water molecules in tissues. However, DWI is time consuming and very susceptible to image artifacts, leading to inaccurate ADC measurements. This study aims to develop a deep learning framework to synthesize ADC maps from multi-parametric MR images. Methods: We pro… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.15044

  34. arXiv:2407.01925  [pdf, other

    cs.CV

    Looking From the Future: Multi-order Iterations Can Enhance Adversarial Attack Transferability

    Authors: Zijian Ying, Qianmu Li, Tao Wang, Zhichao Lian, Shunmei Meng, Xuyun Zhang

    Abstract: Various methods try to enhance adversarial transferability by improving the generalization from different perspectives. In this paper, we rethink the optimization process and propose a novel sequence optimization concept, which is named Looking From the Future (LFF). LFF makes use of the original optimization process to refine the very first local optimization choice. Adapting the LFF concept to t… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  35. arXiv:2407.01733  [pdf, other

    cs.RO

    AquaMILR: Mechanical intelligence simplifies control of undulatory robots in cluttered fluid environments

    Authors: Tianyu Wang, Nishanth Mankame, Matthew Fernandez, Velin Kojouharov, Daniel I. Goldman

    Abstract: While undulatory swimming of elongate limbless robots has been extensively studied in open hydrodynamic environments, less research has been focused on limbless locomotion in complex, cluttered aquatic environments. Motivated by the concept of mechanical intelligence, where controls for obstacle navigation can be offloaded to passive body mechanics in terrestrial limbless locomotion, we hypothesiz… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  36. arXiv:2407.01525  [pdf, other

    cs.CV cs.AI cs.CL

    ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities

    Authors: Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu

    Abstract: Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions from implicit instructions. We propose a new task called 3D reasoning grounding and introduce a new benchmark ScanReason which provides over 10K question-answer-location pairs from five reasoning types that require th… ▽ More

    Submitted 17 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. A comprehensive and hierarchical 3D reasoning grounding benchmark in the era of foundation models. Project page: https://zcmax.github.io/projects/ScanReason

  37. arXiv:2407.01094  [pdf, other

    cs.CV

    Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

    Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

    Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  38. arXiv:2406.20053  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

    Authors: Danny Halawi, Alexander Wei, Eric Wallace, Tony T. Wang, Nika Haghtalab, Jacob Steinhardt

    Abstract: Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate the challenge of defending finetuning interfaces, we introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection. Our method constructs a malicious d… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 22 pages

  39. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  40. arXiv:2406.18558  [pdf, other

    cs.CV eess.IV

    BAISeg: Boundary Assisted Weakly Supervised Instance Segmentation

    Authors: Tengbo Wang, Yu Bai

    Abstract: How to extract instance-level masks without instance-level supervision is the main challenge of weakly supervised instance segmentation (WSIS). Popular WSIS methods estimate a displacement field (DF) via learning inter-pixel relations and perform clustering to identify instances. However, the resulting instance centroids are inherently unstable and vary significantly across different clustering al… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  41. arXiv:2406.18532  [pdf, other

    cs.CL cs.AI cs.LG

    Symbolic Learning Enables Self-Evolving Agents

    Authors: Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang

    Abstract: The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code available at https://github.com/aiwaves-cn/agents

  42. arXiv:2406.18050  [pdf, other

    cs.CV

    A Multi-Stage Goal-Driven Network for Pedestrian Trajectory Prediction

    Authors: Xiuen Wu, Tao Wang, Yuanzheng Cai, Lingyu Liang, George Papageorgiou

    Abstract: Pedestrian trajectory prediction plays a pivotal role in ensuring the safety and efficiency of various applications, including autonomous vehicles and traffic management systems. This paper proposes a novel method for pedestrian trajectory prediction, called multi-stage goal-driven network (MGNet). Diverging from prior approaches relying on stepwise recursive prediction and the singular forecastin… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Paper accepted by 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL 2024)

  43. arXiv:2406.17697  [pdf, other

    cs.LG cs.AI cs.CV

    HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

    Authors: Xi Xiao, Wentao Wang, Jiacheng Xie, Lijing Zhu, Gaofei Chen, Zhengji Li, Tianyang Wang, Min Xu

    Abstract: Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DT… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  44. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

    Authors: Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

    Abstract: As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approach… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Services Computing (TSC)

    Journal ref: IEEE Transactions on Services Computing ( Volume: 17, Issue: 3, May-June 2024)

  45. arXiv:2406.16707  [pdf, other

    cs.LG cs.AI

    Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

    Authors: Vivienne Huiling Wang, Tinghuai Wang, Wenyan Yang, Joni-Kristian Kämäräinen, Joni Pajarinen

    Abstract: In goal-conditioned hierarchical reinforcement learning (HRL), a high-level policy specifies a subgoal for the low-level policy to reach. Effective HRL hinges on a suitable subgoal represen tation function, abstracting state space into latent subgoal space and inducing varied low-level behaviors. Existing methods adopt a subgoal representation that provides a deterministic mapping from state space… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  46. arXiv:2406.16698  [pdf, other

    cs.LG cs.CY

    Learning Interpretable Fair Representations

    Authors: Tianhao Wang, Zana Buçinca, Zilin Ma

    Abstract: Numerous approaches have been recently proposed for learning fair representations that mitigate unfair outcomes in prediction tasks. A key motivation for these methods is that the representations can be used by third parties with unknown objectives. However, because current fair representations are generally not interpretable, the third party cannot use these fair representations for exploration,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  47. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  48. arXiv:2406.14500  [pdf, other

    cs.CL

    Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

    Authors: Xingmeng Zhao, Tongnian Wang, Anthony Rios

    Abstract: Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This approach normalizes key observations and simplifies complex information using non-expert communication techniques inspired by doctor-patient interactions. Combined wi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  49. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  50. arXiv:2406.13763  [pdf, other

    cs.CV cs.AI

    Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models

    Authors: Zhawnen Chen, Tianchun Wang, Yizhou Wang, Michal Kosinski, Xiang Zhang, Yun Fu, Sheng Li

    Abstract: Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.