Skip to main content

Showing 1–50 of 943 results for author: Zhu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12358  [pdf, other

    cs.CV cs.CL

    ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

    Authors: Yufan Shen, Chuwei Luo, Zhaoqing Zhu, Yang Chen, Qi Zheng, Zhi Yu, Jiajun Bu, Cong Yao

    Abstract: Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.11389  [pdf, ps, other

    cs.NI eess.SP

    Spatial-spectral Cell-free Networks: A Large-scale Case Study

    Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Dongming Wang, Kai-Kit Wong

    Abstract: This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.11213  [pdf, other

    cs.CV

    OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

    Authors: Zijian Zhou, Zheng Zhu, Holger Caesar, Miaojing Shi

    Abstract: Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image. Previous methods focus on predicting predefined object and relation categories, hence limiting their applications in the open world scenarios. With the rapid development of large multimodal models (LMMs), significant progress has been made in open-set obje… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  5. arXiv:2407.10408  [pdf, other

    cs.IT eess.SP

    Latency Minimization for IRS-enhanced Wideband MEC Networks with Practical Reflection Model

    Authors: N. Li, W. Hao, X. Li, Z. Zhu, Z. Tang, S. Yang

    Abstract: Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performanc… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  6. arXiv:2407.09367  [pdf, other

    cs.CV

    Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

    Authors: Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma, Weijun Zhuang, Yaohui Ma, Yong Dai, Yaowei Wang

    Abstract: Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data bu… ▽ More

    Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of our paper and supplemental material to appear in ECCV 2024

  7. arXiv:2407.08068  [pdf, other

    cs.FL

    More on Maximally Permissive Similarity Control of Discrete Event Systems

    Authors: Yu Wang, Zhaohui Zhu, Rob van Glabbeek, Jinjin Zhang, Lixing Tan

    Abstract: Takai proposed a method for constructing a maximally permissive supervisor for the similarity control problem (IEEE Transactions on Automatic Control, 66(7):3197-3204, 2021). This paper points out flaws in his results by providing a counterexample. Inspired by Takai's construction, the notion of a (saturated) (G, R)-automaton is introduced and metatheorems concerning (maximally permissive) supervi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages

  8. GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks

    Authors: Xuan Wang, Hao Tang, Zhigang Zhu

    Abstract: Various contextual information has been employed by many approaches for visual detection tasks. However, most of the existing approaches only focus on specific context for specific tasks. In this paper, GMC, a general framework is proposed for multistage context learning and utilization, with various deep network architectures for various visual detection tasks. The GMC framework encompasses three… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  9. arXiv:2407.05550  [pdf, other

    cs.HC cs.AI

    MEEG and AT-DGNN: Advancing EEG Emotion Recognition with Music and Graph Learning

    Authors: Minghao Xiao, Zhengxi Zhu, Wenyu Wang, Meixia Qu

    Abstract: Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  10. arXiv:2407.04842  [pdf, other

    cs.CV cs.CL cs.LG

    MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

    Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 42 pages, 13 figures, 33 tables

  11. arXiv:2407.04561  [pdf, other

    cs.NI eess.SP

    Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities

    Authors: Mukaram Shahid, Kunal Das, Taimoor Ul Islam, Christ Somiah, Daji Qiao, Arsalan Ahmad, Jimming Song, Zhengyuan Zhu, Sarath Babu, Yong Guan, Tusher Chakraborty, Suraj Jog, Ranveer Chandra, Hongwei Zhang

    Abstract: Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  12. arXiv:2407.03779  [pdf, other

    cs.CL cs.LG

    Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning

    Authors: Lei Yu, Jingcheng Niu, Zining Zhu, Gerald Penn

    Abstract: In this paper, we introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP, a novel and effective algorithm based on differentiable masking for discovering circuits. Circuit discovery is the task of interpreting the computational mechanisms of language models (LMs) by dissecting their functions and capabilities into sparse subnetworks (circuits). We identi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  13. arXiv:2407.02067  [pdf, other

    cs.CL

    Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models

    Authors: Anjishnu Mukherjee, Ziwei Zhu, Antonios Anastasopoulos

    Abstract: In this work, we present a comprehensive three-phase study to examine (1) the effectiveness of large multimodal models (LMMs) in recognizing cultural contexts; (2) the accuracy of their representations of diverse cultures; and (3) their ability to adapt content across cultural boundaries. We first introduce Dalle Street, a large-scale dataset generated by DALL-E 3 and validated by humans, containi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: under review

  14. arXiv:2407.02066  [pdf, other

    cs.CL

    BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

    Authors: Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

    Abstract: Existing works examining Vision Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, imp… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Under Review

  15. arXiv:2407.02030  [pdf, other

    cs.CL

    Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

    Authors: Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

    Abstract: Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how int… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Under Review

  16. arXiv:2406.18544  [pdf, other

    cs.CV cs.GR

    GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors

    Authors: Zuo-Liang Zhu, Beibei Wang, Jian Yang

    Abstract: 3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. Inspired by previous works, the signed… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  17. arXiv:2406.18139  [pdf, other

    cs.CL cs.CV

    LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

    Authors: Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan

    Abstract: Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency. Unlike single-modality LLMs that manage only textual contexts, the KV cache of long-context MLLMs includes representations from multiple images with temp… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  18. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  19. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  20. arXiv:2406.17253  [pdf, other

    cs.CL

    How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?

    Authors: Huaizhi Ge, Frank Rudzicz, Zining Zhu

    Abstract: As large language models (LLMs) are widely deployed, targeted editing of their knowledge has become a critical challenge. Recently, advancements in model editing techniques, such as Rank-One Model Editing (ROME), have paved the way for updating LLMs with new knowledge. However, the efficacy of these methods varies across different types of knowledge. This study investigates the capability of knowl… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  21. arXiv:2406.17241  [pdf, other

    cs.CL

    What Do the Circuits Mean? A Knowledge Edit View

    Authors: Huaizhi Ge, Frank Rudzicz, Zining Zhu

    Abstract: In the field of language model interpretability, circuit discovery is gaining popularity. Despite this, the true meaning of these circuits remain largely unanswered. We introduce a novel method to learn their meanings as a holistic object through the lens of knowledge editing. We extract circuits in the GPT2-XL model using diverse text classification datasets, and use hierarchical relations datase… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  22. arXiv:2406.16982  [pdf

    cs.LG cs.AI

    Research on Disease Prediction Model Construction Based on Computer AI deep Learning Technology

    Authors: Yang Lin, Muqing Li, Ziyi Zhu, Yinqiu Feng, Lingxi Xiao, Zexi Chen

    Abstract: The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning alg… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  23. arXiv:2406.15638  [pdf, other

    cs.NI cs.LG

    Root Cause Analysis of Anomalies in 5G RAN Using Graph Neural Network and Transformer

    Authors: Antor Hasan, Conrado Boeira, Khaleda Papry, Yue Ju, Zhongwen Zhu, Israat Haque

    Abstract: The emergence of 5G technology marks a significant milestone in developing telecommunication networks, enabling exciting new applications such as augmented reality and self-driving vehicles. However, these improvements bring an increased management complexity and a special concern in dealing with failures, as the applications 5G intends to support heavily rely on high network performance and low l… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  24. arXiv:2406.14877  [pdf, other

    cs.CL

    Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video

    Authors: Zhengbang Yang, Haotian Xia, Jingxi Li, Zezhi Chen, Zhuangdi Zhu, Weining Shen

    Abstract: Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluate… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  25. arXiv:2406.14479  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

    Authors: Jiachen Jiang, Jinxin Zhou, Zhihui Zhu

    Abstract: Analyzing the similarity of internal representations within and across different models has been an important technique for understanding the behavior of deep neural networks. Most existing methods for analyzing the similarity between representations of high dimensions, such as those based on Canonical Correlation Analysis (CCA) and widely used Centered Kernel Alignment (CKA), rely on statistical… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  26. arXiv:2406.14095  [pdf, other

    cs.LG cs.AI

    Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

    Authors: Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi

    Abstract: Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inherent characteristics, are ill-suited to meet the dem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  27. arXiv:2406.13035  [pdf, other

    cs.CL

    D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

    Authors: Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang

    Abstract: Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discrimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.13027  [pdf

    cs.HC

    Investigating the Effect of Display Refresh Rate on First-Person Shooting Games

    Authors: Haoshen Qin, Zixian Zhu

    Abstract: For first-person shooting game players, display refresh rate is important for a smooth experience. Multiple studies have shown that a low display refresh rate will reduce gamers' experience and performance. However, the human eye's perception of refresh rate has an upper limit, which is usually less than what high-performance monitors, for which players pay much higher prices, provide. This study… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  29. arXiv:2406.12252  [pdf, other

    cs.CL

    Language and Multimodal Models in Sports: A Survey of Datasets and Applications

    Authors: Haotian Xia, Zhengbang Yang, Yun Zhao, Yuqing Wang, Jingxi Li, Rhys Tracy, Zhuangdi Zhu, Yuan-fang Wang, Hanjie Chen, Weining Shen

    Abstract: Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are fo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  30. FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction

    Authors: Muhao Xu, Zhenfeng Zhu, Youru Li, Shuai Zheng, Yawei Zhao, Kunlun He, Yao Zhao

    Abstract: Multimodal electronic health record (EHR) data can offer a holistic assessment of a patient's health status, supporting various predictive healthcare tasks. Recently, several studies have embraced the multitask learning approach in the healthcare domain, exploiting the inherent correlations among clinical tasks to predict multiple outcomes simultaneously. However, existing methods necessitate samp… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024 (Research Track)

  31. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, Jin Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  32. arXiv:2406.09688  [pdf, other

    cs.CL

    FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation

    Authors: Zijian Feng, Hanzhang Zhou, Zixiao Zhu, Kezhi Mao

    Abstract: Controllable text generation (CTG) seeks to craft texts adhering to specific attributes, traditionally employing learning-based techniques such as training, fine-tuning, or prefix-tuning with attribute-specific datasets. These approaches, while effective, demand extensive computational and data resources. In contrast, some proposed learning-free alternatives circumvent learning but often yield inf… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  33. arXiv:2406.09675  [pdf, other

    cs.LG cs.AI

    Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency

    Authors: Ningyi Liao, Haoyu Liu, Zulun Zhu, Siqiang Luo, Laks V. S. Lakshmanan

    Abstract: With the recent advancements in graph neural networks (GNNs), spectral GNNs have received increasing popularity by virtue of their specialty in capturing graph signals in the frequency domain, demonstrating promising capability in specific tasks. However, few systematic studies have been conducted on assessing their spectral characteristics. This emerging family of models also varies in terms of d… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  34. arXiv:2406.07928  [pdf, other

    cs.RO

    Undergraduate Robotics Education with General Instructors using a Student-Centered Personalized Learning Framework

    Authors: Rui Wu, David J Feil-Seifer, Ponkoj C Shill, Hossein Jamali, Sergiu Dascalu, Fred Harris, Laura Rosof, Bryan Hutchins, Marjorie Campo Ringler, Zhen Zhu

    Abstract: Recent advancements in robotics, including applications like self-driving cars, unmanned systems, and medical robots, have had a significant impact on the job market. On one hand, big robotics companies offer training programs based on the job requirements. However, these training programs may not be as beneficial as general robotics programs offered by universities or community colleges. On the o… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures, 1 table, 2024 ASEE Conference

  35. arXiv:2406.07698  [pdf, other

    cs.LG

    Label Smoothing Improves Machine Unlearning

    Authors: Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu

    Abstract: The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  36. arXiv:2406.07580  [pdf, other

    cs.CR cs.LG

    DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks

    Authors: Zhiyu Zhu, Jiayu Zhang, Xinyi Wang, Zhibo Jin, Huaming Chen

    Abstract: Despite the exceptional performance of deep neural networks (DNNs) across different domains, they are vulnerable to adversarial samples, in particular for tasks related to computer vision. Such vulnerability is further influenced by the digital container formats used in computers, where the discrete numerical values are commonly used for storing the pixel values. This paper examines how informatio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  37. arXiv:2406.07177  [pdf, other

    cs.LG

    TernaryLLM: Ternarized Large Language Model

    Authors: Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad Barsoum, Peisong Wang, Jian Cheng

    Abstract: Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming fr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  38. arXiv:2406.06978  [pdf, other

    cs.CV

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Authors: Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez

    Abstract: We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment… ▽ More

    Submitted 19 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

  39. arXiv:2406.06460  [pdf

    cs.RO cs.AI

    Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots

    Authors: Bahador Beigomi, Zheng H. Zhu

    Abstract: In this research, we introduce a deep reinforcement learning-based control approach to address the intricate challenge of the robotic pre-grasping phase under microgravity conditions. Leveraging reinforcement learning eliminates the necessity for manual feature design, therefore simplifying the problem and empowering the robot to learn pre-grasping policies through trial and error. Our methodology… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: This is a preprint for the work submitted to the ICRA 2024 conference

    Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)

  40. arXiv:2406.06002  [pdf, other

    cs.LG eess.SP math.OC

    Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

    Authors: Zhen Qin, Zhihui Zhu

    Abstract: Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (T… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.02592

  41. arXiv:2406.05699  [pdf, ps, other

    eess.AS cs.AI eess.SP

    An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

    Authors: Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audi… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH2024

  42. arXiv:2406.05035  [pdf, other

    cs.CL cs.AI

    Scenarios and Approaches for Situated Natural Language Explanations

    Authors: Pengshuo Qiu, Frank Rudzicz, Zining Zhu

    Abstract: Large language models (LLMs) can be used to generate natural language explanations (NLE) that are adapted to different users' situations. However, there is yet to be a quantitative evaluation of the extent of such adaptation. To bridge this gap, we collect a benchmarking dataset, Situation-Based Explanation. This dataset contains 100 explanandums. Each explanandum is paired with explanations targe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures

  43. arXiv:2406.02804  [pdf, other

    cs.AI cs.CL cs.LG

    $\texttt{ACCORD}$: Closing the Commonsense Measurability Gap

    Authors: François Roewer-Després, Jinyue Feng, Zining Zhu, Frank Rudzicz

    Abstract: We present $\texttt{ACCORD}$, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. $\texttt{ACCORD}$ introduces formal elements to commonsense reasoning to explicitly control and quantify reasoning complexity beyond the typical 1 or 2 hops. Uniquely, $\texttt{ACCORD}$ can a… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: For leaderboard and dataset download, see https://www.codabench.org/competitions/3160/ For source code, see https://github.com/francois-rd/accord/

    ACM Class: I.2.0; I.2.7

  44. arXiv:2406.02515  [pdf

    cs.LG

    Uncertainty of Joint Neural Contextual Bandit

    Authors: Hongbo Guo, Zheqing Zhu

    Abstract: Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a di… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  45. arXiv:2406.01909  [pdf, other

    cs.LG

    A Global Geometric Analysis of Maximal Coding Rate Reduction

    Authors: Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma

    Abstract: The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape h… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 43 pages, 9 figures. This work has been accepted for publication in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  46. arXiv:2406.01059  [pdf, other

    cs.CV

    VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model

    Authors: Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Zeke Xie, Zhong Ji, Jungong Han, Mingming Sun

    Abstract: In this paper, we focus on resolving the problem of image outpainting, which aims to extrapolate the surrounding parts given the center contents of an image. Although recent works have achieved promising performance, the lack of versatility and customization hinders their practical applications in broader scenarios. Therefore, this work presents a novel image outpainting framework that is capable… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages

  47. arXiv:2406.00016  [pdf

    cs.CL

    Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

    Authors: Lingxi Xiao, Muqing Li, Yinqiu Feng, Meiqi Wang, Ziyi Zhu, Zexi Chen

    Abstract: The research explores the utilization of a deep learning model employing an attention mechanism in medical text mining. It targets the challenge of analyzing unstructured text information within medical data. This research seeks to enhance the model's capability to identify essential medical information by incorporating deep learning and attention mechanisms. This paper reviews the basic principle… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.11704 by other authors

  48. arXiv:2405.20852  [pdf, other

    cs.CL

    Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

    Authors: Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou

    Abstract: Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inhere… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  49. arXiv:2405.20612  [pdf, other

    cs.CL cs.AI

    UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

    Authors: Hanzhang Zhou, Zijian Feng, Zixiao Zhu, Junlang Qian, Kezhi Mao

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in various tasks using the in-context learning (ICL) paradigm. However, their effectiveness is often compromised by inherent bias, leading to prompt brittleness, i.e., sensitivity to design settings such as example selection, order, and prompt formatting. Previous studies have addressed LLM bias through external adjustment of m… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  50. arXiv:2405.20607  [pdf, other

    cs.CV

    Textual Inversion and Self-supervised Refinement for Radiology Report Generation

    Authors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang

    Abstract: Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specific… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: This paper has been early accepted by MICCAI 2024!