Zum Hauptinhalt springen

Showing 1–50 of 5,393 results for author: Chang

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17397  [pdf, other

    cs.IT eess.SP

    End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: major revision in IEEE JSAC

  2. arXiv:2408.16633  [pdf

    cs.RO cs.AI

    Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning

    Authors: Keqin Li, Jin Wang, Xubo Wu, Xirui Peng, Runmian Chang, Xiaoyu Deng, Yiwen Kang, Yue Yang, Fanghao Ni, Bo Hong

    Abstract: With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  3. arXiv:2408.16629  [pdf, other

    cs.CY cs.AI cs.SI

    LLMs generate structurally realistic social networks but overestimate political homophily

    Authors: Serina Chang, Alicja Chaszczewicz, Emma Wang, Maya Josifovska, Emma Pierson, Jure Leskovec

    Abstract: Generating social networks is essential for many applications, such as epidemic modeling and social simulations. Prior approaches either involve deep learning models, which require many observed networks for training, or stylized models, which are limited in their realism and flexibility. In contrast, LLMs offer the potential for zero-shot and flexible network generation. However, two key question… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  4. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  5. arXiv:2408.15777  [pdf, other

    cs.CV

    A Survey on Facial Expression Recognition of Static and Dynamic Emotions

    Authors: Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

    Abstract: Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  6. arXiv:2408.15609  [pdf, other

    cs.NI cs.LG

    Statistical QoS Provision in Business-Centric Networks

    Authors: Chang Wu, Yuang Chen, Hancheng Lu

    Abstract: More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages

  7. arXiv:2408.15425  [pdf, other

    cs.RO cs.AI cs.SE

    Fast and Modular Autonomy Software for Autonomous Racing Vehicles

    Authors: Andrew Saba, Aderotimi Adetunji, Adam Johnson, Aadi Kothari, Matthew Sivaprakasam, Joshua Spisak, Prem Bharatia, Arjun Chauhan, Brendan Duff Jr., Noah Gasparro, Charles King, Ryan Larkin, Brian Mao, Micah Nye, Anjali Parashar, Joseph Attias, Aurimas Balciunas, Austin Brown, Chris Chang, Ming Gao, Cindy Heredia, Andrew Keats, Jose Lavariega, William Muckelroy III, Andre Slavescu , et al. (5 additional authors not shown)

    Abstract: Autonomous motorsports aim to replicate the human racecar driver with software and sensors. As in traditional motorsports, Autonomous Racing Vehicles (ARVs) are pushed to their handling limits in multi-agent scenarios at extremely high ($\geq 150mph$) speeds. This Operational Design Domain (ODD) presents unique challenges across the autonomy stack. The Indy Autonomous Challenge (IAC) is an interna… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Published in Journal of Field Robotics

    Journal ref: Field Robotics Volume 4 (2024) 1-45

  8. arXiv:2408.15235  [pdf, other

    cs.CV

    Learning-based Multi-View Stereo: A Survey

    Authors: Fangjinhua Wang, Qingtian Zhu, Di Chang, Quankai Gao, Junlin Han, Tong Zhang, Richard Hartley, Marc Pollefeys

    Abstract: 3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environ… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  9. arXiv:2408.14575  [pdf, other

    cs.AI

    EVINCE: Optimizing Adversarial LLM Dialogues via Conditional Statistics and Information Theory

    Authors: Edward Y. Chang

    Abstract: This paper introduces EVINCE (Entropy and Variation IN Conditional Exchanges), a dialogue framework advancing Artificial General Intelligence (AGI) by enhancing versatility, adaptivity, and reasoning in large language models (LLMs). Leveraging adversarial debate and a novel dual entropy theory, EVINCE improves prediction accuracy, robustness, and stability in LLMs by integrating statistical modeli… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures, four tables

    ACM Class: I.2.7

  10. arXiv:2408.14453  [pdf

    cs.LG eess.IV eess.SP

    Reconstructing physiological signals from fMRI across the adult lifespan

    Authors: Shiyu Wang, Ziyuan Xu, Yamin Li, Mara Mather, Roza G. Bayrak, Catie Chang

    Abstract: Interactions between the brain and body are of fundamental importance for human behavior and health. Functional magnetic resonance imaging (fMRI) captures whole-brain activity noninvasively, and modeling how fMRI signals interact with physiological dynamics of the body can provide new insight into brain function and offer potential biomarkers of disease. However, physiological recordings are not a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  11. arXiv:2408.14267  [pdf, other

    cs.LG cs.CV

    1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

    Authors: Chang Gao, Jianfei Chen, Kang Zhao, Jiaqi Wang, Liping Jing

    Abstract: Fully quantized training (FQT) accelerates the training of deep neural networks by quantizing the activations, weights, and gradients into lower precision. To explore the ultimate limit of FQT (the lowest achievable precision), we make a first attempt to 1-bit FQT. We provide a theoretical analysis of FQT based on Adam and SGD, revealing that the gradient variance influences the convergence of FQT… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  12. arXiv:2408.14262  [pdf

    cs.CL cs.SD eess.AS

    Self-supervised Speech Representations Still Struggle with African American Vernacular English

    Authors: Kalvin Chang, Yi-Hui Chou, Jiatong Shi, Hsuan-Ming Chen, Nicole Holliday, Odette Scharenborg, David R. Mortensen

    Abstract: Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties. We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American Eng… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  13. arXiv:2408.14009  [pdf

    cs.RO cs.AI

    Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning

    Authors: Wen-Han Hsieh, Jen-Yuan Chang

    Abstract: In actor-critic-based reinforcement learning algorithms such as Twin Delayed Deep Deterministic policy gradient (TD3), insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms. To address this issue, we propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for enco… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 4 pages, 2 figures, IEEE-ICKII-2024

  14. arXiv:2408.13980  [pdf, other

    cs.CV

    FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

    Authors: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu

    Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emer… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  15. arXiv:2408.13906  [pdf, other

    cs.CV cs.AI cs.LG

    ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

    Authors: Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang

    Abstract: Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By c… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: First two authors contributed equally. Source code is available at https://github.com/yejipark-m/ConVis

  16. arXiv:2408.13772  [pdf, ps, other

    cs.OS

    FRAP: A Flexible Resource Accessing Protocol for Multiprocessor Real-Time Systems

    Authors: Shuai Zhao, Hanzhi Xu, Nan Chen, Ruoxian Su, Wanli Chang

    Abstract: Fully-partitioned fixed-priority scheduling (FP-FPS) multiprocessor systems are widely found in real-time applications, where spin-based protocols are often deployed to manage the mutually exclusive access of shared resources. Unfortunately, existing approaches either enforce rigid spin priority rules for resource accessing or carry significant pessimism in the schedulability analysis, imposing su… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  17. arXiv:2408.13464  [pdf, other

    cs.AI cs.CL cs.LG

    Uncovering Biases with Reflective Large Language Models

    Authors: Edward Y. Chang

    Abstract: Biases inherent in human endeavors pose significant challenges for machine learning, particularly in supervised learning that relies on potentially biased "ground truth" data. This reliance, coupled with models' tendency to generalize based on statistical maximal likelihood, can propagate and amplify biases, exacerbating societal issues. To address this, our study proposes a reflective methodology… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures, 8 tables

    ACM Class: I.2.7

  18. arXiv:2408.13423  [pdf, other

    cs.CV

    Training-free Long Video Generation with Chain of Diffusion Model Experts

    Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

    Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More

    Submitted 27 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  19. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  20. arXiv:2408.12991  [pdf, other

    cs.CE q-fin.TR

    Controllable Financial Market Generation with Diffusion Guided Meta Agent

    Authors: Yu-Hao Huang, Chang Xu, Yang Liu, Weiqing Liu, Wu-Jun Li, Jiang Bian

    Abstract: Order flow modeling stands as the most fundamental and essential financial task, as orders embody the minimal unit within a financial market. However, current approaches often result in unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their application scenario. In this paper, we advocate incorporating controllability into the market ge… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  21. arXiv:2408.12957  [pdf, other

    cs.CV

    Image Segmentation in Foundation Model Era: A Survey

    Authors: Tianfei Zhou, Fei Zhang, Boyu Chang, Wenguan Wang, Ye Yuan, Ender Konukoglu, Daniel Cremers

    Abstract: Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicate… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: A comprehensive survey of image segmentation in foundation model era (work in progress)

  22. arXiv:2408.12307  [pdf

    cs.LG

    Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning

    Authors: Yen-Ru Lai, Fu-Chieh Chang, Pei-Yuan Wu

    Abstract: Offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires large amounts of data. The challenge arises when labeled datasets are expensive, especially when rewards have to be provided by human labelers for large datasets. In contrast, unlabelled data tends to be less expensive. This situation highlights the importance of finding effective ways to use unlabelled da… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  23. arXiv:2408.11791  [pdf, other

    cs.LG

    Critique-out-Loud Reward Models

    Authors: Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D. Chang, Prithviraj Ammanabrolu

    Abstract: Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the capabilities of reward models as they must reason implicitly about the quality of a response, i.e., preference modeling must be performed in a single for… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  24. arXiv:2408.11451  [pdf, other

    cs.AI

    Bidirectional Gated Mamba for Sequential Recommendation

    Authors: Ziwei Liu, Qidong Liu, Yejing Wang, Wanyu Wang, Pengyue Jia, Maolin Wang, Zitao Liu, Yi Chang, Xiangyu Zhao

    Abstract: In various domains, Sequential Recommender Systems (SRS) have become essential due to their superior capability to discern intricate user preferences. Typically, SRS utilize transformer-based architectures to forecast the subsequent item within a sequence. Nevertheless, the quadratic computational complexity inherent in these models often leads to inefficiencies, hindering the achievement of real-… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  25. Audio Description Customization

    Authors: Rosiana Natalie, Ruei-Che Chang, Smitha Sheshadri, Anhong Guo, Kotaro Hara

    Abstract: Blind and low-vision (BLV) people use audio descriptions (ADs) to access videos. However, current ADs are unalterable by end users, thus are incapable of supporting BLV individuals' potentially diverse needs and preferences. This research investigates if customizing AD could improve how BLV individuals consume videos. We conducted an interview study (Study 1) with fifteen BLV participants, which r… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ASSETS 2024

  26. arXiv:2408.11276  [pdf, other

    math.PR cs.LG math.DG stat.ML

    Chernoff Bounds for Tensor Expanders on Riemannian Manifolds Using Graph Laplacian Approximation

    Authors: Shih-Yu Chang

    Abstract: This paper addresses the advancement of probability tail bound analysis, a crucial statistical tool for assessing the probability of large deviations of random variables from their expected values. Traditional tail bounds, such as Markov's, Chebyshev's, and Chernoff bounds, have proven valuable across numerous scientific and engineering fields. However, as data complexity grows, there is a pressin… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  27. arXiv:2408.11264  [pdf, other

    cs.LG cs.CR

    Correlation Analysis of Adversarial Attack in Time Series Classification

    Authors: Zhengyang Li, Wenhao Liang, Chang Dong, Weitong Chen, Dong Huang

    Abstract: This study investigates the vulnerability of time series classification models to adversarial attacks, with a focus on how these models process local versus global information under such conditions. By leveraging the Normalized Auto Correlation Function (NACF), an exploration into the inclination of neural networks is conducted. It is demonstrated that regularization techniques, particularly those… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 15 pages, 7 figures

    ACM Class: I.2.0

  28. arXiv:2408.11194  [pdf, other

    cs.CV

    Compress Guidance in Conditional Diffusion Sampling

    Authors: Anh-Dung Dinh, Daochang Liu, Chang Xu

    Abstract: Enforcing guidance throughout the entire sampling process often proves counterproductive due to the model-fitting issue., where samples are generated to match the classifier's parameters rather than generalizing the expected condition. This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue. By distributing th… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, Computer Vision and Machine Learning

    ACM Class: I.4

  29. arXiv:2408.10908  [pdf, other

    cs.RO cs.HC

    Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

    Authors: Yiqun Duan, Zhuoli Zhuang, Jinzhao Zhou, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap,… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  30. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  31. arXiv:2408.10764  [pdf, other

    cs.CL

    Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

    Authors: Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou

    Abstract: Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages

  32. arXiv:2408.10668  [pdf, other

    cs.CR cs.AI

    Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation

    Authors: Haoyu Wang, Bingzhe Wu, Yatao Bian, Yongzhe Chang, Xueqian Wang, Peilin Zhao

    Abstract: Large Language Models (LLMs) are implicit troublemakers. While they provide valuable insights and assist in problem-solving, they can also potentially serve as a resource for malicious activities. Implementing safety alignment could mitigate the risk of LLMs generating harmful responses. We argue that: even when an LLM appears to successfully block harmful queries, there may still be hidden vulner… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  33. arXiv:2408.10504  [pdf, other

    cs.AI

    QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

    Authors: Yilun Kong, Hangyu Mao, Qi Zhao, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

    Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLM… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  34. arXiv:2408.10441  [pdf, other

    cs.CL

    Goldfish: Monolingual Language Models for 350 Languages

    Authors: Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

    Abstract: For many low-resource languages, the only available language models are large multilingual models trained on many languages simultaneously. However, using FLORES perplexity as a metric, we find that these models perform worse than bigrams for many languages (e.g. 24% of languages in XGLM 4.5B; 43% in BLOOM 7.1B). To facilitate research that focuses on low-resource languages, we pre-train and relea… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  35. arXiv:2408.10099  [pdf, other

    cs.GR

    Neural Representation of Shape-Dependent Laplacian Eigenfunctions

    Authors: Yue Chang, Otman Benchekroun, Maurizio M. Chiaramonte, Peter Yichen Chen, Eitan Grinspun

    Abstract: The eigenfunctions of the Laplace operator are essential in mathematical physics, engineering, and geometry processing. Typically, these are computed by discretizing the domain and performing eigendecomposition, tying the results to a specific mesh. However, this method is unsuitable for continuously-parameterized shapes. We propose a novel representation for eigenfunctions in continuously-param… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  36. arXiv:2408.09896  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Instruction-Based Molecular Graph Generation with Unified Text-Graph Diffusion Model

    Authors: Yuran Xiang, Haiteng Zhao, Chang Ma, Zhi-Hong Deng

    Abstract: Recent advancements in computational chemistry have increasingly focused on synthesizing molecules based on textual instructions. Integrating graph generation with these instructions is complex, leading most current methods to use molecular sequences with pre-trained large language models. In response to this challenge, we propose a novel framework, named… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  37. arXiv:2408.09882  [pdf, other

    cs.LG

    GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits

    Authors: Gongpu Chen, Soung Chang Liew, Deniz Gunduz

    Abstract: The restless multi-armed bandit (RMAB) framework is a popular model with applications across a wide variety of fields. However, its solution is hindered by the exponentially growing state space (with respect to the number of arms) and the combinatorial action space, making traditional reinforcement learning methods infeasible for large-scale instances. In this paper, we propose GINO-Q, a three-tim… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 9 pages, 11 figures

  38. TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

    Authors: Chang Liu, Jingtao Ding, Yiwen Song, Yong Li

    Abstract: Predicting the resilience of complex networks, which represents the ability to retain fundamental functionality amidst external perturbations or internal failures, plays a critical role in understanding and improving real-world complex systems. Traditional theoretical approaches grounded in nonlinear dynamical systems rely on prior knowledge of network dynamics. On the other hand, data-driven appr… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  39. arXiv:2408.09744  [pdf, other

    cs.CV

    RealCustom++: Representing Images as Real-Word for Real-Time Customization

    Authors: Zhendong Mao, Mengqi Huang, Fei Ding, Mingcong Liu, Qian He, Xiaojun Chang, Yongdong Zhang

    Abstract: Text-to-image customization, which takes given texts and images depicting given subjects as inputs, aims to synthesize new images that align with both text semantics and subject appearance. This task provides precise control over details that text alone cannot capture and is fundamental for various real-world applications, garnering significant interest from academia and industry. Existing works f… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 23 pages

  40. arXiv:2408.09591  [pdf, other

    cs.DS

    Pre-assignment problem for unique minimum vertex cover on bounded clique-width graphs

    Authors: Shinwoo An, Yeonsu Chang, Kyungjin Cho, O-joung Kwon, Myounghwan Lee, Eunjin Oh, Hyeonjun Shin

    Abstract: Horiyama et al. (AAAI 2024) considered the problem of generating instances with a unique minimum vertex cover under certain conditions. The Pre-assignment for Uniquification of Minimum Vertex Cover problem (shortly PAU-VC) is the problem, for given a graph $G$, to find a minimum set $S$ of vertices in $G$ such that there is a unique minimum vertex cover of $G$ containing $S$. We show that PAU-VC i… ▽ More

    Submitted 22 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 19 pages, 3 figures

  41. Social VR for Professional Networking: A Spatial Perspective

    Authors: Victoria Chang, Ge Gao, Huaishu Peng

    Abstract: One essential function of professional events, such as industry trade shows and academic conferences, is to foster and extend a person's connections to others within the community of their interest. In this paper, we delve into the emerging practice transitioning these events from physical venues to social VR as a new medium. Specifically, we ask: how does the spatial design in social VR affect th… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  42. arXiv:2408.09101  [pdf, other

    cs.DC

    Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

    Authors: Wu Yebo, Li Li, Tian Chunlin, Chang Tao, Lin Chi, Wang Cong, Xu Cheng-Zhong

    Abstract: In this paper, we propose SmartFreeze, a framework that effectively reduces the memory footprint by conducting the training in a progressive manner. Instead of updating the full model in each training round, SmartFreeze divides the shared model into blocks consisting of a specified number of layers. It first trains the front block with a well-designed output module, safely freezes it after converg… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Published as a conference paper at IWQoS 2024

  43. arXiv:2408.08894  [pdf

    cs.IR cs.AI cs.CL

    Enhancing Exploratory Learning through Exploratory Search with the Emergence of Large Language Models

    Authors: Yiming Luo, Patrick Cheong-Iao, Shanton Chang

    Abstract: In the information era, how learners find, evaluate, and effectively use information has become a challenging issue, especially with the added complexity of large language models (LLMs) that have further confused learners in their information retrieval and search activities. This study attempts to unpack this complexity by combining exploratory search strategies with the theories of exploratory le… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures

  44. arXiv:2408.08815  [pdf, other

    cs.LG

    An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

    Authors: Qiang Huang, Chuizheng Meng, Defu Cao, Biwei Huang, Yi Chang, Yan Liu

    Abstract: Counterfactual estimation from observations represents a critical endeavor in numerous application fields, such as healthcare and finance, with the primary challenge being the mitigation of treatment bias. The balancing strategy aimed at reducing covariate disparities between different treatment groups serves as a universal solution. However, when it comes to the time series data, the effectivenes… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ICML 2024 Carema Ready Version. 20 Pages, 12 Figures, 10 Tables

  45. arXiv:2408.08790  [pdf, other

    eess.IV cs.AI cs.CV

    A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks

    Authors: Boa Jang, Youngbin Ahn, Eun Kyung Choe, Chang Ki Yoon, Hyuk Jin Choi, Young-Gon Kim

    Abstract: Artificial intelligence applied to retinal images offers significant potential for recognizing signs and symptoms of retinal conditions and expediting the diagnosis of eye diseases and systemic disorders. However, developing generalized artificial intelligence models for medical data often requires a large number of labeled images representing various disease signs, and most models are typically t… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages, 4 figures

  46. arXiv:2408.08591  [pdf, other

    cs.CV

    Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

    Authors: Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo

    Abstract: Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask propos… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: OpenSUN 3D: 2nd Workshop on Open-Vocabulary 3D Scene Understanding (CVPR 2024)

  47. arXiv:2408.08570  [pdf, other

    cs.CV

    EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation

    Authors: Jun Zhou, Chunsheng Liu, Faliang Chang, Wenqian Wang, Penghui Hao, Yiming Huang, Zhiqiang Yang

    Abstract: Associating driver attention with driving scene across two fields of views (FOVs) is a hard cross-domain perception problem, which requires comprehensive consideration of cross-view mapping, dynamic driving scene analysis, and driver status tracking. Previous methods typically focus on a single view or map attention to the scene via estimated gaze, failing to exploit the implicit connection betwee… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13pages, 9 figures,

  48. arXiv:2408.08535  [pdf, other

    cs.CL

    CommunityKG-RAG: Leveraging Community Structures in Knowledge Graphs for Advanced Retrieval-Augmented Generation in Fact-Checking

    Authors: Rong-Ching Chang, Jiawei Zhang

    Abstract: Despite advancements in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems, their effectiveness is often hindered by a lack of integration with entity relationships and community structures, limiting their ability to provide contextually rich and accurate information retrieval for fact-checking. We introduce CommunityKG-RAG (Community Knowledge Graph-Retrieval Augmented… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  49. arXiv:2408.08502  [pdf, other

    cs.CV

    Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

    Authors: Hefei Mei, Minjing Dong, Chang Xu

    Abstract: Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-b… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  50. arXiv:2408.08500  [pdf, other

    cs.CV

    CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving

    Authors: Shihan Peng, Hanyu Zhou, Hao Dong, Zhiwei Shi, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan

    Abstract: Conventional frame camera is the mainstream sensor of the autonomous driving scene perception, while it is limited in adverse conditions, such as low light. Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion, which relies heavily on the pixel-level spatial alignment between various modalities. Typically, existing multimodal datasets mainly pla… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication