Skip to main content

Showing 1–50 of 174 results for author: Gu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12005  [pdf, other

    cs.MM cs.CV

    VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

    Authors: Xiaoxuan Zhu, Zhouhong Gu, Sihang Jiang, Zhixu Li, Hongwei Feng, Yanghua Xiao

    Abstract: Online courses have significantly lowered the barrier to accessing education, yet the varying content quality of these videos poses challenges. In this work, we focus on the task of automatically evaluating the quality of video course content. We have constructed a dataset with a substantial collection of video courses and teaching materials. We propose three evaluation principles and design a new… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

  2. arXiv:2407.05909  [pdf, other

    cs.CV

    Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

    Authors: Chenxu Wang, Chunyan Xu, Ziqi Gu, Zhen Cui

    Abstract: While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when s… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.04845  [pdf, other

    cs.NI

    Poster: Flexible Scheduling of Network and Computing Resources for Distributed AI Tasks

    Authors: Ruikun Wang, Jiawei Zhang, Qiaolun Zhang, Bojun Zhang, Zhiqun Gu, Aryanaz Attarpour, Yuefeng Ji, Massimo Tornatore

    Abstract: Many emerging Artificial Intelligence (AI) applications require on-demand provisioning of large-scale computing, which can only be enabled by leveraging distributed computing services interconnected through networking. To address such increasing demand for networking to serve AI tasks, we investigate new scheduling strategies to improve communication efficiency and test them on a programmable test… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  4. arXiv:2407.03942  [pdf, other

    cs.AI cs.CL cs.HC

    Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data

    Authors: Zihui Gu, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Cheng-Zhong Xu, Ju Fan

    Abstract: Instruction-following is particularly crucial for large language models (LLMs) to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suff… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Journal ref: AAAI 2024

  5. arXiv:2407.02730  [pdf, other

    cs.CV cs.AI

    MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context

    Authors: Zishan Gu, Changchang Yin, Fenglin Liu, Ping Zhang

    Abstract: Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data, which inspires a large amount of studies for LVLMs fine-tuning and training. Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets. In this study, we introduce a new benchmar… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2406.14250  [pdf, other

    cs.CV cs.HC

    E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

    Authors: Ke Wang, Tianyu Xia, Zhangxuan Gu, Yi Zhao, Shuheng Shen, Changhua Meng, Weiqiang Wang, Ke Xu

    Abstract: Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, Under review

  7. arXiv:2406.13726  [pdf, other

    math.OC cs.LG econ.GN

    Global Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models

    Authors: Zhouzhou Gu, Mathieu Laurière, Sebastian Merkel, Jonathan Payne

    Abstract: We propose and compare new global solution algorithms for continuous time heterogeneous agent economies with aggregate shocks. First, we approximate the agent distribution so that equilibrium in the economy can be characterized by a high, but finite, dimensional non-linear partial differential equation. We consider different approximations: discretizing the number of agents, discretizing the agent… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.12641  [pdf, other

    cs.CL

    DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

    Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

    Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.10621  [pdf, other

    cs.CL cs.AI

    StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding

    Authors: Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao

    Abstract: Given the substantial volumes of structured data held by many companies, enabling Large Language Models (LLMs) to directly understand structured text in non-structured forms could significantly enhance their capabilities across various business scenarios. To this end, we propose evaluation data generation method for assessing LLM's ability in understanding the structure-rich text, which generates… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  11. arXiv:2405.19707  [pdf, other

    cs.CV

    DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

    Authors: Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, Huaxiong Li

    Abstract: Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors capable of distinguishing between fake AI-generated videos and mitigating the potential harm caused by fake information. However, the lack of large-scale… ▽ More

    Submitted 16 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  12. arXiv:2405.15216  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

    Authors: Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

    Abstract: Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make. Error correction models are designed to fix ASR errors, however, they showed little improvement over traditional LMs mainly due to the lack of supervised training data. In this paper, we present Denoising LM (DLM), which is a… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: under review

  13. arXiv:2405.12247  [pdf, other

    cs.CV

    Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Hao Shen, Zhao Zhang

    Abstract: In real-world applications of human pose estimation, low-resolution input images are frequently encountered when the performance of the image acquisition equipment is limited or the shooting distance is too far. However, existing state-of-the-art models for human pose estimation perform poorly on low-resolution images. One key reason is the presence of downsampling layers in these models, e.g., st… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, conference

  14. arXiv:2405.11640  [pdf, other

    cs.AI cs.CL cs.CV

    Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning

    Authors: Zishan Gu, Fenglin Liu, Changchang Yin, Ping Zhang

    Abstract: The adoption of large language models (LLMs) in healthcare has attracted significant research interest. However, their performance in healthcare remains under-investigated and potentially limited, due to i) they lack rich domain-specific knowledge and medical reasoning skills; and ii) most state-of-the-art LLMs are unimodal, text-only models that cannot directly process multimodal inputs. To this… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  15. arXiv:2405.11448  [pdf, other

    cs.CV

    Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Henghui Ding, Hao Shen, Zhao Zhang, De-Shuang Huang

    Abstract: In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowled… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  16. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  17. arXiv:2405.10316  [pdf, other

    cs.CV cs.GR

    Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

    Authors: Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

    Abstract: Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual IC… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://analogist2d.github.io

  18. arXiv:2405.08447  [pdf, other

    cs.HC

    AI-Resilient Interfaces

    Authors: Elena L. Glassman, Ziwei Gu, Jonathan K. Kummerfeld

    Abstract: AI is powerful, but it can make choices that result in objective errors, contextually inappropriate outputs, and disliked options. We need AI-resilient interfaces that help people be resilient to the AI choices that are not right, or not right for them. To support this goal, interfaces need to help users notice and have the context to appropriately judge those AI choices. Existing human-AI interac… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  19. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  20. arXiv:2405.01882  [pdf, other

    cs.RO cs.AI eess.SP

    Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

    Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

    Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  21. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  22. arXiv:2404.15254  [pdf, other

    cs.CV

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    Authors: Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

    Abstract: This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributio… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures

  23. arXiv:2404.13671  [pdf, other

    cs.CV cs.LG

    FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

    Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang

    Abstract: Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  24. arXiv:2404.09872  [pdf, other

    cs.CV

    Conditional Prototype Rectification Prompt Learning

    Authors: Haoxing Chen, Yaohui Li, Zizheng Huang, Yan Hong, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

    Abstract: Pre-trained large-scale vision-language models (VLMs) have acquired profound understanding of general visual concepts. Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs. Despite significant progress, current leading ETL methods… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  25. arXiv:2403.15993  [pdf, other

    cs.RO

    Robust-Locomotion-by-Logic: Perturbation-Resilient Bipedal Locomotion via Signal Temporal Logic Guided Model Predictive Control

    Authors: Zhaoyuan Gu, Yuntian Zhao, Yipu Chen, Rongming Guo, Jennifer K. Leestma, Gregory S. Sawicki, Ye Zhao

    Abstract: This study introduces a robust planning framework that utilizes a model predictive control (MPC) approach, enhanced by incorporating signal temporal logic (STL) specifications. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion, specifically designed to handle both translational and orientational perturbations. Existing recovery strategies often stru… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  26. arXiv:2403.13433  [pdf, other

    cs.AI cs.CL cs.CY

    AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoran Guo, Lin Zhang, Yin Cai, Hao Shen, Jiangjie Chen, Zheyu Ye, Yifei Dai, Yan Gao, Yao Hu, Hongwei Feng, Yanghua Xiao

    Abstract: Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of langu… ▽ More

    Submitted 4 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  27. arXiv:2403.12580  [pdf, other

    cs.CV

    Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

    Authors: Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jianning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, Lizhuang Ma

    Abstract: Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However, the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand, most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec, and the differences of methods cannot be well… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: It is accepted by CVPR2024

  28. arXiv:2403.07825  [pdf, other

    cs.CL

    The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing

    Authors: Jianchen Wang, Zhouhong Gu, Xiaoxuan Zhu, Lin Zhang, Haoning Ye, Zhuozhi Xiong, Hongwei Feng, Yanghua Xiao

    Abstract: Large Language Models have revolutionized numerous tasks with their remarkable efficacy. However, editing these models, crucial for rectifying outdated or erroneous information, often leads to a complex issue known as the ripple effect in the hidden space. While difficult to detect, this effect can significantly impede the efficacy of model editing tasks and deteriorate model performance. This pap… ▽ More

    Submitted 2 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  29. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  30. arXiv:2402.19270  [pdf, other

    cs.CV

    Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

    Authors: Rui Gong, Weide Liu, Zaiwang Gu, Xulei Yang, Jun Cheng

    Abstract: Geometric knowledge has been shown to be beneficial for the stereo matching task. However, prior attempts to integrate geometric insights into stereo matching algorithms have largely focused on geometric knowledge from single images while crucial cross-view factors such as occlusion and matching uniqueness have been overlooked. To address this gap, we propose a novel Intra-view and Cross-view Geom… ▽ More

    Submitted 6 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR2024

  31. arXiv:2402.18986  [pdf, other

    cs.CR

    Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs

    Authors: Zhengyao Gu, Diego Troy Lopez, Lilas Alrahis, Ozgur Sinanoglu

    Abstract: Graph neural network-based network intrusion detection systems have recently demonstrated state-of-the-art performance on benchmark datasets. Nevertheless, these methods suffer from a reliance on target encoding for data pre-processing, limiting widespread adoption due to the associated need for annotated labels--a cost-prohibitive requirement. In this work, we propose a solution involving in-cont… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Will appear in the 2024 International Symposium on Quality Electronic Design (ISQED'24)

  32. arXiv:2402.13776  [pdf, other

    eess.IV cs.CV cs.LG

    Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

    Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

    Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  33. arXiv:2402.13714  [pdf, other

    q-bio.QM cs.AI cs.LG

    An Evaluation of Large Language Models in Bioinformatics Research

    Authors: Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun

    Abstract: Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Under review

  34. arXiv:2402.08238  [pdf, other

    cs.IT cs.NI eess.SP

    Opportunistic Scheduling Using Statistical Information of Wireless Channels

    Authors: Zhouyou Gu, Wibowo Hardjawana, Branka Vucetic

    Abstract: This paper considers opportunistic scheduler (OS) design using statistical channel state information~(CSI). We apply max-weight schedulers (MWSs) to maximize a utility function of users' average data rates. MWSs schedule the user with the highest weighted instantaneous data rate every time slot. Existing methods require hundreds of time slots to adjust the MWS's weights according to the instantane… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: This work has been accepted in the IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

  35. arXiv:2402.06783  [pdf, other

    cs.RO cs.LG

    Learn to Teach: Improve Sample Efficiency in Teacher-student Learning for Sim-to-Real Transfer

    Authors: Feiyang Wu, Zhaoyuan Gu, Ye Zhao, Anqi Wu

    Abstract: Simulation-to-reality (sim-to-real) transfer is a fundamental problem for robot learning. Domain Randomization, which adds randomization during training, is a powerful technique that effectively addresses the sim-to-real gap. However, the noise in observations makes learning significantly harder. Recently, studies have shown that employing a teacher-student learning paradigm can accelerate trainin… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  36. arXiv:2402.00879  [pdf, other

    cs.NI cs.LG eess.SP

    Graph Representation Learning for Contention and Interference Management in Wireless Networks

    Authors: Zhouyou Gu, Branka Vucetic, Kishore Chikkam, Pasquale Aliberti, Wibowo Hardjawana

    Abstract: Restricted access window (RAW) in Wi-Fi 802.11ah networks manages contention and interference by grouping users and allocating periodic time slots for each group's transmissions. We will find the optimal user grouping decisions in RAW to maximize the network's worst-case user throughput. We review existing user grouping approaches and highlight their performance limitations in the above problem. W… ▽ More

    Submitted 15 January, 2024; originally announced February 2024.

    Comments: This work has been accepted in the IEEE/ACM Transactions on Networking. Copyright may be transferred without notice, after which this version may no longer be accessible

  37. arXiv:2401.13726  [pdf, other

    cs.HC cs.LG

    Supporting Sensemaking of Large Language Model Outputs at Scale

    Authors: Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman

    Abstract: Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability. In this paper, we explore how to present many LLM responses at once. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual d… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 34 pages, 13 figures, conditionally accepted to ACM Conference on Human Factors in Computing Systems 2024

  38. arXiv:2401.10873  [pdf, other

    cs.HC

    An AI-Resilient Text Rendering Technique for Reading and Skimming Documents

    Authors: Ziwei Gu, Ian Arawjo, Kenneth Li, Jonathan K. Kummerfeld, Elena L. Glassman

    Abstract: Readers find text difficult to consume for many reasons. Summarization can address some of these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating information, which can be hard for a reader to notice. One approach to addressing this problem is to instead modify how the original text is rendered to make important information more salient. We introduce Grammar-… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Conditionally accepted to CHI 2024

  39. arXiv:2401.06166  [pdf

    q-bio.BM cs.AI cs.LG

    AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy

    Authors: Yan Ding, Hao Cheng, Ziliang Ye, Ruyi Feng, Wei Tian, Peng Xie, Juan Zhang, Zhongze Gu

    Abstract: We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptabilit… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

  40. arXiv:2401.05669  [pdf, other

    cs.CL

    ConcEPT: Concept-Enhanced Pre-Training for Language Models

    Authors: Xintao Wang, Zhouhong Gu, Jiaqing Liang, Dakuan Lu, Yanghua Xiao, Wei Wang

    Abstract: Pre-trained language models (PLMs) have been prevailing in state-of-the-art methods for natural language processing, and knowledge-enhanced PLMs are further proposed to promote model performance in knowledge-intensive tasks. However, conceptual knowledge, one essential kind of knowledge for human cognition, still remains understudied in this line of research. This limits PLMs' performance in scena… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 12pages. Work completed in 2023.01

  41. arXiv:2312.12729  [pdf, other

    cs.CV

    Segment Anything Model Meets Image Harmonization

    Authors: Haoxing Chen, Yaohui Li, Zhangxuan Gu, Zhuoer Xu, Jun Lan, Huaxiong Li

    Abstract: Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images. Current methods adopt either global-level or pixel-level feature matching. Global-level feature matching ignores the proximity prior, treating foreground and background as separate entities. On the other hand, pixel-level feature matching los… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  42. arXiv:2311.12268  [pdf, other

    cs.CV

    Boosting Audio-visual Zero-shot Learning with Large Language Models

    Authors: Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

    Abstract: Audio-visual zero-shot learning aims to recognize unseen classes based on paired audio-visual sequences. Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen categories. However, these approaches ignore the obscure event concepts in class names and may inevitably introduce complex network structures with difficult tra… ▽ More

    Submitted 24 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  43. arXiv:2310.14561  [pdf, other

    cs.CV

    F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

    Authors: Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou

    Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations. This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis. At present, adversarial training is one of the most effective defenses against adversarial examples. However, traditional adversarial training makes it diffi… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  44. arXiv:2310.11290  [pdf, other

    cs.RO

    Signal Temporal Logic-Guided Model Predictive Control for Robust Bipedal Locomotion Resilient to Runtime External Perturbations

    Authors: Zhaoyuan Gu, Rongming Guo, William Yates, Yipu Chen, Ye Zhao

    Abstract: This study investigates formal-method-based trajectory optimization (TO) for bipedal locomotion, focusing on scenarios where the robot encounters external perturbations at unforeseen times. Our key research question centers around the assurance of task specification correctness and the maximization of specification robustness for a bipedal robot in the presence of external perturbations. Our con… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  45. arXiv:2310.09827  [pdf, other

    cs.LG

    VFLAIR: A Research Library and Benchmark for Vertical Federated Learning

    Authors: Tianyuan Zou, Zixuan Gu, Yu He, Hideaki Takahashi, Yang Liu, Ya-Qin Zhang

    Abstract: Vertical Federated Learning (VFL) has emerged as a collaborative training paradigm that allows participants with different features of the same group of users to accomplish cooperative training without exposing their raw data or model parameters. VFL has gained significant attention for its research potential and real-world applications in recent years, but still faces substantial challenges, such… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: 39 pages, 22 figures, 19 tabels

  46. arXiv:2310.00749  [pdf, other

    cs.DB cs.LG

    SEED: Domain-Specific Data Curation With Large Language Models

    Authors: Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella

    Abstract: Data curation tasks that prepare data for analytics are critical for turning data into actionable insights. However, due to the diverse requirements of applications in different domains, generic off-the-shelf tools are typically insufficient. As a result, data scientists often have to develop domain-specific solutions tailored to both the dataset and the task, e.g. writing domain-specific code or… ▽ More

    Submitted 24 April, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: preprint, 20 pages, 4 figures

  47. arXiv:2309.16074  [pdf, other

    cs.RO cs.LG

    Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations via Inverse Reinforcement Learning

    Authors: Feiyang Wu, Zhaoyuan Gu, Hanran Wu, Anqi Wu, Ye Zhao

    Abstract: Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of lea… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  48. arXiv:2309.14685  [pdf, other

    cs.RO cs.CV

    DriveSceneGen: Generating Diverse and Realistic Driving Scenarios from Scratch

    Authors: Shuo Sun, Zekai Gu, Tianchen Sun, Jiawei Sun, Chengran Yuan, Yuhang Han, Dongen Li, Marcelo H. Ang Jr

    Abstract: Realistic and diverse traffic scenarios in large quantities are crucial for the development and validation of autonomous driving systems. However, owing to numerous difficulties in the data collection process and the reliance on intensive annotations, real-world datasets lack sufficient quantity and diversity to support the increasing demand for data. This work introduces DriveSceneGen, a data-dri… ▽ More

    Submitted 28 February, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures, 2 tables

  49. arXiv:2309.13172  [pdf, other

    cs.RO

    Walking-by-Logic: Signal Temporal Logic-Guided Model Predictive Control for Bipedal Locomotion Resilient to External Perturbations

    Authors: Zhaoyuan Gu, Rongming Guo, William Yates, Yipu Chen, Ye Zhao

    Abstract: This study proposes a novel planning framework based on a model predictive control formulation that incorporates signal temporal logic (STL) specifications for task completion guarantees and robustness quantification. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion push recovery, where the robot experiences unexpected disturbances. Existing recove… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  50. arXiv:2309.09150  [pdf, other

    cs.CL cs.AI

    Can Large Language Models Understand Real-World Complex Instructions?

    Authors: Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He, Xunzhe Zhou, Lida Chen, Xintao Wang, Yuncheng Huang, Haoning Ye, Zihan Li, Shisong Chen, Yikai Zhang, Zhouhong Gu, Jiaqing Liang, Yanghua Xiao

    Abstract: Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to… ▽ More

    Submitted 8 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.