Skip to main content

Showing 1–50 of 699 results for author: Hu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12882  [pdf, other

    cs.CL cs.AI cs.LG

    InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

    Authors: Yujia Hu, Zhiqiang Hu, Chun-Wei Seah, Roy Ka-Wei Lee

    Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This appro… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11046  [pdf, other

    cs.LG cs.AI cs.CL

    A Survey on LoRA of Large Language Models

    Authors: Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao

    Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.10135  [pdf, other

    cs.CV

    FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection

    Authors: Zheng Jiang, Jinqing Zhang, Yanan Zhang, Qingjie Liu, Zhenghui Hu, Baohui Wang, Yunhong Wang

    Abstract: Although multi-view 3D object detection based on the Bird's-Eye-View (BEV) paradigm has garnered widespread attention as an economical and deployment-friendly perception solution for autonomous driving, there is still a performance gap compared to LiDAR-based methods. In recent years, several cross-modal distillation methods have been proposed to transfer beneficial information from teacher models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  4. arXiv:2407.10074  [pdf, ps, other

    cs.IT

    Optimal linear codes with few weights from simplicial complexes

    Authors: Bing Chen, Yunge Xu, Zhao Hu, Nian Li, Xiangyong Zeng

    Abstract: Recently, constructions of optimal linear codes from simplicial complexes have attracted much attention and some related nice works were presented. Let $q$ be a prime power. In this paper, by using the simplicial complexes of ${\mathbb F}_{q}^m$ with one single maximal element, we construct four families of linear codes over the ring ${\mathbb F}_{q}+u{\mathbb F}_{q}$ ($u^2=0$), which generalizes… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 18 pages

  5. arXiv:2407.09792  [pdf, other

    cs.RO

    Language-Augmented Symbolic Planner for Open-World Task Planning

    Authors: Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, Jia Pan

    Abstract: Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by Robotics: Science and Systems (RSS) 2024

  6. arXiv:2407.09722  [pdf, other

    cs.CL cs.LG

    Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference

    Authors: Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun

    Abstract: Transformer-based Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs. To accelerate LLM inference, speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model. Compared with autoregressive decoding, speculative decoding generate… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  7. arXiv:2407.09469  [pdf, other

    cs.RO

    Learning Coordinated Maneuver in Adversarial Environments

    Authors: Zechen Hu, Manshi Limbu, Daigo Shishika, Xuesu Xiao, Xuan Wang

    Abstract: This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  8. arXiv:2407.03788  [pdf, other

    cs.CV cs.CL

    Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

    Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  9. arXiv:2407.03634  [pdf, other

    cs.CV

    SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection

    Authors: Zongxiang Hu, Zhaosheng Zhang

    Abstract: Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive normal datasets and custom models, limiting scalability. Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 8 pages, 9 figures, conference

  10. arXiv:2407.03000  [pdf, other

    cs.CL cs.CV

    VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values

    Authors: Zhe Hu, Yixiao Ren, Jing Li, Yu Yin

    Abstract: This paper introduces VIVA, a benchmark for VIsion-grounded decision-making driven by human VAlues. While most large vision-language models (VLMs) focus on physical-level skills, our work is the first to examine their multimodal capabilities in leveraging human values to make decisions under a vision-depicted situation. VIVA contains 1,062 images depicting diverse real-world situations and the man… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  11. arXiv:2407.02633  [pdf, other

    cs.CV

    HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes

    Authors: Zhiming Hu, Zheming Yin, Daniel Haeufle, Syn Schmitt, Andreas Bulling

    Abstract: We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ISMAR 2024 TVCG-track, zhiminghu.net/hu24_hoimotion.html. arXiv admin note: text overlap with arXiv:2403.09885

  12. arXiv:2407.01312  [pdf, other

    cs.CV

    ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection

    Authors: Yun Liang, Zhiguang Hu, Junjie Huang, Donglin Di, Anyang Su, Lei Fan

    Abstract: Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  13. arXiv:2407.01231  [pdf, other

    cs.CL cs.AI

    MIRAI: Evaluating LLM Agents for Event Forecasting

    Authors: Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

    Abstract: Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 66 pages, 8 figures, 6 tables; Website: https://mirai-llm.github.io/

  14. arXiv:2407.01085  [pdf, other

    cs.LG cs.CL

    Rethinking LLM-based Preference Evaluation

    Authors: Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Jingang Wang, Zhenyu Chen, Jieyu Zhao, Hui Xiong

    Abstract: Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference eval… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.01017  [pdf, other

    cs.CV

    Coding for Intelligence from the Perspective of Category

    Authors: Wenhan Yang, Zixuan Hu, Lilang Lin, Jiaying Liu, Ling-Yu Duan

    Abstract: Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progress. The recent trends demonstrate the potential homogeneity of these two fields, especially when deep-learning models aid these two categories for bet… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  16. arXiv:2407.00737  [pdf, other

    cs.CV

    LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

    Authors: Mushui Liu, Yuhang Ma, Xinfeng Zhang, Yang Zhen, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan

    Abstract: Diffusion Models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts that involve multiple objects, attribute binding, and long descriptions. This paper proposes a framework called \textbf{LLM4GEN}, which enhances the semantic understanding ability of text-to-image diffusion models by leveraging the se… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 11 pages, 13 figures

  17. arXiv:2406.19643  [pdf, other

    cs.CL cs.AI

    Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation

    Authors: Zhe Hu, Hou Pong Chan, Jing Li, Yu Yin

    Abstract: Writing persuasive arguments is a challenging task for both humans and machines. It entails incorporating high-level beliefs from various perspectives on the topic, along with deliberate reasoning and planning to construct a coherent narrative. Current language models often generate surface tokens autoregressively, lacking explicit integration of these underlying controls, resulting in limited out… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  18. arXiv:2406.18992  [pdf, other

    cs.CV cs.AI cs.LG

    Semi-supervised Concept Bottleneck Models

    Authors: Lijie Hu, Tianhao Huang, Huanyi Xie, Chenyang Ren, Zhengyu Hu, Lu Yu, Di Wang

    Abstract: Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 17 pages

  19. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: Jing Zou, Lanqing Liu, Qi Chen, Shujun Wang, Zhanli Hu, Xiaohan Xing, Jing Qin

    Abstract: Multi-modal MRI offers valuable complementary information for diagnosis and treatment; however, its utility is limited by prolonged scanning times. To accelerate the acquisition process, a practical approach is to reconstruct images of the target modality, which requires longer scanning times, from under-sampled k-space data using the fully-sampled reference modality with shorter scanning times as… ▽ More

    Submitted 7 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  20. arXiv:2406.17294  [pdf, other

    cs.CL

    Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

    Authors: Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge th… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages

  21. arXiv:2406.16987  [pdf

    eess.SP cs.LG

    AI for Equitable Tennis Training: Leveraging AI for Equitable and Accurate Classification of Tennis Skill Levels and Training Phases

    Authors: Gyanna Gao, Hao-Yu Liao, Zhenhong Hu

    Abstract: Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages, 9 figures, 1 table

  22. arXiv:2406.16872  [pdf, other

    eess.SP cs.AI

    Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition

    Authors: Jianguo Pan, Zhengxin Hu, Lingdun Zhang, Xia Cai

    Abstract: Sensor-based human activity recognition is important in daily scenarios such as smart healthcare and homes due to its non-intrusive privacy and low cost advantages, but the problem of out-of-domain generalization caused by differences in focusing individuals and operating environments can lead to significant accuracy degradation on cross-person behavior recognition due to the inconsistent distribu… ▽ More

    Submitted 28 March, 2024; originally announced June 2024.

  23. arXiv:2406.16537  [pdf, other

    cs.CV cs.AI

    Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

    Authors: Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu

    Abstract: Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. The… ▽ More

    Submitted 3 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  24. arXiv:2406.15805  [pdf, other

    cs.CV

    Smart Feature is What You Need

    Authors: Zhaoxin Hu, Keyan Ren

    Abstract: Lack of shape guidance and label jitter caused by information deficiency of weak label are the main problems in 3D weakly-supervised object detection. Current weakly-supervised models often use heuristics or assumptions methods to infer information from weak labels without taking advantage of the inherent clues of weakly-supervised and fully-supervised methods, thus it is difficult to explore a me… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: IJCNN2024 accepted

  25. MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization

    Authors: Zhaozhe Hu, Jia-Li Yin, Bin Chen, Luojun Lin, Bo-Hao Chen, Ximeng Liu

    Abstract: Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the A… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  26. arXiv:2406.13261  [pdf, other

    cs.CL cs.AI

    BeHonest: Benchmarking Honesty in Large Language Models

    Authors: Steffi Chern, Zhulin Hu, Yuqing Yang, Ethan Chern, Yuan Guo, Jiahe Jin, Binjie Wang, Pengfei Liu

    Abstract: Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, present severe risks that intensify as these models approach superintelligent levels. Enhancing honesty i… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.11708  [pdf, ps, other

    math.NA cs.LG math.DS

    Tackling the Curse of Dimensionality in Fractional and Tempered Fractional PDEs with Physics-Informed Neural Networks

    Authors: Zheyuan Hu, Kenji Kawaguchi, Zhongqiang Zhang, George Em Karniadakis

    Abstract: Fractional and tempered fractional partial differential equations (PDEs) are effective models of long-range interactions, anomalous diffusion, and non-local effects. Traditional numerical methods for these problems are mesh-based, thus struggling with the curse of dimensionality (CoD). Physics-informed neural networks (PINNs) offer a promising solution due to their universal approximation, general… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 15 pages

    ACM Class: F.2.2; I.2.7

  28. arXiv:2406.11676  [pdf, other

    cs.LG math.DS math.NA stat.ML

    Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations

    Authors: Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

    Abstract: We introduce an innovative approach for solving high-dimensional Fokker-Planck-Lévy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology. We utilize a fractional score function and Physical-informed neural networks (PINN) to lift the curse of dimensionality (CoD) and alleviate numerical overflow from exponentially decaying solutions with dimen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 16 pages, 1 figure

    ACM Class: F.2.2; I.2.7

  29. arXiv:2406.10847  [pdf, other

    cs.AI cs.CE cs.CL cs.MA

    TorchOpera: A Compound AI System for LLM Safety

    Authors: Shanshan Han, Yuhang Yao, Zijian Hu, Dimitris Stripelis, Zhaozhuo Xu, Chaoyang He

    Abstract: We introduce TorchOpera, a compound AI system for enhancing the safety and quality of prompts and responses for Large Language Models. TorchOpera ensures that all user prompts are safe, contextually grounded, and effectively processed, while enhancing LLM responses to be relevant and high quality. TorchOpera utilizes the vector database for contextual grounding, rule-based wrappers for flexible mo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  30. arXiv:2406.09612  [pdf, other

    cs.AI cs.LG physics.chem-ph

    Automated Molecular Concept Generation and Labeling with Large Language Models

    Authors: Shichang Zhang, Botao Xia, Zimin Zhang, Qianli Wu, Fang Sun, Ziniu Hu, Yizhou Sun

    Abstract: Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like G… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  31. arXiv:2406.09606  [pdf, other

    cs.LG cs.AI cs.AR

    Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

    Authors: Zongyue Qin, Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Ziniu Hu, Yizhou Sun, Jason Cong

    Abstract: In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 8 figures. arXiv admin note: text overlap with arXiv:2305.10838

  32. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  33. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 41 pages, 12 figures, 41 tables; Website: https://llm-self-control.github.io/

  34. arXiv:2406.02603  [pdf, other

    cs.CR cs.LG

    Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions

    Authors: Yihan Wu, Ruibo Chen, Zhengmian Hu, Yanshuo Chen, Junfeng Guo, Hongyang Zhang, Heng Huang

    Abstract: Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  35. arXiv:2406.01026  [pdf, other

    cs.CL

    Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

    Authors: Mengge Xue, Zhenyu Hu, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Meng Zhao, Chengguo Yin

    Abstract: Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept at ACL2024 Main

    Journal ref: ACL 2024

  36. arXiv:2406.00791  [pdf, other

    cs.CV cs.MM eess.IV

    Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor

    Authors: Lei Liu, Zhihao Hu, Zhenghao Chen

    Abstract: Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets fo… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  37. arXiv:2406.00114  [pdf, other

    cs.RO cs.NE

    Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion: An application in 6R robot trajectory planning

    Authors: Bao Liu, Tianbao Liu, Zhongshuo Hu, Fei Ye, Lei Gao

    Abstract: The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynam… ▽ More

    Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  38. arXiv:2405.20179  [pdf, other

    cs.CL cs.AI cs.RO

    Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

    Authors: Zichao Hu, Junyi Jessy Li, Arjun Guha, Joydeep Biswas

    Abstract: Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the pe… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  39. arXiv:2405.19763  [pdf, other

    cs.CL

    Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

    Authors: Kuo Liao, Shuang Li, Meng Zhao, Liqun Liu, Mengge Xue, Zhenyu Hu, Honglin Han, Chengguo Yin

    Abstract: Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks. To address this limitati… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accept at ACL2024 Main

  40. arXiv:2405.19723  [pdf, other

    cs.CV cs.AI

    Encoding and Controlling Global Semantics for Long-form Video Question Answering

    Authors: Thong Thanh Nguyen, Zhiyuan Hu, Xiaobao Wu, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

    Abstract: Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to e… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Work in progress

  41. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  42. arXiv:2405.19131  [pdf, other

    cs.DC

    Learning Interpretable Scheduling Algorithms for Data Processing Clusters

    Authors: Zhibo Hu, Chen Wang, Helen, Paik, Yanfeng Shu, Liming Zhu

    Abstract: Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 20 pages, 18 figures

    MSC Class: 68M20 ACM Class: I.2.8; D.4.1

  43. arXiv:2405.19088  [pdf, other

    cs.CL cs.CV

    Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions

    Authors: Zhe Hu, Tuo Liang, Jing Li, Yiren Lu, Yunlai Zhou, Yiran Qiao, Jing Ma, Yu Yin

    Abstract: Recent advancements in large multimodal language models have demonstrated remarkable proficiency across a wide range of tasks. Yet, these models still struggle with understanding the nuances of human humor through juxtaposition, particularly when it involves nonlinear narratives that underpin many jokes and humor cues. This paper investigates this challenge by focusing on comics with contradictory… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.18822  [pdf, other

    cs.CL

    Toxicity Detection for Free

    Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner

    Abstract: Current LLMs are generally aligned to follow safety requirements and tend to refuse toxic prompts. However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples. In addition, state-of-the-art toxicity detectors have low TPRs at low FPR, incurring high costs in real-world applications where toxic examples are rare. In this paper, we explore Moderation Using LLM Intros… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.16560  [pdf, other

    cs.LG

    Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

    Authors: Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Yu Li, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the mode… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  46. arXiv:2405.16098  [pdf, other

    cs.CV

    Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion

    Authors: Zizhao Hu, Mohammad Rostami

    Abstract: The Transformer architecture has dominated machine learning in a wide range of tasks. The specific characteristic of this architecture is an expensive scaled dot-product attention mechanism that models the inter-token interactions, which is known to be the reason behind its success. However, such a mechanism does not have a direct parallel to the human brain which brings the question if the scaled… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  47. arXiv:2405.15476  [pdf, other

    cs.LG cs.AI cs.CV

    Editable Concept Bottleneck Models

    Authors: Lijie Hu, Chenyang Ren, Zhengyu Hu, Cheng-Long Wang, Di Wang

    Abstract: Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on cases where the data, including concepts, are clean. In many scenarios, we always need to remove/insert some training data or new concepts from trained CBMs due to different reasons, such as priva… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 33 pages

  48. arXiv:2405.15267  [pdf, other

    cs.CV

    Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor

    Authors: Haoxuan Qu, Zhaoyang He, Zeyu Hu, Yujun Cai, Jun Liu

    Abstract: To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  49. arXiv:2405.13872  [pdf, other

    cs.AI cs.CL cs.CV

    Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

    Authors: Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT ha… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Correct the case title

  50. arXiv:2405.13602  [pdf, other

    cs.AI cs.CL cs.LG

    COTET: Cross-view Optimal Transport for Knowledge Graph Entity Typing

    Authors: Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan

    Abstract: Knowledge graph entity typing (KGET) aims to infer missing entity type instances in knowledge graphs. Previous research has predominantly centered around leveraging contextual information associated with entities, which provides valuable clues for inference. However, they have long ignored the dual nature of information inherent in entities, encompassing both high-level coarse-grained cluster know… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.