Skip to main content

Showing 1–50 of 323 results for author: Hu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12817  [pdf, other

    cs.CL cs.SD eess.AS

    Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition

    Authors: Yuchun Shu, Bo Hu, Yifeng He, Hao Shi, Longbiao Wang, Jianwu Dang

    Abstract: Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses as the reference to find the wrong word position. Besides, the acoustic f… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2407.06888  [pdf, other

    cs.LG eess.SY math.OC

    A Complete Set of Quadratic Constraints For Repeated ReLU

    Authors: Sahel Vahedi Noori, Bin Hu, Geir Dullerud, Peter Seiler

    Abstract: This paper derives a complete set of quadratic constraints (QCs) for the repeated ReLU. The complete set of QCs is described by a collection of $2^{n_v}$ matrix copositivity conditions where $n_v$ is the dimension of the repeated ReLU. We also show that only two functions satisfy all QCs in our complete set: the repeated ReLU and a repeated "flipped" ReLU. Thus our complete set of QCs bounds the r… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.03978  [pdf, other

    cs.CL cs.AI

    Benchmarking Complex Instruction-Following with Multiple Constraints Composition

    Authors: Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning Wang, Minlie Huang

    Abstract: Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on m… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: 20 pages, 7 figures

  4. arXiv:2407.03531  [pdf, other

    cs.RO

    OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

    Authors: Boce Hu, Xupeng Zhu, Dian Wang, Zihao Dong, Haojie Huang, Chenghao Wang, Robin Walters, Robert Platt

    Abstract: While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  5. arXiv:2406.17605  [pdf, other

    cs.MM cs.AI cs.CL cs.CV cs.IR

    NativE: Multi-modal Knowledge Graph Completion in the Wild

    Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen

    Abstract: Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: Accepted by SIGIR 2024 as a full paper

  6. arXiv:2406.15325  [pdf, other

    cs.AI cs.SE

    Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks

    Authors: Hokyung Lee, Sumanyu Sharma, Bing Hu

    Abstract: Recent research in Needle-in-a-Haystack (NIAH) benchmarks has explored the capabilities of Large Language Models (LLMs) in retrieving contextual information from large text documents. However, as LLMs become increasingly integrated into software development processes, it is crucial to evaluate their performance in code-based environments. As LLMs are further developed for program synthesis, we nee… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages

    MSC Class: 68T50 ACM Class: I.2.7; D.2.5

  7. arXiv:2406.14282  [pdf, other

    cs.CL cs.AI

    Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

    Authors: Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, Huajun Chen

    Abstract: Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress

  8. arXiv:2406.12225  [pdf, other

    cs.CV

    The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

    Authors: Hongpeng Pan, Shifeng Yi, Shouwei Yang, Lei Qi, Bing Hu, Yi Xu, Yang Yang

    Abstract: This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tunin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR2024 Foundational Few-Shot Object Detection Challenge

  9. arXiv:2406.11303  [pdf, other

    cs.CV cs.AI cs.CL

    VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

    Authors: Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang

    Abstract: Despite significant breakthroughs in video analysis driven by the rapid development of large multimodal models (LMMs), there remains a lack of a versatile evaluation benchmark to comprehensively assess these models' performance in video understanding and reasoning. To address this, we present VideoVista, a video QA benchmark that integrates challenges across diverse content categories, durations,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 38 pages, 44 figures

  10. arXiv:2406.11193  [pdf, other

    cs.CL

    MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

    Authors: Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

    Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechan… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.08216  [pdf, ps, other

    cs.SE

    A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

    Authors: Sinclair Hudson, Sophia Jit, Boyue Caroline Hu, Marsha Chechik

    Abstract: Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems. To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing. Software Engineering (SE) research on testing Machine Learning (ML) components and ML-based systems has systematically explored many topic… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.06435  [pdf, other

    cs.CL cs.AI

    Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

    Authors: Brian Hu, Bill Ray, Alice Leung, Amy Summerville, David Joy, Christopher Funk, Arslan Basharat

    Abstract: In difficult decision-making scenarios, it is common to have conflicting opinions among expert human decision-makers as there may not be a single right answer. Such decisions may be guided by different attributes that can be used to characterize an individual's decision. We introduce a novel dataset for medical triage decision-making, labeled with a set of decision-maker attributes (DMAs). This da… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 15 pages total (including appendix), NAACL 2024 Industry Track

  13. arXiv:2405.19893  [pdf, other

    cs.LG cs.AI cs.CL

    Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

    Authors: Chunjing Gan, Dan Yang, Binbin Hu, Hanxiao Zhang, Siyuan Li, Ziqi Liu, Yue Shen, Lin Ju, Zhiqiang Zhang, Jinjie Gu, Lei Liang, Jun Zhou

    Abstract: In recent years, large language models (LLMs) have made remarkable achievements in various domains. However, the untimeliness and cost of knowledge updates coupled with hallucination issues of LLMs have curtailed their applications in knowledge intensive tasks, where retrieval augmented generation (RAG) can be of help. Nevertheless, existing retrieval augmented models typically use similarity as a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 12 pages

  14. arXiv:2405.19149  [pdf, other

    cs.CV cs.AI cs.IR

    CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

    Authors: Xintong Jiang, Yaxiong Wang, Mengjian Li, Yujiao Wu, Bingwen Hu, Xueming Qian

    Abstract: Composed Image Retrieval (CIR) involves searching for target images based on an image-text pair query. While current methods treat this as a query-target matching problem, we argue that CIR triplets contain additional associations beyond this primary relation. In our paper, we identify two new relations within triplets, treating each triplet as a graph node. Firstly, we introduce the concept of te… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: To appear at SIGIR 2024. arXiv admin note: text overlap with arXiv:2309.02169

  15. arXiv:2405.17132  [pdf, other

    cs.LG

    Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

    Authors: Chunjing Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

    Abstract: Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook th… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  16. arXiv:2405.16869  [pdf, other

    cs.AI cs.CL

    Mixture of Modality Knowledge Experts for Robust Multi-modal Knowledge Graph Completion

    Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen

    Abstract: Multi-modal knowledge graph completion (MMKGC) aims to automatically discover new knowledge triples in the given multi-modal knowledge graphs (MMKGs), which is achieved by collaborative modeling the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods tend to focus on crafting elegant entity-wise multi-modal fusion strategies, yet they… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Work in progress. Code and data will be released at https://github.com/zjukg/MoMoK

  17. arXiv:2405.16631  [pdf, other

    cs.CL cs.CY cs.SI

    Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models

    Authors: Qiong Nan, Qiang Sheng, Juan Cao, Beizhe Hu, Danding Wang, Jintao Li

    Abstract: Fake news detection plays a crucial role in protecting social media users and maintaining a healthy news ecosystem. Among existing works, comment-based fake news detection methods are empirically shown as promising because comments could reflect users' opinions, stances, and emotions and deepen models' understanding of fake news. Unfortunately, due to exposure bias and users' different willingness… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures, 8 tables

  18. arXiv:2405.13085  [pdf, other

    cs.CL cs.AI

    Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks

    Authors: Yichi Zhang, Binbin Hu, Zhuo Chen, Lingbing Guo, Ziqi Liu, Zhiqiang Zhang, Lei Liang, Huajun Chen, Wen Zhang

    Abstract: Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely resea… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Work in progress. Code and data will be open-sourced at https://github.com/zjukg/MuDoK

  19. arXiv:2405.11273  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

    Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

  20. arXiv:2405.10347  [pdf, other

    cs.CV cs.AI cs.CY

    Networking Systems for Video Anomaly Detection: A Tutorial and Survey

    Authors: Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C. M. Leung

    Abstract: The increasing prevalence of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Submitted to ACM Computing Surveys, under review,for more information and supplementary material, please see https://github.com/fdjingliu/NSVAD

  21. arXiv:2405.08013  [pdf, other

    cs.LG cs.AI cs.SI

    CTRL: Continuous-Time Representation Learning on Temporal Heterogeneous Information Network

    Authors: Chenglin Li, Yuanzhen Xie, Chenyun Yu, Lei Cheng, Bo Hu, Zang Li, Di Niu

    Abstract: Inductive representation learning on temporal heterogeneous graphs is crucial for scalable deep learning on heterogeneous information networks (HINs) which are time-varying, such as citation networks. However, most existing approaches are not inductive and thus cannot handle new nodes or edges. Moreover, previous temporal graph embedding methods are often trained with the temporal link prediction… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  22. arXiv:2405.07260  [pdf

    cs.LG cs.AI eess.SP

    A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

    Authors: Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

    Abstract: This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  23. arXiv:2405.05236  [pdf, ps, other

    eess.SY cs.LG math.OC

    Stability and Performance Analysis of Discrete-Time ReLU Recurrent Neural Networks

    Authors: Sahel Vahedi Noori, Bin Hu, Geir Dullerud, Peter Seiler

    Abstract: This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability… ▽ More

    Submitted 14 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  24. arXiv:2405.04950  [pdf, other

    cs.CV cs.AI cs.CL

    VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

    Authors: Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang

    Abstract: Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Addi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 17 pages; Accepted by ICML 2024

  25. arXiv:2405.03799  [pdf, other

    cs.LG cs.AI q-bio.QM

    Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

    Authors: Bing Hu, Ashish Saragadam, Anita Layton, Helen Chen

    Abstract: Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating d… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  26. arXiv:2405.01010  [pdf, other

    cs.LG stat.ML

    Efficient and Adaptive Posterior Sampling Algorithms for Bandits

    Authors: Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde

    Abstract: We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-wo… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  27. arXiv:2404.12008  [pdf, other

    cs.IR cs.AI

    How Do Recommendation Models Amplify Popularity Bias? An Analysis from the Spectral Perspective

    Authors: Siyi Lin, Chongming Gao, Jiawei Chen, Sheng Zhou, Binbin Hu, Yan Feng, Chun Chen, Can Wang

    Abstract: Recommendation Systems (RS) are often plagued by popularity bias. When training a recommendation model on a typically long-tailed dataset, the model tends to not only inherit this bias but often exacerbate it, resulting in over-representation of popular items in the recommendation lists. This study conducts comprehensive empirical and theoretical analyses to expose the root causes of this phenomen… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 23 pages, 9 figures

  28. arXiv:2404.11894  [pdf, other

    cs.GR

    Rendering Participating Media Using Path Graphs

    Authors: Becky Hu, Xi Deng, Fujun Luan, Miloš Hašan, Steve Marschner

    Abstract: Rendering volumetric scattering media, including clouds, fog, smoke, and other complex materials, is crucial for realism in computer graphics. Traditional path tracing, while unbiased, requires many long path samples to converge in scenes with scattering media, and a lot of work is wasted by paths that make a negligible contribution to the image. Methods to make better use of the information learn… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  29. arXiv:2404.11225  [pdf, other

    cs.CL cs.AI

    In-Context Learning State Vector with Inner and Momentum Optimization

    Authors: Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

    Abstract: Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. However, the working mechanisms and optimization of these vectors are yet to be thoroughly explored. In this paper, we address this g… ▽ More

    Submitted 4 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 7 figures, 5 tables

  30. arXiv:2404.09468  [pdf, other

    cs.AI

    MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion

    Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Huajun Chen, Wen Zhang

    Abstract: Multi-modal knowledge graphs (MMKG) store structured world knowledge containing rich multi-modal descriptive information. To overcome their inherent incompleteness, multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given MMKGs, leveraging both structural information from the triples and multi-modal information of the entities. Existing MMKGC methods usually… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Working in progress; Repo is available at https://github.com/zjukg/MyGO

  31. arXiv:2404.09127  [pdf, other

    cs.CL

    Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation

    Authors: Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang

    Abstract: Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose decisions and confidences not only stem from intrinsic beliefs but can also be adjusted through daily observations, existing calibration methods for LLMs focus on estim… ▽ More

    Submitted 10 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted at ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  32. arXiv:2404.03865  [pdf, other

    cs.CL cs.LG

    FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

    Authors: Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

    Abstract: Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges for autoregressive token-by-token generation. To mitigate computation overload incurred during generation, several early-exit and layer… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.01382

  33. arXiv:2404.03647  [pdf, other

    math.OC cs.AI cs.LG

    Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

    Authors: Darioush Kevian, Usman Syed, Xingang Guo, Aaron Havens, Geir Dullerud, Peter Seiler, Lianhui Qin, Bin Hu

    Abstract: In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control problems. Controls provides an interesting case study for LLM reasoning due to its combination of mathematical theory and engineering design. We introduce ControlBench, a benchmark dataset tailored to reflect the bread… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  34. arXiv:2403.19837  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.LO

    Concept-based Analysis of Neural Networks via Vision-Language Models

    Authors: Ravi Mangal, Nina Narodytska, Divya Gopinath, Boyue Caroline Hu, Anirban Roy, Susmit Jha, Corina Pasareanu

    Abstract: The analysis of vision-based deep neural networks (DNNs) is highly desirable but it is very challenging due to the difficulty of expressing formal specifications for vision tasks and the lack of efficient verification procedures. In this paper, we propose to leverage emerging multimodal, vision-language, foundation models (VLMs) as a lens through which we can reason about vision models. VLMs have… ▽ More

    Submitted 10 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  35. arXiv:2403.18381  [pdf, other

    cs.CL cs.AI

    Improving Attributed Text Generation of Large Language Models via Preference Learning

    Authors: Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang

    Abstract: Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirrorin… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages, 15 tables, 2 figures

  36. arXiv:2403.09861  [pdf, other

    cs.ET cs.AI

    NN-Defined Modulator: Reconfigurable and Portable Software Modulator on IoT Gateways

    Authors: Jiazhao Wang, Wenchao Jiang, Ruofeng Liu, Bin Hu, Demin Gao, Shuai Wang

    Abstract: A physical-layer modulator is a vital component for an IoT gateway to map the symbols to signals. However, due to the soldered hardware chipsets on the gateway's motherboards or the diverse toolkits on different platforms for the software radio, the existing solutions either have limited extensibility or are platform-specific. Such limitation is hard to ignore when modulation schemes and hardware… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Journal ref: NSDI 2024

  37. Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

    Authors: Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang

    Abstract: Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity movement. It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment. However, the identification, differentiation, and understanding of micro-actions pose challenges due to the imperceptible and inacc… ▽ More

    Submitted 3 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology

  38. arXiv:2403.04260  [pdf, other

    cs.IR cs.CL cs.LG

    Can Small Language Models be Good Reasoners for Sequential Recommendation?

    Authors: Yuling Wang, Changxin Tian, Binbin Hu, Yanhua Yu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Liang Pang, Xiao Wang

    Abstract: Large language models (LLMs) open up new horizons for sequential recommendations, owing to their remarkable language comprehension and generation capabilities. However, there are still numerous challenges that should be addressed to successfully implement sequential recommendations empowered by LLMs. Firstly, user behavior patterns are often complex, and relying solely on one-step reasoning from L… ▽ More

    Submitted 28 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by TheWebConf (WWW) 2024

  39. arXiv:2403.01954  [pdf, other

    cs.CL cs.AI cs.LO

    DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation

    Authors: Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi, Bin Hu

    Abstract: Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference. However, these methods often guide plausible continuations by greedily selecting targets, which, while completing the task, may disrupt the natural patterns of human language generation. In this work, we propose a novel decoding f… ▽ More

    Submitted 7 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE TKDE (Major Revision), 13 pages, 6 figures

  40. arXiv:2402.19401  [pdf, other

    cs.CV

    Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

    Authors: Huakun Shen, Boyue Caroline Hu, Krzysztof Czarnecki, Lina Marsso, Marsha Chechik

    Abstract: While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and co… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  41. arXiv:2402.16705  [pdf, other

    cs.CL cs.AI cs.LG

    SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

    Authors: Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

    Abstract: Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data sets, which increases costs and limits widespread adoption. In… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  42. arXiv:2402.14488  [pdf, other

    cs.CL

    Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

    Authors: Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang

    Abstract: The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation de… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: LREC-Coling 2024

  43. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  44. arXiv:2402.14398  [pdf, other

    cs.CV cs.AI

    Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

    Authors: Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao

    Abstract: GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes. Recent inversion methods mainly utilize additional high-bit features to improve image details preservation, as low-bit codes cannot faithfully reconstruct source images, leading to the loss of details. However, during editing, existi… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 18 pages, 18 figures, published to AAAI24

  45. arXiv:2402.13587  [pdf, other

    cs.CL cs.CV

    A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

    Authors: Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang

    Abstract: In this paper, we propose a new setting for generating product descriptions from images, augmented by marketing keywords. It leverages the combined power of visual and textual information to create descriptions that are more tailored to the unique features of products. For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language mode… ▽ More

    Submitted 7 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted by LREC-COLING 2024

  46. arXiv:2402.13561  [pdf, other

    cs.CL cs.CV

    Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

    Authors: Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang

    Abstract: Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e.g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i.e., connecting visuals to their relevant knowledge. Visual knowledge plays a significant role in analyzing, inferring, and… ▽ More

    Submitted 26 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 12 pages,4 figures; Accepted by ACL 2024 Main Conference

  47. arXiv:2402.13546  [pdf, other

    cs.CL cs.CV

    LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

    Authors: Yunxin Li, Xinyu Chen, Baotain Hu, Min Zhang

    Abstract: Long video understanding is a significant and ongoing challenge in the intersection of multimedia and artificial intelligence. Employing large language models (LLMs) for comprehending video becomes an emerging and promising method. However, this approach incurs high computational costs due to the extensive array of video tokens, experiences reduced visual clarity as a consequence of token aggregat… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Working in Progress

  48. arXiv:2402.13045  [pdf, other

    cs.RO

    A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

    Authors: Wansong Liu, Sibo Tian, Boyi Hu, Xiao Liang, Minghui Zheng

    Abstract: This paper presents a deep learning enhanced adaptive unscented Kalman filter (UKF) for predicting human arm motion in the context of manufacturing. Unlike previous network-based methods that solely rely on captured human motion data, which is represented as bone vectors in this paper, we incorporate a human arm dynamic model into the motion prediction algorithm and use the UKF to iteratively fore… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  49. arXiv:2402.11654  [pdf, other

    math.OC cs.LG

    Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

    Authors: Darioush Keivan, Xingang Guo, Peter Seiler, Geir Dullerud, Bin Hu

    Abstract: In this paper, we revisit model-free policy search on an important robust control benchmark, namely $μ$-synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to L4DC 2024

  50. arXiv:2402.10671  [pdf, other

    cs.CL

    Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm

    Authors: Yuanzhen Xie, Xinzhou Jin, Tao Xie, MingXiong Lin, Liang Chen, Chenyun Yu, Lei Cheng, ChengXiang Zhuo, Bo Hu, Zang Li

    Abstract: In-context learning of large-language models (LLMs) has achieved remarkable success in the field of natural language processing, while extensive case studies reveal that the single-step chain-of-thought prompting approach faces challenges such as attention diffusion and inadequate performance in complex tasks like text-to-SQL. To improve the contextual learning capabilities of LLMs in text-to-SQL,… ▽ More

    Submitted 3 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.