Zum Hauptinhalt springen

Showing 1–50 of 646 results for author: Jia, J

Searching in archive cs. Search in all archives.
.
  1. VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

    Authors: Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia

    Abstract: Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generation exhibit two limitations. Firstly, they require the division of inputs into content prompt (transcript) and description prompt (style and speaker), i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  2. arXiv:2408.15542  [pdf, other

    cs.CV cs.AI cs.MM

    Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

    Authors: Jiajun Liu, Yibing Wang, Hanghang Ma, Xiaoping Wu, Xiaoqi Ma, Xiaoming Wei, Jianbin Jiao, Enhua Wu, Jie Hu

    Abstract: Rapid advancements have been made in extending Large Language Models (LLMs) to Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data remains a challenging endeavor, especially for long videos. Due to insufficient access to large-scale high-quality video data and the excessive compression of visual features, current methods exhibit limitations in effectively proce… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description

    Authors: Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu

    Abstract: Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate insightful interplay between speech audio and natural language. However, constructing such datasets presents a major trade-off between large-scale data… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  4. arXiv:2408.09585  [pdf, other

    cs.LG cs.IR

    On the Necessity of World Knowledge for Mitigating Missing Labels in Extreme Classification

    Authors: Jatin Prakash, Anirudh Buvanesh, Bishal Santra, Deepak Saini, Sachin Yadav, Jian Jiao, Yashoteja Prabhu, Amit Sharma, Manik Varma

    Abstract: Extreme Classification (XC) aims to map a query to the most relevant documents from a very large document set. XC algorithms used in real-world applications learn this mapping from datasets curated from implicit feedback, such as user clicks. However, these datasets inevitably suffer from missing labels. In this work, we observe that systematic missing labels lead to missing knowledge, which is cr… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Preprint, 23 pages

  5. arXiv:2408.09097  [pdf, other

    cs.CV cs.AI

    Depth-guided Texture Diffusion for Image Semantic Segmentation

    Authors: Wei Sun, Yuan Li, Qixiang Ye, Jianbin Jiao, Yanzhao Zhou

    Abstract: Depth information provides valuable insights into the 3D structure especially the outline of objects, which can be utilized to improve the semantic segmentation tasks. However, a naive fusion of depth information can disrupt feature and compromise accuracy due to the modality gap between the depth and the vision. In this work, we introduce a Depth-guided Texture Diffusion approach that effectively… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  6. arXiv:2408.08723  [pdf, other

    cs.CV cs.AI

    Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS

    Authors: Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao

    Abstract: Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing wor… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.07504 by other authors

  7. arXiv:2408.07291  [pdf, other

    cs.CR

    Evaluating Large Language Model based Personal Information Extraction and Countermeasures

    Authors: Yupei Liu, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Automatically extracting personal information--such as name, phone number, and email address--from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods--such as regular expression, keyword search, and entity detection--achieve limited success at such personal information extraction. In this work, we perform a syste… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  8. arXiv:2408.06070  [pdf, other

    cs.CV

    ControlNeXt: Powerful and Efficient Control for Image and Video Generation

    Authors: Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia

    Abstract: Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require substantial additional computational resources, espe… ▽ More

    Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: controllable generation

  9. arXiv:2408.05678  [pdf, other

    cs.DC cs.AI cs.LG

    Efficient Federated Learning Using Dynamic Update and Adaptive Pruning with Momentum on Shared Server Data

    Authors: Ji Liu, Juncheng Jia, Hong Zhang, Yuhui Yun, Leye Wang, Yang Zhou, Huaiyu Dai, Dejing Dou

    Abstract: Despite achieving remarkable performance, Federated Learning (FL) encounters two important problems, i.e., low training efficiency and limited computational resources. In this paper, we propose a new FL framework, i.e., FedDUMAP, with three original contributions, to leverage the shared insensitive data on the server in addition to the distributed data in edge devices so as to efficiently train a… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 27 pages, to appear in TIST

  10. arXiv:2408.04273  [pdf, other

    eess.IV cs.CV

    SG-JND: Semantic-Guided Just Noticeable Distortion Predictor For Image Compression

    Authors: Linhan Cao, Wei Sun, Xiongkuo Min, Jun Jia, Zicheng Zhang, Zijian Chen, Yucheng Zhu, Lizhou Liu, Qiubo Chen, Jing Chen, Guangtao Zhai

    Abstract: Just noticeable distortion (JND), representing the threshold of distortion in an image that is minimally perceptible to the human visual system (HVS), is crucial for image compression algorithms to achieve a trade-off between transmission bit rate and image quality. However, traditional JND prediction methods only rely on pixel-level or sub-band level features, lacking the ability to capture the i… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ICIP 2024

  11. arXiv:2408.03723  [pdf, other

    cs.RO

    MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

    Authors: Xiangcheng Hu, Jin Wu, Jianhao Jiao, Binqian Jiang, Wei Zhang, Wenshuo Wang, Ping Tan

    Abstract: Large-scale multi-session LiDAR mapping is essential for a wide range of applications, including surveying, autonomous driving, crowdsourced mapping, and multi-agent navigation. However, existing approaches often struggle with data redundancy, robustness, and accuracy in complex environments. To address these challenges, we present MS-Mapping, an novel multi-session LiDAR mapping system that emplo… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 18 pages, 22 figures

  12. arXiv:2408.02978  [pdf, other

    cs.MM cs.AI cs.CV

    ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

    Authors: Ruixiang Zhao, Jian Jia, Yan Li, Xuehan Bai, Quan Chen, Han Li, Peng Jiang, Xirong Li

    Abstract: E-commerce is increasingly multimedia-enriched, with products exhibited in a broad-domain manner as images, short videos, or live stream promotions. A unified and vectorized cross-domain production representation is essential. Due to large intra-product variance and high inter-product similarity in the broad-domain scenario, a visual-only representation is inadequate. While Automatic Speech Recogn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  13. arXiv:2408.00950  [pdf, other

    cs.CV

    PrivateGaze: Preserving User Privacy in Black-box Mobile Gaze Tracking Services

    Authors: Lingyu Du, Jinyuan Jia, Xucong Zhang, Guohao Lan

    Abstract: Eye gaze contains rich information about human attention and cognitive processes. This capability makes the underlying technology, known as gaze tracking, a critical enabler for many ubiquitous applications and has triggered the development of easy-to-use gaze estimation services. Indeed, by utilizing the ubiquitous cameras on tablets and smartphones, users can readily access many gaze estimation… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  14. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  15. arXiv:2407.21408  [pdf, other

    cs.CV

    Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model

    Authors: Zhichao Zhang, Xinyue Li, Wei Sun, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Puyi Wang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Guangtao Zhai

    Abstract: In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  16. arXiv:2407.19594  [pdf, other

    cs.CL cs.AI

    Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

    Authors: Tianhao Wu, Weizhe Yuan, Olga Golovneva, Jing Xu, Yuandong Tian, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar

    Abstract: Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan et al., 2024) have shown that LLMs can improve by judging their own responses instead of relying on human labelers. However, existing methods have primarily focused on improving model responses rather tha… ▽ More

    Submitted 29 July, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

  17. arXiv:2407.13976  [pdf, other

    cs.CV

    PlacidDreamer: Advancing Harmony in Text-to-3D Generation

    Authors: Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia

    Abstract: Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations.… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

    ACM Class: I.4.0

  18. arXiv:2407.13752  [pdf, other

    cs.CV

    LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

    Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia

    Abstract: Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples. Yet, these progresses are largely confined to widely recognized subjects, which can be learned with relative ease through models' adequate shared prior knowledge. In contrast, logos, characterized by unique patterns and textual elements, are hard to establish shared… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  19. arXiv:2407.13229  [pdf, other

    cs.RO eess.SY

    Disturbance Observer for Estimating Coupled Disturbances

    Authors: Jindou Jia, Yuhang Liu, Kexin Guo, Xiang Yu, Lihua Xie, Lei Guo

    Abstract: High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning phil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures

  20. arXiv:2407.11736  [pdf, other

    cs.RO cs.CV

    GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

    Authors: Jingwen Yu, Hanjing Ye, Jianhao Jiao, Ping Tan, Hong Zhang

    Abstract: Visual loop closure detection is an important module in visual simultaneous localization and mapping (SLAM), which associates current camera observation with previously visited places. Loop closures correct drifts in trajectory estimation to build a globally consistent map. However, a false loop closure can be fatal, so verification is required as an additional step to ensure robustness by rejecti… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 9 pages, 11 figures, Accepted by IROS(2024)

  21. arXiv:2407.10459  [pdf, other

    cs.CV

    DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

    Authors: Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

    Abstract: Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures; reference added; accepted at IJCAI2024 main track

  22. arXiv:2407.09091  [pdf, other

    cs.RO

    Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion

    Authors: Jinhao He, Huaiyang Huang, Shuyang Zhang, Jianhao Jiao, Chengju Liu, Ming Liu

    Abstract: Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ICRA 2024

  23. arXiv:2407.08935  [pdf, other

    cs.CR

    Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses

    Authors: Yuxin Yang, Qiang Li, Jinyuan Jia, Yuan Hong, Binghui Wang

    Abstract: Federated graph learning (FedGL) is an emerging federated learning (FL) framework that extends FL to learn graph data from diverse sources. FL for non-graph data has shown to be vulnerable to backdoor attacks, which inject a shared backdoor trigger into the training data such that the trained backdoored FL model can predict the testing data containing the trigger as the attacker desires. However,… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper is accepted to CCS2024

  24. arXiv:2407.05342  [pdf, other

    cs.CV

    Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

    Authors: Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia

    Abstract: This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLM… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  25. arXiv:2407.04086  [pdf, other

    cs.CR cs.CV cs.LG

    Certifiably Robust Image Watermark

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against rem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  26. arXiv:2407.03593  [pdf, other

    math.NA cs.LG

    Green Multigrid Network

    Authors: Ye Lin, Young Ju Lee, Jiwei Jia

    Abstract: GreenLearning networks (GL) directly learn Green's function in physical space, making them an interpretable model for capturing unknown solution operators of partial differential equations (PDEs). For many PDEs, the corresponding Green's function exhibits asymptotic smoothness. In this paper, we propose a framework named Green Multigrid networks (GreenMGNet), an operator learning algorithm designe… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  27. arXiv:2407.01537  [pdf, other

    cs.RO cs.CV

    WaveShot: A Compact Portable Unmanned Surface Vessel for Dynamic Water Surface Videography and Media Production

    Authors: Shijian Ma, Shicong Ma, Jianhao Jiao

    Abstract: This paper presents WaveShot, an innovative portable unmanned surface vessel that aims to transform water surface videography by offering a highly maneuverable, cost-effective, and safe alternative to traditional filming methods. WaveShot is designed for the modern demands of film production, advertising, documentaries, and visual arts, equipped with professional-grade waterproof cameras and advan… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 March, 2024; originally announced July 2024.

  28. arXiv:2407.00752  [pdf, other

    cs.CV cs.AI

    Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation

    Authors: Peng Huang, Xue Gao, Lihong Huang, Jing Jiao, Xiaokang Li, Yuanyuan Wang, Yi Guo

    Abstract: Text-to-image generation has important implications for generation of diverse and controllable images. Several attempts have been made to adapt Stable Diffusion (SD) to the medical domain. However, the large distribution difference between medical reports and natural texts, as well as high computational complexity in common stable diffusion limit the authenticity and feasibility of the generated m… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  29. arXiv:2406.18842  [pdf

    cs.CY cs.AI cs.CL

    The global landscape of academic guidelines for generative AI and Large Language Models

    Authors: Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar

    Abstract: The integration of Generative Artificial Intelligence (GAI) and Large Language Models (LLMs) in academia has spurred a global discourse on their potential pedagogical benefits and ethical considerations. Positive reactions highlight some potential, such as collaborative creativity, increased access to education, and empowerment of trainers and trainees. However, negative reactions raise concerns a… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 May, 2024; originally announced June 2024.

  30. arXiv:2406.18841  [pdf

    cs.CY cs.AI cs.CL

    Navigating LLM Ethics: Advancements, Challenges, and Future Directions

    Authors: Junfeng Jiao, Saleh Afroogh, Yiming Xu, Connor Phillips

    Abstract: This study addresses ethical issues surrounding Large Language Models (LLMs) within the field of artificial intelligence. It explores the common ethical challenges posed by both LLMs and other AI systems, such as privacy and fairness, as well as ethical challenges uniquely arising from LLMs. It highlights challenges such as hallucination, verifiable accountability, and decoding censorship complexi… ▽ More

    Submitted 27 June, 2024; v1 submitted 14 May, 2024; originally announced June 2024.

  31. arXiv:2406.18629  [pdf, other

    cs.LG cs.AI cs.CL

    Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

    Authors: Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia

    Abstract: Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benef… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code, data, and models are available at https://github.com/dvlab-research/Step-DPO

  32. arXiv:2406.17304  [pdf, other

    cs.CL

    Leveraging LLMs for Dialogue Quality Measurement

    Authors: Jinghan Jia, Abi Komma, Timothy Leffel, Xujun Peng, Ajay Nagesh, Tamer Soliman, Aram Galstyan, Anoop Kumar

    Abstract: In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and pro… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  33. arXiv:2406.16694  [pdf, other

    cs.CL

    Task Oriented In-Domain Data Augmentation

    Authors: Xiao Liang, Xinyu Hu, Simiao Zuo, Yeyun Gong, Qiang Lou, Yi Liu, Shao-Lun Huang, Jian Jiao

    Abstract: Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  34. arXiv:2406.14169  [pdf, other

    cs.IR cs.LG

    Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning

    Authors: Amit Sharma, Hua Li, Xue Li, Jian Jiao

    Abstract: Given an input query, a recommendation model is trained using user feedback data (e.g., click data) to output a ranked list of items. In real-world systems, besides accuracy, an important consideration for a new model is novelty of its top-k recommendations w.r.t. an existing deployed model. However, novelty of top-k items is a difficult goal to optimize a model for, since it involves a non-differ… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at KDD 2024

  35. arXiv:2406.14132  [pdf, other

    cs.AI

    Enhancing Monotonic Modeling with Spatio-Temporal Adaptive Awareness in Diverse Marketing

    Authors: Bin Li, Jiayan Pei, Feiyang Xiao, Yifan Zhao, Zhixing Zhang, Diwei Liu, HengXu He, Jia Jia

    Abstract: In the mobile internet era, the Online Food Ordering Service (OFOS) emerges as an integral component of inclusive finance owing to the convenience it brings to people. OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency. Despite significant progress, the marketing doma… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages

  36. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  37. arXiv:2406.11253  [pdf, other

    cs.CV

    Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

    Authors: Yuan Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu

    Abstract: In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 11figures, 17 tables

  38. arXiv:2406.10802  [pdf, other

    cs.CL cs.AI

    KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

    Authors: Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

    Abstract: Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  39. arXiv:2406.10504  [pdf, other

    cs.AI cs.CL cs.LG

    Task Facet Learning: A Structured Approach to Prompt Optimization

    Authors: Gurusha Juneja, Nagarajan Natarajan, Hua Li, Jian Jiao, Amit Sharma

    Abstract: Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model (LLM). Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whethe… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  40. arXiv:2406.10125  [pdf, other

    cs.CV

    MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

    Authors: Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

    Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  41. arXiv:2406.09569  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

    Authors: Frank Seide, Morrie Doulaty, Yangyang Shi, Yashesh Gaur, Junteng Jia, Chunyang Wu

    Abstract: We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio without explicit end-pointing. Speech ReaLLM is a special case of the more general ReaLLM ("real-time LLM") approach, also introduced here for the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  42. arXiv:2406.07698  [pdf, other

    cs.LG

    Label Smoothing Improves Machine Unlearning

    Authors: Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu

    Abstract: The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  43. arXiv:2406.07528  [pdf, other

    cs.LG

    QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

    Authors: Jingyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia

    Abstract: The capacity of Large Language Models (LLMs) to comprehend and reason over long contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing long-distance dependencies within sequences to deeply understand semantics. To address this issue, we introduce Query-aware Inference for LLMs (Q-LLM), a system designed to process extensive sequences akin to human cognition.… ▽ More

    Submitted 22 August, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  44. arXiv:2406.07348  [pdf, other

    cs.LG cs.CL

    DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

    Authors: Zijian Hei, Weiling Liu, Wenjie Ou, Juyi Qiao, Junming Jiao, Guowen Song, Ting Tian, Yi Lin

    Abstract: Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the rele… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  45. arXiv:2406.06739  [pdf, other

    cs.CL cs.IR cs.LG

    Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval

    Authors: Ravisri Valluri, Akash Kumar Mohankumar, Kushal Dave, Amit Singh, Jian Jiao, Manik Varma, Gaurav Sinha

    Abstract: Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 tables, 2 figures

  46. arXiv:2406.06087  [pdf, other

    cs.CV

    GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

    Authors: Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

    Abstract: Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus ren… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures

  47. arXiv:2406.05972  [pdf, other

    cs.AI cs.CY cs.HC cs.LG econ.TH

    Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

    Authors: Jingru Jia, Zehua Yuan, Junhao Pan, Paul McNamara, Deming Chen

    Abstract: When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Jingru Jia and Zehua Yuan has equal contribution

  48. arXiv:2406.03757  [pdf, other

    cs.RO cs.LG

    RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

    Authors: Jingyao Li, Pengguang Chen, Sitong Wu, Chuanyang Zheng, Hong Xu, Jiaya Jia

    Abstract: The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive benchmark and an autonomous learning framework, RoboCoder aimed at enhancing the generalization capabilities of robots in complex environments. Unlike tradition… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  49. arXiv:2406.03193  [pdf, other

    cs.CR cs.LG

    Graph Neural Network Explanations are Fragile

    Authors: Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang

    Abstract: Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN expla… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 17 pages, 64 figures

  50. arXiv:2406.02096  [pdf, other

    cs.RO

    MS-Mapping: Multi-session LiDAR Mapping with Wasserstein-based Keyframe Selection

    Authors: Xiangcheng Hu, Jin Wu, Jianhao Jiao, Wei Zhang, Ping Tan

    Abstract: Large-scale multi-session LiDAR mapping is crucial for various applications but still faces significant challenges in data redundancy, memory consumption, and efficiency. This paper presents MS-Mapping, a novel multi-session LiDAR mapping system that incorporates an incremental mapping scheme to enable efficient map assembly in large-scale environments. To address the data redundancy and improve g… ▽ More

    Submitted 16 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 3 pages, 2 figures, Accepted by the 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA@40)