Skip to main content

Showing 1–50 of 493 results for author: Jiang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12448  [pdf, other

    cs.LG

    Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

    Authors: Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu

    Abstract: Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an inno… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12281  [pdf, other

    cs.CR cs.AI

    Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

    Authors: Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai, Nathalie Baracaldo

    Abstract: The increasing use of large language models (LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. While such attacks have been extensively studied in image domains and classification tasks, they remain underexplored for natural language generation (NLG) tasks. To addre… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 18 pages, 11 figures

  3. arXiv:2407.12005  [pdf, other

    cs.MM cs.CV

    VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

    Authors: Xiaoxuan Zhu, Zhouhong Gu, Sihang Jiang, Zhixu Li, Hongwei Feng, Yanghua Xiao

    Abstract: Online courses have significantly lowered the barrier to accessing education, yet the varying content quality of these videos poses challenges. In this work, we focus on the task of automatically evaluating the quality of video course content. We have constructed a dataset with a substantial collection of video courses and teaching materials. We propose three evaluation principles and design a new… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

  4. arXiv:2407.11298  [pdf, other

    cs.RO

    ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

    Authors: Yaoyao Qian, Xupeng Zhu, Ondrej Biza, Shuo Jiang, Linfeng Zhao, Haojie Huang, Yu Qi, Robert Platt

    Abstract: Robotic grasping in cluttered environments remains a significant challenge due to occlusions and complex object arrangements. We have developed ThinkGrasp, a plug-and-play vision-language grasping system that makes use of GPT-4o's advanced contextual reasoning for heavy clutter environment grasping strategies. ThinkGrasp can effectively identify and generate grasp poses for target objects, even wh… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Project Website:(https://h-freax.github.io/thinkgrasp_page/)

  5. arXiv:2407.08964  [pdf, other

    cs.LG cs.RO

    Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control

    Authors: Sicong Jiang, Seongjin Choi, Lijun Sun

    Abstract: Cooperative Adaptive Cruise Control (CACC) plays a pivotal role in enhancing traffic efficiency and safety in Connected and Autonomous Vehicles (CAVs). Reinforcement Learning (RL) has proven effective in optimizing complex decision-making processes in CACC, leading to improved system performance and adaptability. Among RL approaches, Multi-Agent Reinforcement Learning (MARL) has shown remarkable p… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  6. OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

    Authors: Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan

    Abstract: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale re… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 figures, 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  7. arXiv:2407.08466  [pdf, other

    eess.IV cs.CV

    Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

    Authors: Congrui Fu, Hui Yuan, Shiqi Jiang, Guanghui Zhang, Liquan Shen, Raouf Hamzaoui

    Abstract: By converting low-frame-rate, low-resolution videos into high-frame-rate, high-resolution ones, space-time video super-resolution techniques can enhance visual experiences and facilitate more efficient information dissemination. We propose a convolutional neural network (CNN) for space-time video super-resolution, namely GIRNet. To generate highly accurate features and thus improve performance, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.06360  [pdf, other

    cs.SE

    CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings

    Authors: Anthony Varkey, Siyuan Jiang, Weijing Huang

    Abstract: Pretrained language models for code token embeddings are used in code search, code clone detection, and other code-related tasks. Similarly, code function embeddings are useful in such tasks. However, there are no out-of-box models for function embeddings in the current literature. So, this paper proposes CodeCSE, a contrastive learning model that learns embeddings for functions and their descript… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.05717  [pdf

    eess.SY cs.RO eess.SP

    A New Framework for Nonlinear Kalman Filters

    Authors: Shida Jiang, Junzhe Shi, Scott Moura

    Abstract: The Kalman filter (KF) is a state estimation algorithm that optimally combines system knowledge and measurements to minimize the mean squared error of the estimated states. While KF was initially designed for linear systems, numerous extensions of it, such as extended Kalman filter (EKF), unscented Kalman filter (UKF), cubature Kalman filter (CKF), etc., have been proposed for nonlinear systems. A… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 26 pages, 5 + 9 figues, 2 tables

  10. arXiv:2407.05082  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    DMTG: One-Shot Differentiable Multi-Task Grouping

    Authors: Yuan Gao, Shuguo Jiang, Moran Li, Jin-Gang Yu, Gui-Song Xia

    Abstract: We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train th… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

    Journal ref: International Conference on Machine Learning (ICML), 2024

  11. arXiv:2407.01551  [pdf, other

    cs.CY cs.AI cs.CL

    Leveraging Prompts in LLMs to Overcome Imbalances in Complex Educational Text Data

    Authors: Jeanne McClure, Machi Shimmei, Noboru Matsuda, Shiyan Jiang

    Abstract: In this paper, we explore the potential of Large Language Models (LLMs) with assertions to mitigate imbalances in educational datasets. Traditional models often fall short in such contexts, particularly due to the complexity and nuanced nature of the data. This issue is especially prominent in the education sector, where cognitive engagement levels among students show significant variation in thei… ▽ More

    Submitted 27 April, 2024; originally announced July 2024.

    Comments: 17 pages, 5 figures, 3 tables, 2 appendices

  12. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.19249  [pdf, other

    cs.LG

    NTFormer: A Composite Node Tokenized Graph Transformer for Node Classification

    Authors: Jinsong Chen, Siyu Jiang, Kun He

    Abstract: Recently, the emerging graph Transformers have made significant advancements for node classification on graphs. In most graph Transformers, a crucial step involves transforming the input graph into token sequences as the model input, enabling Transformer to effectively learn the node representations. However, we observe that existing methods only express partial graph information of nodes through… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  14. arXiv:2406.18227  [pdf, other

    cs.CV cs.CL

    GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

    Authors: Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

    Abstract: There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024

  15. arXiv:2406.17484  [pdf, other

    cs.CL

    MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

    Authors: Yusheng Liao, Shuyang Jiang, Yanfeng Wang, Yu Wang

    Abstract: Large language models (LLMs) have shown substantial progress in natural language understanding and generation, proving valuable especially in the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks, which can be categorized as knowledge-intensive tasks and alignment-required tasks. Previous approaches either ignore the latter task o… ▽ More

    Submitted 6 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 19 pages, 6 figures

  16. arXiv:2406.17225  [pdf, other

    eess.IV cs.CV

    Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

    Authors: Songhan Jiang, Zhengyu Gan, Linghan Cai, Yifeng Wang, Yongbing Zhang

    Abstract: Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  17. arXiv:2406.16518  [pdf

    cs.CV

    Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Convolutional neural networks (CNNs) and Transformers have shown advanced accuracy in crack detection under certain conditions. Yet, the fixed local attention can compromise the generalisation of CNNs, and the quadratic complexity of the global self-attention restricts the practical deployment of Transformers. Given the emergence of the new-generation architecture of Mamba, this paper proposes a V… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures

  18. arXiv:2406.16505  [pdf, other

    q-fin.CP cs.AI

    $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

    Authors: Feng Xu, Yan Yin, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Zongzhang Zhang

    Abstract: Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on gen… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  19. arXiv:2406.11882  [pdf

    cs.AI cs.LG

    Applications of Explainable artificial intelligence in Earth system science

    Authors: Feini Huang, Shijie Jiang, Lu Li, Yongkun Zhang, Ye Zhang, Ruqing Zhang, Qingliang Li, Danxi Li, Wei Shangguan, Yongjiu Dai

    Abstract: In recent years, artificial intelligence (AI) rapidly accelerated its influence and is expected to promote the development of Earth system science (ESS) if properly harnessed. In application of AI to ESS, a significant hurdle lies in the interpretability conundrum, an inherent problem of black-box nature arising from the complexity of AI algorithms. To address this, explainable AI (XAI) offers a s… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  21. arXiv:2406.10261  [pdf, other

    cs.CL cs.AI

    FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

    Authors: Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying Jin, Mingyu Huang, Xiangyang Li, Shuhuan Mei, Shuqiang Jiang

    Abstract: Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages, 19 figures

  22. arXiv:2406.09798  [pdf, other

    cs.RO cs.CV

    Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

    Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang

    Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VL… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024. The code is available at https://github.com/MrZihan/Sim2Real-VLN-3DFF

  23. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  24. arXiv:2406.07025  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    Entropy-Reinforced Planning with Large Language Models for Drug Discovery

    Authors: Xuefeng Liu, Chih-chan Tien, Peng Ding, Songhao Jiang, Rick L. Stevens

    Abstract: The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused tok… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Published in ICML2024

  25. arXiv:2406.04876  [pdf, other

    cs.CL

    HateDebias: On the Diversity and Variability of Hate Speech Debiasing

    Authors: Nankai Lin, Hongyan Wu, Zhengming Chen, Zijian Li, Lianxi Wang, Shengyi Jiang, Dong Zhou, Aimin Yang

    Abstract: Hate speech on social media is ubiquitous but urgently controlled. Without detecting and mitigating the biases brought by hate speech, different types of ethical problems. While a number of datasets have been proposed to address the problem of hate speech detection, these datasets seldom consider the diversity and variability of bias, making it far from real-world scenarios. To fill this gap, we p… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  26. arXiv:2405.20192  [pdf, other

    cs.CL

    TAIA: Large Language Models are Out-of-Distribution Data Learners

    Authors: Shuyang Jiang, Yusheng Liao, Ya Zhang, Yu Wang, Yanfeng Wang

    Abstract: Fine-tuning on task-specific question-answer pairs is a predominant method for enhancing the performance of instruction-tuned large language models (LLMs) on downstream tasks. However, in certain specialized domains, such as healthcare or harmless content generation, it is nearly impossible to obtain a large volume of high-quality data that matches the downstream distribution. To improve the perfo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 25 pages

  27. arXiv:2405.17846  [pdf, other

    cs.RO cs.AI

    Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs

    Authors: Yong Qi, Gabriel Kyebambo, Siyuan Xie, Wei Shen, Shenghui Wang, Bitao Xie, Bin He, Zhipeng Wang, Shuo Jiang

    Abstract: Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent saf… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  28. arXiv:2405.16414  [pdf, other

    cs.CV

    PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model

    Authors: Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao, Shuhang Gu, Changbo Wang, Chenhui Li

    Abstract: Image steganography can hide information in a host image and obtain a stego image that is perceptually indistinguishable from the original one. This technique has tremendous potential in scenarios like copyright protection, information retrospection, etc. Some previous studies have proposed to enhance the robustness of the methods against image disturbances to increase their applicability. However… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 content pages

  29. arXiv:2405.15278  [pdf, other

    cs.CV

    MindShot: Brain Decoding Framework Using Only One Image

    Authors: Shuai Jiang, Zhu Meng, Delong Liu, Haiwen Li, Fei Su, Zhicheng Zhao

    Abstract: Brain decoding, which aims at reconstructing visual stimuli from brain signals, primarily utilizing functional magnetic resonance imaging (fMRI), has recently made positive progress. However, it is impeded by significant challenges such as the difficulty of acquiring fMRI-image pairs and the variability of individuals, etc. Most methods have to adopt the per-subject-per-model paradigm, greatly lim… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  30. arXiv:2405.12541  [pdf, other

    cs.AI

    DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge

    Authors: Bufang Yang, Siyang Jiang, Lilin Xu, Kaiwei Liu, Hai Li, Guoliang Xing, Hongkai Chen, Xiaofan Jiang, Zhenyu Yan

    Abstract: Large language models (LLMs) have the potential to transform digital healthcare, as evidenced by recent advances in LLM-based virtual doctors. However, current approaches rely on patient's subjective descriptions of symptoms, causing increased misdiagnosis. Recognizing the value of daily data from smart devices, we introduce a novel LLM-based multi-turn consultation virtual doctor system, DrHouse,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  31. arXiv:2405.11273  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

    Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

  32. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  33. arXiv:2405.07309  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

    Authors: Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu

    Abstract: Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward des… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  34. arXiv:2405.05590  [pdf, other

    cs.CR cs.AR cs.LG

    TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans

    Authors: Fangzhou Wang, Qijing Wang, Lilas Alrahis, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Ozgur Sinanoglu, Tsung-Yi Ho, Evangeline F. Y. Young, Johann Knechtel

    Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  35. arXiv:2405.04021  [pdf, other

    cs.CR

    Robust and Reusable Fuzzy Extractors for Low-entropy Rate Randomness Sources

    Authors: Somnath Panja, Shaoquan Jiang, Reihaneh Safavi-Naini

    Abstract: Fuzzy extractors (FE) are cryptographic primitives that extract reliable cryptographic key from noisy real world random sources such as biometric sources. The FE generation algorithm takes a source sample, extracts a key and generates some helper data that will be used by the reproduction algorithm to recover the key. Reusability of FE guarantees that security holds when FE is used multiple times… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  36. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  37. arXiv:2404.11824  [pdf, other

    cs.CV

    TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

    Authors: Tianyi Liang, Jiangqi Liu, Sicheng Song, Shiqi Jiang, Yifei Huang, Changbo Wang, Chenhui Li

    Abstract: Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 7 pages, 7 figures

  38. arXiv:2404.10237  [pdf, other

    cs.CV cs.CL

    Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models

    Authors: Songtao Jiang, Tuo Zheng, Yan Zhang, Yeying Jin, Li Yuan, Zuozhu Liu

    Abstract: Recent advancements in general-purpose or domain-specific multimodal large language models (LLMs) have witnessed remarkable progress for medical decision-making. However, they are designated for specific classification or generative tasks, and require model training or finetuning on large-scale datasets with sizeable parameters and tremendous computing, hindering their clinical utility across dive… ▽ More

    Submitted 26 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  39. arXiv:2404.09027  [pdf, other

    cs.CL

    MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

    Authors: Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

    Abstract: Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks which often require multi-task learning capabilities. Previous approaches, although beneficial, fall short… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 15 pages, 3 figures

  40. arXiv:2404.06258  [pdf

    cs.CV

    Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Vision-based crack detection faces deployment challenges due to the size of robust models and edge device limitations. These can be addressed with lightweight models trained with knowledge distillation (KD). However, state-of-the-art (SOTA) KD methods compromise anti-noise robustness. This paper develops Robust Feature Knowledge Distillation (RFKD), a framework to improve robustness while retainin… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 24 pages, 13 figures

  41. arXiv:2404.06078  [pdf, other

    cs.IR

    End-to-end training of Multimodal Model and ranking Model

    Authors: Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang Song, Na Mou, Shen Jiang, Han Li

    Abstract: Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  42. arXiv:2404.04514  [pdf, other

    cs.CL

    Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models

    Authors: Songtao Jiang, Yan Zhang, Chenyi Zhou, Yeying Jin, Yang Feng, Jian Wu, Zuozhu Liu

    Abstract: Multimodal Large Language Models (MLLMs) such as GPT-4V and Gemini Pro face challenges in achieving human-level perception in Visual Question Answering (VQA), particularly in object-oriented perception tasks which demand fine-grained understanding of object identities, locations or attributes, as indicated by empirical findings. This is mainly due to their limited capability to effectively integra… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  43. arXiv:2404.01943  [pdf, other

    cs.CV cs.RO

    Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

    Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang

    Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. The code is available at https://github.com/MrZihan/HNR-VLN

  44. arXiv:2403.18339  [pdf, other

    eess.IV cs.CV

    H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images

    Authors: Jinpeng Lu, Jingyun Chen, Linghan Cai, Songhan Jiang, Yongbing Zhang

    Abstract: Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effec… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 10 pages,4 figures

  45. arXiv:2403.16463  [pdf, other

    cs.CL

    Few-shot Named Entity Recognition via Superposition Concept Discrimination

    Authors: Jiawei Chen, Hongyu Lin, Xianpei Han, Yaojie Lu, Shanshan Jiang, Bin Dong, Le Sun

    Abstract: Few-shot NER aims to identify entities of target types with only limited number of illustrative instances. Unfortunately, few-shot NER is severely challenged by the intrinsic precise generalization problem, i.e., it is hard to accurately determine the desired target type due to the ambiguity stemming from information deficiency. In this paper, we propose Superposition Concept Discriminator (SuperC… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  46. arXiv:2403.16395  [pdf, other

    cs.CV

    Multi-attention Associate Prediction Network for Visual Tracking

    Authors: Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Xilai Wei, Zhonghe Hu

    Abstract: Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key issue and only employ a unified matching block in two task branches, decaying the decision quality.… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  47. arXiv:2403.15815  [pdf, other

    cs.DC

    Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing

    Authors: Mingjin Zhang, Jiannong Cao, Yuvraj Sahni, Xiangchun Chen, Shan Jiang

    Abstract: Edge AI has been recently proposed to facilitate the training and deployment of Deep Neural Network (DNN) models in proximity to the sources of data. To enable the training of large models on resource-constraint edge devices and protect data privacy, parallel split learning is becoming a practical and popular approach. However, current parallel split learning neglects the resource heterogeneity of… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted by International Conference on Computing, Networking and Communications (ICNC 2024)

  48. arXiv:2403.14690  [pdf

    cs.CY cs.AI cs.CL cs.LG

    Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning

    Authors: Xiuqin Zhong, Shengyuan Yan, Gongqi Lin, Hongguang Fu, Liang Xu, Siwen Jiang, Lei Huang, Wei Fang

    Abstract: In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatical… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  49. arXiv:2403.13846  [pdf, other

    cs.LG cs.AI

    A Clustering Method with Graph Maximum Decoding Information

    Authors: Xinrun Xu, Manying Lv, Zhanbiao Lian, Yurong Wu, Jin Yan, Shan Jiang, Zhiming Ding

    Abstract: The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of rela… ▽ More

    Submitted 18 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 9 pages, 9 figures, IJCNN 2024

  50. arXiv:2403.13002  [pdf

    cs.HC cs.AI cs.CL

    AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models

    Authors: Shuo Jiang, Jianxi Luo

    Abstract: Researchers and innovators have made enormous efforts in developing ideation methods, such as morphological analysis and design-by-analogy, to aid engineering design ideation for problem solving and innovation. Among these, the Theory of Inventive Problem Solving (TRIZ) stands out as one of the most well-known approaches, widely applied for systematic innovation. However, the complexity of TRIZ re… ▽ More

    Submitted 22 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Proceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conferences

    ACM Class: I.2.7; I.2.1