Search | arXiv e-print repository

Inferring ghost cities on the globe in newly developed urban areas based on urban vitality with multi-source data

Authors: Yecheng Zhang, Tangqi Tu, Ying long

Abstract: Due to rapid urbanization over the past 20 years, many newly developed areas have lagged in socio-economic maturity, creating an imbalance with older cities and leading to the rise of "ghost cities." However, due to the complexity of socio-economic factors, no global studies have measured this phenomenon. We propose a unified framework based on urban vitality theory and multi-source data, validate… ▽ More Due to rapid urbanization over the past 20 years, many newly developed areas have lagged in socio-economic maturity, creating an imbalance with older cities and leading to the rise of "ghost cities." However, due to the complexity of socio-economic factors, no global studies have measured this phenomenon. We propose a unified framework based on urban vitality theory and multi-source data, validated by various data sources. We derived 8841 natural cities globally with an area over 5 square kiloxmeters and divided each into new urban areas (developed after 2005) and old urban areas (developed before 2005). Urban vitality was gauged using the density of road networks, points of interest (POIs), and population density with 1 km resolution across morphological, functional, and social dimensions. By comparing urban vitality in new and old urban areas, we quantify the ghost cities index (GCI) globally using the theory of urban vitality for the first time. The results reveal that the vitality of new urban areas is 7.69% that of old ones. The top 5% (442) of cities were designated as ghost cities, a finding mirrored by news media and other research. This study sheds light on strategies for sustainable global urbanization, crucial for the United Nations' Sustainable Development Goals. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 28 pages, 13 figures

arXiv:2408.13889 [pdf, other]

LLM with Relation Classifier for Document-Level Relation Extraction

Authors: Xingzuo Li, Kehai Chen, Yunfei Long, Min Zhang

Abstract: Large language models (LLMs) create a new paradigm for natural language processing. Despite their advancement, LLM-based methods still lag behind traditional approaches in document-level relation extraction (DocRE), a critical task for understanding complex entity relations. This paper investigates the causes of this performance gap, identifying the dispersion of attention by LLMs due to entity pa… ▽ More Large language models (LLMs) create a new paradigm for natural language processing. Despite their advancement, LLM-based methods still lag behind traditional approaches in document-level relation extraction (DocRE), a critical task for understanding complex entity relations. This paper investigates the causes of this performance gap, identifying the dispersion of attention by LLMs due to entity pairs without relations as a primary factor. We then introduce a novel classifier-LLM approach to DocRE. The proposed approach begins with a classifier specifically designed to select entity pair candidates exhibiting potential relations and thereby feeds them to LLM for the final relation extraction. This method ensures that during inference, the LLM's focus is directed primarily at entity pairs with relations. Experiments on DocRE benchmarks reveal that our method significantly outperforms recent LLM-based DocRE models and achieves competitive performance with several leading traditional DocRE models. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.13102 [pdf, other]

Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks

Authors: Zhenyu Liu, Haoran Duan, Huizhi Liang, Yang Long, Vaclav Snasel, Guiseppe Nicosia, Rajiv Ranjan, Varun Ojha

Abstract: Adversarial training is one of the most effective methods for enhancing model robustness. Recent approaches incorporate adversarial distillation in adversarial training architectures. However, we notice two scenarios of defense methods that limit their performance: (1) Previous methods primarily use static ground truth for adversarial training, but this often causes robust overfitting; (2) The los… ▽ More Adversarial training is one of the most effective methods for enhancing model robustness. Recent approaches incorporate adversarial distillation in adversarial training architectures. However, we notice two scenarios of defense methods that limit their performance: (1) Previous methods primarily use static ground truth for adversarial training, but this often causes robust overfitting; (2) The loss functions are either Mean Squared Error or KL-divergence leading to a sub-optimal performance on clean accuracy. To solve those problems, we propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gradually and dynamically gain robustness from the guide model's decisions. Additionally, we found that a budgeted dimension of inner optimization for the target model may contribute to the trade-off between clean accuracy and robust accuracy. Therefore, we propose a novel inner optimization method to be incorporated into the adversarial training. This will enable the target model to adaptively search for adversarial examples based on dynamic labels from the guiding model, contributing to the robustness of the target model. Extensive experiments validate the superior performance of our approach. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Journal ref: 31st International Conference on Neural Information Processing (ICONIP), 2024

arXiv:2408.11878 [pdf, other]

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, tables, and time-series data to embed comprehensive financial knowledge. FinLLaMA is then instruction fine-tuned with 573K financial instructions, resulting in FinLLaMA-instruct, which enhances task performance. Finally, we present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types. Extensive evaluations demonstrate FinLLaMA's superior performance over LLaMA3-8B, LLaMA3.1-8B, and BloombergGPT in both zero-shot and few-shot settings across 19 and 4 datasets, respectively. FinLLaMA-instruct outperforms GPT-4 and other Financial LLMs on 15 datasets. FinLLaVA excels in understanding tables and charts across 4 multimodal tasks. Additionally, FinLLaMA achieves impressive Sharpe Ratios in trading simulations, highlighting its robust financial application capabilities. We will continually maintain and improve our models and benchmarks to support ongoing innovation in academia and industry. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 33 pages, 13 figures

arXiv:2408.10561 [pdf, other]

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Authors: Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long

Abstract: The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection,… ▽ More The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 11 pages, 6 figures

arXiv:2408.09368 [pdf, ps, other]

Unbreakable Decomposition in Close-to-Linear Time

Authors: Aditya Anand, Euiwoong Lee, Jason Li, Yaowei Long, Thatchaphol Saranurak

Abstract: Unbreakable decomposition, introduced by Cygan et al. (SICOMP'19) and Cygan et al. (TALG'20), has proven to be one of the most powerful tools for parameterized graph cut problems in recent years. Unfortunately, all known constructions require at least $Ω_k\left(mn^2\right)$ time, given an undirected graph with $n$ vertices, $m$ edges, and cut-size parameter $k$. In this work, we show the first clo… ▽ More Unbreakable decomposition, introduced by Cygan et al. (SICOMP'19) and Cygan et al. (TALG'20), has proven to be one of the most powerful tools for parameterized graph cut problems in recent years. Unfortunately, all known constructions require at least $Ω_k\left(mn^2\right)$ time, given an undirected graph with $n$ vertices, $m$ edges, and cut-size parameter $k$. In this work, we show the first close-to-linear time parameterized algorithm that computes an unbreakable decomposition. More precisely, for any $0<ε\leq 1$, our algorithm runs in time $2^{O(\frac{k}ε \log \frac{k}ε)}m^{1 + ε}$ and computes a $(O(k/ε), k)$ unbreakable tree decomposition of $G$, where each bag has adhesion at most $O(k/ε)$. This immediately opens up possibilities for obtaining close-to-linear time algorithms for numerous problems whose only known solution is based on unbreakable decomposition. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 37 pages

arXiv:2408.09278 [pdf, other]

Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology

Authors: Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

Abstract: Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets… ▽ More Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets from clinical sources is relatively difficult and expensive. Moreover, using external natural image datasets introduces noise during the domain generalization process. Cross-species homologous data, such as mouse kidney data, which exhibits high structural and feature similarity to human kidneys, has the potential to enhance model performance on human datasets. In this study, we incorporated the collected private Periodic Acid-Schiff (PAS) stained mouse kidney dataset into the human kidney dataset for joint training. The results showed that after introducing cross-species homologous data, the semantic segmentation models based on CNN and Transformer architectures achieved an average increase of 1.77% and 1.24% in mIoU, and 1.76% and 0.89% in Dice score for the human renal cortex and medulla datasets, respectively. This approach is also capable of enhancing the model's generalization ability. This indicates that cross-species homologous data, as a low-noise trainable data source, can help improve model performance under conditions of limited clinical samples. Code is available at https://github.com/hrlblab/layer_segmentation. △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.08484 [pdf, other]

doi 10.1145/3637528.3671704

An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem

Authors: Huaiyuan Liu, Xianzhang Liu, Donghua Yang, Hongzhi Wang, Yingchi Long, Mengtong Ji, Dongjing Miao, Zhiyu Liang

Abstract: The Maximum Minimal Cut Problem (MMCP), a NP-hard combinatorial optimization (CO) problem, has not received much attention due to the demanding and challenging bi-connectivity constraint. Moreover, as a CO problem, it is also a daunting task for machine learning, especially without labeled instances. To deal with these problems, this work proposes an unsupervised learning framework combined with h… ▽ More The Maximum Minimal Cut Problem (MMCP), a NP-hard combinatorial optimization (CO) problem, has not received much attention due to the demanding and challenging bi-connectivity constraint. Moreover, as a CO problem, it is also a daunting task for machine learning, especially without labeled instances. To deal with these problems, this work proposes an unsupervised learning framework combined with heuristics for MMCP that can provide valid and high-quality solutions. As far as we know, this is the first work that explores machine learning and heuristics to solve MMCP. The unsupervised solver is inspired by a relaxation-plus-rounding approach, the relaxed solution is parameterized by graph neural networks, and the cost and penalty of MMCP are explicitly written out, which can train the model end-to-end. A crucial observation is that each solution corresponds to at least one spanning tree. Based on this finding, a heuristic solver that implements tree transformations by adding vertices is utilized to repair and improve the solution quality of the unsupervised solver. Alternatively, the graph is simplified while guaranteeing solution consistency, which reduces the running time. We conduct extensive experiments to evaluate our framework and give a specific application. The results demonstrate the superiority of our method against two techniques designed. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.05891 [pdf, other]

CMAB: A First National-Scale Multi-Attribute Building Dataset in China Derived from Open Source Data and GeoAI

Authors: Yecheng Zhang, Huimin Zhao, Ying Long

Abstract: Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper introduces a geospatial artificial… ▽ More Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper introduces a geospatial artificial intelligence (GeoAI) framework for large-scale building modeling, presenting the first national-scale Multi-Attribute Building dataset (CMAB), covering 3,667 spatial cities, 29 million buildings, and 21.3 billion square meters of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 337.7 billion cubic meters of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating features such as morphology, location, and function. Using multi-source data, including billions of high-resolution Google Earth images and 60 million street view images (SVIs), we generated rooftop, height, function, age, and quality attributes for each building. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning. △ Less

Submitted 21 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

Comments: 43 pages, 20 figures

ACM Class: I.4.9

arXiv:2408.00855 [pdf, other]

doi 10.1145/3678518

HAIGEN: Towards Human-AI Collaboration for Facilitating Creativity and Style Generation in Fashion Design

Authors: Jianan Jiang, Di Wu, Hanhui Deng, Yidan Long, Wenyi Tang, Xiang Li, Can Liu, Zhanpeng Jin, Wenlei Zhang, Tangquan Qi

Abstract: The process of fashion design usually involves sketching, refining, and coloring, with designers drawing inspiration from various images to fuel their creative endeavors. However, conventional image search methods often yield irrelevant results, impeding the design process. Moreover, creating and coloring sketches can be time-consuming and demanding, acting as a bottleneck in the design workflow.… ▽ More The process of fashion design usually involves sketching, refining, and coloring, with designers drawing inspiration from various images to fuel their creative endeavors. However, conventional image search methods often yield irrelevant results, impeding the design process. Moreover, creating and coloring sketches can be time-consuming and demanding, acting as a bottleneck in the design workflow. In this work, we introduce HAIGEN (Human-AI Collaboration for GENeration), an efficient fashion design system for Human-AI collaboration developed to aid designers. Specifically, HAIGEN consists of four modules. T2IM, located in the cloud, generates reference inspiration images directly from text prompts. With three other modules situated locally, the I2SM batch generates the image material library into a certain designer-style sketch material library. The SRM recommends similar sketches in the generated library to designers for further refinement, and the STM colors the refined sketch according to the styles of inspiration images. Through our system, any designer can perform local personalized fine-tuning and leverage the powerful generation capabilities of large models in the cloud, streamlining the entire design development process. Given that our approach integrates both cloud and local model deployment schemes, it effectively safeguards design privacy by avoiding the need to upload personalized data from local designers. We validated the effectiveness of each module through extensive qualitative and quantitative experiments. User surveys also confirmed that HAIGEN offers significant advantages in design efficiency, positioning it as a new generation of aid-tool for designers. △ Less

Submitted 11 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (ACM IMWUT/UbiComp 2024)

arXiv:2407.15862 [pdf]

Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis

Authors: Qiuhong Wei, Ying Cui, Mengwei Ding, Yanqin Wang, Lingling Xiang, Zhengxiong Yao, Ceran Chen, Ying Long, Zhezhen Jin, Ximing Xu

Abstract: Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions w… ▽ More Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions were randomly selected from a public online medical forum, with 10 questions from each of 25 pediatric departments, spanning from December 1, 2022, to October 30, 2023. Two lightweight open-source LLMs, ChatGLM3-6B and Vicuna-7B, along with a larger-scale model, Vicuna-13B, and the widely-used proprietary ChatGPT-3.5, independently answered these questions in Chinese between November 1, 2023, and November 7, 2023. To assess reproducibility, each inquiry was replicated once. We found that ChatGLM3-6B demonstrated higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001), but all were outperformed by ChatGPT-3.5. ChatGPT-3.5 received the highest ratings in accuracy (65.2%) compared to ChatGLM3-6B (41.2%), Vicuna-13B (11.2%), and Vicuna-7B (4.4%). Similarly, in completeness, ChatGPT-3.5 led (78.4%), followed by ChatGLM3-6B (76.0%), Vicuna-13B (34.8%), and Vicuna-7B (22.0%) in highest ratings. ChatGLM3-6B matched ChatGPT-3.5 in readability, both outperforming Vicuna models (P < .001). In terms of empathy, ChatGPT-3.5 outperformed the lightweight LLMs (P < .001). In safety, all models performed comparably well (P > .05), with over 98.4% of responses being rated as safe. Repetition of inquiries confirmed these findings. In conclusion, Lightweight LLMs demonstrate promising application in pediatric healthcare. However, the observed gap between lightweight and large-scale proprietary LLMs underscores the need for continued development efforts. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 27 pages in total with 17 pages of main manuscript and 10 pages of supplementary materials; 4 figures in the main manuscript and 2 figures in supplementary material

MSC Class: 68M20 (Primary) 62G10 (Secondary)

arXiv:2407.11906 [pdf, other]

SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

Authors: Hao Ding, Tuxun Lu, Yuqian Zhang, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Mathias Unberath

Abstract: Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's pe… ▽ More Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's performance. This vulnerability is especially problematic in surgical settings where predictions might be used to inform high-stakes decisions. To better understand model behavior under non-adversarial corruptions, prior work has explored introducing artificial corruptions, like Gaussian noise or contrast perturbation to test set images, to assess model robustness. However, these corruptions are either not photo-realistic or model/task agnostic. Thus, these investigations provide limited insights into model deterioration under realistic surgical corruptions. To address this limitation, we introduce the SegSTRONG-C challenge that aims to promote the development of algorithms robust to unforeseen but plausible image corruptions of surgery, like smoke, bleeding, and low brightness. We collect and release corruption-free mock endoscopic video sequences for the challenge participants to train their algorithms and benchmark them on video sequences with photo-realistic non-adversarial corruptions for a binary robot tool segmentation task. This new benchmark will allow us to carefully study neural network robustness to non-adversarial corruptions of surgery, thus constituting an important first step towards more robust models for surgical computer vision. In this paper, we describe the data collection and annotation protocol, baseline evaluations of established segmentation models, and data augmentation-based techniques to enhance model robustness. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2406.16967 [pdf, other]

Remaining useful life prediction of rolling bearings based on refined composite multi-scale attention entropy and dispersion entropy

Authors: Yunchong Long, Qinkang Pang, Guangjie Zhu, Junxian Cheng, Xiangshun Li

Abstract: Remaining useful life (RUL) prediction based on vibration signals is crucial for ensuring the safe operation and effective health management of rotating machinery. Existing studies often extract health indicators (HI) from time domain and frequency domain features to analyze complex vibration signals, but these features may not accurately capture the degradation process. In this study, we propose… ▽ More Remaining useful life (RUL) prediction based on vibration signals is crucial for ensuring the safe operation and effective health management of rotating machinery. Existing studies often extract health indicators (HI) from time domain and frequency domain features to analyze complex vibration signals, but these features may not accurately capture the degradation process. In this study, we propose a degradation feature extraction method called Fusion of Multi-Modal Multi-Scale Entropy (FMME), which utilizes multi-modal Refined Composite Multi-scale Attention Entropy (RCMATE) and Fluctuation Dispersion Entropy (RCMFDE), to solve the problem that the existing degradation features cannot accurately reflect the degradation process. Firstly, the Empirical Mode Decomposition (EMD) is employed to decompose the dual-channel vibration signals of bearings into multiple modals. The main modals are then selected for further analysis. The subsequent step involves the extraction of RCMATE and RCMFDE from each modal, followed by wavelet denoising. Next, a novel metric is proposed to evaluate the quality of degradation features. The attention entropy and dispersion entropy of the optimal scales under different modals are fused using Laplacian Eigenmap (LE) to obtain the health indicators. Finally, RUL prediction is performed through the similarity of health indicators between fault samples and bearings to be predicted. Experimental results demonstrate that the proposed method yields favorable outcomes across diverse operating conditions. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 12pages, 9 figures

arXiv:2406.14962 [pdf, other]

Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning

Authors: Suyi Li, Chenyi Jiang, Shidong Wang, Yang Long, Zheng Zhang, Haofeng Zhang

Abstract: Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addr… ▽ More Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addressed this issue by focusing on disentangling strategy or utilizing object-based conditional probabilities to constrain the selection space of attributes. Unfortunately, few studies have explored the problem from the perspective of modeling the mechanism of visual primitive interactions. Inspired by the success of vanilla adversarial learning in Cross-Domain Few-Shot Learning, we take a step further and devise a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem. Besides, the latest studies highlight the weakness of the perception of hard compositions even under data-balanced conditions. To this end, we propose a novel over-sampling strategy with object-similarity guidance to augment target compositional training data. We performed detailed quantitative analysis and retrieval experiments on well-established datasets, such as UT-Zappos50K, MIT-States, and C-GQA, to validate the effectiveness of our proposed method, and the state-of-the-art (SOTA) performance demonstrates the superiority of our approach. The code is available at https://github.com/lisuyi/PBadv_czsl. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.04882 [pdf, other]

InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

Authors: Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, Hao Dong

Abstract: Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all co… ▽ More Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all constrained to one specific type of navigation instruction. In this work, we propose InstructNav, a generic instruction navigation system. InstructNav makes the first endeavor to handle various instruction navigation tasks without any navigation training or pre-built maps. To reach this goal, we introduce Dynamic Chain-of-Navigation (DCoN) to unify the planning process for different types of navigation instructions. Furthermore, we propose Multi-sourced Value Maps to model key elements in instruction navigation so that linguistic DCoN planning can be converted into robot actionable trajectories. With InstructNav, we complete the R2R-CE task in a zero-shot way for the first time and outperform many task-training methods. Besides, InstructNav also surpasses the previous SOTA method by 10.48% on the zero-shot Habitat ObjNav and by 86.34% on demand-driven navigation DDN. Real robot experiments on diverse indoor scenes further demonstrate our method's robustness in coping with the environment and instruction variations. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Submitted to CoRL 2024

arXiv:2406.00321 [pdf, other]

Non-Abelian lattice gauge fields in the photonic synthetic frequency dimension

Authors: Dali Cheng, Kai Wang, Charles Roques-Carmes, Eran Lustig, Olivia Y. Long, Heming Wang, Shanhui Fan

Abstract: Non-Abelian gauge fields provide a conceptual framework for the description of particles having spins. The theoretical importance of non-Abelian gauge fields motivates their experimental synthesis and explorations. Here, we demonstrate non-Abelian lattice gauge fields for photons. In the study of gauge fields, lattice models are essential for the understanding of their implications in extended sys… ▽ More Non-Abelian gauge fields provide a conceptual framework for the description of particles having spins. The theoretical importance of non-Abelian gauge fields motivates their experimental synthesis and explorations. Here, we demonstrate non-Abelian lattice gauge fields for photons. In the study of gauge fields, lattice models are essential for the understanding of their implications in extended systems. We utilize the platform of synthetic frequency dimensions, which enables the study of lattice physics in a scalable and programmable way. We observe Dirac cones at time-reversal-invariant momenta as well as the direction reversal of eigenstate trajectories associated with such Dirac cones. Both of them are unique signatures of non-Abelian gauge fields in our lattice system. Our results highlight the implications of non-Abelian gauge field in the study of topological physics and suggest opportunities for the control of photon spins and pseudospins. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20714 [pdf]

Large low-field magnetocaloric response in a ferromagnetic gadolinium orthophosphate

Authors: Ziyu W. Yang, Jie Zhang, Maocai Pi, Xubin Ye, Chenxu Kang, Xiaoliang Weng, Wei Tang, Hongzhi Cui, Yu-Jia Zeng, Youwen Long

Abstract: Bulk magnetic and thermodynamic measurements, along with mean-field calculations, were conducted on the ferromagnetic K3Gd5(PO4)6 powders. No magnetic ordering was observed until 2 K, while the application of an external field B > 1 T resulted in the splitting of the Gd3+ ground state multiplet and induced a non-cooperative Schottky effect. The average nearest-neighbor exchange strength |J1/kB| is… ▽ More Bulk magnetic and thermodynamic measurements, along with mean-field calculations, were conducted on the ferromagnetic K3Gd5(PO4)6 powders. No magnetic ordering was observed until 2 K, while the application of an external field B > 1 T resulted in the splitting of the Gd3+ ground state multiplet and induced a non-cooperative Schottky effect. The average nearest-neighbor exchange strength |J1/kB| is determined to be 0.017 K, which leads to a remarkably large low field magnetic entropy change ΔSm = 36.2 J kg-1 K-1 under applied field change B = 2 T at temperature T = 2 K, as well as a maximum adiabatic temperature change Tad = 10.9 K. We contend that ferromagnetic gadolinium orthophosphates serve as a promising reservoir for exploring advanced magnetic refrigerants applicable under low magnetic fields. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.18757 [pdf, other]

Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation

Authors: Jiawei Fu, Yonghao Long, Kai Chen, Wang Wei, Qi Dou

Abstract: Surgical robot task automation has been a promising research topic for improving surgical efficiency and quality. Learning-based methods have been recognized as an interesting paradigm and been increasingly investigated. However, existing approaches encounter difficulties in long-horizon goal-conditioned tasks due to the intricate compositional structure, which requires decision-making for a seque… ▽ More Surgical robot task automation has been a promising research topic for improving surgical efficiency and quality. Learning-based methods have been recognized as an interesting paradigm and been increasingly investigated. However, existing approaches encounter difficulties in long-horizon goal-conditioned tasks due to the intricate compositional structure, which requires decision-making for a sequence of sub-steps and understanding of inherent dynamics of goal-reaching tasks. In this paper, we propose a new learning-based framework by leveraging the strong reasoning capability of the GPT-based architecture to automate surgical robotic tasks. The key to our approach is developing a goal-conditioned decision transformer to achieve sequential representations with goal-aware future indicators in order to enhance temporal reasoning. Moreover, considering to exploit a general understanding of dynamics inherent in manipulations, thus making the model's reasoning ability to be task-agnostic, we also design a cross-task pretraining paradigm that uses multiple training objectives associated with data from diverse tasks. We have conducted extensive experiments on 10 tasks using the surgical robot learning simulator SurRoL~\cite{long2023human}. The results show that our new approach achieves promising performance and task versatility compared to existing methods. The learned trajectories can be deployed on the da Vinci Research Kit (dVRK) for validating its practicality in real surgical robot settings. Our project website is at: https://med-air.github.io/SurRoL. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.15962 [pdf]

doi 10.1016/j.ins.2024.120393

Wearable-based behaviour interpolation for semi-supervised human activity recognition

Authors: Haoran Duan, Shidong Wang, Varun Ojha, Shizheng Wang, Yawen Huang, Yang Long, Rajiv Ranjan, Yefeng Zheng

Abstract: While traditional feature engineering for Human Activity Recognition (HAR) involves a trial-anderror process, deep learning has emerged as a preferred method for high-level representations of sensor-based human activities. However, most deep learning-based HAR requires a large amount of labelled data and extracting HAR features from unlabelled data for effective deep learning training remains chal… ▽ More While traditional feature engineering for Human Activity Recognition (HAR) involves a trial-anderror process, deep learning has emerged as a preferred method for high-level representations of sensor-based human activities. However, most deep learning-based HAR requires a large amount of labelled data and extracting HAR features from unlabelled data for effective deep learning training remains challenging. We, therefore, introduce a deep semi-supervised HAR approach, MixHAR, which concurrently uses labelled and unlabelled activities. Our MixHAR employs a linear interpolation mechanism to blend labelled and unlabelled activities while addressing both inter- and intra-activity variability. A unique challenge identified is the activityintrusion problem during mixing, for which we propose a mixing calibration mechanism to mitigate it in the feature embedding space. Additionally, we rigorously explored and evaluated the five conventional/popular deep semi-supervised technologies on HAR, acting as the benchmark of deep semi-supervised HAR. Our results demonstrate that MixHAR significantly improves performance, underscoring the potential of deep semi-supervised techniques in HAR. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15914 [pdf, other]

ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

Authors: Yumin Zhang, Xingyu Miao, Haoran Duan, Bo Wei, Tejal Shah, Yang Long, Rajiv Ranjan

Abstract: Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge,… ▽ More Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge, LucidDreamer replaces the Denoising Diffusion Probabilistic Model (DDPM) in SDS with the Denoising Diffusion Implicit Model (DDIM) to construct Interval Score Matching (ISM). However, ISM inevitably inherits inconsistencies from DDIM, causing reconstruction errors during the DDIM inversion process. This results in poor performance in the detailed generation of 3D objects and loss of content. To alleviate these problems, we propose a novel method named Exact Score Matching (ESM). Specifically, ESM leverages auxiliary variables to mathematically guarantee exact recovery in the DDIM reverse process. Furthermore, to effectively capture the dynamic changes of the original and auxiliary variables, the LoRA of a pre-trained diffusion model implements these exact paths. Extensive experiments demonstrate the effectiveness of ESM in text-to-3D generation, particularly highlighting its superiority in detailed generation. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.11252 [pdf, other]

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversion process of DDIM to generate two paths from the same starting point for calculation. Since both paths start from the same starting point, TSM can reduce the accumulated error compared to ISM, thus alleviating the problem of pseudo ground truth inconsistency. TSM enhances the stability and consistency of the model's generated paths during the distillation process. We demonstrate this experimentally and further show that ISM is a special case of TSM. Furthermore, to optimize the current multi-stage optimization process from high-resolution text to 3D generation, we adopt Stable Diffusion XL for guidance. In response to the issues of abnormal replication and splitting caused by unstable gradients during the 3D Gaussian splatting process when using Stable Diffusion XL, we propose a pixel-by-pixel gradient clipping method. Extensive experiments show that our model significantly surpasses the state-of-the-art models in terms of visual quality and performance. Code: \url{https://github.com/xingy038/Dreamer-XL}. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.08748 [pdf, other]

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Project Page: https://dit.hunyuan.tencent.com/

arXiv:2405.04879 [pdf, other]

doi 10.1103/PhysRevLett.132.236401

Non-Abelian Braiding of Topological Edge Bands

Authors: Yang Long, Zihao Wang, Chen Zhang, Haoran Xue, Yuxin Zhao, Baile Zhang

Abstract: Braiding is a geometric concept that manifests itself in a variety of scientific contexts from biology to physics, and has been employed to classify bulk band topology in topological materials. Topological edge states can also form braiding structures, as demonstrated recently in a type of topological insulators known as Möbius insulators, whose topological edge states form two braided bands exhib… ▽ More Braiding is a geometric concept that manifests itself in a variety of scientific contexts from biology to physics, and has been employed to classify bulk band topology in topological materials. Topological edge states can also form braiding structures, as demonstrated recently in a type of topological insulators known as Möbius insulators, whose topological edge states form two braided bands exhibiting a Möbius twist. While the formation of Möbius twist is inspiring, it belongs to the simple Abelian braid group $\mathbb{B}_2$. The most fascinating features about topological braids rely on the non-Abelianness in the higher-order braid group $\mathbb{B}_N$ ($N \geq 3$), which necessitates multiple edge bands, but so far it has not been discussed. Here, based on the gauge enriched symmetry, we develop a scheme to realize non-Abelian braiding of multiple topological edge bands. We propose tight-binding models of topological insulators that are able to generate topological edge states forming non-Abelian braiding structures. Experimental demonstrations are conducted in two acoustic crystals, which carry three and four braided acoustic edge bands, respectively. The observed braiding structure can correspond to the topological winding in the complex eigenvalue space of projective translation operator, akin to the previously established point-gap winding topology in the bulk of the Hatano-Nelson model. Our work also constitutes the realization of non-Abelian braiding topology on an actual crystal platform, but not based on the "virtual" synthetic dimensions. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Journal ref: Phys. Rev. Lett. 132, 236401 (2024)

arXiv:2405.04652 [pdf, ps, other]

AffirmativeAI: Towards LGBTQ+ Friendly Audit Frameworks for Large Language Models

Authors: Yinru Long, Zilin Ma, Yiyang Mei, Zhaoyuan Su

Abstract: LGBTQ+ community face disproportionate mental health challenges, including higher rates of depression, anxiety, and suicidal ideation. Research has shown that LGBTQ+ people have been using large language model-based chatbots, such as ChatGPT, for their mental health needs. Despite the potential for immediate support and anonymity these chatbots offer, concerns regarding their capacity to provide e… ▽ More LGBTQ+ community face disproportionate mental health challenges, including higher rates of depression, anxiety, and suicidal ideation. Research has shown that LGBTQ+ people have been using large language model-based chatbots, such as ChatGPT, for their mental health needs. Despite the potential for immediate support and anonymity these chatbots offer, concerns regarding their capacity to provide empathetic, accurate, and affirming responses remain. In response to these challenges, we propose a framework for evaluating the affirmativeness of LLMs based on principles of affirmative therapy, emphasizing the need for attitudes, knowledge, and actions that support and validate LGBTQ+ experiences. We propose a combination of qualitative and quantitative analyses, hoping to establish benchmarks for "Affirmative AI," ensuring that LLM-based chatbots can provide safe, supportive, and effective mental health support to LGBTQ+ individuals. We benchmark LLM affirmativeness not as a mental health solution for LGBTQ+ individuals or to claim it resolves their mental health issues, as we highlight the need to consider complex discrimination in the LGBTQ+ community when designing technological aids. Our goal is to evaluate LLMs for LGBTQ+ mental health support since many in the community already use them, aiming to identify potential harms of using general-purpose LLMs in this context. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00956 [pdf, other]

SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians

Authors: Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou

Abstract: Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and… ▽ More Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes, we incorporate depth supervision and anisotropy regularization into the Gaussian learning process. Furthermore, we apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations. Our method was evaluated on our collected in-house and public surgical videos datasets. Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene-and produce both visually and physically plausible deformations at a speed approaching real-time. The results demonstrate great potential of our proposed method to enhance the efficiency and variety of simulations available for surgical education and robot learning. △ Less

Submitted 6 August, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.19449 [pdf, other]

AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks

Authors: Yusi Long, Songhan Zhao, Shimin Gong, Bo Gu, Dusit Niyato, Xuemin, Shen

Abstract: This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissio… ▽ More This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissions. We formulate a multi-stage stochastic optimization problem to minimize the long-term time-averaged age-of-information (AoI) by jointly optimizing the GUs' access control, the UAVs' beamforming, and trajectory planning strategies. To solve this problem, we first model the dynamics of the GUs' AoI statuses by virtual queueing systems, and then propose the AoI-aware sensing scheduling and trajectory optimization (AoI-STO) algorithm. This allows us to transform the multi-stage AoI minimization problem into a series of per-slot control problems by using the Lyapunov optimization framework. In each time slot, the GUs' access control, the UAVs' beamforming, and mobility control strategies are updated by using the block coordinate descent (BCD) method according to the instant GUs' AoI statuses. Simulation results reveal that the proposed AoI-STO algorithm can reduce the overall AoI by more than 50%. The GUs' scheduling fairness is also improved greatly by adapting the GUs' access control compared with typical baseline schemes. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by IEEE TVT

arXiv:2404.15339 [pdf, other]

Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation

Authors: Yuehao Wang, Bingchen Gong, Yonghao Long, Siu Hin Fan, Qi Dou

Abstract: The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate sha… ▽ More The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate shapes and textures. To address this gap, we present a data-driven framework that leverages emerging neural radiance field technology to enable high-quality surgical reconstruction and explore its application for surgical simulations. We first focus on developing a fast NeRF-based surgical scene 3D reconstruction approach that achieves state-of-the-art performance. This method can significantly outperform traditional 3D reconstruction methods, which have failed to capture large deformations and produce fine-grained shapes and textures. We then propose an automated creation pipeline of interactive surgical simulation environments through a closed mesh extraction algorithm. Our experiments have validated the superior performance and efficiency of our proposed approach in surgical scene 3D reconstruction. We further utilize our reconstructed soft tissues to conduct FEM and MPM simulations, showcasing the practical application of our method in data-driven surgical simulations. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 figures. Accepted by International Journal of Computer Assisted Radiology and Surgery

arXiv:2404.12291 [pdf]

Augmenting emotion features in irony detection with Large language modeling

Authors: Yucheng Lin, Yuhan Xia, Yunfei Long

Abstract: This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, o… ▽ More This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, our methodology augments the detection process by integrating subtle emotional cues, augmented through LLMs, into three benchmark pre-trained NLP models - BERT, T5, and GPT-2 - which are widely recognized as foundational in irony detection. We assessed our method using the SemEval-2018 Task 3 dataset and observed substantial enhancements in irony detection capabilities. △ Less

Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 11 pages, 3 tables, 2 figures. Accepted by the 25th Chinese Lexical Semantics Workshop

arXiv:2403.15905 [pdf, other]

Towards Low-Energy Adaptive Personalization for Resource-Constrained Devices

Authors: Yushan Huang, Josh Millar, Yuxuan Long, Yuchen Zhao, Hamed Haddadi

Abstract: The personalization of machine learning (ML) models to address data drift is a significant challenge in the context of Internet of Things (IoT) applications. Presently, most approaches focus on fine-tuning either the full base model or its last few layers to adapt to new data, while often neglecting energy costs. However, various types of data drift exist, and fine-tuning the full base model or th… ▽ More The personalization of machine learning (ML) models to address data drift is a significant challenge in the context of Internet of Things (IoT) applications. Presently, most approaches focus on fine-tuning either the full base model or its last few layers to adapt to new data, while often neglecting energy costs. However, various types of data drift exist, and fine-tuning the full base model or the last few layers may not result in optimal performance in certain scenarios. We propose Target Block Fine-Tuning (TBFT), a low-energy adaptive personalization framework designed for resource-constrained devices. We categorize data drift and personalization into three types: input-level, feature-level, and output-level. For each type, we fine-tune different blocks of the model to achieve optimal performance with reduced energy costs. Specifically, input-, feature-, and output-level correspond to fine-tuning the front, middle, and rear blocks of the model. We evaluate TBFT on a ResNet model, three datasets, three different training sizes, and a Raspberry Pi. Compared with the $Block Avg$, where each block is fine-tuned individually and their performance improvements are averaged, TBFT exhibits an improvement in model accuracy by an average of 15.30% whilst saving 41.57% energy consumption on average compared with full fine-tuning. △ Less

Submitted 29 March, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: Accepetd to The 4th Workshop on Machine Learning and Systems (EuroMLSys '24)

arXiv:2403.15574 [pdf, other]

SensoryT5: Infusing Sensorimotor Norms into T5 for Enhanced Fine-grained Emotion Classification

Authors: Yuhan Xia, Qingqing Zhao, Yunfei Long, Ge Xu, Jia Wang

Abstract: In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose Sens… ▽ More In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose SensoryT5, a neuro-cognitive approach that integrates sensory information into the T5 (Text-to-Text Transfer Transformer) model, designed specifically for fine-grained emotion classification. This methodology incorporates sensory cues into the T5's attention mechanism, enabling a harmonious balance between contextual understanding and sensory awareness. The resulting model amplifies the richness of emotional representations. In rigorous tests across various detailed emotion classification datasets, SensoryT5 showcases improved performance, surpassing both the foundational T5 model and current state-of-the-art works. Notably, SensoryT5's success signifies a pivotal change in the NLP domain, highlighting the potential influence of neuro-cognitive data in refining machine learning models' emotional sensitivity. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted by CogALex 2024 conference

arXiv:2403.11894 [pdf, other]

doi 10.1016/j.csbj.2024.05.004

From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?

Authors: Guangming Huang, Yingya Li, Shoaib Jameel, Yunfei Long, Giorgos Papanastasiou

Abstract: Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretabl… ▽ More Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretable Artificial Intelligence" (XIAI) is introduced to distinguish XAI from IAI. Different models are further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms are the most prevalent emerging IAI technique. The use of IAI is growing, distinguishing it from XAI. The major challenges identified are that most XIAI does not explore "global" modelling processes, the lack of best practices, and the lack of systematic evaluation and benchmarks. One important opportunity is to use attention mechanisms to enhance multi-modal XIAI for personalized medicine. Additionally, combining DL with causal logic holds promise. Our discussion encourages the integration of XIAI in Large Language Models (LLMs) and domain-specific smaller models. In conclusion, XIAI adoption in healthcare requires dedicated in-house expertise. Collaboration with domain experts, end-users, and policymakers can lead to ready-to-use XIAI methods across NLP and medical tasks. While challenges exist, XIAI techniques offer a valuable foundation for interpretable NLP algorithms in healthcare. △ Less

Submitted 9 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: This paper has been accepted by Computational and Structural Biotechnology Journal

arXiv:2403.09363 [pdf, other]

Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

Authors: Fan Wan, Xingyu Miao, Haoran Duan, Jingjing Deng, Rui Gao, Yang Long

Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both model entities. The teacher model serves as a sentinel on behalf of the data owner, replacing real data, to guide the student model at the AI service provider's end during training. Considering the disparity of knowledge space between the teacher and student, we introduce two variants of the teacher model: the omniscient and the quasi-omniscient teachers. Under these teachers' guidance, the student model seeks to match the teacher model's performance and explores domains that the teacher has not covered. To trade off between privacy and performance, we further introduce two distinct security-level training protocols: white-box and black-box, enhancing the paradigm's adaptability. Despite the inherent challenges of real data absence in the SG-ZSL paradigm, it consistently outperforms in ZSL and GZSL tasks, notably in the white-box protocol. Our comprehensive evaluation further attests to its robustness and efficiency across various setups, including stringent black-box training protocol. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08857 [pdf, other]

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

Authors: Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M… ▽ More Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language Models (MLLMs) with T2I models to bring the user's natural language instructions into reality. Hence, the output modality of MLLMs is extended, and the multi-turn generation quality of T2I models is enhanced thanks to the strong multi-modal comprehension ability of MLLMs. However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper. Therefore, we propose DialogGen, an effective pipeline to align off-the-shelf MLLMs and T2I models to build a Multi-modal Interactive Dialogue System (MIDS) for multi-turn Text-to-Image generation. It is composed of drawing prompt alignment, careful training data curation, and error correction. Moreover, as the field of MIDS flourishes, comprehensive benchmarks are urgently needed to evaluate MIDS fairly in terms of output modality correctness and multi-modal output coherence. To address this issue, we introduce the Multi-modal Dialogue Benchmark (DialogBen), a comprehensive bilingual benchmark designed to assess the ability of MLLMs to generate accurate and coherent multi-modal content that supports image editing. It contains two evaluation metrics to measure the model's ability to switch modalities and the coherence of the output images. Our extensive experiments on DialogBen and user study demonstrate the effectiveness of DialogGen compared with other State-of-the-Art models. △ Less

Submitted 3 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Project page: https://hunyuan-dialoggen.github.io/

arXiv:2403.05770 [pdf, other]

doi 10.1109/TPAMI.2023.3273594

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

Authors: Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

Abstract: Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human in… ▽ More Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human interruptions, which widely exist and may usually cause an unexpected route deviation. In this paper, we present a model-agnostic training paradigm, called Progressive Perturbation-aware Contrastive Learning (PROPER) to enhance the generalization ability of existing VLN agents, by requiring them to learn towards deviation-robust navigation. Specifically, a simple yet effective path perturbation scheme is introduced to implement the route deviation, with which the agent is required to still navigate successfully following the original instruction. Since directly enforcing the agent to learn perturbed trajectories may lead to inefficient training, a progressively perturbed trajectory augmentation strategy is designed, where the agent can self-adaptively learn to navigate under perturbation with the improvement of its navigation performance for each specific trajectory. For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts. Extensive experiments on R2R show that PROPER can benefit multiple VLN baselines in perturbation-free scenarios. We further collect the perturbed path data to construct an introspection subset based on the R2R, called Path-Perturbed R2R (PP-R2R). The results on PP-R2R show unsatisfying robustness of popular VLN agents and the capability of PROPER in improving the navigation robustness. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted by TPAMI 2023

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI,2023)

arXiv:2402.19350 [pdf, other]

Prompting Explicit and Implicit Knowledge for Multi-hop Question Answering Based on Human Reading Process

Authors: Guangming Huang, Yunfei Long, Cunjin Luo, Jiaxing Shen, Xia Sun

Abstract: Pre-trained language models (PLMs) leverage chains-of-thought (CoT) to simulate human reasoning and inference processes, achieving proficient performance in multi-hop QA. However, a gap persists between PLMs' reasoning abilities and those of humans when tackling complex problems. Psychological studies suggest a vital connection between explicit information in passages and human prior knowledge dur… ▽ More Pre-trained language models (PLMs) leverage chains-of-thought (CoT) to simulate human reasoning and inference processes, achieving proficient performance in multi-hop QA. However, a gap persists between PLMs' reasoning abilities and those of humans when tackling complex problems. Psychological studies suggest a vital connection between explicit information in passages and human prior knowledge during reading. Nevertheless, current research has given insufficient attention to linking input passages and PLMs' pre-training-based knowledge from the perspective of human cognition studies. In this study, we introduce a Prompting Explicit and Implicit knowledge (PEI) framework, which uses prompts to connect explicit and implicit knowledge, aligning with human reading process for multi-hop QA. We consider the input passages as explicit knowledge, employing them to elicit implicit knowledge through unified prompt reasoning. Furthermore, our model incorporates type-specific reasoning via prompts, a form of implicit knowledge. Experimental results show that PEI performs comparably to the state-of-the-art on HotpotQA. Ablation studies confirm the efficacy of our model in bridging and integrating explicit and implicit knowledge. △ Less

Submitted 27 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: This paper has been accepted at COLING 2024

arXiv:2402.18541 [pdf, ps, other]

Dynamic Deterministic Constant-Approximate Distance Oracles with $n^ε$ Worst-Case Update Time

Authors: Bernhard Haeupler, Yaowei Long, Thatchaphol Saranurak

Abstract: We present a new distance oracle in the fully dynamic setting: given a weighted undirected graph $G=(V,E)$ with $n$ vertices undergoing both edge insertions and deletions, and an arbitrary parameter $ε$ where $ε\in[1/\log^{c} n,1]$ and $c>0$ is a small constant, we can deterministically maintain a data structure with $n^ε$ worst-case update time that, given any pair of vertices $(u,v)$, returns a… ▽ More We present a new distance oracle in the fully dynamic setting: given a weighted undirected graph $G=(V,E)$ with $n$ vertices undergoing both edge insertions and deletions, and an arbitrary parameter $ε$ where $ε\in[1/\log^{c} n,1]$ and $c>0$ is a small constant, we can deterministically maintain a data structure with $n^ε$ worst-case update time that, given any pair of vertices $(u,v)$, returns a $2^{{\rm poly}(1/ε)}$-approximate distance between $u$ and $v$ in ${\rm poly}(1/ε)\log\log n$ query time. Our algorithm significantly advances the state-of-the-art in two aspects, both for fully dynamic algorithms and even decremental algorithms. First, no existing algorithm with worst-case update time guarantees a $o(n)$-approximation while also achieving an $n^{2-Ω(1)}$ update and $n^{o(1)}$ query time, while our algorithm offers a constant $O_ε(1)$-approximation with $n^ε$ update time and $O_ε(\log \log n)$ query time. Second, even if amortized update time is allowed, it is the first deterministic constant-approximation algorithm with $n^{1-Ω(1)}$ update and query time. The best result in this direction is the recent deterministic distance oracle by Chuzhoy and Zhang [STOC 2023] which achieves an approximation of $(\log\log n)^{2^{O(1/ε^{3})}}$ with amortized update time of $n^ε$ and query time of $2^{{\rm poly}(1/ε)}\log n\log\log n$. We obtain the result by dynamizing tools related to length-constrained expanders [Haeupler-Räcke-Ghaffari, STOC 2022; Haeupler-Hershkowitz-Tan, 2023; Haeupler-Huebotter-Ghaffari, 2022]. Our technique completely bypasses the 40-year-old Even-Shiloach tree, which has remained the most pervasive tool in the area but is inherently amortized. △ Less

Submitted 10 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 137 pages

arXiv:2402.16722 [pdf]

All-optical polarization scrambler based on polarization beam splitting with amplified fiber ring

Authors: Yuanjie Yu, Shiyun Dai, Qiang Wu, Yu Long, Ai Liu, Peng Cai, Ligang Huang, Lei Gao, Tao Zhu

Abstract: Optical-fiber-based polarization scramblers can reduce the impact of polarization sensitive performance of various optical fiber systems. Here, we propose a simple and efficient polarization scrambler based on an all optical Mach-Zehnder structure by combining polarization beam splitter and amplified fiber ring. To totally decoherence one polarization splitted beam, a fiber ring together with an a… ▽ More Optical-fiber-based polarization scramblers can reduce the impact of polarization sensitive performance of various optical fiber systems. Here, we propose a simple and efficient polarization scrambler based on an all optical Mach-Zehnder structure by combining polarization beam splitter and amplified fiber ring. To totally decoherence one polarization splitted beam, a fiber ring together with an amplifier are incorporated. The ratio of two orthogonal beams can be controlled by varying the amplification factor, and we observe different evolution trajectories of the output state of polarizations on Poincare sphere. When the amplification factor exceeds a certain threshold, the scrambler system exhibits chaotical behavior. A commercial single wavelength laser with linewidth of 3 MHz is utilized to characterize the scrambling performance. We found that when the sampling rate is 1.6 MSa/s, a scrambling speed up to 2000 krad/s can be obtained for the average degree of polarization being less than 0.1. We also exploit these chaotic polarization fluctuations to generate random binary number, indicating that the proposed technique is a good candidate for random bit generator. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15078 [pdf, other]

LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models

Authors: Zhijie Liu, Yutian Tang, Meiyun Li, Xin Jin, Yunfei Long, Liang Feng Zhang, Xiapu Luo

Abstract: XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration comp… ▽ More XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration compatibility bugs. Our findings highlight certain limitations of LLMs in effectively identifying and resolving these bugs, while also revealing their potential in addressing complex, hard-to-repair issues that traditional tools struggle with. Leveraging these insights, we introduce the LLM-CompDroid framework, which combines the strengths of LLMs and traditional tools for bug resolution. Our experimental results demonstrate a significant enhancement in bug resolution performance by LLM-CompDroid, with LLM-CompDroid-GPT-3.5 and LLM-CompDroid-GPT-4 surpassing the state-of-the-art tool, ConfFix, by at least 9.8% and 10.4% in both Correct and Correct@k metrics, respectively. This innovative approach holds promise for advancing the reliability and robustness of Android applications, making a valuable contribution to the field of software development. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.10353 [pdf, other]

Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models

Authors: Kang He, Yinghan Long, Kaushik Roy

Abstract: Prompt learning is susceptible to intrinsic bias present in pre-trained language models (LMs), resulting in sub-optimal performance of prompt-based zero/few-shot learning. In this work, we propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained LMs. Different from prior efforts that address intrinsic bias primarily for social fairness and often involve excessive co… ▽ More Prompt learning is susceptible to intrinsic bias present in pre-trained language models (LMs), resulting in sub-optimal performance of prompt-based zero/few-shot learning. In this work, we propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained LMs. Different from prior efforts that address intrinsic bias primarily for social fairness and often involve excessive computational cost, our objective is to explore enhancing LMs' performance in downstream zero/few-shot learning while emphasizing the efficiency of intrinsic bias calibration. Specifically, we leverage a diverse set of auto-selected null-meaning inputs generated from GPT-4 to prompt pre-trained LMs for intrinsic bias probing. Utilizing the bias-reflected probability distribution, we formulate a distribution disparity loss for bias calibration, where we exclusively update bias parameters ($0.1\%$ of total parameters) of LMs towards equal probability distribution. Experimental results show that the calibration promotes an equitable starting point for LMs while preserving language modeling abilities. Across a wide range of datasets, including sentiment analysis and topic classification, our method significantly improves zero/few-shot learning performance of LMs for both in-context learning and prompt-based fine-tuning (on average $9\%$ and $2\%$, respectively). △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.09748 [pdf, other]

Model Compression and Efficient Inference for Large Language Models: A Survey

Authors: Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He

Abstract: Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, sim… ▽ More Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, similar to smaller models, compression and acceleration algorithms for large language models can still be categorized into quantization, pruning, distillation, compact architecture design, dynamic networks. However, Large language models have two prominent characteristics compared to smaller models: (1) Most of compression algorithms require finetuning or even retraining the model after compression. The most notable aspect of large models is the very high cost associated with model finetuning or training. Therefore, many algorithms for large models, such as quantization and pruning, start to explore tuning-free algorithms. (2) Large models emphasize versatility and generalization rather than performance on a single task. Hence, many algorithms, such as knowledge distillation, focus on how to preserving their versatility and generalization after compression. Since these two characteristics were not very pronounced in early large models, we further distinguish large language models into medium models and ``real'' large models. Additionally, we also provide an introduction to some mature frameworks for efficient inference of large models, which can support basic compression or acceleration algorithms, greatly facilitating model deployment for users. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 47 pages, review 380 papers. The work is ongoing

arXiv:2402.09260 [pdf, other]

doi 10.1145/3613904.3642482

Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support

Authors: Zilin Ma, Yiyang Mei, Yinru Long, Zhaoyuan Su, Krzysztof Z. Gajos

Abstract: LGBTQ+ individuals are increasingly turning to chatbots powered by large language models (LLMs) to meet their mental health needs. However, little research has explored whether these chatbots can adequately and safely provide tailored support for this demographic. We interviewed 18 LGBTQ+ and 13 non-LGBTQ+ participants about their experiences with LLM-based chatbots for mental health needs. LGBTQ+… ▽ More LGBTQ+ individuals are increasingly turning to chatbots powered by large language models (LLMs) to meet their mental health needs. However, little research has explored whether these chatbots can adequately and safely provide tailored support for this demographic. We interviewed 18 LGBTQ+ and 13 non-LGBTQ+ participants about their experiences with LLM-based chatbots for mental health needs. LGBTQ+ participants relied on these chatbots for mental health support, likely due to an absence of support in real life. Notably, while LLMs offer prompt support, they frequently fall short in grasping the nuances of LGBTQ-specific challenges. Although fine-tuning LLMs to address LGBTQ+ needs can be a step in the right direction, it isn't the panacea. The deeper issue is entrenched in societal discrimination. Consequently, we call on future researchers and designers to look beyond mere technical refinements and advocate for holistic strategies that confront and counteract the societal biases burdening the LGBTQ+ community. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.09150 [pdf, ps, other]

Better Decremental and Fully Dynamic Sensitivity Oracles for Subgraph Connectivity

Authors: Yaowei Long, Yunfan Wang

Abstract: We study the \emph{sensitivity oracles problem for subgraph connectivity} in the \emph{decremental} and \emph{fully dynamic} settings. In the fully dynamic setting, we preprocess an $n$-vertices $m$-edges undirected graph $G$ with $n_{\rm off}$ deactivated vertices initially and the others are activated. Then we receive a single update $D\subseteq V(G)$ of size $|D| = d \leq d_{\star}$, representi… ▽ More We study the \emph{sensitivity oracles problem for subgraph connectivity} in the \emph{decremental} and \emph{fully dynamic} settings. In the fully dynamic setting, we preprocess an $n$-vertices $m$-edges undirected graph $G$ with $n_{\rm off}$ deactivated vertices initially and the others are activated. Then we receive a single update $D\subseteq V(G)$ of size $|D| = d \leq d_{\star}$, representing vertices whose states will be switched. Finally, we get a sequence of queries, each of which asks the connectivity of two given vertices $u$ and $v$ in the activated subgraph. The decremental setting is a special case when there is no deactivated vertex initially, and it is also known as the \emph{vertex-failure connectivity oracles} problem. We present a better deterministic vertex-failure connectivity oracle with $\widehat{O}(d_{\star}m)$ preprocessing time, $\widetilde{O}(m)$ space, $\widetilde{O}(d^{2})$ update time and $O(d)$ query time, which improves the update time of the previous almost-optimal oracle [Long-Saranurak, FOCS 2022] from $\widehat{O}(d^{2})$ to $\widetilde{O}(d^{2})$. We also present a better deterministic fully dynamic sensitivity oracle for subgraph connectivity with $\widehat{O}(\min\{m(n_{\rm off} + d_{\star}),n^ω\})$ preprocessing time, $\widetilde{O}(\min\{m(n_{\rm off} + d_{\star}),n^{2}\})$ space, $\widetilde{O}(d^{2})$ update time and $O(d)$ query time, which significantly improves the update time of the state of the art [Hu-Kosinas-Polak, 2023] from $\widetilde{O}(d^{4})$ to $\widetilde{O}(d^{2})$. Furthermore, our solution is even almost-optimal assuming popular fine-grained complexity conjectures. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 30 pages

arXiv:2402.02380 [pdf]

Evaluating Large Language Models in Analysing Classroom Dialogue

Authors: Yun Long, Haifeng Luo, Yu Zhang

Abstract: This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance t… ▽ More This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance the analysis process. The study involves datasets from a middle school, encompassing classroom dialogues across mathematics and Chinese classes. These dialogues were manually coded by educational experts and then analyzed using a customised GPT-4 model. This study focuses on comparing manual annotations with the outputs of GPT-4 to evaluate its efficacy in analyzing educational dialogues. Time efficiency, inter-coder agreement, and inter-coder reliability between human coders and GPT-4 are evaluated. Results indicate substantial time savings with GPT-4, and a high degree of consistency in coding between the model and human coders, with some discrepancies in specific codes. These findings highlight the strong potential of LLM in teaching evaluation and facilitation. △ Less

Submitted 22 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01950 [pdf, other]

ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields

Authors: Xingyu Miao, Yang Bai, Haoran Duan, Fan Wan, Yawen Huang, Yang Long, Yefeng Zheng

Abstract: Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps th… ▽ More Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps the CLIP feature space to the style space of a pre-trained VGG network and then refine the CLIP multi-modal knowledge into a style transfer neural radiation field. Additionally, we use a 3D volumetric representation to perform local style transfer. By combining these operations, ConRF offers the capability to utilize either text or images as references, resulting in the generation of sequences with novel views enhanced by global or local stylization. Our experiment demonstrates that ConRF outperforms other existing methods for 3D scene and single-text stylization in terms of visual quality. △ Less

Submitted 6 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01181 [pdf, other]

Efficient Physically-based Simulation of Soft Bodies in Embodied Environment for Surgical Robot

Authors: Zhenya Yang, Yonghao Long, Kai Chen, Wang Wei, Qi Dou

Abstract: Surgical robot simulation platform plays a crucial role in enhancing training efficiency and advancing research on robot learning. Much effort have been made by scholars on developing open-sourced surgical robot simulators to facilitate research. We also developed SurRoL formerly, an open-source, da Vinci Research Kit (dVRK) compatible and interactive embodied environment for robot learning. Despi… ▽ More Surgical robot simulation platform plays a crucial role in enhancing training efficiency and advancing research on robot learning. Much effort have been made by scholars on developing open-sourced surgical robot simulators to facilitate research. We also developed SurRoL formerly, an open-source, da Vinci Research Kit (dVRK) compatible and interactive embodied environment for robot learning. Despite its advancements, the simulation of soft bodies still remained a major challenge within the open-source platforms available for surgical robotics. To this end, we develop an interactive physically based soft body simulation framework and integrate it to SurRoL. Specifically, we utilized a high-performance adaptation of the Material Point Method (MPM) along with the Neo-Hookean model to represent the deformable tissue. Lagrangian particles are used to track the motion and deformation of the soft body throughout the simulation and Eulerian grids are leveraged to discretize space and facilitate the calculation of forces, velocities, and other physical quantities. We also employed an efficient collision detection and handling strategy to simulate the interaction between soft body and rigid tool of the surgical robot. By employing the Taichi programming language, our implementation harnesses parallel computing to boost simulation speed. Experimental results show that our platform is able to simulate soft bodies efficiently with strong physical interpretability and plausible visual effects. These new features in SurRoL enable the efficient simulation of surgical tasks involving soft tissue manipulation and pave the path for further investigation of surgical robot learning. The code will be released in a new branch of SurRoL github repo. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 8 pages

arXiv:2401.17968 [pdf, other]

Unsupervised Learning of Topological Non-Abelian Braiding in Non-Hermitian Bands

Authors: Yang Long, Haoran Xue, Baile Zhang

Abstract: The topological classification of energy bands has laid the groundwork for the discovery of various topological phases of matter in recent decades. While this classification has traditionally focused on real-energy bands, recent studies have revealed the intriguing topology of complex-energy, or non-Hermitian bands. For example, the spectral winding of complex-energy bands can from unique topologi… ▽ More The topological classification of energy bands has laid the groundwork for the discovery of various topological phases of matter in recent decades. While this classification has traditionally focused on real-energy bands, recent studies have revealed the intriguing topology of complex-energy, or non-Hermitian bands. For example, the spectral winding of complex-energy bands can from unique topological structures like braids, holding promise for advancing quantum computing. However, discussions of complex-energy braids have been largely limited to the Abelian braid group $\mathbb{B}_2$ for its relative simplicity, while identifying topological non-Abelian braiding is still difficult since it has no universal topological invariant for characterization. Here, we present a machine learning algorithm for the unsupervised identification of non-Abelian braiding of multiple complex-energy bands. The consistency with Artin's well-known topological equivalence conditions in braiding is demonstrated. Inspired by the results from unsupervised learning, we also introduce a winding matrix as a topological invariant in charactering the braiding topology and unveiling the bulk-edge correspondence of non-Abelian braided non-Hermitian bands. Finally, we extend our approach to identify non-Abelian braiding topology in 2D/3D exceptional semimetals and successfully address the unknotting problem in an unsupervised manner. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.15383 [pdf, ps, other]

Connectedness of the Gromov boundary of fine curve graphs

Authors: Yusen Long, Dong Tan

Abstract: In this paper, we study the topological properties of the Gromov boundary of the fine curve graph of an orientable finite-type surface of genus at least 2. This graph consisting of topological curves has much richer dynamics than the classical curve graph. Using the techniques introduced by Wright [Wri23], we show that this boundary is (path) connected and that the spheres in non-separating fine c… ▽ More In this paper, we study the topological properties of the Gromov boundary of the fine curve graph of an orientable finite-type surface of genus at least 2. This graph consisting of topological curves has much richer dynamics than the classical curve graph. Using the techniques introduced by Wright [Wri23], we show that this boundary is (path) connected and that the spheres in non-separating fine curve graph are connected. △ Less

Submitted 28 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: 16 pages. New version specifies the topology of curves, corrects some minor errors and typos. Comments are welcome!

MSC Class: 57K20; 53C23

arXiv:2401.13259 [pdf, ps, other]

Three closed characteristics on non-degenerate star-shaped hypersurfaces in $\mathbf{R}^{6}$

Authors: Huagui Duan, Hui Liu, Yiming Long, Zihao Qi, Wei Wang

Abstract: In this paper, we prove that for every non-degenerate $C^3$ compact star-shaped hypersurface $Σ$ in $\mathbf{R}^{6}$ which carries no prime closed characteristic of Maslov-type index $0$ or no prime closed characteristic of Maslov-type index $-1$, there exist at least three prime closed characteristics on $Σ$. In this paper, we prove that for every non-degenerate $C^3$ compact star-shaped hypersurface $Σ$ in $\mathbf{R}^{6}$ which carries no prime closed characteristic of Maslov-type index $0$ or no prime closed characteristic of Maslov-type index $-1$, there exist at least three prime closed characteristics on $Σ$. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 30 pages. arXiv admin note: text overlap with arXiv:2205.07082, arXiv:1510.08648, arXiv:2205.14789

arXiv:2401.04861 [pdf, other]

doi 10.1016/j.patcog.2024.110729

CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video

Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Yang Long, Yefeng Zheng

Abstract: The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of d… ▽ More The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of details. To address this limitation, we propose a novel approach that builds upon a recent generalization NeRF, which aggregates nearby views onto new viewpoints. However, such methods are typically only effective for static scenes. To overcome this challenge, we introduce a module that operates in both the time and frequency domains to aggregate the features of object motion. This allows us to learn the relationship between frames and generate higher-quality images. Our experiments demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets. Specifically, our approach outperforms existing methods in terms of both the accuracy and visual quality of the synthesized views. Our code is available on https://github.com/xingy038/CTNeRF. △ Less

Submitted 26 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by Pattern Recognition

arXiv:2401.03623 [pdf]

A Video Coding Method Based on Neural Network for CLIC2024

Authors: Zhengang Li, Jingchi Zhang, Yonghua Wang, Xing Zeng, Zhen Zhang, Yunlin Long, Menghu Jia, Ning Wang

Abstract: This paper presents a video coding scheme that combines traditional optimization methods with deep learning methods based on the Enhanced Compression Model (ECM). In this paper, the traditional optimization methods adaptively adjust the quantization parameter (QP). The key frame QP offset is set according to the video content characteristics, and the coding tree unit (CTU) level QP of all frames i… ▽ More This paper presents a video coding scheme that combines traditional optimization methods with deep learning methods based on the Enhanced Compression Model (ECM). In this paper, the traditional optimization methods adaptively adjust the quantization parameter (QP). The key frame QP offset is set according to the video content characteristics, and the coding tree unit (CTU) level QP of all frames is also adjusted according to the spatial-temporal perception information. Block importance mapping technology (BIM) is also introduced, which adjusts the QP according to the block importance. Meanwhile, the deep learning methods propose a convolutional neural network-based loop filter (CNNLF), which is turned on/off based on the rate-distortion optimization at the CTU and frame level. Besides, intra-prediction using neural networks (NN-intra) is proposed to further improve compression quality, where 8 neural networks are used for predicting blocks of different sizes. The experimental results show that compared with ECM-3.0, the proposed traditional methods and adding deep learning methods improve the PSNR by 0.54 dB and 1 dB at 0.05Mbps, respectively; 0.38 dB and 0.71dB at 0.5 Mbps, respectively, which proves the superiority of our method. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Showing 1–50 of 493 results for author: Long, Y