Search | arXiv e-print repository

A Score-Based Density Formula, with Applications in Diffusion Generative Models

Abstract: Score-based generative models (SGMs) have revolutionized the field of generative modeling, achieving unprecedented success in generating realistic and diverse content. Despite empirical advances, the theoretical basis for why optimizing the evidence lower bound (ELBO) on the log-likelihood is effective for training diffusion generative models, such as DDPMs, remains largely unexplored. In this pap… ▽ More Score-based generative models (SGMs) have revolutionized the field of generative modeling, achieving unprecedented success in generating realistic and diverse content. Despite empirical advances, the theoretical basis for why optimizing the evidence lower bound (ELBO) on the log-likelihood is effective for training diffusion generative models, such as DDPMs, remains largely unexplored. In this paper, we address this question by establishing a density formula for a continuous-time diffusion process, which can be viewed as the continuous-time limit of the forward process in an SGM. This formula reveals the connection between the target density and the score function associated with each step of the forward process. Building on this, we demonstrate that the minimizer of the optimization objective for training DDPMs nearly coincides with that of the true objective, providing a theoretical foundation for optimizing DDPMs using the ELBO. Furthermore, we offer new insights into the role of score-matching regularization in training GANs, the use of ELBO in diffusion classifiers, and the recently proposed diffusion loss. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16288 [pdf, other]

OpenFGL: A Comprehensive Benchmarks for Federated Graph Learning

Authors: Xunkai Li, Yinlin Zhu, Boyang Pang, Guochen Yan, Yeyu Yan, Zening Li, Zhengyu Wu, Wentao Zhang, Rong-Hua Li, Guoren Wang

Abstract: Federated graph learning (FGL) has emerged as a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach is particularly beneficial in privacy-sensitive scenarios and offers a new perspective on addressing scalability challenges in large-scale graph learning. Despite the proliferation of FGL, the diverse motivations… ▽ More Federated graph learning (FGL) has emerged as a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach is particularly beneficial in privacy-sensitive scenarios and offers a new perspective on addressing scalability challenges in large-scale graph learning. Despite the proliferation of FGL, the diverse motivations from practical applications, spanning various research backgrounds and experimental settings, pose a significant challenge to fair evaluation. To fill this gap, we propose OpenFGL, a unified benchmark designed for the primary FGL scenarios: Graph-FL and Subgraph-FL. Specifically, OpenFGL includes 38 graph datasets from 16 application domains, 8 federated data simulation strategies that emphasize graph properties, and 5 graph-based downstream tasks. Additionally, it offers 18 recently proposed SOTA FGL algorithms through a user-friendly API, enabling a thorough comparison and comprehensive evaluation of their effectiveness, robustness, and efficiency. Empirical results demonstrate the ability of FGL while also revealing its potential limitations, offering valuable insights for future exploration in this thriving field. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: Under Review

arXiv:2408.15493 [pdf, ps, other]

Investigating the $p$-$Ω$ Interaction and Correlation Functions

Authors: Ye Yan, Youchang Yang, Qi Huang, Hongxia Huang, Jialun Ping

Abstract: Motivated by the experimental measurements, we investigate the $p$-$Ω$ correlation functions and interactions. By solving the inverse scattering problem, we derive the $p$-$Ω$ potentials from a quark model. The effects of Coulomb interaction and spin-averaging are discussed. According to our results, the depletion of the $p$-$Ω$ correlation functions, attributed to the $J^P = 2^+$ bound state not… ▽ More Motivated by the experimental measurements, we investigate the $p$-$Ω$ correlation functions and interactions. By solving the inverse scattering problem, we derive the $p$-$Ω$ potentials from a quark model. The effects of Coulomb interaction and spin-averaging are discussed. According to our results, the depletion of the $p$-$Ω$ correlation functions, attributed to the $J^P = 2^+$ bound state not observed in the ALICE Collaboration's measurements [Nature \textbf{588}, 232 (2020)], can be explained by the contribution of the attractive $J^P = 1^+$ component in spin-averaging. Additionally, there is a subtle sub-unity part of the correlation function, which can also be seen in the experimental data, supporting the existence of the $p$-$Ω$ bound state. So far, we have completed the consistent description of the $p$-$Ω$ system from the perspective of the quark model in terms of energy spectrum, scattering phase shift, and correlation function. The existence of the $p$-$Ω$ bound state has been confirmed from these three aspects. In Appendix, we learn the relationship between correlation functions and interaction potentials by using simplified square potential models and find a periodic-like variation. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 9 pages, 6 figures

arXiv:2408.14917 [pdf, other]

PMSN: A Parallel Multi-compartment Spiking Neuron for Multi-scale Temporal Processing

Authors: Xinyi Chen, Jibin Wu, Chenxiang Ma, Yinsong Yan, Yujie Wu, Kay Chen Tan

Abstract: Spiking Neural Networks (SNNs) hold great potential to realize brain-inspired, energy-efficient computational systems. However, current SNNs still fall short in terms of multi-scale temporal processing compared to their biological counterparts. This limitation has resulted in poor performance in many pattern recognition tasks with information that varies across different timescales. To address thi… ▽ More Spiking Neural Networks (SNNs) hold great potential to realize brain-inspired, energy-efficient computational systems. However, current SNNs still fall short in terms of multi-scale temporal processing compared to their biological counterparts. This limitation has resulted in poor performance in many pattern recognition tasks with information that varies across different timescales. To address this issue, we put forward a novel spiking neuron model called Parallel Multi-compartment Spiking Neuron (PMSN). The PMSN emulates biological neurons by incorporating multiple interacting substructures and allows for flexible adjustment of the substructure counts to effectively represent temporal information across diverse timescales. Additionally, to address the computational burden associated with the increased complexity of the proposed model, we introduce two parallelization techniques that decouple the temporal dependencies of neuronal updates, enabling parallelized training across different time steps. Our experimental results on a wide range of pattern recognition tasks demonstrate the superiority of PMSN. It outperforms other state-of-the-art spiking neuron models in terms of its temporal processing capacity, training speed, and computation cost. Specifically, compared with the commonly used Leaky Integrate-and-Fire neuron, PMSN offers a simulation acceleration of over 10 $\times$ and a 30 % improvement in accuracy on Sequential CIFAR10 dataset, while maintaining comparable computational cost. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14520 [pdf, other]

Towards Graph Prompt Learning: A Survey and Beyond

Authors: Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou

Abstract: Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability ac… ▽ More Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability across various tasks. Graphs, as versatile data structures that capture relationships between entities, play pivotal roles in fields such as social network analysis, recommender systems, and biological graphs. Despite the success of pre-train and prompt learning paradigms in Natural Language Processing (NLP) and Computer Vision (CV), their application in graph domains remains nascent. In graph-structured data, not only do the node and edge features often have disparate distributions, but the topological structures also differ significantly. This diversity in graph data can lead to incompatible patterns or gaps between pre-training and fine-tuning on downstream graphs. We aim to bridge this gap by summarizing methods for alleviating these disparities. This includes exploring prompt design methodologies, comparing related techniques, assessing application scenarios and datasets, and identifying unresolved problems and challenges. This survey categorizes over 100 relevant works in this field, summarizing general design principles and the latest applications, including text-attributed graphs, molecules, proteins, and recommendation systems. Through this extensive review, we provide a foundational understanding of graph prompt learning, aiming to impact not only the graph mining community but also the broader Artificial General Intelligence (AGI) community. △ Less

Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

Comments: 19 pages, 2 figures

arXiv:2408.14506 [pdf, other]

Distilling Long-tailed Datasets

Authors: Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

Abstract: Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradi… ▽ More Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. Parameter matching, a common technique in DD, involves aligning the learning parameters of the distilled dataset with that of the original dataset. However, in the context of long-tailed datasets, matching biased experts leads to inheriting the imbalance present in the original data, causing the distilled dataset to inadequately represent tail classes. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we propose a novel long-tailed dataset distillation method, Long-tailed Aware Dataset distillation (LAD). Specifically, we propose Weight Mismatch Avoidance to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we propose Adaptive Decoupled Matching, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.13430 [pdf, other]

Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning?

Authors: Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie J. Su

Abstract: We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be le… ▽ More We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings. Our analysis demonstrates that the ranking-calibrated scores outperform raw scores in estimating the ground truth ``expected review scores'' in both squared and absolute error metrics. Moreover, we propose several cautious, low-risk approaches to using the Isotonic Mechanism and author-provided rankings in peer review processes, including assisting senior area chairs' oversight of area chairs' recommendations, supporting the selection of paper awards, and guiding the recruitment of emergency reviewers. We conclude the paper by addressing the study's limitations and proposing future research directions. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: See more details about the experiment at https://openrank.cc/

arXiv:2408.12710 [pdf, other]

CasualGaze: Towards Modeling and Recognizing Casual Gaze Behavior for Efficient Gaze-based Object Selection

Authors: Yingtian Shi, Yukang Yan, Zisu Li, Chen Liang, Yuntao Wang, Chun Yu, Yuanchun Shi

Abstract: We present CasualGaze, a novel eye-gaze-based target selection technique to support natural and casual eye-gaze input. Unlike existing solutions that require users to keep the eye-gaze center on the target actively, CasualGaze allows users to glance at the target object to complete the selection simply. To understand casual gaze behavior, we studied the spatial distribution of casual gaze for diff… ▽ More We present CasualGaze, a novel eye-gaze-based target selection technique to support natural and casual eye-gaze input. Unlike existing solutions that require users to keep the eye-gaze center on the target actively, CasualGaze allows users to glance at the target object to complete the selection simply. To understand casual gaze behavior, we studied the spatial distribution of casual gaze for different layouts and user behavior in a simulated real-world environment. Results revealed the impacts of object parameters, the speed and randomness features of casual gaze, and special gaze behavior patterns in "blurred areas". Based on the results, we devised CasualGaze algorithms, employing a bivariate Gaussian distribution model along with temporal compensation and voting algorithms for robust target prediction. Usability evaluation study showed significant improvements in recognition and selection speed for CasualGaze compared with two baseline techniques. Subjective ratings and comments further supported the preference for CasualGaze regarding efficiency, accuracy, and stability. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.12352 [pdf, other]

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

Authors: Shiyue Zhang, Zheng Chong, Xujie Zhang, Hanhui Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

Abstract: General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffus… ▽ More General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffusion model trained with retrieval-augmented multi-level corrections. To achieve semantic alignment at the component level, we introduce an automatic component extraction pipeline to obtain spatial and quantitative information of garment components from corresponding images and captions. Subsequently, to exploit component relationships within the garment images, we construct retrieval subsets for each garment by retrieval augmentation based on component-level similarity ranking and conduct contrastive learning to enhance the model perception of components from positive and negative samples. To further enhance the alignment of components across semantic, spatial, and quantitative granularities, we propose the utilization of multi-level correction losses that leverage detailed component information. The experimental findings demonstrate that GarmentAligner achieves superior fidelity and fine-grained semantic alignment when compared to existing competitors. △ Less

Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: Accepted by ECCV 2024

arXiv:2408.11660 [pdf, other]

Anteumbler: Non-Invasive Antenna Orientation Error Measurement for WiFi APs

Authors: Dawei Yan, Panlong Yang, Fei Shang, Nikolaos M. Freris, Yubo Yan

Abstract: The performance of WiFi-based localization systems is affected by the spatial accuracy of WiFi AP. Compared with the imprecision of AP location and antenna separation, the imprecision of AP's or antenna's orientation is more important in real scenarios, including AP rotation and antenna irregular tilt. In this paper, we propose Anteumbler that non-invasively, accurately and efficiently measures th… ▽ More The performance of WiFi-based localization systems is affected by the spatial accuracy of WiFi AP. Compared with the imprecision of AP location and antenna separation, the imprecision of AP's or antenna's orientation is more important in real scenarios, including AP rotation and antenna irregular tilt. In this paper, we propose Anteumbler that non-invasively, accurately and efficiently measures the orientation of each antenna in physical space. Based on the fact that the received power is maximized when a Tx-Rx antenna pair is perfectly aligned, we construct a spatial angle model that can obtain the antennas' orientations without prior knowledge. However, the sampling points of traversing the spatial angle need to cover the entire space. We use the orthogonality of antenna directivity and polarization and adopt an iterative algorithm to reduce the sampling points by hundreds of times, which greatly improves the efficiency. To achieve the required antenna orientation accuracy, we eliminate the influence of propagation distance using a dual plane intersection model and filter out ambient noise. Our real-world experiments with six antenna types, two antenna layouts and two antenna separations show that Anteumbler achieves median errors below 6 degree for both elevation and azimuth angles, and is robust to NLoS and dynamic environments. Last but not least, for the reverse localization system, we deploy Anteumbler over LocAP and reduce the antenna separation error by 10 mm, while for the user localization system, we deploy Anteumbler over SpotFi and reduce the user localization error by more than 1 m. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11366 [pdf, other]

GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language Understanding

Authors: Yibo Yan, Joey Lee

Abstract: In human reading and communication, individuals tend to engage in geospatial reasoning, which involves recognizing geographic entities and making informed inferences about their interrelationships. To mimic such cognitive process, current methods either utilize conventional natural language understanding toolkits, or directly apply models pretrained on geo-related natural language corpora. However… ▽ More In human reading and communication, individuals tend to engage in geospatial reasoning, which involves recognizing geographic entities and making informed inferences about their interrelationships. To mimic such cognitive process, current methods either utilize conventional natural language understanding toolkits, or directly apply models pretrained on geo-related natural language corpora. However, these methods face two significant challenges: i) they do not generalize well to unseen geospatial scenarios, and ii) they overlook the importance of integrating geospatial context from geographical databases with linguistic information from the Internet. To handle these challenges, we propose GeoReasoner, a language model capable of reasoning on geospatially grounded natural language. Specifically, it first leverages Large Language Models (LLMs) to generate a comprehensive location description based on linguistic and geospatial information. It also encodes direction and distance information into spatial embedding via treating them as pseudo-sentences. Consequently, the model is trained on both anchor-level and neighbor-level inputs to learn geo-entity representation. Extensive experimental results demonstrate GeoReasoner's superiority in three tasks: toponym recognition, toponym linking, and geo-entity typing, compared to the state-of-the-art baselines. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: Accepted by International Conference on Information and Knowledge Management 2024

arXiv:2408.09452 [pdf, other]

Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning

Authors: Yuchen Yan, Hanjie Zhao, Senbin Zhu, Hongde Liu, Zhihong Zhang, Yuxiang Jia

Abstract: Quotations in literary works, especially novels, are important to create characters, reflect character relationships, and drive plot development. Current research on quotation extraction in novels primarily focuses on quotation attribution, i.e., identifying the speaker of the quotation. However, the addressee of the quotation is also important to construct the relationship between the speaker and… ▽ More Quotations in literary works, especially novels, are important to create characters, reflect character relationships, and drive plot development. Current research on quotation extraction in novels primarily focuses on quotation attribution, i.e., identifying the speaker of the quotation. However, the addressee of the quotation is also important to construct the relationship between the speaker and the addressee. To tackle the problem of dataset scarcity, we annotate the first Chinese quotation corpus with elements including speaker, addressee, speaking mode and linguistic cue. We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models. Experiments on both Chinese and English datasets show the effectiveness of the proposed methods, which outperform methods based on zero-shot and few-shot large language models. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: This paper has been accepted by NLPCC 2024

arXiv:2408.09429 [pdf, other]

Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

Authors: Kening Zheng, Junkai Chen, Yibo Yan, Xin Zou, Xuming Hu

Abstract: Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that necessitate advanced reasoning abilities from MLLMs. Besides, recent benchmarks regarding relation hallucinations lack in-depth evaluation and effective… ▽ More Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that necessitate advanced reasoning abilities from MLLMs. Besides, recent benchmarks regarding relation hallucinations lack in-depth evaluation and effective mitigation. Moreover, their datasets are typically derived from a systematic annotation process, which could introduce inherent biases due to the predefined process. To handle the aforementioned challenges, we introduce Reefknot, a comprehensive benchmark specifically targeting relation hallucinations, consisting of over 20,000 samples derived from real-world scenarios. Specifically, we first provide a systematic definition of relation hallucinations, integrating perspectives from perceptive and cognitive domains. Furthermore, we construct the relation-based corpus utilizing the representative scene graph dataset Visual Genome (VG), from which semantic triplets follow real-world distributions. Our comparative evaluation across three distinct tasks revealed a substantial shortcoming in the capabilities of current MLLMs to mitigate relation hallucinations. Finally, we advance a novel confidence-based mitigation strategy tailored to tackle the relation hallucinations problem. Across three datasets, including Reefknot, we observed an average reduction of 9.75% in the hallucination rate. We believe our paper sheds valuable insights into achieving trustworthy multimodal intelligence. Our dataset and code will be released upon paper acceptance. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.09320 [pdf, other]

doi 10.1145/3654777.3676424

Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality

Authors: Hyunsung Cho, Alexander Wang, Divya Kartik, Emily Liying Xie, Yukang Yan, David Lindlbauer

Abstract: Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimi… ▽ More Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimination error and front-back confusion. This decreases the efficiency of XR interfaces because users misidentify from which XR element a sound is coming. To address this, we propose Auptimize, a novel computational approach for placing XR sound sources, which mitigates such localization errors by utilizing the ventriloquist effect. Auptimize disentangles the sound source locations from the visual elements and relocates the sound sources to optimal positions for unambiguous identification of sound cues, avoiding errors due to inter-source proximity and front-back confusion. Our evaluation shows that Auptimize decreases spatial audio-based source identification errors compared to playing sound cues at the paired visual-sound locations. We demonstrate the applicability of Auptimize for diverse spatial audio-based interactive XR scenarios. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: UIST 2024

ACM Class: H.5.1; H.5.2; H.5.5

arXiv:2408.07522 [pdf, other]

Optimising MFCC parameters for the automatic detection of respiratory diseases

Authors: Yuyang Yan, Sami O. Simons, Loes van Bemmel, Lauren Reinders, Frits M. E. Franssen, Visara Urovi

Abstract: Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated… ▽ More Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrucken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.07098 [pdf, other]

QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition

Authors: Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan

Abstract: In multi-agent cooperative tasks, the presence of heterogeneous agents is familiar. Compared to cooperation among homogeneous agents, collaboration requires considering the best-suited sub-tasks for each agent. However, the operation of multi-agent systems often involves a large amount of complex interaction information, making it more challenging to learn heterogeneous strategies. Related multi-a… ▽ More In multi-agent cooperative tasks, the presence of heterogeneous agents is familiar. Compared to cooperation among homogeneous agents, collaboration requires considering the best-suited sub-tasks for each agent. However, the operation of multi-agent systems often involves a large amount of complex interaction information, making it more challenging to learn heterogeneous strategies. Related multi-agent reinforcement learning methods sometimes use grouping mechanisms to form smaller cooperative groups or leverage prior domain knowledge to learn strategies for different roles. In contrast, agents should learn deeper role features without relying on additional information. Therefore, we propose QTypeMix, which divides the value decomposition process into homogeneous and heterogeneous stages. QTypeMix learns to extract type features from local historical observations through the TE loss. In addition, we introduce advanced network structures containing attention mechanisms and hypernets to enhance the representation capability and achieve the value decomposition process. The results of testing the proposed method on 14 maps from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance in tasks of varying difficulty. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: 16 pages, 8 figures

ACM Class: I.2.6; I.2.11

arXiv:2408.07089 [pdf, other]

doi 10.1145/3627673.3679122

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

Authors: Bo-Wen Zhang, Yan Yan, Lin Li, Guang Liu

Abstract: Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challen… ▽ More Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the GSM8K+ and MATH+ benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. The data is available at https://huggingface.co/datasets/flagopen/InfinityMATH. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: Accepted by CIKM 2024

ACM Class: I.2.7

arXiv:2408.05687 [pdf, other]

Investigating the competition between the deconfinement and chiral phase transitions in light of the multimessenger observations of neutron stars

Authors: Wen-Li Yuan, Bikai Gao, Yan Yan, Bolin Li, Renxin Xu

Abstract: We extend the parity doublet model for hadronic matter and study the possible presence of quark matter inside the cores of neutron stars with the Nambu-Jona-Lasinio (NJL) model. Considering the uncertainties of the QCD phase diagram and the location of the critical endpoint, we aim to explore the competition between the chiral phase transition and the deconfinement phase transition systematically,… ▽ More We extend the parity doublet model for hadronic matter and study the possible presence of quark matter inside the cores of neutron stars with the Nambu-Jona-Lasinio (NJL) model. Considering the uncertainties of the QCD phase diagram and the location of the critical endpoint, we aim to explore the competition between the chiral phase transition and the deconfinement phase transition systematically, regulated by the vacuum pressure $-B$ in the NJL model. Employing a Maxwell construction, a sharp first-order deconfinement phase transition is implemented combining the parity doublet model for the hadronic phase and the NJL model for the high-energy quark phase. The position of the chiral phase transition is obtained from the NJL model self-consistently. We find stable neutron stars with a quark core within a specific parameter space that satisfies current astronomical observations. The observations suggest a relatively large chiral invariant mass $m_0=600$ MeV in the parity doublet model and a larger split between the chiral and deconfinement phase transitions while assuming the first-order deconfinement phase transition. The maximum mass of the hybrid star that we obtain is $\sim 2.2 M_{\odot}$. △ Less

Submitted 12 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

Comments: 10pages,7 figures

arXiv:2408.05112 [pdf, other]

Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

Authors: Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Rui Li, Wenchi Cheng, Zhu Han

Abstract: Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scen… ▽ More Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scenarios. This system leverages deep generative models to establish a new paradigm in SC. Specifically, At the transmitter end, it employs a joint source-channel coding mechanism based on the Swin Transformer for efficient semantic feature extraction and compression. At the receiver end, an advanced Diffusion Model (DM) reconstructs high-quality images from degraded signals, enhancing perceptual details. Additionally, we present a Multi-User Generative Semantic Communication (MU-GSC) system utilizing an asynchronous processing model. This model effectively manages multiple user requests and optimally utilizes system resources for parallel processing. Simulation results on public datasets demonstrate that our generative AI semantic communication systems achieve superior transmission efficiency and enhanced communication content quality across various channel conditions. Compared to CNN-based DeepJSCC, our methods improve the Peak Signal-to-Noise Ratio (PSNR) by 17.75% in Additive White Gaussian Noise (AWGN) channels and by 20.86% in Rayleigh channels. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2408.05006 [pdf, other]

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

Authors: Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

Abstract: Debugging is a vital aspect of software development, yet the debugging capabilities of Large Language Models (LLMs) remain largely unexplored. This paper first introduces DEBUGEVAL, a comprehensive benchmark designed to evaluate the debugging capabilities of LLMs. DEBUGEVAL collects data from existing high-quality datasets and designs four different tasks to evaluate the debugging effectiveness, i… ▽ More Debugging is a vital aspect of software development, yet the debugging capabilities of Large Language Models (LLMs) remain largely unexplored. This paper first introduces DEBUGEVAL, a comprehensive benchmark designed to evaluate the debugging capabilities of LLMs. DEBUGEVAL collects data from existing high-quality datasets and designs four different tasks to evaluate the debugging effectiveness, including BUG Localization, BUG Identification, Code Review, and Code Repair. Additionally, to enhance the code debugging ability of LLMs, this paper proposes a CoMmunicative Agent BaSed DaTa REfinement FRamework (MASTER), which generates the refined code debugging data for supervised finetuning. Specifically, MASTER employs the Code Quizzer to generate refined data according to the defined tasks of DEBUGEVAL. Then the Code Learner acts as a critic and reserves the generated problems that it can not solve. Finally, the Code Teacher provides a detailed Chain-of-Thought based solution to deal with the generated problem. We collect the synthesized data and finetune the Code Learner to enhance the debugging ability and conduct the NeuDebugger model. Our experiments evaluate various LLMs and NeuDebugger in the zero-shot setting on DEBUGEVAL. Experimental results demonstrate that these 7B-scale LLMs have weaker debugging capabilities, even these code-oriented LLMs. On the contrary, these larger models (over 70B) show convincing debugging ability. Our further analyses illustrate that MASTER is an effective method to enhance the code debugging ability by synthesizing data for Supervised Fine-Tuning (SFT) LLMs. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04130 [pdf, ps, other]

Predicting X(2370) glueball-like particle production in pp collisions at the LHC energy with PACIAE model

Authors: Jian Cao, Jin-Peng Zhang, Jia-Hao Shi, Zhi-Ying Qin, Wen-Chao Zhang, Hua Zheng, An-Ke Lei, Zhi-Lei She, Dai-Mei Zhou, Yu-Liang Yan, Ben-Hao Sa

Abstract: Inspired by the BESIII newest observation of X(2370) glueball-like particle production in $e^+e^-$ collisions, we search its production in proton-proton (pp) collisions at $\sqrt{s}=$ 13 TeV with a parton and hadron cascade model PACIAE. In this model, the final partonic state (FPS) and final hadronic state (FHS) are consecutively simulated and recorded. The X(2370) glueball- or tetraquark-state i… ▽ More Inspired by the BESIII newest observation of X(2370) glueball-like particle production in $e^+e^-$ collisions, we search its production in proton-proton (pp) collisions at $\sqrt{s}=$ 13 TeV with a parton and hadron cascade model PACIAE. In this model, the final partonic state (FPS) and final hadronic state (FHS) are consecutively simulated and recorded. The X(2370) glueball- or tetraquark-state is then, respectively, recombined by two gluons or four quarks $ss\bar{s}\bar{s}$ in the FPS using the quantum statistical mechanics inspired dynamically constrained phase-space coalescence (DCPC) model. The X(2370) molecular-state is recombined by the baryon-antibaryon of $Λ$-$\barΛ$ or $Σ$-$\barΣ$, or by three mesons of $π^+π^{-}η'$, $K^+K^-η'$, or $K_S^0K_S^0η'$ in the FHS using DCPC model. Significant discrepancies in the transverse momentum ($p_{\rm T}$) and rapidity ($y$) distributions among the X(2370) glueball-, tetraquark-, and molecular-state are observed. Thus both of $p_{\rm T}$ and $y$ distributions could serve as valuable criteria identifying different states of the X(2370). We strongly suggest the experimental measurement of the X(2370) glueball-like particle production in pp collisions at the LHC energies. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 6 pages, 5 figures

arXiv:2408.01895 [pdf, other]

doi 10.1145/3654777.3676415

Computational Trichromacy Reconstruction: Empowering the Color-Vision Deficient to Recognize Colors Using Augmented Reality

Authors: Yuhao Zhu, Ethan Chen, Colin Hascup, Yukang Yan, Gaurav Charma

Abstract: We propose an assistive technology that helps individuals with Color Vision Deficiencies (CVD) to recognize/name colors. A dichromat's color perception is a reduced two-dimensional (2D) subset of a normal trichromat's three dimensional color (3D) perception, leading to confusion when visual stimuli that appear identical to the dichromat are referred to by different color names. Using our proposed… ▽ More We propose an assistive technology that helps individuals with Color Vision Deficiencies (CVD) to recognize/name colors. A dichromat's color perception is a reduced two-dimensional (2D) subset of a normal trichromat's three dimensional color (3D) perception, leading to confusion when visual stimuli that appear identical to the dichromat are referred to by different color names. Using our proposed system, CVD individuals can interactively induce distinct perceptual changes to originally confusing colors via a computational color space transformation. By combining their original 2D precepts for colors with the discriminative changes, a three dimensional color space is reconstructed, where the dichromat can learn to resolve color name confusions and accurately recognize colors. Our system is implemented as an Augmented Reality (AR) interface on smartphones, where users interactively control the rotation through swipe gestures and observe the induced color shifts in the camera view or in a displayed image. Through psychophysical experiments and a longitudinal user study, we demonstrate that such rotational color shifts have discriminative power (initially confusing colors become distinct under rotation) and exhibit structured perceptual shifts dichromats can learn with modest training. The AR App is also evaluated in two real-world scenarios (building with lego blocks and interpreting artistic works); users all report positive experience in using the App to recognize object colors that they otherwise could not. △ Less

Submitted 3 August, 2024; originally announced August 2024.

arXiv:2408.01431 [pdf]

Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundational Models

Authors: Simha Sankar Baradwaj, Destiny Gilliland, Jack Rincon, Henning Hermjakob, Yu Yan, Irsyad Adam, Gwyneth Lemaster, Dean Wang, Karol Watson, Alex Bui, Wei Wang, Peipei Ping

Abstract: Foundational Models (FMs) are gaining increasing attention in the biomedical AI ecosystem due to their ability to represent and contextualize multimodal biomedical data. These capabilities make FMs a valuable tool for a variety of tasks, including biomedical reasoning, hypothesis generation, and interpreting complex imaging data. In this review paper, we address the unique challenges associated wi… ▽ More Foundational Models (FMs) are gaining increasing attention in the biomedical AI ecosystem due to their ability to represent and contextualize multimodal biomedical data. These capabilities make FMs a valuable tool for a variety of tasks, including biomedical reasoning, hypothesis generation, and interpreting complex imaging data. In this review paper, we address the unique challenges associated with establishing an ethical and trustworthy biomedical AI ecosystem, with a particular focus on the development of FMs and their downstream applications. We explore strategies that can be implemented throughout the biomedical AI pipeline to effectively tackle these challenges, ensuring that these FMs are translated responsibly into clinical and translational settings. Additionally, we emphasize the importance of key stewardship and co-design principles that not only ensure robust regulation but also guarantee that the interests of all stakeholders, especially those involved in or affected by these clinical and translational applications are adequately represented. We aim to empower the biomedical AI community to harness these models responsibly and effectively. As we navigate this exciting frontier, our collective commitment to ethical stewardship, co-design, and responsible translation will be instrumental in ensuring that the evolution of FMs truly enhances patient care and medical decision making, ultimately leading to a more equitable and trustworthy biomedical AI ecosystem. △ Less

Submitted 13 August, 2024; v1 submitted 18 July, 2024; originally announced August 2024.

Comments: 3 figures, 3 tables

arXiv:2408.01262 [pdf, other]

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

Authors: Kunlun Zhu, Yifan Luo, Dingling Xu, Ruobing Wang, Shi Yu, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

Abstract: Retrieval-Augmented Generation (RAG) systems have demonstrated their advantages in alleviating the hallucination of Large Language Models (LLMs). Existing RAG benchmarks mainly focus on evaluating whether LLMs can correctly answer the general knowledge. However, they are unable to evaluate the effectiveness of the RAG system in dealing with the data from different vertical domains. This paper intr… ▽ More Retrieval-Augmented Generation (RAG) systems have demonstrated their advantages in alleviating the hallucination of Large Language Models (LLMs). Existing RAG benchmarks mainly focus on evaluating whether LLMs can correctly answer the general knowledge. However, they are unable to evaluate the effectiveness of the RAG system in dealing with the data from different vertical domains. This paper introduces RAGEval, a framework for automatically generating evaluation datasets to evaluate the knowledge usage ability of different LLMs in different scenarios. Specifically, RAGEval summarizes a schema from seed documents, applies the configurations to generate diverse documents, and constructs question-answering pairs according to both articles and configurations. We propose three novel metrics, Completeness, Hallucination, and Irrelevance, to carefully evaluate the responses generated by LLMs. By benchmarking RAG models in vertical domains, RAGEval has the ability to better evaluate the knowledge usage ability of LLMs, which avoids the confusion regarding the source of knowledge in answering question in existing QA datasets--whether it comes from parameterized memory or retrieval. The code and dataset will be released. △ Less

Submitted 26 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

Comments: add github repo

arXiv:2408.00971 [pdf, other]

Two distinct types of echoes in compact objects

Authors: Shui-Fa Shen, Kai Lin, Tao Zhu, Yu-Peng Yan, Cheng-Gang Shao, Wei-Liang Qian

Abstract: In the black hole perturbation theory framework, two different physical pictures for echoes in compact objects have been proposed. The first mechanism interprets echoes as repeated reflections of gravitational waves within a potential well, where the echo period is defined by twice the distance related to the spatial displacement operator that separates two local maxima of the effective potential.… ▽ More In the black hole perturbation theory framework, two different physical pictures for echoes in compact objects have been proposed. The first mechanism interprets echoes as repeated reflections of gravitational waves within a potential well, where the echo period is defined by twice the distance related to the spatial displacement operator that separates two local maxima of the effective potential. The second mechanism associates echoes with a discontinuity in the effective potential, potentially associated with specific accretion processes, without necessarily introducing a second local maximum in the effective potential. This discontinuity leads to echo signals that are typically attenuated over time more quickly, with their period dictated by the characteristics of the transfer amplitudes. In both scenarios, the echoes correspond to a new category of quasinormal modes with minor real parts, with their period connected to the spacing between successive modes in the frequency domain. This work elaborates on a unified framework in compact stars that encompasses both echo mechanisms. It suggests that these two types of echoes derive from different physical origins and can be independently triggered. The occurrence and interplay between these two types of echoes are demonstrated through numerical simulations. %The observational relevance of this study is also addressed. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 17 pages and 6 figures

arXiv:2408.00469 [pdf]

doi 10.1038/s41467-024-50833-9

Evidence of electron interaction with an unidentified bosonic mode in superconductor CsCa$_2$Fe$_4$As$_4$F$_2$

Authors: Peng Li, Sen Liao, Zhicheng Wang, Huaxun Li, Shiwu Su, Jiakang Zhang, Ziyuan Chen, Zhicheng Jiang, Zhengtai Liu, Lexian Yang, Linwei Huai, Junfeng He, Shengtao Cui, Zhe Sun, Yajun Yan, Guanghan Cao, Dawei Shen, Juan Jiang, Donglai Feng

Abstract: The kink structure in band dispersion usually refers to a certain electron-boson interaction, which is crucial in understanding the pairing in unconventional superconductors. Here we report the evidence of the observation of a kink structure in Fe-based superconductor CsCa$_2$Fe$_4$As$_4$F$_2$ using angle-resolved photoemission spectroscopy. The kink shows an orbital selective and momentum depende… ▽ More The kink structure in band dispersion usually refers to a certain electron-boson interaction, which is crucial in understanding the pairing in unconventional superconductors. Here we report the evidence of the observation of a kink structure in Fe-based superconductor CsCa$_2$Fe$_4$As$_4$F$_2$ using angle-resolved photoemission spectroscopy. The kink shows an orbital selective and momentum dependent behavior, which is located at 15 meV below Fermi level along the Gamma-M direction at the band with dxz orbital character and vanishes when approaching the Gamma-X direction, correlated with a slight decrease of the superconducting gap. Most importantly, this kink structure disappears when the superconducting gap closes, indicating that the corresponding bosonic mode (9 meV) is closely related to superconductivity. However, the origin of this mode remains unidentified, since it cannot be related to phonons or the spin resonance mode (15 meV) observed by inelastic neutron scattering. The behavior of this mode is rather unique and challenges our present understanding of the superconducting paring mechanism of the bilayer FeAs-based superconductors. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 14 pages, 4 figures

Journal ref: Nature Communications 15,2024,6433

arXiv:2408.00247 [pdf, other]

Simple but Efficient: A Multi-Scenario Nearline Retrieval Framework for Recommendation on Taobao

Authors: Yingcai Ma, Ziyang Wang, Yuliang Yan, Jian Wu, Yuning Jiang, Longbin Li, Wen Chen, Jianhang Huang

Abstract: In recommendation systems, the matching stage is becoming increasingly critical, serving as the upper limit for the entire recommendation process. Recently, some studies have started to explore the use of multi-scenario information for recommendations, such as model-based and data-based approaches. However, the matching stage faces significant challenges due to the need for ultra-large-scale retri… ▽ More In recommendation systems, the matching stage is becoming increasingly critical, serving as the upper limit for the entire recommendation process. Recently, some studies have started to explore the use of multi-scenario information for recommendations, such as model-based and data-based approaches. However, the matching stage faces significant challenges due to the need for ultra-large-scale retrieval and meeting low latency requirements. As a result, the methods applied at this stage (collaborative filtering and two-tower models) are often designed to be lightweight, hindering the full utilization of extensive information. On the other hand, the ranking stage features the most sophisticated models with the strongest scoring capabilities, but due to the limited screen size of mobile devices, most of the ranked results may not gain exposure or be displayed. In this paper, we introduce an innovative multi-scenario nearline retrieval framework. It operates by harnessing ranking logs from various scenarios through Flink, allowing us to incorporate finely ranked results from other scenarios into our matching stage in near real-time. Besides, we propose a streaming scoring module, which selects a crucial subset from the candidate pool. Implemented on the "Guess You Like" (homepage of the Taobao APP), China's premier e-commerce platform, our method has shown substantial improvements-most notably, a 5% uptick in product transactions. Furthermore, the proposed approach is not only model-free but also highly efficient, suggesting it can be quickly implemented in diverse scenarios and demonstrate promising performance. △ Less

Submitted 5 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

arXiv:2407.21507 [pdf, other]

FSSC: Federated Learning of Transformer Neural Networks for Semantic Image Communication

Authors: Yuna Yan, Xin Zhang, Lixin Li, Wensheng Lin, Rui Li, Wenchi Cheng, Zhu Han

Abstract: In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next,… ▽ More In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next, the FL framework is introduced to collaboratively learn a global model by aggregating local model parameters, rather than directly sharing clients' data. This approach enhances user privacy protection and reduces the workload on the server or mobile edge. Simulation evaluations indicate that our method outperforms the typical JSCC algorithm and traditional separate-based communication algorithms. Particularly after integrating local semantics, the global aggregation model has further increased the Peak Signal-to-Noise Ratio (PSNR) by more than 2dB, thoroughly proving the effectiveness of our algorithm. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.20499 [pdf, other]

Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement

Authors: Yakun Wang, Daixin Wang, Hongrui Liu, Binbin Hu, Yingcui Yan, Qiyang Zhang, Zhiqiang Zhang

Abstract: Link prediction, as a fundamental task for graph neural networks (GNNs), has boasted significant progress in varied domains. Its success is typically influenced by the expressive power of node representation, but recent developments reveal the inferior performance of low-degree nodes owing to their sparse neighbor connections, known as the degree-based long-tailed problem. Will the degree-based lo… ▽ More Link prediction, as a fundamental task for graph neural networks (GNNs), has boasted significant progress in varied domains. Its success is typically influenced by the expressive power of node representation, but recent developments reveal the inferior performance of low-degree nodes owing to their sparse neighbor connections, known as the degree-based long-tailed problem. Will the degree-based long-tailed distribution similarly constrain the efficacy of GNNs on link prediction? Unexpectedly, our study reveals that only a mild correlation exists between node degree and predictive accuracy, and more importantly, the number of common neighbors between node pairs exhibits a strong correlation with accuracy. Considering node pairs with less common neighbors, i.e., tail node pairs, make up a substantial fraction of the dataset but achieve worse performance, we propose that link prediction also faces the long-tailed problem. Therefore, link prediction of GNNs is greatly hindered by the tail node pairs. After knowing the weakness of link prediction, a natural question is how can we eliminate the negative effects of the skewed long-tailed distribution on common neighbors so as to improve the performance of link prediction? Towards this end, we introduce our long-tailed framework (LTLP), which is designed to enhance the performance of tail node pairs on link prediction by increasing common neighbors. Two key modules in LTLP respectively supplement high-quality edges for tail node pairs and enforce representational alignment between head and tail node pairs within the same category, thereby improving the performance of tail node pairs. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.20272 [pdf, other]

An Efficient Inference Framework for Early-exit Large Language Models

Authors: Ruijie Miao, Yihan Yan, Xinshuo Yao, Tong Yang

Abstract: Building efficient inference framework has gained increasing interests for research community. Early-exit models, a variant of LLMs, improves the inference efficiency of LLMs by skipping rest layers and directly generate output tokens when they are confident enough. However, there is no work of LLM inference framework that takes early-exit models into consideration. This is non-trivial as prior ar… ▽ More Building efficient inference framework has gained increasing interests for research community. Early-exit models, a variant of LLMs, improves the inference efficiency of LLMs by skipping rest layers and directly generate output tokens when they are confident enough. However, there is no work of LLM inference framework that takes early-exit models into consideration. This is non-trivial as prior art on LLM inference cannot be directly applied to early-exit models. In this work, we solves two key challenges in building efficient inference framework for early-exit models: (1) batch inference at iteration-level granularity; and (2) KV cache management. For the former, we propose to process the batch until all sequences surpass the early-exit confidence threshold. For the latter, we propose to fill the KV cache of rest layers before the iteration terminates. Our evaluation shows that, compared with the original vLLM operating at full layers, our solution achieves up to 1.25x speed up. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.19499 [pdf, other]

Optimization for expectation value estimation with shallow quantum circuits

Authors: Bujiao Wu, Yuxuan Yan, Fuchuan Wei, Zhenhuan Liu

Abstract: Estimating linear properties of quantum states, such as fidelities, molecular energies, and correlation functions, is a fundamental task in quantum information science. The classical shadow has emerged as a prevalent tool due to its efficiency in estimating many independent observables simultaneously. However, it does not utilize the information of the target observable and the constraints of quan… ▽ More Estimating linear properties of quantum states, such as fidelities, molecular energies, and correlation functions, is a fundamental task in quantum information science. The classical shadow has emerged as a prevalent tool due to its efficiency in estimating many independent observables simultaneously. However, it does not utilize the information of the target observable and the constraints of quantum devices, making it inefficient in many practical scenarios where the focus is on estimating a select few observables. To address this inefficiency, we propose a framework that optimizes sample complexity for estimating the expectation value of any observable using a shallow parameterized quantum circuit. Within this framework, we introduce a greedy algorithm that decomposes the target observable into a linear combination of multiple observables, each of which can be diagonalized with the shallow circuit. Using this decomposition, we then apply an importance sampling algorithm to estimate the expectation value of the target observable. We numerically demonstrate the performance of our algorithm by estimating the ground energy of a sparse Hamiltonian and the inner product of two pure states, highlighting the advantages compared to some conventional methods. Additionally, we derive the fundamental lower bound for the sample complexity required to estimate a target observable using a given shallow quantum circuit, thereby enhancing our understanding of the capabilities of shallow circuits in quantum learning tasks. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: 14 pages, 4 figures

arXiv:2407.15452 [pdf, other]

doi 10.1145/3627673.3680021

GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs

Authors: Vipul Gupta, Xin Chen, Ruoyun Huang, Fanlong Meng, Jianjun Chen, Yujun Yan

Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for supervised machine learning over graph-structured data, while sampling-based node representation learning is widely utilized in unsupervised learning. However, scalability remains a major challenge in both supervised and unsupervised learning for large graphs (e.g., those with over 1 billion nodes). The scalability bottleneck largely… ▽ More Graph Neural Networks (GNNs) have emerged as powerful tools for supervised machine learning over graph-structured data, while sampling-based node representation learning is widely utilized in unsupervised learning. However, scalability remains a major challenge in both supervised and unsupervised learning for large graphs (e.g., those with over 1 billion nodes). The scalability bottleneck largely stems from the mini-batch sampling phase in GNNs and the random walk sampling phase in unsupervised methods. These processes often require storing features or embeddings in memory. In the context of distributed training, they require frequent, inefficient random access to data stored across different workers. Such repeated inter-worker communication for each mini-batch leads to high communication overhead and computational inefficiency. We propose GraphScale, a unified framework for both supervised and unsupervised learning to store and process large graph data distributedly. The key insight in our design is the separation of workers who store data and those who perform the training. This separation allows us to decouple computing and storage in graph training, thus effectively building a pipeline where data fetching and data computation can overlap asynchronously. Our experiments show that GraphScale outperforms state-of-the-art methods for distributed training of both GNNs and node embeddings. We evaluate GraphScale both on public and proprietary graph datasets and observe a reduction of at least 40% in end-to-end training times compared to popular distributed frameworks, without any loss in performance. While most existing methods don't support billion-node graphs for training node embeddings, GraphScale is currently deployed in production at TikTok enabling efficient learning over such large graphs. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: Published in the Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), 8 Pages, 12 Figures

Journal ref: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), October 21-25, 2024, Boise, ID, USA

arXiv:2407.15345 [pdf, other]

Stability of Quantum Systems beyond Canonical Typicality

Authors: Yu Su, Zi-Fan Zhu, Yao Wang, Rui-Xue Xu, YiJing Yan

Abstract: Involvement of the environment is indispensable for establishing the statistical distribution of system. We analyze the statistical distribution of a quantum system coupled strongly with a heat bath. This distribution is determined by tracing over the bath's degrees of freedom for the equilibrium system-plus-bath composite. The stability of system distribution is largely affected by the system--ba… ▽ More Involvement of the environment is indispensable for establishing the statistical distribution of system. We analyze the statistical distribution of a quantum system coupled strongly with a heat bath. This distribution is determined by tracing over the bath's degrees of freedom for the equilibrium system-plus-bath composite. The stability of system distribution is largely affected by the system--bath interaction strength. We propose that the quantum system exhibits a stable distribution only when its system response function in the frequency domain satisfies $\tildeχ(ω= 0+)>0$. We show our results by investigating the non-interacting bosonic impurity system from both the thermodynamic and dynamic perspectives. Our study refines the theoretical framework of canonical statistics, offering insights into thermodynamic phenomena in small-scale systems. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures

arXiv:2407.14769 [pdf, other]

A Two-Phase Visualization System for Continuous Human-AI Collaboration in Sequelae Analysis and Modeling

Authors: Yang Ouyang, Chenyang Zhang, He Wang, Tianle Ma, Chang Jiang, Yuheng Yan, Zuoqin Yan, Xiaojuan Ma, Chuhan Shi, Quan Li

Abstract: In healthcare, AI techniques are widely used for tasks like risk assessment and anomaly detection. Despite AI's potential as a valuable assistant, its role in complex medical data analysis often oversimplifies human-AI collaboration dynamics. To address this, we collaborated with a local hospital, engaging six physicians and one data scientist in a formative study. From this collaboration, we prop… ▽ More In healthcare, AI techniques are widely used for tasks like risk assessment and anomaly detection. Despite AI's potential as a valuable assistant, its role in complex medical data analysis often oversimplifies human-AI collaboration dynamics. To address this, we collaborated with a local hospital, engaging six physicians and one data scientist in a formative study. From this collaboration, we propose a framework integrating two-phase interactive visualization systems: one for Human-Led, AI-Assisted Retrospective Analysis and another for AI-Mediated, Human-Reviewed Iterative Modeling. This framework aims to enhance understanding and discussion around effective human-AI collaboration in healthcare. △ Less

Submitted 20 July, 2024; originally announced July 2024.

Comments: To appear at the IEEE VIS Conference 2024

arXiv:2407.13598 [pdf, other]

KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration

Authors: Youfu Yan, Yu Hou, Yongkang Xiao, Rui Zhang, Qianwen Wang

Abstract: The increasing reliance on Large Language Models (LLMs) for health information seeking can pose severe risks due to the potential for misinformation and the complexity of these topics. This paper introduces KNOWNET a visualization system that integrates LLMs with Knowledge Graphs (KG) to provide enhanced accuracy and structured exploration. Specifically, for enhanced accuracy, KNOWNET extracts tri… ▽ More The increasing reliance on Large Language Models (LLMs) for health information seeking can pose severe risks due to the potential for misinformation and the complexity of these topics. This paper introduces KNOWNET a visualization system that integrates LLMs with Knowledge Graphs (KG) to provide enhanced accuracy and structured exploration. Specifically, for enhanced accuracy, KNOWNET extracts triples (e.g., entities and their relations) from LLM outputs and maps them into the validated information and supported evidence in external KGs. For structured exploration, KNOWNET provides next-step recommendations based on the neighborhood of the currently explored entities in KGs, aiming to guide a comprehensive understanding without overlooking critical aspects. To enable reasoning with both the structured data in KGs and the unstructured outputs from LLMs, KNOWNET conceptualizes the understanding of a subject as the gradual construction of graph visualization. A progressive graph visualization is introduced to monitor past inquiries, and bridge the current query with the exploration history and next-step recommendations. We demonstrate the effectiveness of our system via use cases and expert interviews. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 9 pages, 9 figures, accepted by IEEE VIS 2024

arXiv:2407.12996 [pdf, other]

Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and out-of-distribution (OOD) data. We discover a trade-off between sharpness and diversity: minimizing the sharpness in the loss landscape tends to diminish the diversity of individual members within the ensemble, adversely affecting the ensemble's improvement. The trade-off is justified through our theoretical analysis and verified empirically through extensive experiments. To address the issue of reduced diversity, we introduce SharpBalance, a novel training approach that balances sharpness and diversity within ensembles. Theoretically, we show that our training strategy achieves a better sharpness-diversity trade-off. Empirically, we conducted comprehensive evaluations in various data sets (CIFAR-10, CIFAR-100, TinyImageNet) and showed that SharpBalance not only effectively improves the sharpness-diversity trade-off, but also significantly improves ensemble performance in ID and OOD scenarios. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12888 [pdf]

Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models

Authors: Alexander R. Pelletier, Joseph Ramirez, Irsyad Adam, Simha Sankar, Yu Yan, Ding Wang, Dylan Steinecke, Wei Wang, Peipei Ping

Abstract: The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial… ▽ More The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial for achieving accurate information. In this protocol, we present RUGGED (Retrieval Under Graph-Guided Explainable disease Distinction), a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation, identifying validated paths forward. Relevant biomedical information from publications and knowledge bases are reviewed, integrated, and extracted via text-mining association analysis and explainable graph prediction models on disease nodes, forecasting potential links among drugs and diseases. These analyses, along with biomedical texts, are integrated into a framework that facilitates user-directed mechanism elucidation as well as hypothesis exploration through RAG-enabled LLMs. A clinical use-case demonstrates RUGGED's ability to evaluate and recommend therapeutics for Arrhythmogenic Cardiomyopathy (ACM) and Dilated Cardiomyopathy (DCM), analyzing prescribed drugs for molecular interactions and unexplored uses. The platform minimizes LLM hallucinations, offers actionable insights, and improves the investigation of novel therapeutics. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12735 [pdf, other]

EchoSight: Advancing Visual-Language Models with Wiki Knowledge

Authors: Yibin Yan, Weidi Xie

Abstract: Knowledge-based Visual Question Answering (KVQA) tasks require answering questions about images using extensive background knowledge. Despite significant advancements, generative models often struggle with these tasks due to the limited integration of external knowledge. In this paper, we introduce EchoSight, a novel multimodal Retrieval-Augmented Generation (RAG) framework that enables large lang… ▽ More Knowledge-based Visual Question Answering (KVQA) tasks require answering questions about images using extensive background knowledge. Despite significant advancements, generative models often struggle with these tasks due to the limited integration of external knowledge. In this paper, we introduce EchoSight, a novel multimodal Retrieval-Augmented Generation (RAG) framework that enables large language models (LLMs) to answer visual questions requiring fine-grained encyclopedic knowledge. To strive for high-performing retrieval, EchoSight first searches wiki articles by using visual-only information, subsequently, these candidate articles are further reranked according to their relevance to the combined text-image query. This approach significantly improves the integration of multimodal knowledge, leading to enhanced retrieval outcomes and more accurate VQA responses. Our experimental results on the Encyclopedic VQA and InfoSeek datasets demonstrate that EchoSight establishes new state-of-the-art results in knowledge-based VQA, achieving an accuracy of 41.8% on Encyclopedic VQA and 31.3% on InfoSeek. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Technical Report; Project Page: https://go2heart.github.io/echosight

arXiv:2407.12393 [pdf, other]

PersLLM: A Personified Training Approach for Large Language Models

Authors: Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

Abstract: Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems. However, the absence of distinct personalities, such as displaying ingratiating behaviors, inconsistent opinions, and uniform response patterns, diminish LLMs utility in pract… ▽ More Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems. However, the absence of distinct personalities, such as displaying ingratiating behaviors, inconsistent opinions, and uniform response patterns, diminish LLMs utility in practical applications. Addressing this, the development of personality traits in LLMs emerges as a crucial area of research to unlock their latent potential. Existing methods to personify LLMs generally involve strategies like employing stylized training data for instruction tuning or using prompt engineering to simulate different personalities. These methods only capture superficial linguistic styles instead of the core of personalities and are therefore not stable. In this study, we propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development, into a comprehensive training methodology. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality. Single-agent evaluation validates our method's superiority, as it produces responses more aligned with reference personalities compared to other approaches. Case studies for multi-agent communication highlight its benefits in enhancing opinion consistency within individual agents and fostering collaborative creativity among multiple agents in dialogue contexts, potentially benefiting human simulation and multi-agent cooperation. Additionally, human-agent interaction evaluations indicate that our personified models significantly enhance interactive experiences, underscoring the practical implications of our research. △ Less

Submitted 8 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: 10 pages for main text, 5 figures

arXiv:2407.12385 [pdf, other]

RankTower: A Synergistic Framework for Enhancing Two-Tower Pre-Ranking Model

Authors: YaChen Yan, Liubo Li

Abstract: In large-scale ranking systems, cascading architectures have been widely adopted to achieve a balance between efficiency and effectiveness. The pre-ranking module plays a vital role in selecting a subset of candidates for the subsequent ranking module. It is crucial for the pre-ranking model to maintain a balance between efficiency and accuracy to adhere to online latency constraints. In this pape… ▽ More In large-scale ranking systems, cascading architectures have been widely adopted to achieve a balance between efficiency and effectiveness. The pre-ranking module plays a vital role in selecting a subset of candidates for the subsequent ranking module. It is crucial for the pre-ranking model to maintain a balance between efficiency and accuracy to adhere to online latency constraints. In this paper, we propose a novel neural network architecture called RankTower, which is designed to efficiently capture user-item interactions while following the user-item decoupling paradigm to ensure online inference efficiency. The proposed approach employs a hybrid training objective that learns from samples obtained from the full stage of the cascade ranking system, optimizing different objectives for varying sample spaces. This strategy aims to enhance the pre-ranking model's ranking capability and improvement alignment with the existing cascade ranking system. Experimental results conducted on public datasets demonstrate that RankTower significantly outperforms state-of-the-art pre-ranking models. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12371 [pdf, other]

HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Authors: Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang

Abstract: Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08… ▽ More Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08M 3D HOI frames. We also annotate HIMO with detailed textual descriptions and temporal segments, benchmarking two novel tasks of HOI synthesis conditioned on either the whole text prompt or the segmented text prompts as fine-grained timeline control. To address these novel tasks, we propose a dual-branch conditional diffusion model with a mutual interaction module for HOI synthesis. Besides, an auto-regressive generation pipeline is also designed to obtain smooth transitions between HOI segments. Experimental results demonstrate the generalization ability to unseen object geometries and temporal compositions. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Project page: https://lvxintao.github.io/himo, accepted by ECCV 2024

arXiv:2407.12228 [pdf, other]

Variational approach to light-matter interaction: Bridging quantum and semiclassical limits

Authors: Yiying Yan, Zhiguo Lü, JunYan Luo

Abstract: We present a time-dependent variational approach with the multiple Davydov $D_2$ trial state to simulate the dynamics of light-matter systems when the field is in a coherent state with an arbitrary finite mean photon number. The variational approach captures not only the system dynamics but also the field dynamics and is applicable to a variety of quantum models of light-matter interaction such as… ▽ More We present a time-dependent variational approach with the multiple Davydov $D_2$ trial state to simulate the dynamics of light-matter systems when the field is in a coherent state with an arbitrary finite mean photon number. The variational approach captures not only the system dynamics but also the field dynamics and is applicable to a variety of quantum models of light-matter interaction such as the Jaynes-Cummings model, Rabi model, and Dicke model, and is feasible to tackle the multimode quantized fields. By comparison of the variational and semiclassical dynamics of both the system and field, we illustrate that the variational dynamics from the quantum models agrees with those from the corresponding semiclassical models as long as the mean number of photons is sufficiently large. Moreover, we illustrate that in the crossover between the quantum and semiclassical limits, the quantum corrections lead to the collapse of the oscillations in dynamics, which is absent in the semiclassical models. The variational approach provides a unified treatment of light-matter interaction from the quantum to the semiclassical limit. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 14 pages, 8 figures

arXiv:2407.09935 [pdf, other]

LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation

Authors: Jiacheng Li, Chang Chen, Fenglong Song, Youliang Yan, Zhiwei Xiong

Abstract: Image resampling is a basic technique that is widely employed in daily applications, such as camera photo editing. Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors. Still, these methods are not the perfect substitute for interpolation, due to the drawbacks in efficiency and versatility. In this work, we propose a novel method of Lea… ▽ More Image resampling is a basic technique that is widely employed in daily applications, such as camera photo editing. Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors. Still, these methods are not the perfect substitute for interpolation, due to the drawbacks in efficiency and versatility. In this work, we propose a novel method of Learning Resampling Function (termed LeRF), which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption of interpolation. Specifically, LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the hyper-parameters that determine the shapes of these resampling functions with a neural network. Based on the formulation of LeRF, we develop a family of models, including both efficiency-orientated and performance-orientated ones. To achieve interpolation-level efficiency, we adopt look-up tables (LUTs) to accelerate the inference of the learned neural network. Furthermore, we design a directional ensemble strategy and edge-sensitive indexing patterns to better capture local structures. On the other hand, to obtain DNN-level performance, we propose an extension of LeRF to enable it in cooperation with pre-trained upsampling models for cascaded resampling. Extensive experiments show that the efficiency-orientated version of LeRF runs as fast as interpolation, generalizes well to arbitrary transformations, and outperforms interpolation significantly, e.g., up to 3dB PSNR gain over Bicubic for x2 upsampling on Manga109. Besides, the performance-orientated version of LeRF reaches comparable performance with existing DNNs at much higher efficiency, e.g., less than 25% running time on a desktop GPU. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Code: https://github.com/ddlee-cn/LeRF-PyTorch

arXiv:2407.08916 [pdf]

Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering

Authors: Yubing Yan, Camille Moreau, Zhuoyue Wang, Wenhan Fan, Chengqian Fu

Abstract: This study develops a robust movie recommendation system using various machine learning techniques, including Non- Negative Matrix Factorization (NMF), Truncated Singular Value Decomposition (SVD), and K-Means clustering. The primary objective is to enhance user experience by providing personalized movie recommendations. The research encompasses data preprocessing, model training, and evaluation,… ▽ More This study develops a robust movie recommendation system using various machine learning techniques, including Non- Negative Matrix Factorization (NMF), Truncated Singular Value Decomposition (SVD), and K-Means clustering. The primary objective is to enhance user experience by providing personalized movie recommendations. The research encompasses data preprocessing, model training, and evaluation, highlighting the efficacy of the employed methods. Results indicate that the proposed system achieves high accuracy and relevance in recommendations, making significant contributions to the field of recommendations systems. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 4th International Symposium on Computer Technology and Information Science, IEEE

arXiv:2407.08610 [pdf, other]

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

Authors: Yanfu Yan, Nathan Cooper, Oscar Chaparro, Kevin Moran, Denys Poshyvanyk

Abstract: Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques to manage video-based reports is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about a reported bug. In this paper, we aim to overcome these challenges by ad… ▽ More Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques to manage video-based reports is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about a reported bug. In this paper, we aim to overcome these challenges by advancing the bug report management task of duplicate detection for video-based reports. To this end, we introduce a new approach, called JANUS, that adapts the scene-learning capabilities of vision transformers to capture subtle visual and textual patterns that manifest on app UI screens - which is key to differentiating between similar screens for accurate duplicate report detection. JANUS also makes use of a video alignment technique capable of adaptive weighting of video frames to account for typical bug manifestation patterns. In a comprehensive evaluation on a benchmark containing 7,290 duplicate detection tasks derived from 270 video-based bug reports from 90 Android app bugs, the best configuration of our approach achieves an overall mRR/mAP of 89.8%/84.7%, and for the large majority of duplicate detection tasks, outperforms prior work by around 9% to a statistically significant degree. Finally, we qualitatively illustrate how the scene-learning capabilities provided by Janus benefits its performance. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 13 pages, accepted to 46th International Conference on Software Engineering (ICSE 2024)

arXiv:2407.08468 [pdf, other]

Matching-Based Policy Learning

Authors: Xuqiao Li, Ying Yan

Abstract: Treatment heterogeneity is ubiquitous in many areas, motivating practitioners to search for the optimal policy that maximizes the expected outcome based on individualized characteristics. However, most existing policy learning methods rely on weighting-based approaches, which may suffer from high instability in observational studies. To enhance the robustness of the estimated policy, we propose a… ▽ More Treatment heterogeneity is ubiquitous in many areas, motivating practitioners to search for the optimal policy that maximizes the expected outcome based on individualized characteristics. However, most existing policy learning methods rely on weighting-based approaches, which may suffer from high instability in observational studies. To enhance the robustness of the estimated policy, we propose a matching-based estimator of the policy improvement upon a randomized baseline. After correcting the conditional bias, we learn the optimal policy by maximizing the estimate over a policy class. We derive a non-asymptotic high probability bound for the regret of the learned policy and show that the convergence rate is almost $1/\sqrt{n}$. The competitive finite sample performance of the proposed method is demonstrated in extensive simulation studies and a real data application. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07661 [pdf, ps, other]

Confirming the glueball-like particle $X(2370)$ productions in $e^+e^-$ collisions at BESIII energy with PACIAE model

Authors: Zhi-Lei She, An-Ke Lei, Wen-Chao Zhang, Yu-Liang Yan, Dai-Mei Zhou, Hua Zheng, Ben-Hao Sa

Abstract: The parton and hadron cascade model {\footnotesize PACIAE} is employed to confirm the BESIII newest observation of glueball-like particle $\rm X(2370)$ production in $e^+e^-$ collisions at $\sqrt{s}=4.95\,\mathrm{GeV}$. We coalesce the $\rm X(2370)$ glueball state with two gluons in the simulated partonic final state by the Dynamically Constrained Phase-space Coalescence ({\footnotesize DCPC}) mod… ▽ More The parton and hadron cascade model {\footnotesize PACIAE} is employed to confirm the BESIII newest observation of glueball-like particle $\rm X(2370)$ production in $e^+e^-$ collisions at $\sqrt{s}=4.95\,\mathrm{GeV}$. We coalesce the $\rm X(2370)$ glueball state with two gluons in the simulated partonic final state by the Dynamically Constrained Phase-space Coalescence ({\footnotesize DCPC}) model. Alternative configuration of $\rm X(2370)$ molecular state is recombined in the simulated hadronic final state with $π^{+},π^{-},K^{+}, K^{-},K_{S}^{0},K_{S}^{0}$ and $η'$ by {\footnotesize DCPC} model. The resulted particle transverse momentum spectrum and rapidity distribution, etc. show a significant discrepancy between the two states. They are not only serving as criteria to distinguish the $\rm X(2370)$ glueball state or molecular state, but also confirming the BESIII observation of glueball-like particle $\rm X(2370)$ productions in $e^+e^-$ collisions. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 4 pages, 2 figures

arXiv:2407.07268 [pdf, other]

Dataset Quantization with Active Learning based Adaptive Sampling

Authors: Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

Abstract: Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sam… ▽ More Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sample distributions across different classes, our research demonstrates that maintaining performance is feasible even with uneven distributions. We find that for certain classes, the variation in sample quantity has a minimal impact on performance. Inspired by this observation, an intuitive idea is to reduce the number of samples for stable classes and increase the number of samples for sensitive classes to achieve a better performance with the same sampling ratio. Then the question arises: how can we adaptively select samples from a dataset to achieve optimal performance? In this paper, we propose a novel active learning based adaptive sampling strategy, Dataset Quantization with Active Learning based Adaptive Sampling (DQAS), to optimize the sample selection. In addition, we introduce a novel pipeline for dataset quantization, utilizing feature space from the final stage of dataset quantization to generate more precise dataset bins. Our comprehensive evaluations on the multiple datasets show that our approach outperforms the state-of-the-art dataset compression methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.06067 [pdf, other]

Faraday laser pumped cesium beam clock

Authors: Hangbo Shi, Xiaomin Qin, Haijun Chen, Yufei Yan, Ziqi Lu, Zhiyang Wang, Zijie Liu, Xiaolei Guan, Qiang Wei, Tiantian Shi, Jingbiao Chen

Abstract: We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday lase… ▽ More We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday laser is 2.5 kHz after MTS locking, and the fractional frequency stability of the Faraday laser is optimized to $1.8\times{10}^{-12}/\sqrtτ$. Based on this high-performance Faraday laser, the cesium beam clock realizes a signal-to-noise ratio (SNR) in 1 Hz bandwidth of $39600$ when the cesium oven temperature is 130°C. Frequency-compared with Hydrogen maser, the fractional frequency stability of the Faraday laser pumped cesium beam clock can reach $1.3\times{10}^{-12}/\sqrtτ$ and drops to $1.4\times{10}^{-14}$ at 10000 s when the cesium oven temperature is 110°C. %, which is the best reported result compared with other cesium beam clocks. This Faraday laser pumped cesium beam clock demonstrates its excellent performance, and its great potential in the fields of timekeeping, navigation, and communication. Meanwhile, the Faraday laser, as a high-performance optical frequency standard, can also contribute to the development of other applications in quantum metrology, precision measurement and atomic physics. △ Less

Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05771 [pdf, other]

Multi-times Monte Carlo Rendering for Inter-reflection Reconstruction

Authors: Tengjie Zhu, Zhuo Chen, Jingnan Gao, Yichao Yan, Xiaokang Yang

Abstract: Inverse rendering methods have achieved remarkable performance in reconstructing high-fidelity 3D objects with disentangled geometries, materials, and environmental light. However, they still face huge challenges in reflective surface reconstruction. Although recent methods model the light trace to learn specularity, the ignorance of indirect illumination makes it hard to handle inter-reflections… ▽ More Inverse rendering methods have achieved remarkable performance in reconstructing high-fidelity 3D objects with disentangled geometries, materials, and environmental light. However, they still face huge challenges in reflective surface reconstruction. Although recent methods model the light trace to learn specularity, the ignorance of indirect illumination makes it hard to handle inter-reflections among multiple smooth objects. In this work, we propose Ref-MC2 that introduces the multi-time Monte Carlo sampling which comprehensively computes the environmental illumination and meanwhile considers the reflective light from object surfaces. To address the computation challenge as the times of Monte Carlo sampling grow, we propose a specularity-adaptive sampling strategy, significantly reducing the computational complexity. Besides the computational resource, higher geometry accuracy is also required because geometric errors accumulate multiple times. Therefore, we further introduce a reflection-aware surface model to initialize the geometry and refine it during inverse rendering. We construct a challenging dataset containing scenes with multiple objects and inter-reflections. Experiments show that our method outperforms other inverse rendering methods on various object groups. We also show downstream applications, e.g., relighting and material editing, to illustrate the disentanglement ability of our method. △ Less

Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 10 pages,6 figures,NeurIPS 2024 Submitted

Showing 1–50 of 1,717 results for author: Yan, Y