Search | arXiv e-print repository

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

Authors: Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu

Abstract: The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) mode… ▽ More The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) models and instruction datasets serves as a good starting point. However, existing methods on model and data selection focus on the performance of general-purpose capabilities while neglecting the knowledge gap exposed in domain-specific deployment. In the present study, we propose to bridge such gap by introducing few human-annotated samples (i.e., K-shot) for advancing task expertise of LLMs with open knowledge. Specifically, we develop an efficient and scalable pipeline to cost-efficiently produce task experts where K-shot data intervene in selecting the most promising expert candidates and the task-relevant instructions. A mixture-of-expert (MoE) system is built to make the best use of individual-yet-complementary knowledge between multiple experts. We unveil the two keys to the success of a MoE system, 1) the abidance by K-shot, and 2) the insistence on diversity. For the former, we ensure that models that truly possess problem-solving abilities on K-shot are selected rather than those blind guessers. Besides, during data selection, instructions that share task-relevant contexts with K-shot are prioritized. For the latter, we highlight the diversity of constituting experts and that of the fine-tuning instructions throughout the model and data selection process. Extensive experimental results confirm the superiority of our approach over existing methods on utilization of open knowledge across various tasks. Codes and models will be released later. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 28 pages, 12 tables, 10 figures

arXiv:2408.12911 [pdf, other]

Ground state of the S = 1/2 Heisenberg spin chain with random ferro- and antiferromagnetic couplings

Authors: Sibei Li, Hui Shao, Anders W. Sandvik

Abstract: We study the Heisenberg $S=1/2$ chain with random ferro- and antiferromagnetic couplings, using quantum Monte Carlo simulations at ultra-low temperatures, converging to the ground state. Finite-size scaling of correlation functions and excitation gaps demonstrate an exotic critical state in qualitative agreement with previous strong-disorder renormalization group calculations, but with scaling exp… ▽ More We study the Heisenberg $S=1/2$ chain with random ferro- and antiferromagnetic couplings, using quantum Monte Carlo simulations at ultra-low temperatures, converging to the ground state. Finite-size scaling of correlation functions and excitation gaps demonstrate an exotic critical state in qualitative agreement with previous strong-disorder renormalization group calculations, but with scaling exponents depending on the coupling distribution. We find dual scaling regimes of the transverse correlations versus the distance, with an $L$ independent form $C(r)=r^{-μ}$ for $r \ll L$ and $C(r,L)=L^{-η}f(r/L)$ for $r/L > 0$, where $μ> η$ and the scaling function is delivered by our analysis. These results are at variance with previous spin-wave and density-matrix renormalization group calculations, thus highlighting the power of unbiased quantum Monte Carlo simulations. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 8 pages, 10 figures

arXiv:2408.11332 [pdf]

High-quality imaging of large areas through path-difference ptychography

Authors: Jizhe Cui, Yi Zheng, Kang Sun, Wenfeng Yang, Haozhi Sha, Rong Yu

Abstract: Tilting planar samples for multi-zone-axes observation is a routine procedure in electron microscopy. However, this process invariably introduces optical path differences in the electron beam across different sample positions, significantly compromising image quality, particularly over large fields of view. To address this challenge, we developed path difference ptychography (PDP), a method capabl… ▽ More Tilting planar samples for multi-zone-axes observation is a routine procedure in electron microscopy. However, this process invariably introduces optical path differences in the electron beam across different sample positions, significantly compromising image quality, particularly over large fields of view. To address this challenge, we developed path difference ptychography (PDP), a method capable of decoupling path differences from the four-dimensional data during reconstruction. This enables the acquisition of high-quality, large-scale images, facilitating a more comprehensive understanding and analysis of materials microstructure. Moreover, PDP has the potential to promote the widespread application of ptychographic tomography in the analysis of planar samples. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.09404 [pdf, other]

Comparison between the Structures of Word Co-occurrence and Word Similarity Networks for Ill-formed and Well-formed Texts in Taiwan Mandarin

Authors: Po-Hsuan Huang, Hsuan-Lei Shao

Abstract: The study of word co-occurrence networks has attracted the attention of researchers due to their potential significance as well as applications. Understanding the structure of word co-occurrence networks is therefore important to fully realize their significance and usages. In past studies, word co-occurrence networks built on well-formed texts have been found to possess certain characteristics, i… ▽ More The study of word co-occurrence networks has attracted the attention of researchers due to their potential significance as well as applications. Understanding the structure of word co-occurrence networks is therefore important to fully realize their significance and usages. In past studies, word co-occurrence networks built on well-formed texts have been found to possess certain characteristics, including being small-world, following a two-regime power law distribution, and being generally disassortative. On the flip side, past studies have found that word co-occurrence networks built from ill-formed texts such as microblog posts may behave differently from those built from well-formed documents. While both kinds of word co-occurrence networks are small-world and disassortative, word co-occurrence networks built from ill-formed texts are scale-free and follow the power law distribution instead of the two-regime power law distribution. However, since past studies on the behavior of word co-occurrence networks built from ill-formed texts only investigated English, the universality of such characteristics remains to be seen among different languages. In addition, it is yet to be investigated whether there could be possible similitude/differences between word co-occurrence networks and other potentially comparable networks. This study therefore investigates and compares the structure of word co-occurrence networks and word similarity networks based on Taiwan Mandarin ill-formed internet forum posts and compare them with those built with well-formed judicial judgments, and seeks to find out whether the three aforementioned properties (scale-free, small-world, and disassortative) for ill-formed and well-formed texts are universal among different languages and between word co-occurrence and word similarity networks. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 4 pages, 1 figure, 5 tables

ACM Class: H.3.3; I.2.7

arXiv:2408.06749 [pdf, other]

Multiscale Excitations in the Diluted Two-dimensional S = 1/2 Heisenberg Antiferromagnet

Authors: Liuyun Dao, Hui Shao, Anders W. Sandvik

Abstract: We study the excitation spectrum of the $S=1/2$ Heisenberg model on the randomly diluted square lattice by analytic continuation of QMC data. At dilution fractions $p=1/16$ and $p=1/8$, the dynamic structure factor $S({\bf q},ω)$ exhibits a damped magnon peak with anomalous dispersion near ${\bf q}=(0,0)$ and $(π,π)$, a non-dispersive low-energy localization peak, and a second dispersive peak betw… ▽ More We study the excitation spectrum of the $S=1/2$ Heisenberg model on the randomly diluted square lattice by analytic continuation of QMC data. At dilution fractions $p=1/16$ and $p=1/8$, the dynamic structure factor $S({\bf q},ω)$ exhibits a damped magnon peak with anomalous dispersion near ${\bf q}=(0,0)$ and $(π,π)$, a non-dispersive low-energy localization peak, and a second dispersive peak between these two features. A magnon with anomalous dispersion, close to our result, was predicted in spin wave and $T$-matrix theory [A. Chernyshev et al., PRB {\bf 65}, 104407 (2002)], above the localization energy. However, no intermediate dispersive mode was predicted. Analyzing spectral functions in real space for individual vacancy realizations by energy tomography, we find that these excitations are concentrated on a small subset of the spins adjacent to vacancies. We argue that the low-energy excitations are those of a sparse random network of effective moments at a fraction of the vacancies. There is a shift in magnon spectral weight distribution, from the spins away from vacancies at high energy to those adjacent to vacancies at lower energy. We also analyze the Anderson quantum rotor excitation at $ω\propto N^{-1}$ (with $N=L^2$ the system size), which in the clean system is visible in $S({\bf q},ω)$ only at ${\bf q}=(π,π)$ but spreads through the Brillouin zone when $p>0$. Weight close to ${\bf q}=(0,0)$ and $(π,π)$ is explained by local sublattice imbalance within a dimer-monomer model but there is also structure arising from correlated singlet fluctuations, which we demonstrate by enhancing said fluctuations with four-spin couplings. All spectral features found here should be observable by elastic neutron scattering experiments on layered quantum antiferromagnets doped with nonmagnetic impurities. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 35 pages, 37 figures

arXiv:2408.04382 [pdf, other]

Judgment2vec: Apply Graph Analytics to Searching and Recommendation of Similar Judgments

Authors: Hsuan-Lei Shao

Abstract: In court practice, legal professionals rely on their training to provide opinions that resolve cases, one of the most crucial aspects being the ability to identify similar judgments from previous courts efficiently. However, finding a similar case is challenging and often depends on experience, legal domain knowledge, and extensive labor hours, making veteran lawyers or judges indispensable. This… ▽ More In court practice, legal professionals rely on their training to provide opinions that resolve cases, one of the most crucial aspects being the ability to identify similar judgments from previous courts efficiently. However, finding a similar case is challenging and often depends on experience, legal domain knowledge, and extensive labor hours, making veteran lawyers or judges indispensable. This research aims to automate the analysis of judgment text similarity. We utilized a judgment dataset labeled as the "golden standard" by experts, which includes human-verified features that can be converted into an "expert similarity score." We then constructed a knowledge graph based on "case-article" relationships, ranking each case using natural language processing to derive a "Node2vec similarity score." By evaluating these two similarity scores, we identified their discrepancies and relationships. The results can significantly reduce the labor hours required for legal searches and recommendations, with potential applications extending to various fields of information retrieval. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 5 pages, 7 figures, 2 tables

MSC Class: 68T30 (Primary); 68T50 (Secondary) ACM Class: I.2.7; I.2.4

arXiv:2408.02085 [pdf, other]

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

Authors: Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu, Ke Li, Xing Sun

Abstract: Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and… ▽ More Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and deep learning. However, under the context of instruction tuning, there still exists a gap in knowledge on what kind of data evaluation metrics can be employed and how they can be integrated into the selection mechanism. To bridge this gap, we present a comprehensive review on existing literature of data assessment and selection especially for instruction tuning of LLMs. We systematically categorize all applicable methods into quality-based, diversity-based, and importance-based ones where a unified, fine-grained taxonomy is structured. For each category, representative methods are elaborated to describe the landscape of relevant research. In addition, comparison between latest methods is conducted on their officially reported results to provide in-depth discussions on their limitations. Finally, we summarize the open challenges and propose the promosing avenues for future studies. All related contents are available at https://github.com/yuleiqin/fantastic-data-engineering. △ Less

Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

Comments: review, survey, 28 pages, 2 figures, 4 tables

arXiv:2408.01596 [pdf, other]

Trustworthy Machine Learning under Social and Adversarial Data Sources

Authors: Han Shao

Abstract: Machine learning has witnessed remarkable breakthroughs in recent years. As machine learning permeates various aspects of daily life, individuals and organizations increasingly interact with these systems, exhibiting a wide range of social and adversarial behaviors. These behaviors may have a notable impact on the behavior and performance of machine learning systems. Specifically, during these int… ▽ More Machine learning has witnessed remarkable breakthroughs in recent years. As machine learning permeates various aspects of daily life, individuals and organizations increasingly interact with these systems, exhibiting a wide range of social and adversarial behaviors. These behaviors may have a notable impact on the behavior and performance of machine learning systems. Specifically, during these interactions, data may be generated by strategic individuals, collected by self-interested data collectors, possibly poisoned by adversarial attackers, and used to create predictors, models, and policies satisfying multiple objectives. As a result, the machine learning systems' outputs might degrade, such as the susceptibility of deep neural networks to adversarial examples (Shafahi et al., 2018; Szegedy et al., 2013) and the diminished performance of classic algorithms in the presence of strategic individuals (Ahmadi et al., 2021). Addressing these challenges is imperative for the success of machine learning in societal settings. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: PhD thesis

arXiv:2407.16714 [pdf, other]

Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation

Authors: Tao Meng, Fuchen Zhang, Yuntao Shou, Hongen Shao, Wei Ai, Keqin Li

Abstract: Since Multimodal Emotion Recognition in Conversation (MERC) can be applied to public opinion monitoring, intelligent dialogue robots, and other fields, it has received extensive research attention in recent years. Unlike traditional unimodal emotion recognition, MERC can fuse complementary semantic information between multiple modalities (e.g., text, audio, and vision) to improve emotion recogniti… ▽ More Since Multimodal Emotion Recognition in Conversation (MERC) can be applied to public opinion monitoring, intelligent dialogue robots, and other fields, it has received extensive research attention in recent years. Unlike traditional unimodal emotion recognition, MERC can fuse complementary semantic information between multiple modalities (e.g., text, audio, and vision) to improve emotion recognition. However, previous work ignored the inter-modal alignment process and the intra-modal noise information before multimodal fusion but directly fuses multimodal features, which will hinder the model for representation learning. In this study, we have developed a novel approach called Masked Graph Learning with Recursive Alignment (MGLRA) to tackle this problem, which uses a recurrent iterative module with memory to align multimodal features, and then uses the masked GCN for multimodal feature fusion. First, we employ LSTM to capture contextual information and use a graph attention-filtering mechanism to eliminate noise effectively within the modality. Second, we build a recurrent iteration module with a memory function, which can use communication between different modalities to eliminate the gap between modalities and achieve the preliminary alignment of features between modalities. Then, a cross-modal multi-head attention mechanism is introduced to achieve feature alignment between modalities and construct a masked GCN for multimodal feature fusion, which can perform random mask reconstruction on the nodes in the graph to obtain better node feature representation. Finally, we utilize a multilayer perceptron (MLP) for emotion recognition. Extensive experiments on two benchmark datasets (i.e., IEMOCAP and MELD) demonstrate that {MGLRA} outperforms state-of-the-art methods. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 15 pages, 9 figures

arXiv:2407.16351 [pdf, other]

Datasets of Visualization for Machine Learning

Authors: Can Liu, Ruike Jiang, Shaocong Tan, Jiacheng Yu, Chaofan Yang, Hanning Shao, Xiaoru Yuan

Abstract: Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a… ▽ More Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a what-why-how model for visualization datasets, considering the content of the dataset (what), the supported tasks (why), and the dataset construction process (how). This model provides a clear understanding of the diversity and complexity of visualization datasets. Additionally, we highlight the challenges faced by existing visualization datasets, including the lack of standardization in data types and formats and the limited availability of large-scale datasets. To address these challenges, we suggest future research directions. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 15 pages

arXiv:2407.13610 [pdf, other]

Dimuon and ditau production in photon-photon collisions at next-to-leading order in QED

Authors: Hua-Sheng Shao, David d'Enterria

Abstract: Next-to-leading-order (NLO) quantum electrodynamics (QED) corrections to the production of muon and tau pairs in photon-photon collisions, $γγ\toμ^{+}μ^{-},τ^{+}τ^{-}$, are calculated in the equivalent photon approximation. We mostly consider $γγ$ processes in ultraperipheral collisions of hadrons at the LHC, but the $γγ\toτ^{+}τ^{-}$ process in $\mathrm{e}^+\mathrm{e}^-$ collisions at LEP is also… ▽ More Next-to-leading-order (NLO) quantum electrodynamics (QED) corrections to the production of muon and tau pairs in photon-photon collisions, $γγ\toμ^{+}μ^{-},τ^{+}τ^{-}$, are calculated in the equivalent photon approximation. We mostly consider $γγ$ processes in ultraperipheral collisions of hadrons at the LHC, but the $γγ\toτ^{+}τ^{-}$ process in $\mathrm{e}^+\mathrm{e}^-$ collisions at LEP is also discussed. The NLO terms are found to modify the total cross sections by up to 5%, increasing the tails of the dilepton acoplanarity and transverse momentum distributions, and depleting by up to 15% the yields at high masses, with respect to the leading-order predictions including the very small virtuality of the colliding photons. At the LHC, the calculations obtained with the charge form factor for protons and lead ions including the NLO QED corrections improve the data--theory agreement for all measured differential distributions, and prove an indispensable ingredient for the extraction of precision quantities in photon-photon processes, such as the anomalous magnetic moment of the tau lepton. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 19 pages, 33 plots

arXiv:2407.12070 [pdf, other]

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Authors: Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang

Abstract: Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices… ▽ More Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices. Hence, we propose a co-design method for efficient end-to-end edge deployment of Transformers from three aspects: algorithm, hardware, and joint optimization. First, we propose BMT, a novel hardware-friendly binarized Transformer with optimized quantization methods and components, and we further enhance its model accuracy by leveraging the weighted ternary weight splitting training technique. Second, we develop a streaming processor mixed binarized Transformer accelerator, namely BAT, which is equipped with specialized units and scheduling pipelines for efficient inference of binarized Transformers. Finally, we co-optimize the algorithm and hardware through a design space exploration approach to achieve a global trade-off between accuracy, latency, and robustness for real-world deployments. Experimental results show our co-design achieves up to 2.14-49.37x throughput gains and 3.72-88.53x better energy efficiency over state-of-the-art Transformer accelerators, enabling efficient end-to-end edge deployment. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: This paper is accepted by ICCAD 2024

arXiv:2406.17931 [pdf, other]

doi 10.1145/3637528.3672020

CAT: Interpretable Concept-based Taylor Additive Models

Authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Hongjue Zhao, Chenxiang Luo, Eric Zavesky, Huaxiu Yao, Huajie Shao

Abstract: As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to… ▽ More As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to train and scale. Additionally, in real-world datasets with many features, the interpretability of feature-based explanations diminishes for humans. To tackle these issues, recent research has shifted towards concept-based interpretable methods. These approaches try to integrate concept learning as an intermediate step before making predictions, explaining the predictions in terms of human-understandable concepts. However, these methods require domain experts to extensively label concepts with relevant names and their ground-truth values. In response, we propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process. CAT does not have to require domain experts to annotate concepts and their ground-truth values. Instead, it only requires users to simply categorize input features into broad groups, which can be easily accomplished through a quick metadata review. Specifically, CAT first embeds each group of input features into one-dimensional high-level concept representation, and then feeds the concept representations into a new white-box Taylor Neural Network (TaylorNet). The TaylorNet aims to learn the non-linear relationship between the inputs and outputs using polynomials. Evaluation results across multiple benchmarks demonstrate that CAT can outperform or compete with the baselines while reducing the need of extensive model parameters. Importantly, it can explain model predictions through high-level concepts that human can understand. △ Less

Submitted 30 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.14869 [pdf, other]

Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy

Authors: Jiayan Gan, Zhixing Du, Qiang Li, Huaizong Shao, Jingran Lin, Ye Pan, Zhongyi Wen, Shafei Wang

Abstract: While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h… ▽ More While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms have emerged. In particular, deep learning (DL) has shown great benefits in automatically extracting complex and subtle features from raw data with high classification accuracy. However, DL algorithms face the computational cost problem as the difficulty of the RFF task and the size of the DNN have increased dramatically. To address the above challenge, this paper proposes a novel costeffective early-exit neural network consisting of a complex-valued neural network (CVNN) backbone with multiple random forest branches, called hybrid CVNN-RF. Unlike conventional studies that use a single fixed DL model to process all RF samples, our hybrid CVNN-RF considers differences in the recognition difficulty of RF samples and introduces an early-exit mechanism to dynamically process the samples. When processing "easy" samples that can be well classified with high confidence, the hybrid CVNN-RF can end early at the random forest branch to reduce computational cost. Conversely, subsequent network layers will be activated to ensure accuracy. To further improve the early-exit rate, an automated multi-dimensional early-exit strategy is proposed to achieve scheduling control from multiple dimensions within the network depth and classification category. Finally, our experiments on the public ADS-B dataset show that the proposed algorithm can reduce the computational cost by 83% while improving the accuracy by 1.6% under a classification task with 100 categories. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted by IEEE Internet of Things Journal

arXiv:2406.13867 [pdf, other]

Error-Correcting Graph Codes

Authors: Swastik Kopparty, Aditya Potukuchi, Harry Sha

Abstract: In this paper, we define, study, and construct {\em Error-Correcting Graph Codes}. An error-correcting graph code of distance $δ$ is a family $C$ of graphs, on a common vertex set of size $n$, such that if we start with any graph in $C$, we would have to modify the neighborhoods of at least $δn$ vertices in order to reach some other graph in $C$. This is a natural graph generalization of the sta… ▽ More In this paper, we define, study, and construct {\em Error-Correcting Graph Codes}. An error-correcting graph code of distance $δ$ is a family $C$ of graphs, on a common vertex set of size $n$, such that if we start with any graph in $C$, we would have to modify the neighborhoods of at least $δn$ vertices in order to reach some other graph in $C$. This is a natural graph generalization of the standard Hamming distance error-correcting codes for binary strings. We show: 1. Combinatorial results determining the optimal rate vs distance tradeoff nonconstructively. 2. A connection to rank-metric codes, enabling some simple and some involved constructions achieving certain positive rates and distances. 3. Graph code analogues of Reed-Solomon codes and code concatenation, leading to positive distance codes for all rates and positive rate codes for all distances. 4. Graph code analogues of dual-BCH codes, yielding large codes with distance $δ= 1-o(1)$. This gives an explicit "graph code of Ramsey graphs". Several recent works, starting with the paper of Alon, Gujgiczer, Körner, Milojević, and Simonyi, have studied more general graph codes; where the symmetric difference between any two graphs in the code is required to have a desired property. Error-correcting graph codes are a particularly interesting instantiation of this concept. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 27 pages, 3 figures, 1 table

ACM Class: G.2.1; E.4

arXiv:2405.18164 [pdf]

Imaging, counting, and positioning single interstitial atoms in solids

Authors: Jizhe Cui, Haozhi Sha, Liangze Mao, Kang Sun, Wenfeng Yang, Rong Yu

Abstract: Interstitial atoms are ubiquitous in solids and they are widely incorporated into materials to tune their lattice structure, electronic transportation, and mechanical properties. Because the distribution of interstitial atoms in matrix materials is usually disordered and most of them are light atoms with weak scattering ability, it remains a challenge to directly image single interstitial atoms an… ▽ More Interstitial atoms are ubiquitous in solids and they are widely incorporated into materials to tune their lattice structure, electronic transportation, and mechanical properties. Because the distribution of interstitial atoms in matrix materials is usually disordered and most of them are light atoms with weak scattering ability, it remains a challenge to directly image single interstitial atoms and measure their geometrical positions. In this work, direct imaging and measuring of single interstitial atoms have been realized with adaptive-propagator ptychography. The measurement of their three-dimensional coordinates enables quantitative analysis of the pair distribution function of the interstitial atoms and reveals the anisotropic occupation of oxygen in the interstitial sites in titanium. The current work paves the way for the determination of interstitial atoms in materials, and for the correlation between the atomic-scale behavior of interstitial atoms and the physical properties of materials. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 20 pages and 8 figures; Jizhe Cui and Haozhi Sha contributed equally to this work. Rong Yu, corresponding author: [email protected]

arXiv:2405.17529 [pdf, other]

Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails

Authors: Haichao Sha, Yang Cao, Yong Liu, Yuncheng Wu, Ruixuan Liu, Hong Chen

Abstract: Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performa… ▽ More Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clipping~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clipping mechanism that applies different clipping thresholds for body and tail gradients to reduce the clipping loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log^{\max(0,θ-1)}(T/δ)\log^{2θ}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $θ\geq 1/2$, iterations $T$, and arbitrary probability $δ$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17233 [pdf, other]

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Authors: Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

Abstract: Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantizatio… ▽ More Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies for LLM quantization. Firstly, a K-Means clustering based algorithm is proposed that allows dynamic generation of quantization centroids for each column of a parameter matrix. Secondly, we design an outlier-guided adaptive precision search strategy which can dynamically assign varying bit-widths to different columns. Finally, a dynamic outlier reservation scheme is developed to retain some parameters in their original float point precision, in trade off of boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 and Yi demonstrate that our methods achieve the state-of-the-art results across different bit settings, especially in extremely low-bit scenarios. Code is available at https://github.com/fayuge/CLAQ. △ Less

Submitted 2 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16889 [pdf]

Extraction of In-Phase and Quadrature Components by Time-Encoding Sampling

Authors: Y. H. Shao, S. Y. Chen, H. Z. Yang, F. Xi, H. Hong, Z. Liu

Abstract: Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper… ▽ More Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper-edge frequency and amplitude as in the case of bandlimited/lowpass signals. We show that the I and Q components can be perfectly reconstructed from the TEM measurements if the minimum firing rate is equal to the Landau's rate of the signal. For the reconstruction of I and Q components, we develop an alternating projection onto convex sets (POCS) algorithm in which two POCS algorithms are alternately iterated. For the algorithm analysis, we define a solution space of vector-valued signals and prove that the proposed reconstruction algorithm converges to the correct unique solution in the noiseless case. The proposed TEM can operate regardless of the center frequencies of the bandpass signals. This is quite different from traditional bandpass sampling, where the center frequency should be carefully allocated for Landau's rate and its variations have the negative effect on the sampling performance. In addition, the proposed TEM achieves certain reconstructed signal-to-noise-plus-distortion ratios for small firing rates in thermal noise, which is unavoidably present and will be aliased to the Nyquist band in the traditional sampling such that high sampling rates are required. We demonstrate the reconstruction performance and substantiate our claims via simulation experiments. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 30 pages, 8 figures

arXiv:2405.14292 [pdf, other]

A New Method in Facial Registration in Clinics Based on Structure Light Images

Authors: Pengfei Li, Ziyue Ma, Hong Wang, Juan Deng, Yan Wang, Zhenyu Xu, Feng Yan, Wenjun Tu, Hong Sha

Abstract: Background and Objective: In neurosurgery, fusing clinical images and depth images that can improve the information and details is beneficial to surgery. We found that the registration of face depth images was invalid frequently using existing methods. To abundant traditional image methods with depth information, a method in registering with depth images and traditional clinical images was investi… ▽ More Background and Objective: In neurosurgery, fusing clinical images and depth images that can improve the information and details is beneficial to surgery. We found that the registration of face depth images was invalid frequently using existing methods. To abundant traditional image methods with depth information, a method in registering with depth images and traditional clinical images was investigated. Methods: We used the dlib library, a C++ library that could be used in face recognition, and recognized the key points on faces from the structure light camera and CT image. The two key point clouds were registered for coarse registration by the ICP method. Fine registration was finished after coarse registration by the ICP method. Results: RMSE after coarse and fine registration is as low as 0.995913 mm. Compared with traditional methods, it also takes less time. Conclusions: The new method successfully registered the facial depth image from structure light images and CT with a low error, and that would be promising and efficient in clinical application of neurosurgery. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06607 [pdf, other]

SO(5) multicriticality in two-dimensional quantum magnets

Authors: Jun Takahashi, Hui Shao, Bowen Zhao, Wenan Guo, Anders W. Sandvik

Abstract: We resolve the nature of the quantum phase transition between a Néel antiferromagnet and a valence-bond solid in two-dimensional spin-1/2 magnets. We study a class of $J$-$Q$ models, in which Heisenberg exchange $J$ competes with interactions $Q_n$ formed by products of $n$ singlet projectors on adjacent parallel lattice links. QMC simulations provide unambiguous evidence for first-order transitio… ▽ More We resolve the nature of the quantum phase transition between a Néel antiferromagnet and a valence-bond solid in two-dimensional spin-1/2 magnets. We study a class of $J$-$Q$ models, in which Heisenberg exchange $J$ competes with interactions $Q_n$ formed by products of $n$ singlet projectors on adjacent parallel lattice links. QMC simulations provide unambiguous evidence for first-order transitions, with the discontinuities increasing with $n$. For $n=2$ and $n=3$ models, the first-order signatures are very weak. On intermediate length scales, we extract well-defined scaling dimensions (critical exponents) that are common to the models with small $n$, indicating proximity to a quantum critical point. By combining two $Q$ terms, the transition can be tuned from weak to more strongly first-order. The two coexisting orders on the first-order line scale with a large exponent $β\approx 0.85$. This exponent and others are close to bounds for an SO($5$) symmetric CFT with a relevant SO($5$) singlet. We characterize the emergent SO($5$) symmetry by the scaling dimensions of its leading irrelevant perturbations. The large $β$ value and a large correlation length exponent, $ν\approx 1.4$, partially explain why the transition remains near-critical even quite far away from the critical point and in many different models without fine-tuning. In addition, we find that few-spin lattice operators are dominated by the SO($5$) violating field (the traceless symmetric tensor), and interactions involving many spins are required to observe strong effects of the relevant SO($5$) singlet. The exponent that had previously been identified with the divergent correlation length when crossing between the two phases does not have a corresponding CFT operator. We explain this emergent pseudocritical scale by a mechanism relying on a dangerously irrelevant SO($5$) perturbation. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 57 pages, 36 figures

arXiv:2405.03882 [pdf, other]

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Authors: Huihong Shi, Haikuo Shao, Wendong Mao, Zhongfeng Wang

Abstract: Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unf… ▽ More Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unfortunately, due to the existence of hardware-unfriendly and quantization-sensitive non-linear operations, particularly {Softmax}, it is non-trivial to completely quantize all operations in ViTs, yielding either significant accuracy drops or non-negligible hardware costs. In response to challenges associated with \textit{standard ViTs}, we focus our attention towards the quantization and acceleration for \textit{efficient ViTs}, which not only eliminate the troublesome Softmax but also integrate linear attention with low computational complexity, and propose \emph{Trio-ViT} accordingly. Specifically, at the algorithm level, we develop a {tailored post-training quantization engine} taking the unique activation distributions of Softmax-free efficient ViTs into full consideration, aiming to boost quantization accuracy. Furthermore, at the hardware level, we build an accelerator dedicated to the specific Convolution-Transformer hybrid architecture of efficient ViTs, thereby enhancing hardware efficiency. Extensive experimental results consistently prove the effectiveness of our Trio-ViT framework. {Particularly, we can gain up to $\uparrow$$\mathbf{7.2}\times$ and $\uparrow$$\mathbf{14.6}\times$ FPS under comparable accuracy over state-of-the-art ViT accelerators, as well as $\uparrow$$\mathbf{5.9}\times$ and $\uparrow$$\mathbf{2.0}\times$ DSP efficiency.} Codes will be released publicly upon acceptance. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.13046 [pdf, other]

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Authors: Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

Abstract: As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understandi… ▽ More As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e.g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content. To alleviate the bias of CLIP vision encoder, we first delve into the inherent behavior of different pre-trained vision encoders and then propose the MoVA, a powerful and novel MLLM, adaptively routing and fusing task-specific vision experts with a coarse-to-fine mechanism. In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. This benefits from the powerful model function understanding ability of the large language model (LLM) equipped with expert-routing low-rank adaptation (LoRA). In the fine-grained stage, we elaborately conduct the mixture-of-vision-expert adapter (MoV-Adapter) to extract and fuse task-specific knowledge from various experts. This coarse-to-fine paradigm effectively leverages representations from experts based on multimodal context and model expertise, further enhancing the generalization ability. We conduct extensive experiments to evaluate the effectiveness of the proposed approach. Without any bells and whistles, MoVA can achieve significant performance gains over current state-of-the-art methods in a wide range of challenging multimodal benchmarks. Codes and models will be available at https://github.com/TempleX98/MoVA. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12867 [pdf, other]

FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving

Authors: Xingtai Gui, Tengteng Huang, Haonan Shao, Haotian Yao, Chi Zhang

Abstract: The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will… ▽ More The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will lead to degradation of the prediction performance. In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR), which views the task as BEV instance segmentation and prediction for future frames. We propose to adopt instance queries representing specific traffic participants to directly estimate the corresponding future occupied masks, and thus get rid of complex post-processing procedures. Besides, we devise a flow-aware BEV predictor for future BEV feature prediction composed of a flow-aware deformable attention that takes backward flow guiding the offset sampling. A novel future instance matching strategy is also proposed to further improve the temporal coherence. Extensive experiments demonstrate the superiority of FipTR and its effectiveness under different temporal BEV encoders. The code is available at https://github.com/TabGuigui/FipTR . △ Less

Submitted 24 July, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.08145 [pdf]

Polar vortex hidden in twisted bilayers of paraelectric SrTiO3

Authors: Haozhi Sha, Yixuan Zhang, Yunpeng Ma, Wei Li, Wenfeng Yang, Jizhe Cui, Qian Li, Houbing Huang, Rong Yu

Abstract: Polar topologies, such as vortex and skyrmion, have attracted significant interest due to their unique physical properties and promising applications in high-density memory devices. Currently, most polar vortices are observed in heterostructures containing ferroelectric materials and constrained by substrates. In this study, we unravel arrays of polar vortices formed in twisted freestanding bilaye… ▽ More Polar topologies, such as vortex and skyrmion, have attracted significant interest due to their unique physical properties and promising applications in high-density memory devices. Currently, most polar vortices are observed in heterostructures containing ferroelectric materials and constrained by substrates. In this study, we unravel arrays of polar vortices formed in twisted freestanding bilayers composed of SrTiO3, a quantum-paraelectric material. Depth-resolved structures of the bilayers are measured with deep-sub-angstrom resolution and one picometer accuracy using multislice ptychography, enabling identification of the three-dimensional variations of polarization topology. Our findings reveal the evolution of the polar vortices in the twisted overlapping layers, demonstrating the reverse of rotation manner in the depth direction. Twisted freestanding bilayers provide a unique platform for exploration and modulation of novel polar topologies. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.02571 [pdf]

Wenzhou TE: a first-principles calculated thermoelectric materials database

Authors: Ying Fang, Hezhu Shao

Abstract: Since the implementation of the Materials Genome Project by the Obama administration in the United States, the development of various computational materials databases has fundamentally expanded the choices of industries such as materials and energy. In the field of thermoelectric materials, the thermoelectric figure of merit ZT quantifies the performance of the material. From the viewpoint of cal… ▽ More Since the implementation of the Materials Genome Project by the Obama administration in the United States, the development of various computational materials databases has fundamentally expanded the choices of industries such as materials and energy. In the field of thermoelectric materials, the thermoelectric figure of merit ZT quantifies the performance of the material. From the viewpoint of calculations for vast materials, the ZT values are not easily obtained due to their computational complexity. Here, we show how to build a database of thermoelectric materials based on first-principles calculations for the electronic and heat transport of materials. Firstly, the initial structures are classified according to the values of bandgap and other basic properties using the clustering algorithm K-means in machine learning, and high-throughput first principles calculations are carried out for narrow-bandgap semiconductors which exhibiting potential thermoelectric application. The present framework of calculations mainly includes deformation potential module, electrical transport performance module, mechanical and thermodynamic properties module. We have also set up a search webpage for the calculated database of thermoelectric materials, providing searching and viewing the related physical properties of materials. Our work may inspire the construction of more computational databases of first-principle thermoelectric materials and accelerate research progress in the field of thermoelectrics. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 13 pages, 5 figures

Journal ref: https://www.mdpi.com/1996-1944/17/10/2200

arXiv:2404.01448 [pdf]

Prior Frequency Guided Diffusion Model for Limited Angle (LA)-CBCT Reconstruction

Authors: Jiacheng Xie, Hua-Chieh Shao, Yunxiang Li, You Zhang

Abstract: Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate… ▽ More Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate data/images by reversing a data-noising process through learned data distributions; and can be incorporated as a denoiser/regularizer in LA-CBCT reconstruction. In this study, we developed a diffusion model-based framework, prior frequency-guided diffusion model (PFGDM), for robust and structure-preserving LA-CBCT reconstruction. PFGDM uses a conditioned diffusion model as a regularizer for LA-CBCT reconstruction, and the condition is based on high-frequency information extracted from patient-specific prior CT scans which provides a strong anatomical prior for LA-CBCT reconstruction. Specifically, we developed two variants of PFGDM (PFGDM-A and PFGDM-B) with different conditioning schemes. PFGDM-A applies the high-frequency CT information condition until a pre-optimized iteration step, and drops it afterwards to enable both similar and differing CT/CBCT anatomies to be reconstructed. PFGDM-B, on the other hand, continuously applies the prior CT information condition in every reconstruction step, while with a decaying mechanism, to gradually phase out the reconstruction guidance from the prior CT scans. The two variants of PFGDM were tested and compared with current available LA-CBCT reconstruction solutions, via metrics including PSNR and SSIM. PFGDM outperformed all traditional and diffusion model-based methods. PFGDM reconstructs high-quality LA-CBCTs under very-limited gantry angles, allowing faster and more flexible CBCT scans with dose reductions. △ Less

Submitted 8 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 20 pages, 8 figures, submitted to Physics in Medicine & Biology

arXiv:2403.20230 [pdf, other]

An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT

Authors: Haikuo Shao, Huihong Shi, Wendong Mao, Zhongfeng Wang

Abstract: Vision Transformers (ViTs) have achieved significant success in computer vision. However, their intensive computations and massive memory footprint challenge ViTs' deployment on embedded devices, calling for efficient ViTs. Among them, EfficientViT, the state-of-the-art one, features a Convolution-Transformer hybrid architecture, enhancing both accuracy and hardware efficiency. Unfortunately, exis… ▽ More Vision Transformers (ViTs) have achieved significant success in computer vision. However, their intensive computations and massive memory footprint challenge ViTs' deployment on embedded devices, calling for efficient ViTs. Among them, EfficientViT, the state-of-the-art one, features a Convolution-Transformer hybrid architecture, enhancing both accuracy and hardware efficiency. Unfortunately, existing accelerators cannot fully exploit the hardware benefits of EfficientViT due to its unique architecture. In this paper, we propose an FPGA-based accelerator for EfficientViT to advance the hardware efficiency frontier of ViTs. Specifically, we design a reconfigurable architecture to efficiently support various operation types, including lightweight convolutions and attention, boosting hardware utilization. Additionally, we present a time-multiplexed and pipelined dataflow to facilitate both intra- and inter-layer fusions, reducing off-chip data access costs. Experimental results show that our accelerator achieves up to 780.2 GOPS in throughput and 105.1 GOPS/W in energy efficiency at 200MHz on the Xilinx ZCU102 FPGA, which significantly outperforms prior works. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: To appear in the 2024 IEEE International Symposium on Circuits and Systems (ISCAS 2024)

arXiv:2403.16999 [pdf, other]

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Authors: Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

Abstract: Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduc… ▽ More Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions. Additionally, about 98k pairs of them are annotated with detailed reasoning steps. Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. We also introduce the related benchmark to evaluate the MLLMs in scenarios requiring specific local region identification. Extensive experiments demonstrate the effectiveness of our framework and shed light on better inference strategies. The Visual CoT dataset, benchmark, and pre-trained models are released to foster further research in this direction. △ Less

Submitted 7 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Code: https://github.com/deepcs233/Visual-CoT

arXiv:2403.15464 [pdf, other]

LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction

Authors: Hejie Cui, Zhuocheng Shen, Jieyu Zhang, Hui Shao, Lianhui Qin, Joyce C. Ho, Carl Yang

Abstract: Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit dat… ▽ More Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit data (e.g., diagnoses, labs, prescriptions) into natural language narratives. We evaluate the zero-shot and few-shot performance of LLMs using various EHR-prediction-oriented prompting strategies. Furthermore, we propose a novel approach that utilizes LLM agents with different roles: a predictor agent that makes predictions and generates reasoning processes and a critic agent that analyzes incorrect predictions and provides guidance for improving the reasoning of the predictor agent. Our results demonstrate that with the proposed approach, LLMs can achieve decent few-shot performance compared to traditional supervised learning methods in EHR-based disease predictions, suggesting its potential for health-oriented applications. △ Less

Submitted 19 March, 2024; originally announced March 2024.

ACM Class: J.3; I.2.7

arXiv:2403.14693 [pdf]

A2CI: A Cloud-based, Service-oriented Geospatial Cyberinfrastructure to Support Atmospheric Research

Authors: Wenwen Li, Hu Shao, Sizhe Wang, Xiran Zhou, Sheng Wu

Abstract: Big earth science data offers the scientific community great opportunities. Many more studies at large-scales, over long-terms and at high resolution can now be conducted using the rich information collected by remote sensing satellites, ground-based sensor networks, and even social media input. However, the hundreds of terabytes of information collected and compiled on an hourly basis by NASA and… ▽ More Big earth science data offers the scientific community great opportunities. Many more studies at large-scales, over long-terms and at high resolution can now be conducted using the rich information collected by remote sensing satellites, ground-based sensor networks, and even social media input. However, the hundreds of terabytes of information collected and compiled on an hourly basis by NASA and other government agencies present a significant challenge for atmospheric scientists seeking to improve the understanding of the Earth atmospheric system. These challenges include effective discovery, organization, analysis and visualization of large amounts of data. This paper reports the outcomes of an NSF-funded project that developed a geospatial cyberinfrastructure -- the A2CI (Atmospheric Analysis Cyberinfrastructure) -- to support atmospheric research. We first introduce the service-oriented system framework then describe in detail the implementation of the data discovery module, data management module, data integration module, data analysis and visualization modules following the cloud computing principles-Data-as-a-Service, Software-as-a-Service, Platform-as-a-Service and Infrastructure-as-a-Service. We demonstrate the graphic user interface by performing an analysis between Sea Surface Temperature and the intensity of tropical storms in the North Atlantic and Pacific oceans. We expect this work to contribute to the technical advancement of cyberinfrastructure research as well as to the development of an online, collaborative scientific analysis system for atmospheric science. △ Less

Submitted 15 March, 2024; originally announced March 2024.

MSC Class: big data; cyberinfrastructure; cloud computing

arXiv:2403.11492 [pdf, other]

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Authors: Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu

Abstract: Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectori… ▽ More Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectories are first proposed, and then used to select critical context information for trajectory refinement. However, they either incur a large amount of computation or bring limited improvement, if not both. In this paper, we introduce a novel scenario-adaptive refinement strategy, named SmartRefine, to refine prediction with minimal additional computation. Specifically, SmartRefine can comprehensively adapt refinement configurations based on each scenario's properties, and smartly chooses the number of refinement iterations by introducing a quality score to measure the prediction quality and remaining refinement potential of each scenario. SmartRefine is designed as a generic and flexible approach that can be seamlessly integrated into most state-of-the-art motion prediction models. Experiments on Argoverse (1 & 2) show that our method consistently improves the prediction accuracy of multiple state-of-the-art prediction models. Specifically, by adding SmartRefine to QCNet, we outperform all published ensemble-free works on the Argoverse 2 leaderboard (single agent track) at submission. Comprehensive studies are also conducted to ablate design choices and explore the mechanism behind multi-iteration refinement. Codes are available at https://github.com/opendilab/SmartRefine/ △ Less

Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Camera-ready version for CVPR 2024

arXiv:2403.10779 [pdf, other]

LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices

Authors: Jingping Nie, Hanya Shao, Yuang Fan, Qijia Shao, Haoxuan You, Matthias Preindl, Xiaofan Jiang

Abstract: Despite the global mental health crisis, access to screenings, professionals, and treatments remains high. In collaboration with licensed psychotherapists, we propose a Conversational AI Therapist with psychotherapeutic Interventions (CaiTI), a platform that leverages large language models (LLM)s and smart devices to enable better mental health self-care. CaiTI can screen the day-to-day functionin… ▽ More Despite the global mental health crisis, access to screenings, professionals, and treatments remains high. In collaboration with licensed psychotherapists, we propose a Conversational AI Therapist with psychotherapeutic Interventions (CaiTI), a platform that leverages large language models (LLM)s and smart devices to enable better mental health self-care. CaiTI can screen the day-to-day functioning using natural and psychotherapeutic conversations. CaiTI leverages reinforcement learning to provide personalized conversation flow. CaiTI can accurately understand and interpret user responses. When the user needs further attention during the conversation, CaiTI can provide conversational psychotherapeutic interventions, including cognitive behavioral therapy (CBT) and motivational interviewing (MI). Leveraging the datasets prepared by the licensed psychotherapists, we experiment and microbenchmark various LLMs' performance in tasks along CaiTI's conversation flow and discuss their strengths and weaknesses. With the psychotherapists, we implement CaiTI and conduct 14-day and 24-week studies. The study results, validated by therapists, demonstrate that CaiTI can converse with users naturally, accurately understand and interpret user responses, and provide psychotherapeutic interventions appropriately and effectively. We showcase the potential of CaiTI LLMs to assist the mental therapy diagnosis and treatment and improve day-to-day functioning screening and precautionary psychotherapeutic intervention systems. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10319 [pdf, other]

NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models

Authors: Chen Qian, Xiaochang Li, Qineng Wang, Gang Zhou, Huajie Shao

Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both cip… ▽ More In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they often rely on different datasets for performance evaluation. This inconsistency results in substantial manual data processing efforts and unfair comparisons. Moreover, some data processing methods may cause data leakage due to improper separation of training and testing data. To address these issues, we introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative models using our benchmark. The results show that foundation models significantly outperform the traditional deep learning methods in traffic classification. We believe NetBench will facilitate fair comparisons among various approaches and advance the development of foundation models for network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench. △ Less

Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09615 [pdf, other]

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation

Authors: Yuhan Guo, Hanning Shao, Can Liu, Kai Xu, Xiaoru Yuan

Abstract: Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insig… ▽ More Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insights into what have been explored and how the prompt changes impact the output image, yet little research attention has been paid to the visual analysis of such process to support users. We propose the Image Variant Graph, a novel visual representation designed to support comparing prompt-image pairs and exploring the editing history. The Image Variant Graph models prompt differences as edges between corresponding images and presents the distances between images through projection. Based on the graph, we developed the PrompTHis system through co-design with artists. Besides Image Variant Graph, PrompTHis also incorporates a detailed prompt-image history and a navigation mini-map. Based on the review and analysis of the prompting history, users can better understand the impact of prompt changes and have a more effective control of image generation. A quantitative user study with eleven amateur participants and qualitative interviews with five professionals and one amateur user were conducted to evaluate the effectiveness of PrompTHis. The results demonstrate PrompTHis can help users review the prompt history, make sense of the model, and plan their creative process. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.07390 [pdf, other]

Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution

Authors: Haochen Sun, Yan Yuan, Lijuan Su, Haotian Shao

Abstract: Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res… ▽ More Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a result of correction errors. In this paper, we introduce a novel blind SR approach that focuses on Learning Correction Errors (LCE). Our method employs a lightweight Corrector to obtain a corrected low-resolution (CLR) image. Subsequently, within an SR network, we jointly optimize SR performance by utilizing both the original LR image and the frequency learning of the CLR image. Additionally, we propose a new Frequency-Self Attention block (FSAB) that enhances the global information utilization ability of Transformer. This block integrates both self-attention and frequency spatial attention mechanisms. Extensive ablation and comparison experiments conducted across various settings demonstrate the superiority of our method in terms of visual quality and accuracy. Our approach effectively addresses the challenges associated with degradation estimation and correction errors, paving the way for more accurate blind image SR. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 16 pages

arXiv:2402.19303 [pdf, ps, other]

Learnability Gaps of Strategic Classification

Authors: Lee Cohen, Yishay Mansour, Shay Moran, Han Shao

Abstract: In contrast with standard classification tasks, strategic classification involves agents strategically modifying their features in an effort to receive favorable predictions. For instance, given a classifier determining loan approval based on credit scores, applicants may open or close their credit cards to fool the classifier. The learning goal is to find a classifier robust against strategic man… ▽ More In contrast with standard classification tasks, strategic classification involves agents strategically modifying their features in an effort to receive favorable predictions. For instance, given a classifier determining loan approval based on credit scores, applicants may open or close their credit cards to fool the classifier. The learning goal is to find a classifier robust against strategic manipulations. Various settings, based on what and when information is known, have been explored in strategic classification. In this work, we focus on addressing a fundamental question: the learnability gaps between strategic classification and standard learning. We essentially show that any learnable class is also strategically learnable: we first consider a fully informative setting, where the manipulation structure (which is modeled by a manipulation graph $G^\star$) is known and during training time the learner has access to both the pre-manipulation data and post-manipulation data. We provide nearly tight sample complexity and regret bounds, offering significant improvements over prior results. Then, we relax the fully informative setting by introducing two natural types of uncertainty. First, following Ahmadi et al. (2023), we consider the setting in which the learner only has access to the post-manipulation data. We improve the results of Ahmadi et al. (2023) and close the gap between mistake upper bound and lower bound raised by them. Our second relaxation of the fully informative setting introduces uncertainty to the manipulation structure. That is, we assume that the manipulation graph is unknown but belongs to a known class of graphs. We provide nearly tight bounds on the learning complexity in various unknown manipulation graph settings. Notably, our algorithm in this setting is of independent interest and can be applied to other problems such as multi-label learning. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.19221 [pdf, other]

doi 10.1007/JHEP07(2024)050

FKS subtraction for quarkonium production at NLO

Authors: Ajjath A H, Hua-Sheng Shao, Lukas Simon

Abstract: We extend the local infrared-divergence subtraction formalism, originally proposed by Frixione, Kunszt and Signer (FKS), to calculate short-distance (differential) cross section for any inclusive process involving a quarkonium particle in non-relativistic QCD (NRQCD) factorisation at next-to-leading order (NLO) accuracy in the strong coupling constant $α_s$. The new formulas are generally applicab… ▽ More We extend the local infrared-divergence subtraction formalism, originally proposed by Frixione, Kunszt and Signer (FKS), to calculate short-distance (differential) cross section for any inclusive process involving a quarkonium particle in non-relativistic QCD (NRQCD) factorisation at next-to-leading order (NLO) accuracy in the strong coupling constant $α_s$. The new formulas are generally applicable to the production of an S- or P-wave quarkonium state in association with any number of elementary particles. The main new ingredients derived in this paper are the local and integrated soft counterterms for the colour-singlet and colour-octet P-wave bound states. It, therefore, paves the way to the automation of the NLO calculations for heavy quarkonium inclusive and associated production processes. △ Less

Submitted 6 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 53 pages, 2 figures, v2 (journal version)

Journal ref: JHEP 07 (2024) 050

arXiv:2402.15991 [pdf, other]

$C^3$: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding

Authors: Taixi Lu, Haoyu Wang, Huajie Shao, Jing Gao, Huaxiu Yao

Abstract: Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-t… ▽ More Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-time systems. Existing model cascade methods seek to enhance inference efficiency by greedily selecting the lightest model capable of processing the current input from a variety of models, based on model confidence scores. Nonetheless, deep models tend to exhibit overconfidence, and confidence distributions vary across languages. This leads to the emission of confident but incorrect predictions by smaller models, hindering their ability to generalize effectively across test languages. In this study, we introduce a confidence calibration model cascade ($C^3$) method. This approach, simple yet effective, involves calibration prior to cascade inference, thereby enhancing cascade accuracy through more reliable predictions. Extensive experiments conducted on three cross-lingual benchmarks demonstrate that $C^3$ significantly outperforms all state-of-the-art baselines. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15758 [pdf, other]

Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens

Authors: Ziqian Zeng, Jiahong Yu, Qianshi Pang, Zihao Wang, Huiping Zhuang, Hongen Shao, Xiaofeng Zou

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their widespread application is hindered by the resource-intensive decoding process. To address this challenge, current approaches have incorporated additional decoding heads to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the ac… ▽ More Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their widespread application is hindered by the resource-intensive decoding process. To address this challenge, current approaches have incorporated additional decoding heads to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the accuracy of these decoding heads falls short of the auto-regressive decoding approach. In light of these limitations, we propose Chimera, a novel framework specifically designed for speculative sampling. Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words. To ensure both accuracy and efficiency, we present two strategies within the lightweight draft model. Firstly, we focus on capturing short-range dependencies at the bottom layer. Secondly, we leverage the readily available representations from the original LLM.Through empirical evaluation on the Vicuna and LlaMA-2 series, Chimera demonstrates impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach. This highlights the potential of our proposed framework in significantly improving the efficiency of large language models during the decoding process. △ Less

Submitted 18 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14605 [pdf, other]

Observation of the antiferromagnetic phase transition in the fermionic Hubbard model

Authors: Hou-Ji Shao, Yu-Xuan Wang, De-Zhi Zhu, Yan-Song Zhu, Hao-Nan Sun, Si-Yuan Chen, Chi Zhang, Zhi-Jie Fan, Youjin Deng, Xing-Can Yao, Yu-Ao Chen, Jian-Wei Pan

Abstract: The fermionic Hubbard model (FHM)[1], despite its simple form, captures essential features of strongly correlated electron physics. Ultracold fermions in optical lattices[2, 3] provide a clean and well-controlled platform for simulating FHM. Doping its antiferromagnetic ground state at half filling, various exotic phases are expected to arise in the FHM simulator, including stripe order[4], pseudo… ▽ More The fermionic Hubbard model (FHM)[1], despite its simple form, captures essential features of strongly correlated electron physics. Ultracold fermions in optical lattices[2, 3] provide a clean and well-controlled platform for simulating FHM. Doping its antiferromagnetic ground state at half filling, various exotic phases are expected to arise in the FHM simulator, including stripe order[4], pseudogap[5], and d-wave superconductors[6], offering valuable insights into high-temperature superconductivity[7{9]. Although notable progress, such as the observation of antiferromagnetic correlations over short[10] and extended distances[11], has been obtained, the antiferromagnetic phase has yet to be realized due to the significant challenges of achieving low temperatures in a large and uniform quantum simulator. Here, we report the observation of the antiferromagnetic phase transition in a three-dimensional fermionic Hubbard system comprising lithium-6 atoms in a uniform optical lattice with approximately 800,000 sites. When the interaction strength, temperature, and doping concentration are finely tuned to approach their respective critical values, sharp increases in the spin structure factor (SSF) are observed. These observations can be well described by a power-law divergence, with a critical exponent of 1.396 from the Heisenberg universality class[12]. At half filling and with optimal interaction strength, the measured SSF reaches 123(8), signifying the establishment of an antiferromagnetic phase. Our results set the stage for exploring the low-temperature phase diagram of FHM. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.12634 [pdf, other]

doi 10.1145/3613904.3643022

Data Storytelling in Data Visualisation: Does it Enhance the Efficiency and Effectiveness of Information Retrieval and Insights Comprehension?

Authors: Honbo Shao, Roberto Martinez-Maldonado, Vanessa Echeverria, Lixiang Yan, Dragan Gasevic

Abstract: Data storytelling (DS) is rapidly gaining attention as an approach that integrates data, visuals, and narratives to create data stories that can help a particular audience to comprehend the key messages underscored by the data with enhanced efficiency and effectiveness. It has been posited that DS can be especially advantageous for audiences with limited visualisation literacy, by presenting the d… ▽ More Data storytelling (DS) is rapidly gaining attention as an approach that integrates data, visuals, and narratives to create data stories that can help a particular audience to comprehend the key messages underscored by the data with enhanced efficiency and effectiveness. It has been posited that DS can be especially advantageous for audiences with limited visualisation literacy, by presenting the data clearly and concisely. However, empirical studies confirming whether data stories indeed provide these benefits over conventional data visualisations are scarce. To bridge this gap, we conducted a study with 103 participants to determine whether DS indeed improve both efficiency and effectiveness in tasks related to information retrieval and insights comprehension. Our findings suggest that data stories do improve the efficiency of comprehension tasks, as well as the effectiveness of comprehension tasks that involve a single insight compared with conventional visualisations. Interestingly, these benefits were not associated with participants' visualisation literacy. △ Less

Submitted 20 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted to CHI24 Edited two typos. One in the abstract, another in a formulae

arXiv:2402.09013 [pdf, other]

doi 10.1117/1.JATIS.10.1.015002

Asgard/NOTT: L-band nulling interferometry at the VLTI. II. Warm optical design and injection system

Authors: Germain Garreau, Azzurra Bigioli, Romain Laugier, Gert Raskin, Johan Morren, Jean-Philippe Berger, Colin Dandumont, Harry-Dean Kenchington Goldsmith, Simon Gross, Michael Ireland, Lucas Labadie, Jérôme Loicq, Stephen Madden, Guillermo Martin, Marc-Antoine Martinod, Alexandra Mazzoli, Ahmed Sanny, Hancheng Shao, Kunlun Yan, Denis Defrère

Abstract: Asgard/NOTT (previously Hi-5) is a European Research Council (ERC)-funded project hosted at KU Leuven and a new visitor instrument for the Very Large Telescope Interferometer (VLTI). Its primary goal is to image the snow line region around young stars using nulling interferometry in the L-band (3.5 to 4.0)$μ$m, where the contrast between exoplanets and their host stars is advantageous. The breakth… ▽ More Asgard/NOTT (previously Hi-5) is a European Research Council (ERC)-funded project hosted at KU Leuven and a new visitor instrument for the Very Large Telescope Interferometer (VLTI). Its primary goal is to image the snow line region around young stars using nulling interferometry in the L-band (3.5 to 4.0)$μ$m, where the contrast between exoplanets and their host stars is advantageous. The breakthrough is the use of a photonic beam combiner, which only recently allowed the required theoretical raw contrast of $10^{-3}$ in this spectral range. Nulling interferometry observations of exoplanets also require a high degree of balancing between the four pupils of the VLTI in terms of intensity, phase, and polarization. The injection into the beam combiner and the requirements of nulling interferometry are driving the design of the warm optics and the injection system. The optical design up to the beam combiner is presented. It offers a technical solution to efficiently couple the light from the VLTI into the beam combiner. During the coupling, the objective is to limit throughput losses to 5% of the best expected efficiency for the injection. To achieve this, a list of different loss sources is considered with their respective impact on the injection efficiency. Solutions are also proposed to meet the requirements on beam balancing for intensity, phase, and polarization. The different properties of the design are listed, including the optics used, their alignment and tolerances, and their impact on the instrumental performances in terms of throughput and null depth. The performance evaluation gives an expected throughput loss of less than <6.4% of the best efficiency for the injection and a null depth of $\sim2.10^{-3}$, mainly from optical path delay errors outside the scope of this work. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted for publication in JATIS. 23 pages, 11 figures, 8 tables

Journal ref: J. Astron. Telesc. Instrum. Syst. 10(1), 015002 (2024)

arXiv:2402.05935 [pdf, other]

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Authors: Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao

Abstract: We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we… ▽ More We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory △ Less

Submitted 26 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted by ICML 2024. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

arXiv:2402.03646 [pdf, other]

Lens: A Foundation Model for Network Traffic in Cybersecurity

Authors: Qineng Wang, Chen Qian, Xiaochang Li, Ziyu Yao, Huajie Shao

Abstract: Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking se… ▽ More Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from massive traffic data. However, these methods typically excel in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from raw data. To further enhance pre-training effectiveness, we design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results across various benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and generation. Notably, it also requires much less labeled data for fine-tuning compared to current methods. △ Less

Submitted 28 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02851 [pdf, other]

Enhancing Compositional Generalization via Compositional Feature Alignment

Authors: Haoxiang Wang, Haozhe Si, Huajie Shao, Han Zhao

Abstract: Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domains scales up, it becomes infeasible to gather training data for every domain-class combination. This challenge naturally leads the quest for models wi… ▽ More Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domains scales up, it becomes infeasible to gather training data for every domain-class combination. This challenge naturally leads the quest for models with Compositional Generalization (CG) ability, where models can generalize to unseen domain-class combinations. To delve into the CG challenge, we develop CG-Bench, a suite of CG benchmarks derived from existing real-world image datasets, and observe that the prevalent pretraining-finetuning paradigm on foundational models, such as CLIP and DINOv2, struggles with the challenge. To address this challenge, we propose Compositional Feature Alignment (CFA), a simple two-stage finetuning technique that i) learns two orthogonal linear heads on a pretrained encoder with respect to class and domain labels, and ii) fine-tunes the encoder with the newly learned head frozen. We theoretically and empirically justify that CFA encourages compositional feature learning of pretrained models. We further conduct extensive experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models. Experiment results show that CFA outperforms common finetuning techniques in compositional generalization, corroborating CFA's efficacy in compositional feature learning. △ Less

Submitted 22 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Published in Transactions on Machine Learning Research (TMLR). The code is released at https://github.com/Haoxiang-Wang/Compositional-Feature-Alignment

arXiv:2401.02439 [pdf]

Information limit of 15 pm achieved with bright-field ptychography

Authors: Haozhi Sha, Jizhe Cui, Wenfeng Yang, Rong Yu

Abstract: It is generally assumed that a high spatial resolution of a microscope requires a large numerical aperture of the imaging lens or detector. In this study, the information limit of 15 pm is achieved in transmission electron microscopy using only the bright-field disk (small numerical aperture) via multislice ptychography. The results indicate that high-frequency information has been encoded in the… ▽ More It is generally assumed that a high spatial resolution of a microscope requires a large numerical aperture of the imaging lens or detector. In this study, the information limit of 15 pm is achieved in transmission electron microscopy using only the bright-field disk (small numerical aperture) via multislice ptychography. The results indicate that high-frequency information has been encoded in the electrons scattered to low angles due to the multiple scattering of electrons in the objects, making it possible to break the diffraction limit of imaging via bright-field ptychography. △ Less

Submitted 20 December, 2023; originally announced January 2024.

Comments: 10 pages, 4 figures

arXiv:2401.01638 [pdf, other]

Radon Removal Commissioning of the PandaX-4T Cryogenic Distillation System

Authors: Xiangyi Cui, Zhou Wang, Jiafu Li, Shuaijie Li, Lin Si, Yonglin Ju, Wenbo Ma, Jianglai Liu, Li Zhao, Xiangdong Ji, Rui Yan, Haidong Sha, Peiyao Huang, Xiuli Wang, Huaxuan Liu

Abstract: The PandaX-4T distillation system, designed for the removal of krypton and radon from xenon, is evaluated for its radon removal efficiency using a $^{222}$Rn source during the online distillation process. The PandaX-4T dark matter detector is employed to monitor the temporal evolution of radon activity. To determine the radon reduction factor, the experimental data of radon atoms introduced into a… ▽ More The PandaX-4T distillation system, designed for the removal of krypton and radon from xenon, is evaluated for its radon removal efficiency using a $^{222}$Rn source during the online distillation process. The PandaX-4T dark matter detector is employed to monitor the temporal evolution of radon activity. To determine the radon reduction factor, the experimental data of radon atoms introduced into and bypassed the distillation system is compared. The results indicate that the PandaX-4T distillation system achieves a radon reduction factor exceeding 190 at the flow rate of 10 slpm and the reflux ratio of 1.44. Gas-only online distillation process of a flow rate of 20 slpm is also conducted without observing significant reduction of radon levels in the detector. This observation suggests that the migration flow of radon atoms from the liquid phase to the gas phase is limited, and the flow rate of gas circulation and duration of the process are insignificant compared to the total xenon mass of 5.6 tons in the detector. This study provides the experimental data to support the efficient removal of radon at $\sim$Bq level using the PandaX-4T distillation system, which is the prerequisite of the radon background control in the detector. The further operation with higher flow rate will be applied for the upcoming science run in PandaX-4T. △ Less

Submitted 19 April, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: 14 pages, 9 figures

arXiv:2401.01495 [pdf, other]

A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning

Authors: Wei Ai, FuChen Zhang, Tao Meng, YunTao Shou, HongEn Shao, Keqin Li

Abstract: In terms of human-computer interaction, it is becoming more and more important to correctly understand the user's emotional state in a conversation, so the task of multimodal emotion recognition (MER) started to receive more attention. However, existing emotion classification methods usually perform classification only once. Sentences are likely to be misclassified in a single round of classificat… ▽ More In terms of human-computer interaction, it is becoming more and more important to correctly understand the user's emotional state in a conversation, so the task of multimodal emotion recognition (MER) started to receive more attention. However, existing emotion classification methods usually perform classification only once. Sentences are likely to be misclassified in a single round of classification. Previous work usually ignores the similarities and differences between different morphological features in the fusion process. To address the above issues, we propose a two-stage emotion recognition model based on graph contrastive learning (TS-GCL). First, we encode the original dataset with different preprocessing modalities. Second, a graph contrastive learning (GCL) strategy is introduced for these three modal data with other structures to learn similarities and differences within and between modalities. Finally, we use MLP twice to achieve the final emotion classification. This staged classification method can help the model to better focus on different levels of emotional information, thereby improving the performance of the model. Extensive experiments show that TS-GCL has superior performance on IEMOCAP and MELD datasets compared with previous methods. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 9 pages, 3 figures

arXiv:2401.00376 [pdf, other]

Magnon, doublon and quarton excitations in 2D S=1/2 trimerized Heisenberg models

Authors: Yue-Yue Chang, Jun-Qing Cheng, Hui Shao, Dao-Xin Yao, Han-Qing Wu

Abstract: We investigate the magnetic excitations of the trimerized Heisenberg models with intra-trimer interaction $J_1$ and inter-trimer interaction $J_2$ on four different two-dimensional lattices using a combination of stochastic series expansion quantum Monte Carlo (SSE QMC) and stochastic analytic continuation methods (SAC), complemented by cluster perturbation theory (CPT). These models exhibit quasi… ▽ More We investigate the magnetic excitations of the trimerized Heisenberg models with intra-trimer interaction $J_1$ and inter-trimer interaction $J_2$ on four different two-dimensional lattices using a combination of stochastic series expansion quantum Monte Carlo (SSE QMC) and stochastic analytic continuation methods (SAC), complemented by cluster perturbation theory (CPT). These models exhibit quasi-particle-like excitations when $g=J_2/J_1$ is small, characterized by low-energy magnons, intermediate-energy doublons, and high-energy quartons. The low-energy magnons are associated with the magnetic ground states. They can be described by the linear spin wave theory (LSWT) of the effective block spin model and the original spin model. Doublons and quartons emerge from the corresponding internal excitations of the trimers with distinct energy levels, which can be effectively analyzed using perturbation theory when the ratio of exchange interactions $g$ is small. In this small $g$ regime, we observe a clear separation between the magnon and higher-energy spectra. However, as $g$ increases, these three spectra gradually merge into the magnon modes or continua. Nevertheless, the LSWT fails to provide quantitative descriptions of the higher-energy excitation bands due to significant quantum fluctuations. Notably, in the Collinear II and trimerized hexagon lattice, a broad continuum emerges above the single-magnon spectrum, originating from the quasi-1D physics due to the dilute connections between chains. Our numerical analysis of these 2D trimers yields valuable theoretical predictions and explanations for the inelastic neutron scattering (INS) spectra of 2D magnetic materials featuring trimerized lattices. △ Less

Submitted 16 June, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Showing 1–50 of 356 results for author: Shao, H