Search | arXiv e-print repository

Analytical and Empirical Study of Herding Effects in Recommendation Systems

Authors: Hong Xie, Mingze Zhong, Defu Lian, Zhen Wang, Enhong Chen

Abstract: Online rating systems are often used in numerous web or mobile applications, e.g., Amazon and TripAdvisor, to assess the ground-truth quality of products. Due to herding effects, the aggregation of historical ratings (or historical collective opinion) can significantly influence subsequent ratings, leading to misleading and erroneous assessments. We study how to manage product ratings via rating a… ▽ More Online rating systems are often used in numerous web or mobile applications, e.g., Amazon and TripAdvisor, to assess the ground-truth quality of products. Due to herding effects, the aggregation of historical ratings (or historical collective opinion) can significantly influence subsequent ratings, leading to misleading and erroneous assessments. We study how to manage product ratings via rating aggregation rules and shortlisted representative reviews, for the purpose of correcting the assessment error. We first develop a mathematical model to characterize important factors of herding effects in product ratings. We then identify sufficient conditions (via the stochastic approximation theory), under which the historical collective opinion converges to the ground-truth collective opinion of the whole user population. These conditions identify a class of rating aggregation rules and review selection mechanisms that can reveal the ground-truth product quality. We also quantify the speed of convergence (via the martingale theory), which reflects the efficiency of rating aggregation rules and review selection mechanisms. We prove that the herding effects slow down the speed of convergence while an accurate review selection mechanism can speed it up. We also study the speed of convergence numerically and reveal trade-offs in selecting rating aggregation rules and review selection mechanisms. To show the utility of our framework, we design a maximum likelihood algorithm to infer model parameters from ratings, and conduct experiments on rating datasets from Amazon and TripAdvisor. We show that proper recency aware rating aggregation rules can improve the speed of convergence in Amazon and TripAdvisor by 41% and 62% respectively. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 29 pages

arXiv:2408.09439 [pdf, other]

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Authors: Zeyuan Chen, Haiyan Wu, Kaixin Wu, Wei Chen, Mingjie Zhong, Jia Xu, Zhongyi Liu, Wei Zhang

Abstract: Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even pow… ▽ More Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even powerful Large Language Models (LLMs) still cannot accurately judge the relevance of a query and an item from a semantic perspective. To augment LLMs-driven relevance modeling, this study proposes leveraging user interactions recorded in search logs to yield insights into users' implicit search intentions. The challenge lies in the effective prompting of LLMs to capture dynamic search intentions, which poses several obstacles in real-world relevance scenarios, i.e., the absence of domain-specific knowledge, the inadequacy of an isolated prompt, and the prohibitive costs associated with deploying LLMs. In response, we propose ProRBP, a novel Progressive Retrieved Behavior-augmented Prompting framework for integrating search scenario-oriented knowledge with LLMs effectively. Specifically, we perform the user-driven behavior neighbors retrieval from the daily search logs to obtain domain-specific knowledge in time, retrieving candidates that users consider to meet their expectations. Then, we guide LLMs for relevance modeling by employing advanced prompting techniques that progressively improve the outputs of the LLMs, followed by a progressive aggregation with comprehensive consideration of diverse aspects. For online serving, we have developed an industrial application framework tailored for the deployment of LLMs in relevance modeling. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.08978 [pdf, other]

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

Authors: Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang

Abstract: The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this en… ▽ More The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this end, we propose a Self-Challenge evaluation framework with human-in-the-loop. Starting from seed instances that GPT-4 fails to answer, we prompt GPT-4 to summarize error patterns that can be used to generate new instances and incorporate human feedback on them to refine these patterns for generating more challenging data, iteratively. We end up with 8 diverse patterns, such as text manipulation and questions with assumptions. We then build a benchmark, SC-G4, consisting of 1,835 instances generated by GPT-4 using these patterns, with human-annotated gold responses. The SC-G4 serves as a challenging benchmark that allows for a detailed assessment of LLMs' abilities. Our results show that only 44.96\% of instances in SC-G4 can be answered correctly by GPT-4. Interestingly, our pilot study indicates that these error patterns also challenge other LLMs, such as Claude-3 and Llama-3, and cannot be fully resolved through fine-tuning. Our work takes the first step to demonstrate that LLMs can autonomously identify their inherent flaws and provide insights for future dynamic and automatic evaluation. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: COLM 2024

arXiv:2408.05457 [pdf, other]

Investigating Instruction Tuning Large Language Models on Graphs

Authors: Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chang, Shou-De Lin, Jiawei Han

Abstract: Inspired by the recent advancements of Large Language Models (LLMs) in NLP tasks, there's growing interest in applying LLMs to graph-related tasks. This study delves into the capabilities of instruction-following LLMs for engaging with real-world graphs, aiming to offer empirical insights into how LLMs can effectively interact with graphs and generalize across graph tasks. We begin by constructing… ▽ More Inspired by the recent advancements of Large Language Models (LLMs) in NLP tasks, there's growing interest in applying LLMs to graph-related tasks. This study delves into the capabilities of instruction-following LLMs for engaging with real-world graphs, aiming to offer empirical insights into how LLMs can effectively interact with graphs and generalize across graph tasks. We begin by constructing a dataset designed for instruction tuning, which comprises a diverse collection of 79 graph-related tasks from academic and e-commerce domains, featuring 44,240 training instances and 18,960 test samples. Utilizing this benchmark, our initial investigation focuses on identifying the optimal graph representation that serves as a conduit for LLMs to understand complex graph structures. Our findings indicate that JSON format for graph representation consistently outperforms natural language and code formats across various LLMs and graph types. Furthermore, we examine the key factors that influence the generalization abilities of instruction-tuned LLMs by evaluating their performance on both in-domain and out-of-domain graph tasks. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: COLM 2024

arXiv:2408.02877 [pdf, other]

First Measurement of Solar $^8$B Neutrinos via Coherent Elastic Neutrino-Nucleus Scattering with XENONnT

Authors: E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, D. Antón Martin, F. Arneodo, L. Baudis, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, K. Boese, A. Brown, G. Bruno, R. Budnik, C. Cai, C. Capelli, J. M. R. Cardoso, A. P. Cimental Chávez, A. P. Colijn, J. Conrad, J. J. Cuenca-García , et al. (142 additional authors not shown)

Abstract: We present the first measurement of nuclear recoils from solar $^8$B neutrinos via coherent elastic neutrino-nucleus scattering with the XENONnT dark matter experiment. The central detector of XENONnT is a low-background, two-phase time projection chamber with a 5.9\,t sensitive liquid xenon target. A blind analysis with an exposure of 3.51\,t$\times$y resulted in 37 observed events above 0.5\,keV… ▽ More We present the first measurement of nuclear recoils from solar $^8$B neutrinos via coherent elastic neutrino-nucleus scattering with the XENONnT dark matter experiment. The central detector of XENONnT is a low-background, two-phase time projection chamber with a 5.9\,t sensitive liquid xenon target. A blind analysis with an exposure of 3.51\,t$\times$y resulted in 37 observed events above 0.5\,keV, with ($26.4^{+1.4}_{-1.3}$) events expected from backgrounds. The background-only hypothesis is rejected with a statistical significance of 2.73\,$σ$. The measured $^8$B solar neutrino flux of $(4.7_{-2.3}^{+3.6})\times 10^6\,\mathrm{cm}^{-2}\mathrm{s}^{-1}$ is consistent with results from dedicated solar neutrino experiments. The measured neutrino flux-weighted CE$ν$NS cross-section on Xe of $(1.1^{+0.8}_{-0.5})\times10^{-39}\,\mathrm{cm}^2$ is consistent with the Standard Model prediction. This is the first direct measurement of nuclear recoils from solar neutrinos with a dark matter detector. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2407.12441 [pdf, ps, other]

Dynamics of discrete solitons in the fractional discrete nonlinear Schrödinger equation with the quasi-Riesz derivative

Authors: Ming Zhong, Boris A. Malomed, Zhenya Yan

Abstract: We elaborate a fractional discrete nonlinear Schrödinger (FDNLS) equation based on an appropriately modified definition of the Riesz fractional derivative, which is characterized by its Lévy index (LI). This FDNLS equation represents a novel discrete system, in which the nearest-neighbor coupling is combined with long-range interactions, that decay as the inverse square of the separation between l… ▽ More We elaborate a fractional discrete nonlinear Schrödinger (FDNLS) equation based on an appropriately modified definition of the Riesz fractional derivative, which is characterized by its Lévy index (LI). This FDNLS equation represents a novel discrete system, in which the nearest-neighbor coupling is combined with long-range interactions, that decay as the inverse square of the separation between lattice sites. The system may be realized as an array of parallel quasi-one-dimensional Bose-Einstein condensates composed of atoms or small molecules carrying, respectively, a permanent magnetic or electric dipole moment. The dispersion relation (DR) for lattice waves and the corresponding propagation band in the system's linear spectrum are found in an exact form for all values of LI. The DR is consistent with the continuum limit, differing in the range of wavenumbers. Formation of single-site and two-site discrete solitons is explored, starting from the anti-continuum limit and continuing the analysis in the numerical form up to the existence boundary of the discrete solitons. Stability of the solitons is identified in terms of eigenvalues for small perturbations, and verified in direct simulations. Mobility of the discrete solitons is considered too, by means of an estimate of the system's Peierls-Nabarro potential barrier, and with the help of direct simulations. Collisions between persistently moving discrete solitons are also studied. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 15 pages, 8 figures (to be published in Phys. Rev. E, 2024)

arXiv:2407.02811 [pdf, other]

SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing

Authors: Meiyu Zhong, Ravi Tandon

Abstract: Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose \textit{S… ▽ More Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose \textit{SPLITZ}, a practical and novel approach which leverages the synergistic benefits of both the above ideas into a single framework. Our main idea is to \textit{split} a classifier into two halves, constrain the Lipschitz constant of the first half, and smooth the second half via randomization. Motivation for \textit{SPLITZ} comes from the observation that many standard deep networks exhibit heterogeneity in Lipschitz constants across layers. \textit{SPLITZ} can exploit this heterogeneity while inheriting the scalability of randomized smoothing. We present a principled approach to train \textit{SPLITZ} and provide theoretical analysis to derive certified robustness guarantees during inference. We present a comprehensive comparison of robustness-accuracy tradeoffs and show that \textit{SPLITZ} consistently improves upon existing state-of-the-art approaches on MNIST and CIFAR-10 datasets. For instance, with $\ell_2$ norm perturbation budget of \textbf{$ε=1$}, \textit{SPLITZ} achieves $\textbf{43.2\%}$ top-1 test accuracy on CIFAR-10 dataset compared to state-of-art top-1 test accuracy $\textbf{39.8\%} △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.13638 [pdf, other]

XENONnT WIMP Search: Signal & Background Modeling and Statistical Inference

Authors: XENON Collaboration, E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, D. Antón Martin, F. Arneodo, L. Baudis, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, K. Boese, A. Brown, G. Bruno, R. Budnik, J. M. R. Cardoso, A. P. Cimental Chávez, A. P. Colijn, J. Conrad, J. J. Cuenca-García, V. D'Andrea , et al. (139 additional authors not shown)

Abstract: The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 t… ▽ More The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 tonne-years yielded no signal excess over background expectations, from which competitive exclusion limits were derived on WIMP-nucleon elastic scatter cross sections, for WIMP masses ranging from 6 GeV/$c^2$ up to the TeV/$c^2$ scale. This work details the modeling and statistical methods employed in this search. By means of calibration data, we model the detector response, which is then used to derive background and signal models. The construction and validation of these models is discussed, alongside additional purely data-driven backgrounds. We also describe the statistical inference framework, including the definition of the likelihood function and the construction of confidence intervals. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, 10 figures

arXiv:2406.13282 [pdf, other]

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

Authors: Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang

Abstract: Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, how… ▽ More Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, however, few of them have attempted to showcase their inner workings comprehensively. In this paper, we are driven to offer a straightforward yet in-depth understanding of RoPE extensions from an attention perspective and on two benchmarking tasks. A broad array of experiments reveals several valuable findings: 1) Maintaining attention patterns to those at the pretrained length improves extrapolation; 2) Large attention uncertainty leads to retrieval errors; 3) Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.08394 [pdf, other]

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Authors: Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu, Wenhai Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai

Abstract: We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. Unlike traditional MLLMs limited to text output, VisionLLM v2 significantly broadens its application scope. It excels not only in conventional visual question answering (VQA) but also in open-ended, cross-domain vision tasks such a… ▽ More We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. Unlike traditional MLLMs limited to text output, VisionLLM v2 significantly broadens its application scope. It excels not only in conventional visual question answering (VQA) but also in open-ended, cross-domain vision tasks such as object localization, pose estimation, and image generation and editing. To this end, we propose a new information transmission mechanism termed "super link", as a medium to connect MLLM with task-specific decoders. It not only allows flexible transmission of task information and gradient feedback between the MLLM and multiple downstream decoders but also effectively resolves training conflicts in multi-tasking scenarios. In addition, to support the diverse range of tasks, we carefully collected and combed training data from hundreds of public vision and vision-language tasks. In this way, our model can be joint-trained end-to-end on hundreds of vision language tasks and generalize to these tasks using a set of shared parameters through different user prompts, achieving performance comparable to task-specific models. We believe VisionLLM v2 will offer a new perspective on the generalization of MLLMs. △ Less

Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 43 pages

arXiv:2406.08335 [pdf, other]

A Survey of Pipeline Tools for Data Engineering

Authors: Anthony Mbata, Yaji Sripada, Mingjun Zhong

Abstract: Currently, a variety of pipeline tools are available for use in data engineering. Data scientists can use these tools to resolve data wrangling issues associated with data and accomplish some data engineering tasks from data ingestion through data preparation to utilization as input for machine learning (ML). Some of these tools have essential built-in components or can be combined with other tool… ▽ More Currently, a variety of pipeline tools are available for use in data engineering. Data scientists can use these tools to resolve data wrangling issues associated with data and accomplish some data engineering tasks from data ingestion through data preparation to utilization as input for machine learning (ML). Some of these tools have essential built-in components or can be combined with other tools to perform desired data engineering operations. While some tools are wholly or partly commercial, several open-source tools are available to perform expert-level data engineering tasks. This survey examines the broad categories and examples of pipeline tools based on their design and data engineering intentions. These categories are Extract Transform Load/Extract Load Transform (ETL/ELT), pipelines for Data Integration, Ingestion, and Transformation, Data Pipeline Orchestration and Workflow Management, and Machine Learning Pipelines. The survey also provides a broad outline of the utilization with examples within these broad groups and finally, a discussion is presented with case studies indicating the usage of pipeline tools for data engineering. The studies present some first-user application experiences with sample data, some complexities of the applied pipeline, and a summary note of approaches to using these tools to prepare data for machine learning. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2406.07239 [pdf, other]

On the Hallucination in Simultaneous Machine Translation

Authors: Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min Zhang

Abstract: It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the dis… ▽ More It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the distribution of hallucination words and the target-side context usage of them. Intensive experiments demonstrate some valuable findings and particularly show that it is possible to alleviate hallucination by decreasing the over usage of target-side information for SiMT. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.14386 [pdf, other]

Capsule Network Projectors are Equivariant and Invariant Learners

Authors: Miles Everett, Aiden Durrant, Mingjun Zhong, Georgios Leontidis

Abstract: Learning invariant representations has been the longstanding approach to self-supervised learning. However, recently progress has been made in preserving equivariant properties in representations, yet do so with highly prescribed architectures. In this work, we propose an invariant-equivariant self-supervised architecture that employs Capsule Networks (CapsNets) which have been shown to capture eq… ▽ More Learning invariant representations has been the longstanding approach to self-supervised learning. However, recently progress has been made in preserving equivariant properties in representations, yet do so with highly prescribed architectures. In this work, we propose an invariant-equivariant self-supervised architecture that employs Capsule Networks (CapsNets) which have been shown to capture equivariance with respect to novel viewpoints. We demonstrate that the use of CapsNets in equivariant self-supervised architectures achieves improved downstream performance on equivariant tasks with higher efficiency and fewer network parameters. To accommodate the architectural changes of CapsNets, we introduce a new objective function based on entropy minimisation. This approach which we name CapsIE (Capsule Invariant Equivariant Network) achieves state-of-the-art performance across invariant and equivariant tasks on the 3DIEBench dataset compared to prior equivariant SSL methods, while outperforming supervised baselines. Our results demonstrate the ability of CapsNets to learn complex and generalised representations for large-scale, multi-task datasets compared to previous CapsNet benchmarks. Code is available at https://github.com/AberdeenML/CapsIE. △ Less

Submitted 6 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 17 pages, 7 figures, 10 Tables; code to be released at: https://github.com/AberdeenML/CapsIE V2: corrected typos, added a new Table 3 and additional results in Table 1 and Table 2

arXiv:2405.07393 [pdf, other]

Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds

Authors: Meiyu Zhong, Ravi Tandon

Abstract: With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for… ▽ More With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for any classifier), as a function of the fairness budget. In addition, our bounds also exhibit dependence on the underlying statistics of the data, labels and the sensitive group attributes. We validate our theoretical upper bounds through empirical analysis on three real-world datasets: COMPAS, Adult, and Law School. Specifically, we compare our upper bound to the tradeoffs that are achieved by various existing fair classifiers in the literature. Our results show that achieving high accuracy subject to a low-bias could be fundamentally limited based on the statistical disparity across the groups. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.06280 [pdf, ps, other]

Green's Function and Pointwise Space-time Behaviors of the Three-Dimensional Relativistic Boltzmann Equation

Authors: Yanchao Li, Mingying Zhong

Abstract: The pointwise space-time behavior of the Green's function of the three-dimensional relativistic Boltzmann equation is studied in this paper. It is shown that the Green's function has a decomposition of the macroscopic diffusive waves and Huygens waves with the speed $\sqrt{a^2+b^2}$ at low-frequency, the singular kinetic wave and the remainder term decaying exponentially in space and time. In addi… ▽ More The pointwise space-time behavior of the Green's function of the three-dimensional relativistic Boltzmann equation is studied in this paper. It is shown that the Green's function has a decomposition of the macroscopic diffusive waves and Huygens waves with the speed $\sqrt{a^2+b^2}$ at low-frequency, the singular kinetic wave and the remainder term decaying exponentially in space and time. In addition, we establish the pointwise space-time estimate of the global solution to the nonlinear relativistic Boltzmann equation with non-smooth initial data based on the Green's function. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2303.13021

MSC Class: 76P05; 82C40; 82D05

arXiv:2404.18389 [pdf, ps, other]

Diffusion Limit with Optimal Convergence Rate of Classical Solutions to the Vlasov-Maxwell-Boltzmann System

Authors: Tong Yang, Mingying Zhong

Abstract: We study the diffusion limit of the strong solution to the Vlasov-Maxwell-Boltzmann (VMB) system with initial data near a global Maxwellian. By introducing a new decomposition of the solution to identify the essential components for generating the initial layer, we prove the convergence and establish the opitmal convergence rate of the classical solution to the VMB system to the solution of the Na… ▽ More We study the diffusion limit of the strong solution to the Vlasov-Maxwell-Boltzmann (VMB) system with initial data near a global Maxwellian. By introducing a new decomposition of the solution to identify the essential components for generating the initial layer, we prove the convergence and establish the opitmal convergence rate of the classical solution to the VMB system to the solution of the Navier-Stokes-Maxwell system based on the spectral analysis. △ Less

Submitted 9 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

MSC Class: 76P05; 82C40; 82D05

arXiv:2404.16189 [pdf, other]

Structure Preserving PINN for Solving Time Dependent PDEs with Periodic Boundary

Authors: Baoli Hao, Ulisses Braga-Neto, Chun Liu, Lifan Wang, Ming Zhong

Abstract: We present a structure preserving PINN for solving a series of time dependent PDEs with periodic boundary. Our method can incorporate the periodic boundary condition as the natural output of any deep neural net, hence significantly improving the training accuracy of baseline PINN. Together with mini-batching and other PINN variants (SA-PINN, RBA-PINN, etc.), our structure preserving PINN can even… ▽ More We present a structure preserving PINN for solving a series of time dependent PDEs with periodic boundary. Our method can incorporate the periodic boundary condition as the natural output of any deep neural net, hence significantly improving the training accuracy of baseline PINN. Together with mini-batching and other PINN variants (SA-PINN, RBA-PINN, etc.), our structure preserving PINN can even handle stiff PDEs for modeling a wide range of convection-diffusion and reaction-diffusion processes. We demonstrate the effectiveness of our PINNs on various PDEs from Allen Cahn, Gray Scott to nonlinear Schrodinger. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.07458 [pdf, other]

I-mode Plasma Confinement Improvement by Real-time Lithium Injection and its Classification on EAST Tokamak

Authors: X. M. Zhong, X. L. Zou, A. D. Liu, Y. T. Song, G. Zhuang, H. Q. Liu, L. Q. Xu, E. Z. Li, B. Zhang, G. Z. Zuo, Z. Wang, C. Zhou, J. Zhang, W. X. Shi, L. T. Gao, S. F. Wang, W. Gao, T. Q. Jia, Q. Zang, H. L. Zhao, M. Wang, H. D. Xu, X. J. Wang, X. Gao, X. D. Lin , et al. (3 additional authors not shown)

Abstract: I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found… ▽ More I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found that the confinement performance of the I-mode can be improved by the lithium powder injection, which can strongly reduce electron turbulence (ET) and then trigger ion turbulence (IT). Four different regimes of I-mode have been identified in EAST. The Type I I-mode plasma is characterized by the weakly coherent mode (WCM) and the geodesic-acoustic mode (GAM). The Type II I-mode is featured as the WCM and the edge temperature ring oscillation (ETRO). The Type III I-mode corresponds to the plasma with the co-existence of ETRO, GAM, and WCM. The Type IV I-mode denotes the plasma with only WCM but without ETRO and GAM. It has been observed that WCM and ETRO are increased with lithium powder injection due to the reduction of ion and electron turbulence, and the enhancement of the pedestal electron temperature gradient. EAST experiments demonstrate that lithium powder injection is an effective tool for real-time control and confinement improvement of I-mode plasma. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05817 [pdf, other]

Label Propagation Training Schemes for Physics-Informed Neural Networks and Gaussian Processes

Authors: Ming Zhong, Dehao Liu, Raymundo Arroyave, Ulisses Braga-Neto

Abstract: This paper proposes a semi-supervised methodology for training physics-informed machine learning methods. This includes self-training of physics-informed neural networks and physics-informed Gaussian processes in isolation, and the integration of the two via co-training. We demonstrate via extensive numerical experiments how these methods can ameliorate the issue of propagating information forward… ▽ More This paper proposes a semi-supervised methodology for training physics-informed machine learning methods. This includes self-training of physics-informed neural networks and physics-informed Gaussian processes in isolation, and the integration of the two via co-training. We demonstrate via extensive numerical experiments how these methods can ameliorate the issue of propagating information forward in time, which is a common failure mode of physics-informed machine learning. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.14878 [pdf, other]

Offline tagging of radon-induced backgrounds in XENON1T and applicability to other liquid xenon detectors

Authors: E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, G. Bruno, R. Budnik, T. K. Bui, J. M. R. Cardoso, A. P. Cimental Chavez, A. P. Colijn, J. Conrad , et al. (142 additional authors not shown)

Abstract: This paper details the first application of a software tagging algorithm to reduce radon-induced backgrounds in liquid noble element time projection chambers, such as XENON1T and XENONnT. The convection velocity field in XENON1T was mapped out using $^{222}\text{Rn}$ and $^{218}\text{Po}$ events, and the root-mean-square convection speed was measured to be $0.30 \pm 0.01$ cm/s. Given this velocity… ▽ More This paper details the first application of a software tagging algorithm to reduce radon-induced backgrounds in liquid noble element time projection chambers, such as XENON1T and XENONnT. The convection velocity field in XENON1T was mapped out using $^{222}\text{Rn}$ and $^{218}\text{Po}$ events, and the root-mean-square convection speed was measured to be $0.30 \pm 0.01$ cm/s. Given this velocity field, $^{214}\text{Pb}$ background events can be tagged when they are followed by $^{214}\text{Bi}$ and $^{214}\text{Po}$ decays, or preceded by $^{218}\text{Po}$ decays. This was achieved by evolving a point cloud in the direction of a measured convection velocity field, and searching for $^{214}\text{Bi}$ and $^{214}\text{Po}$ decays or $^{218}\text{Po}$ decays within a volume defined by the point cloud. In XENON1T, this tagging system achieved a $^{214}\text{Pb}$ background reduction of $6.2^{+0.4}_{-0.9}\%$ with an exposure loss of $1.8\pm 0.2 \%$, despite the timescales of convection being smaller than the relevant decay times. We show that the performance can be improved in XENONnT, and that the performance of such a software-tagging approach can be expected to be further improved in a diffusion-limited scenario. Finally, a similar method might be useful to tag the cosmogenic $^{137}\text{Xe}$ background, which is relevant to the search for neutrinoless double-beta decay. △ Less

Submitted 19 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 17 pages, 19 figures

arXiv:2403.06813 [pdf, other]

LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations

Authors: Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong

Abstract: Contrastive instance discrimination approaches outperform supervised learning in downstream tasks like image classification and object detection. However, these approaches heavily rely on data augmentation during representation learning, which may result in inferior results if not properly implemented. Random cropping followed by resizing is a common form of data augmentation used in contrastive l… ▽ More Contrastive instance discrimination approaches outperform supervised learning in downstream tasks like image classification and object detection. However, these approaches heavily rely on data augmentation during representation learning, which may result in inferior results if not properly implemented. Random cropping followed by resizing is a common form of data augmentation used in contrastive learning, but it can lead to degraded representation learning if the two random crops contain distinct semantic content. To address this issue, this paper introduces LeOCLR (Leveraging Original Images for Contrastive Learning of Visual Representations), a framework that employs a new instance discrimination approach and an adapted loss function to alleviate discarding semantic features caused by mapping different object parts during representation learning. The experimental results show that our approach consistently improves representation learning across different datasets compared to baseline models. For example, our approach outperforms MoCo-v2 by 5.1% on ImageNet-1K in linear evaluation and several other methods on transfer learning tasks. △ Less

Submitted 18 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 14 pages, 5 figures, 7 tables

arXiv:2403.04724 [pdf, other]

Masked Capsule Autoencoders

Authors: Miles Everett, Mingjun Zhong, Georgios Leontidis

Abstract: We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a self-supervised manner. Capsule Networks have emerged as a powerful alternative to Convolutional Neural Networks (CNNs), and have shown favourable properties when compared to Vision Transformers (ViT), but have struggled to effectively learn when presented with more complex data, leading to Caps… ▽ More We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a self-supervised manner. Capsule Networks have emerged as a powerful alternative to Convolutional Neural Networks (CNNs), and have shown favourable properties when compared to Vision Transformers (ViT), but have struggled to effectively learn when presented with more complex data, leading to Capsule Network models that do not scale to modern tasks. Our proposed MCAE model alleviates this issue by reformulating the Capsule Network to use masked image modelling as a pretraining stage before finetuning in a supervised manner. Across several experiments and ablations studies we demonstrate that similarly to CNNs and ViTs, Capsule Networks can also benefit from self-supervised pretraining, paving the way for further advancements in this neural network domain. For instance, pretraining on the Imagenette dataset, a dataset of 10 classes of Imagenet-sized images, we achieve not only state-of-the-art results for Capsule Networks but also a 9% improvement compared to purely supervised training. Thus we propose that Capsule Networks benefit from and should be trained within a masked image modelling framework, with a novel capsule decoder, to improve a Capsule Network's performance on realistic-sized images. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 14 pages, 6 figures, 4 tables

arXiv:2403.02595 [pdf, other]

Learning Stochastic Dynamics from Data

Authors: Ziheng Guo, Igor Cialenco, Ming Zhong

Abstract: We present a noise guided trajectory based system identification method for inferring the dynamical structure from observation generated by stochastic differential equations. Our method can handle various kinds of noise, including the case when the the components of the noise is correlated. Our method can also learn both the noise level and drift term together from trajectory. We present various n… ▽ More We present a noise guided trajectory based system identification method for inferring the dynamical structure from observation generated by stochastic differential equations. Our method can handle various kinds of noise, including the case when the the components of the noise is correlated. Our method can also learn both the noise level and drift term together from trajectory. We present various numerical tests for showcasing the superior performance of our learning algorithm. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.16843 [pdf, other]

Multi-LoRA Composition for Image Generation

Authors: Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

Abstract: Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we stu… ▽ More Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Project Website: https://maszhongming.github.io/Multi-LoRA-Composition/

arXiv:2402.11499 [pdf, other]

Acousto-electric tomography by the convergence of Kaczamrz two-point gradient-$Θ$ method

Authors: Kai Zhu, Jijun Liu, Min Zhong

Abstract: We study the numerical reconstruction problem in acousto-electric tomography (AET) of recovering the conductivity distribution in a bounded domain from multiple interior power density data. The Two-Point-Gradient-$Θ$ (TPG-$Θ$) in Kaczmarz type is proposed, with a general convex penalty term $Θ$, the algorithm can be utilized in AET problem for recovering sparse and discontinuous conductivity distr… ▽ More We study the numerical reconstruction problem in acousto-electric tomography (AET) of recovering the conductivity distribution in a bounded domain from multiple interior power density data. The Two-Point-Gradient-$Θ$ (TPG-$Θ$) in Kaczmarz type is proposed, with a general convex penalty term $Θ$, the algorithm can be utilized in AET problem for recovering sparse and discontinuous conductivity distributions. We establish the convergence of such iterative regularized method. Extensive numerical experiments are presented to illustrate the feasibility and effectiveness of the proposed approach. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.10446 [pdf, other]

The XENONnT Dark Matter Experiment

Authors: XENON Collaboration, E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, M. Balata, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui , et al. (170 additional authors not shown)

Abstract: The multi-staged XENON program at INFN Laboratori Nazionali del Gran Sasso aims to detect dark matter with two-phase liquid xenon time projection chambers of increasing size and sensitivity. The XENONnT experiment is the latest detector in the program, planned to be an upgrade of its predecessor XENON1T. It features an active target of 5.9 tonnes of cryogenic liquid xenon (8.5 tonnes total mass in… ▽ More The multi-staged XENON program at INFN Laboratori Nazionali del Gran Sasso aims to detect dark matter with two-phase liquid xenon time projection chambers of increasing size and sensitivity. The XENONnT experiment is the latest detector in the program, planned to be an upgrade of its predecessor XENON1T. It features an active target of 5.9 tonnes of cryogenic liquid xenon (8.5 tonnes total mass in cryostat). The experiment is expected to extend the sensitivity to WIMP dark matter by more than an order of magnitude compared to XENON1T, thanks to the larger active mass and the significantly reduced background, improved by novel systems such as a radon removal plant and a neutron veto. This article describes the XENONnT experiment and its sub-systems in detail and reports on the detector performance during the first science run. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 32 pages, 19 figures

arXiv:2401.06059 [pdf, other]

Investigating Data Contamination for Pre-training Language Models

Authors: Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo

Abstract: Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding… ▽ More Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding of how this potential contamination might influence LMs' performance on downstream tasks. In this paper, we explore the impact of data contamination at the pre-training stage by pre-training a series of GPT-2 models \textit{from scratch}. We highlight the effect of both text contamination (\textit{i.e.}\ input text of the evaluation samples) and ground-truth contamination (\textit{i.e.}\ the prompts asked on the input and the desired outputs) from evaluation data. We also investigate the effects of repeating contamination for various downstream tasks. Additionally, we examine the prevailing n-gram-based definitions of contamination within current LLM reports, pinpointing their limitations and inadequacy. Our findings offer new insights into data contamination's effects on language model capabilities and underscore the need for independent, comprehensive contamination assessments in LLM studies. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 16 pages, 5 figures

arXiv:2401.01031 [pdf, other]

Quantum phase transitions in the alternating XY chain with three-site interactions

Authors: Kaiyuan Cao, Hao Fu, Xue Liu, Ming Zhong, Peiqing Tong

Abstract: We investigate the quantum phase transition in the alternating XY chain with the XZX+YZY type of three-spin interactions. We present the exact solution derived by means of the Jordan-Wigner transformation and study the average magnetization, spin correlations, and von Neumann entropy to establish the phase diagram. The phase diagram consists of the ferromagnetic phases, the paramagnetic phases, an… ▽ More We investigate the quantum phase transition in the alternating XY chain with the XZX+YZY type of three-spin interactions. We present the exact solution derived by means of the Jordan-Wigner transformation and study the average magnetization, spin correlations, and von Neumann entropy to establish the phase diagram. The phase diagram consists of the ferromagnetic phases, the paramagnetic phases, and the phase with weak magnetization (WM). By examining the nearest-neighbor transverse spin correlation, we probe that in the WM phase, the spins within a supercell generate a cluster with a small total spin, but between the nearest-neighbor supercells are distributed randomly. Especially for the dimerized limit case, the spins within a supercell tend to point to opposite directions of the transverse field. In addition, we also investigate the influence of the three-site interaction, and find that the WM phase is absent as the strength of the three-site interaction increases. Our findings shed light on the complex behavior of the alternating XY chain and provide valuable insights for future studies. △ Less

Submitted 4 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 10 pages,12 figures

arXiv:2312.14238 [pdf, other]

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Authors: Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

Abstract: The exponential growth of large language models (LLMs) has opened up numerous possibilities for multimodal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model… ▽ More The exponential growth of large language models (LLMs) has opened up numerous possibilities for multimodal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model to 6 billion parameters and progressively aligns it with the LLM, using web-scale image-text data from various sources. This model can be broadly applied to and achieve state-of-the-art performance on 32 generic visual-linguistic benchmarks including visual perception tasks such as image-level or pixel-level recognition, vision-language tasks such as zero-shot image/video classification, zero-shot image/video-text retrieval, and link with LLMs to create multi-modal dialogue systems. It has powerful visual capabilities and can be a good alternative to the ViT-22B. We hope that our research could contribute to the development of multi-modal large models. Code and models are available at https://github.com/OpenGVLab/InternVL. △ Less

Submitted 15 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 25 pages, 5 figures, 28 tables

arXiv:2312.01150 [pdf, other]

Pointer Networks Trained Better via Evolutionary Algorithms

Authors: Muyao Zhong, Shengcai Liu, Bingdong Li, Haobo Fu, Ke Tang, Peng Yang

Abstract: Pointer Network (PtrNet) is a specific neural network for solving Combinatorial Optimization Problems (COPs). While PtrNets offer real-time feed-forward inference for complex COPs instances, its quality of the results tends to be less satisfactory. One possible reason is that such issue suffers from the lack of global search ability of the gradient descent, which is frequently employed in traditio… ▽ More Pointer Network (PtrNet) is a specific neural network for solving Combinatorial Optimization Problems (COPs). While PtrNets offer real-time feed-forward inference for complex COPs instances, its quality of the results tends to be less satisfactory. One possible reason is that such issue suffers from the lack of global search ability of the gradient descent, which is frequently employed in traditional PtrNet training methods including both supervised learning and reinforcement learning. To improve the performance of PtrNet, this paper delves deeply into the advantages of training PtrNet with Evolutionary Algorithms (EAs), which have been widely acknowledged for not easily getting trapped by local optima. Extensive empirical studies based on the Travelling Salesman Problem (TSP) have been conducted. Results demonstrate that PtrNet trained with EA can consistently perform much better inference results than eight state-of-the-art methods on various problem scales. Compared with gradient descent based PtrNet training methods, EA achieves up to 30.21\% improvement in quality of the solution with the same computational time. With this advantage, this paper is able to at the first time report the results of solving 1000-dimensional TSPs by training a PtrNet on the same dimensionality, which strongly suggests that scaling up the training instances is in need to improve the performance of PtrNet on solving higher-dimensional COPs. △ Less

Submitted 11 March, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: None

MSC Class: 68T07

arXiv:2311.12947 [pdf, other]

PINNs-Based Uncertainty Quantification for Transient Stability Analysis

Authors: Ren Wang, Ming Zhong, Kaidi Xu, Lola Giráldez Sánchez-Cortés, Ignacio de Cominges Guerra

Abstract: This paper addresses the challenge of transient stability in power systems with missing parameters and uncertainty propagation in swing equations. We introduce a novel application of Physics-Informed Neural Networks (PINNs), specifically an Ensemble of PINNs (E-PINNs), to estimate critical parameters like rotor angle and inertia coefficient with enhanced accuracy and reduced computational load. E-… ▽ More This paper addresses the challenge of transient stability in power systems with missing parameters and uncertainty propagation in swing equations. We introduce a novel application of Physics-Informed Neural Networks (PINNs), specifically an Ensemble of PINNs (E-PINNs), to estimate critical parameters like rotor angle and inertia coefficient with enhanced accuracy and reduced computational load. E-PINNs capitalize on the underlying physical principles of swing equations to provide a robust solution. Our approach not only facilitates efficient parameter estimation but also quantifies uncertainties, delivering probabilistic insights into the system behavior. The efficacy of E-PINNs is demonstrated through the analysis of $1$-bus and $2$-bus systems, highlighting the model's ability to handle parameter variability and data scarcity. The study advances the application of machine learning in power system stability, paving the way for reliable and computationally efficient transient stability analysis. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.07066 [pdf, other]

Context Consistency between Training and Testing in Simultaneous Machine Translation

Authors: Meizhi Zhong, Lemao Liu, Kehai Chen, Mingming Yang, Min Zhang

Abstract: Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context. However, there is a counterintuitive phenomenon about the context usage between training and testing: e.g., the wait-k testing model consistently trained with wait-k is much worse than that model inconsistently trained with wait-k' (k' is not equal to k) in te… ▽ More Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context. However, there is a counterintuitive phenomenon about the context usage between training and testing: e.g., the wait-k testing model consistently trained with wait-k is much worse than that model inconsistently trained with wait-k' (k' is not equal to k) in terms of translation quality. To this end, we first investigate the underlying reasons behind this phenomenon and uncover the following two factors: 1) the limited correlation between translation quality and training (cross-entropy) loss; 2) exposure bias between training and testing. Based on both reasons, we then propose an effective training approach called context consistency training accordingly, which makes consistent the context usage between training and testing by optimizing translation quality and latency as bi-objectives and exposing the predictions to the model during the training. The experiments on three language pairs demonstrate our intuition: our system encouraging context consistency outperforms that existing systems with context inconsistency for the first time, with the help of our context consistency training approach. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.00875 [pdf, other]

Learning Collective Behaviors from Observation

Authors: Jinchao Feng, Ming Zhong

Abstract: We present a comprehensive examination of learning methodologies employed for the structural identification of dynamical systems. These techniques are designed to elucidate emergent phenomena within intricate systems of interacting agents. Our approach not only ensures theoretical convergence guarantees but also exhibits computational efficiency when handling high-dimensional observational data. T… ▽ More We present a comprehensive examination of learning methodologies employed for the structural identification of dynamical systems. These techniques are designed to elucidate emergent phenomena within intricate systems of interacting agents. Our approach not only ensures theoretical convergence guarantees but also exhibits computational efficiency when handling high-dimensional observational data. The methods adeptly reconstruct both first- and second-order dynamical systems, accommodating observation and stochastic noise, intricate interaction rules, absent interaction features, and real-world observations in agent systems. The foundational aspect of our learning methodologies resides in the formulation of tailored loss functions using the variational inverse problem approach, inherently equipping our methods with dimension reduction capabilities. △ Less

Submitted 4 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.16040 [pdf, other]

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

Authors: Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han

Abstract: Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, t… ▽ More Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.12418 [pdf, other]

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

Authors: Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han

Abstract: Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine requirements of human users. This paper provides a comprehensive analysis of the divergence between current NLP research and the needs of real-world NLP applicati… ▽ More Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine requirements of human users. This paper provides a comprehensive analysis of the divergence between current NLP research and the needs of real-world NLP applications via a large-scale collection of user-GPT conversations. We analyze a large-scale collection of real user queries to GPT. We compare these queries against existing NLP benchmark tasks and identify a significant gap between the tasks that users frequently request from LLMs and the tasks that are commonly studied in academic research. For example, we find that tasks such as ``design'' and ``planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks. We investigate these overlooked tasks, dissect the practical challenges they pose, and provide insights toward a roadmap to make LLMs better aligned with user needs. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.11451 [pdf, other]

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

Abstract: Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying… ▽ More Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer. △ Less

Submitted 8 May, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2310.06314 [pdf, ps, other]

Power-partible Reduction and Congruences for Schröder Polynomials

Authors: Chen-Bo Jia, Rong-Hua Wang, Michael X. X. Zhong

Abstract: In this note, we apply the power-partible reduction to show the following arithmetic properties of large Schröder polynomials $S_n(z)$ and little Schröder polynomials $s_n(z)$: for any odd prime $p$, nonnegative integer $r\in\mathbb{N}$, $\varepsilon\in\{-1,1\}$ and $z\in\mathbb{Z}$ with $\gcd(p,z(z+1))=1$, we have \[ \sum_{k=0}^{p-1}(2k+1)^{2r+1}\varepsilon^k S_k(z)\equiv 1\pmod {p}\quad \text{an… ▽ More In this note, we apply the power-partible reduction to show the following arithmetic properties of large Schröder polynomials $S_n(z)$ and little Schröder polynomials $s_n(z)$: for any odd prime $p$, nonnegative integer $r\in\mathbb{N}$, $\varepsilon\in\{-1,1\}$ and $z\in\mathbb{Z}$ with $\gcd(p,z(z+1))=1$, we have \[ \sum_{k=0}^{p-1}(2k+1)^{2r+1}\varepsilon^k S_k(z)\equiv 1\pmod {p}\quad \text{and} \quad \sum_{k=0}^{p-1}(2k+1)^{2r+1}\varepsilon^k s_k(z)\equiv 0\pmod {p}. \] △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 11

MSC Class: 05A19

arXiv:2309.11996 [pdf, other]

doi 10.1140/epjc/s10052-023-12296-y

Design and performance of the field cage for the XENONnT experiment

Authors: E. Aprile, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso, D. Cichon , et al. (139 additional authors not shown)

Abstract: The precision in reconstructing events detected in a dual-phase time projection chamber depends on an homogeneous and well understood electric field within the liquid target. In the XENONnT TPC the field homogeneity is achieved through a double-array field cage, consisting of two nested arrays of field shaping rings connected by an easily accessible resistor chain. Rather than being connected to t… ▽ More The precision in reconstructing events detected in a dual-phase time projection chamber depends on an homogeneous and well understood electric field within the liquid target. In the XENONnT TPC the field homogeneity is achieved through a double-array field cage, consisting of two nested arrays of field shaping rings connected by an easily accessible resistor chain. Rather than being connected to the gate electrode, the topmost field shaping ring is independently biased, adding a degree of freedom to tune the electric field during operation. Two-dimensional finite element simulations were used to optimize the field cage, as well as its operation. Simulation results were compared to ${}^{83m}\mathrm{Kr}$ calibration data. This comparison indicates an accumulation of charge on the panels of the TPC which is constant over time, as no evolution of the reconstructed position distribution of events is observed. The simulated electric field was then used to correct the charge signal for the field dependence of the charge yield. This correction resolves the inconsistent measurement of the drift electron lifetime when using different calibrations sources and different field cage tuning voltages. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Journal ref: Eur. Phys. J. C 84, 138 (2024)

arXiv:2309.02342 [pdf, ps, other]

doi 10.1103/PhysRevE.108.064214

Attractive and repulsive interactions in the one-dimensional swarmalator model

Authors: Baoli Hao, Ming Zhong, Kevin O'Keeffe

Abstract: We study a population of swarmalators, mobile variants of phase oscillators, which run on a ring and have both attractive and repulsive interactions. This one-dimensional (1D) swarmalator model produces several of collective states: the standard sync and async states as well as a splaylike "polarized" state and several unsteady states such as active bands or swirling. The model's simplicity allows… ▽ More We study a population of swarmalators, mobile variants of phase oscillators, which run on a ring and have both attractive and repulsive interactions. This one-dimensional (1D) swarmalator model produces several of collective states: the standard sync and async states as well as a splaylike "polarized" state and several unsteady states such as active bands or swirling. The model's simplicity allows us to describe some of the states analytically. The model can be considered as a toy model for real-world swarmalators such as vinegar eels and sperm which swarm in quasi-1D geometries. △ Less

Submitted 4 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2309.00361 [pdf, ps, other]

A Unified and Scalable Algorithm Framework of User-Defined Temporal $(k,\mathcal{X})$-Core Query

Authors: Ming Zhong, Junyong Yang, Yuanyuan Zhu, Tieyun Qian, Mengchi Liu, Jeffrey Xu Yu

Abstract: Querying cohesive subgraphs on temporal graphs (e.g., social network, finance network, etc.) with various conditions has attracted intensive research interests recently. In this paper, we study a novel Temporal $(k,\mathcal{X})$-Core Query (TXCQ) that extends a fundamental Temporal $k$-Core Query (TCQ) proposed in our conference paper by optimizing or constraining an arbitrary metric… ▽ More Querying cohesive subgraphs on temporal graphs (e.g., social network, finance network, etc.) with various conditions has attracted intensive research interests recently. In this paper, we study a novel Temporal $(k,\mathcal{X})$-Core Query (TXCQ) that extends a fundamental Temporal $k$-Core Query (TCQ) proposed in our conference paper by optimizing or constraining an arbitrary metric $\mathcal{X}$ of $k$-core, such as size, engagement, interaction frequency, time span, burstiness, periodicity, etc. Our objective is to address specific TXCQ instances with conditions on different $\mathcal{X}$ in a unified algorithm framework that guarantees scalability. For that, this journal paper proposes a taxonomy of measurement $\mathcal{X}(\cdot)$ and achieve our objective using a two-phase framework while $\mathcal{X}(\cdot)$ is time-insensitive or time-monotonic. Specifically, Phase 1 still leverages the query processing algorithm of TCQ to induce all distinct $k$-cores during a given time range, and meanwhile locates the ``time zones'' in which the cores emerge. Then, Phase 2 conducts fast local search and $\mathcal{X}$ evaluation in each time zone with respect to the time insensitivity or monotonicity of $\mathcal{X}(\cdot)$. By revealing two insightful concepts named tightest time interval and loosest time interval that bound time zones, the redundant core induction and unnecessary $\mathcal{X}$ evaluation in a zone can be reduced dramatically. Our experimental results demonstrate that TXCQ can be addressed as efficiently as TCQ, which achieves the latest state-of-the-art performance, by using a general algorithm framework that leaves $\mathcal{X}(\cdot)$ as a user-defined function. △ Less

Submitted 21 December, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2301.03770

arXiv:2307.11088 [pdf, other]

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

Authors: Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

Abstract: Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories. While proprietary models such as GPT-4 and Claude can largely preserve the reasoning ability in an extended context, open-source models are still progressing through the early stages of devel… ▽ More Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories. While proprietary models such as GPT-4 and Claude can largely preserve the reasoning ability in an extended context, open-source models are still progressing through the early stages of development. To bridge this gap, we propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs) addressing two key aspects: dataset construction and evaluation metrics. On the one hand, we build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs encompassing diverse question styles, domains, and input length (3k$\sim$200k tokens). On the other hand, we investigate the effectiveness in evalution metrics for LCLMs. Results show that popular n-gram matching metrics generally can not correlate well with human judgment, and thus we strongly advocate for length-instruction-enhanced (LIE) evaluation and employing LLM judges. We conducted a comprehensive study of 4 popular commercial LLMs and 12 open-source counterparts using the L-Eval benchmark. Our empirical findings offer useful insights into the study of LCLMs and lay the groundwork for the development of more principled evaluation of these models. △ Less

Submitted 4 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09944 [pdf, other]

ProtoCaps: A Fast and Non-Iterative Capsule Network Routing Method

Authors: Miles Everett, Mingjun Zhong, Georgios Leontidis

Abstract: Capsule Networks have emerged as a powerful class of deep learning architectures, known for robust performance with relatively few parameters compared to Convolutional Neural Networks (CNNs). However, their inherent efficiency is often overshadowed by their slow, iterative routing mechanisms which establish connections between Capsule layers, posing computational challenges resulting in an inabili… ▽ More Capsule Networks have emerged as a powerful class of deep learning architectures, known for robust performance with relatively few parameters compared to Convolutional Neural Networks (CNNs). However, their inherent efficiency is often overshadowed by their slow, iterative routing mechanisms which establish connections between Capsule layers, posing computational challenges resulting in an inability to scale. In this paper, we introduce a novel, non-iterative routing mechanism, inspired by trainable prototype clustering. This innovative approach aims to mitigate computational complexity, while retaining, if not enhancing, performance efficacy. Furthermore, we harness a shared Capsule subspace, negating the need to project each lower-level Capsule to each higher-level Capsule, thereby significantly reducing memory requisites during training. Our approach demonstrates superior results compared to the current best non-iterative Capsule Network and tests on the Imagewoof dataset, which is too computationally demanding to handle efficiently by iterative approaches. Our findings underscore the potential of our proposed methodology in enhancing the operational efficiency and performance of Capsule Networks, paving the way for their application in increasingly complex computational scenarios. Code is available at https://github.com/mileseverett/ProtoCaps. △ Less

Submitted 8 March, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: 13 pages, 5 figures, 5 tables

Journal ref: TMLR December 2023 (https://openreview.net/pdf?id=Id10mlBjcx)

arXiv:2307.09696 [pdf, other]

Towards Saner Deep Image Registration

Authors: Bin Duan, Ming Zhong, Yan Yan

Abstract: With recent advances in computing hardware and surges of deep-learning architectures, learning-based deep image registration methods have surpassed their traditional counterparts, in terms of metric performance and inference time. However, these methods focus on improving performance measurements such as Dice, resulting in less attention given to model behaviors that are equally desirable for regi… ▽ More With recent advances in computing hardware and surges of deep-learning architectures, learning-based deep image registration methods have surpassed their traditional counterparts, in terms of metric performance and inference time. However, these methods focus on improving performance measurements such as Dice, resulting in less attention given to model behaviors that are equally desirable for registrations, especially for medical imaging. This paper investigates these behaviors for popular learning-based deep registrations under a sanity-checking microscope. We find that most existing registrations suffer from low inverse consistency and nondiscrimination of identical pairs due to overly optimized image similarities. To rectify these behaviors, we propose a novel regularization-based sanity-enforcer method that imposes two sanity checks on the deep model to reduce its inverse consistency errors and increase its discriminative power simultaneously. Moreover, we derive a set of theoretical guarantees for our sanity-checked image registration method, with experimental results supporting our theoretical findings and their effectiveness in increasing the sanity of models without sacrificing any performance. Our code and models are available at https://github.com/tuffr5/Saner-deep-registration. △ Less

Submitted 12 March, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: ICCV 2023, fix typos

arXiv:2307.04018 [pdf, other]

Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation

Authors: Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Michael Zhu, Yue Zhang

Abstract: Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers sou… ▽ More Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: ACL2023

arXiv:2307.01448 [pdf, other]

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

Authors: Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han

Abstract: Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient tr… ▽ More Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that ReactIE achieves substantial improvements and outperforms all existing baselines. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: Findings of ACL 2023, Short Paper

arXiv:2306.16552 [pdf, other]

Learning Fair Classifiers via Min-Max F-divergence Regularization

Authors: Meiyu Zhong, Ravi Tandon

Abstract: As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while… ▽ More As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy. Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation make them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets). To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding \textit{low-bias} FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.16122 [pdf, other]

Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods

Authors: Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong

Abstract: Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results, performing competitively or even outperforming supervised learning counterparts in some downstream tasks. Such approaches employ data augmentation to create two views of the same instance (i.e., positive pairs) and encourage the model to learn good representations by attracting these views clos… ▽ More Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results, performing competitively or even outperforming supervised learning counterparts in some downstream tasks. Such approaches employ data augmentation to create two views of the same instance (i.e., positive pairs) and encourage the model to learn good representations by attracting these views closer in the embedding space without collapsing to the trivial solution. However, data augmentation is limited in representing positive pairs, and the repulsion process between the instances during contrastive learning may discard important features for instances that have similar categories. To address this issue, we propose an approach to identify those images with similar semantic content and treat them as positive instances, thereby reducing the chance of discarding important features during representation learning and increasing the richness of the latent representation. Our approach is generic and could work with any self-supervised instance discrimination frameworks such as MoCo and SimSiam. To evaluate our method, we run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches. The experimental results show that our approach consistently outperforms the baseline methods across all three datasets; for instance, we improve upon the vanilla MoCo-v2 by 4.1% on ImageNet under a linear evaluation protocol over 800 epochs. We also report results on semi-supervised learning, transfer learning on downstream tasks, and object detection. △ Less

Submitted 25 April, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 17 pages, 6 figures, 12 tables

Journal ref: TMLR 2024 (https://openreview.net/pdf?id=z5AXLMBWdU)

arXiv:2306.11871 [pdf, other]

Search for events in XENON1T associated with Gravitational Waves

Authors: XENON Collaboration, E. Aprile, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antoń Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso , et al. (138 additional authors not shown)

Abstract: We perform a blind search for particle signals in the XENON1T dark matter detector that occur close in time to gravitational wave signals in the LIGO and Virgo observatories. No particle signal is observed in the nuclear recoil, electronic recoil, CE$ν$NS, and S2-only channels within $\pm$ 500 seconds of observations of the gravitational wave signals GW170104, GW170729, GW170817, GW170818, and GW1… ▽ More We perform a blind search for particle signals in the XENON1T dark matter detector that occur close in time to gravitational wave signals in the LIGO and Virgo observatories. No particle signal is observed in the nuclear recoil, electronic recoil, CE$ν$NS, and S2-only channels within $\pm$ 500 seconds of observations of the gravitational wave signals GW170104, GW170729, GW170817, GW170818, and GW170823. We use this null result to constrain mono-energetic neutrinos and Beyond Standard Model particles emitted in the closest coalescence GW170817, a binary neutron star merger. We set new upper limits on the fluence (time-integrated flux) of coincident neutrinos down to 17 keV at 90% confidence level. Furthermore, we constrain the product of coincident fluence and cross section of Beyond Standard Model particles to be less than $10^{-29}$ cm$^2$/cm$^2$ in the [5.5-210] keV energy range at 90% confidence level. △ Less

Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.08416 [pdf, other]

Characteristics of the edge temperature ring oscillation during stationary improved confnement mode in EAST

Authors: A. D. Liu, X. L. Zou, X. M. Zhong, Y. T. Song, M. K. Han, Y. M. Duan, H. Q. Liu, T. B. Wang, E. Z. Li, L. Zhang, X. Feng, G. Zhuang, EAST I-mode working group

Abstract: I-mode is a natural ELMy-free regime with H-mode like improved energy confnement and L-mode like particle confnement, making it an attractive scenario for future tokamak based fusion reactors. A kind of low frequency oscillation was widely found and appeared to be unique in I-mode, with the frequency between stationary zonal flow and geodesic-acoustic mode (GAM) zonal flow. In EAST, 90 percent I-m… ▽ More I-mode is a natural ELMy-free regime with H-mode like improved energy confnement and L-mode like particle confnement, making it an attractive scenario for future tokamak based fusion reactors. A kind of low frequency oscillation was widely found and appeared to be unique in I-mode, with the frequency between stationary zonal flow and geodesic-acoustic mode (GAM) zonal flow. In EAST, 90 percent I-mode shots have such mode, called edge temperature ring oscillation (ETRO). The mode probably plays an important role during I-mode development and sustainment, while investigations are needed to clarify the differences between ETRO and the similar mode named as low frequency edge oscillation (LFEO) in AUG and C-Mod, especially whether it is still GAM. In the paper, the ETRO characteristics in EAST were investigated in detail and most do not agree with GAM, including that 1) during L-I transition with edge Te and Ti both increasing, ETRO has a smaller frequency than GAM; 2) ETRO has distinct harmonics in various diagnostics; 3) The magnetic component of ETRO is dominated by m = 1 structure; 4) ETRO is accompanied by turbulence transition between electron-scale and ion-scale; 5) As I-mode approaching to H-mode, ETRO frequency would decrease rapidly with Te increasing. These features imply that ETRO is probably caused by the stationary zonal flow with fnite frequency. Moreover, other damping mechanisms need to be involved besides collision in the Imode edge region. It was found that modest fueling could decrease the ETRO intensity with the I-mode confnement sustaining, suggesting that supersonic molecular beam injection (SMBI) could be used as an effective tool to control ETRO. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 23 pages, 16 figures

arXiv:2306.06601 [pdf, other]

Mimicking the Thinking Process for Emotion Recognition in Conversation with Prompts and Paraphrasing

Authors: Ting Zhang, Zhuang Chen, Ming Zhong, Tieyun Qian

Abstract: Emotion recognition in conversation, which aims to predict the emotion for all utterances, has attracted considerable research attention in recent years. It is a challenging task since the recognition of the emotion in one utterance involves many complex factors, such as the conversational context, the speaker's background, and the subtle difference between emotion labels. In this paper, we propos… ▽ More Emotion recognition in conversation, which aims to predict the emotion for all utterances, has attracted considerable research attention in recent years. It is a challenging task since the recognition of the emotion in one utterance involves many complex factors, such as the conversational context, the speaker's background, and the subtle difference between emotion labels. In this paper, we propose a novel framework which mimics the thinking process when modeling these factors. Specifically, we first comprehend the conversational context with a history-oriented prompt to selectively gather information from predecessors of the target utterance. We then model the speaker's background with an experience-oriented prompt to retrieve the similar utterances from all conversations. We finally differentiate the subtle label semantics with a paraphrasing mechanism to elicit the intrinsic label related knowledge. We conducted extensive experiments on three benchmarks. The empirical results demonstrate the superiority of our proposed framework over the state-of-the-art baselines. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted to IJCAI 2023, AI and Social Good track

Showing 1–50 of 217 results for author: Zhong, M