Search | arXiv e-print repository

PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion

Authors: Jianhao Shen, Chenguang Wang, Ye Yuan, Jiawei Han, Heng Ji, Koushik Sen, Ming Zhang, Dawn Song

Abstract: This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion. Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed. We establish this via reformulating KG completion as a "fill-in-the-blank" task, and introducing a parameter-lite encoder on… ▽ More This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion. Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed. We establish this via reformulating KG completion as a "fill-in-the-blank" task, and introducing a parameter-lite encoder on top of the original LMs. We show that, by tuning far fewer parameters than finetuning, LMs transfer non-trivially to most tasks and reach competitiveness with prior state-of-the-art approaches. For instance, we outperform the fully finetuning approaches on a KG completion benchmark by tuning only 1% of the parameters. The code and datasets are available at \url{https://github.com/yuanyehome/PALT}. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Findings of EMNLP 2022

arXiv:2210.12810 [pdf, other]

Code4Struct: Code Generation for Few-Shot Event Structure Prediction

Authors: Xingyao Wang, Sha Li, Heng Ji

Abstract: Large Language Model (LLM) trained on a mixture of text and code has demonstrated impressive capability in translating natural language (NL) into structured code. We observe that semantic structures can be conveniently translated into code and propose Code4Struct to leverage such text-to-structure translation capability to tackle structured prediction tasks. As a case study, we formulate Event Arg… ▽ More Large Language Model (LLM) trained on a mixture of text and code has demonstrated impressive capability in translating natural language (NL) into structured code. We observe that semantic structures can be conveniently translated into code and propose Code4Struct to leverage such text-to-structure translation capability to tackle structured prediction tasks. As a case study, we formulate Event Argument Extraction (EAE) as converting text into event-argument structures that can be represented as a class object using code. This alignment between structures and code enables us to take advantage of Programming Language (PL) features such as inheritance and type annotation to introduce external knowledge or add constraints. We show that, with sufficient in-context examples, formulating EAE as a code generation problem is advantageous over using variants of text-based prompts. Despite only using 20 training event instances for each event type, Code4Struct is comparable to supervised models trained on 4,202 instances and outperforms current state-of-the-art (SOTA) trained on 20-shot data by 29.5% absolute F1. Code4Struct can use 10-shot training data from a sibling event type to predict arguments for zero-resource event types and outperforms the zero-shot baseline by 12% absolute F1. △ Less

Submitted 24 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: ACL 2023

arXiv:2210.12582 [pdf, other]

Language Model Pre-Training with Sparse Latent Typing

Authors: Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, Chengxiang Zhai, Heng Ji

Abstract: Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by propo… ▽ More Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT. △ Less

Submitted 26 October, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: EMNLP 2022 (Oral)

arXiv:2210.12444 [pdf, other]

Weakly-Supervised Temporal Article Grounding

Authors: Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang

Abstract: Given a long untrimmed video and natural language queries, video grounding (VG) aims to temporally localize the semantically-aligned video segments. Almost all existing VG work holds two simple but unrealistic assumptions: 1) All query sentences can be grounded in the corresponding video. 2) All query sentences for the same video are always at the same semantic scale. Unfortunately, both assumptio… ▽ More Given a long untrimmed video and natural language queries, video grounding (VG) aims to temporally localize the semantically-aligned video segments. Almost all existing VG work holds two simple but unrealistic assumptions: 1) All query sentences can be grounded in the corresponding video. 2) All query sentences for the same video are always at the same semantic scale. Unfortunately, both assumptions make today's VG models fail to work in practice. For example, in real-world multimodal assets (eg, news articles), most of the sentences in the article can not be grounded in their affiliated videos, and they typically have rich hierarchical relations (ie, at different semantic scales). To this end, we propose a new challenging grounding task: Weakly-Supervised temporal Article Grounding (WSAG). Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales. Accordingly, we collect the first WSAG dataset to facilitate this task: YouwikiHow, which borrows the inherent multi-scale descriptions in wikiHow articles and plentiful YouTube videos. In addition, we propose a simple but effective method DualMIL for WSAG, which consists of a two-level MIL loss and a single-/cross- sentence constraint loss. These training objectives are carefully designed for these relaxed assumptions. Extensive ablations have verified the effectiveness of DualMIL. △ Less

Submitted 23 February, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: EMNLP 2022, https://github.com/zjuchenlong/WSAG

arXiv:2210.11768 [pdf, other]

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

Authors: Ziqi Wang, Yuexin Wu, Frederick Liu, Daogao Liu, Le Hou, Hongkun Yu, Jing Li, Heng Ji

Abstract: Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive task-specific data, which may not be plausible in many real-world applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation meth… ▽ More Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive task-specific data, which may not be plausible in many real-world applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models). To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. Our method builds on top of representation interpolation augmentation methods to maintain the diversity of expressions and converts the augmented data to tokens to avoid shifting decision boundaries. It uses simple operations that come with little computational overhead. The results on multiple GLUE tasks show that our methods can improve distillation performance by a large margin at a low time cost. Codes are available at https://github.com/google-research/google-research/tree/master/augpro. △ Less

Submitted 10 March, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: 20 pages, 5 figures. Accepted by ICLR 2023

arXiv:2210.10402 [pdf]

doi 10.1016/j.asr.2022.10.045

Solar Ring Mission: Building a Panorama of the Sun and Inner-heliosphere

Authors: Yuming Wang, Xianyong Bai, Changyong Chen, Linjie Chen, Xin Cheng, Lei Deng, Linhua Deng, Yuanyong Deng, Li Feng, Tingyu Gou, Jingnan Guo, Yang Guo, Xinjun Hao, Jiansen He, Junfeng Hou, Huang Jiangjiang, Zhenghua Huang, Haisheng Ji, Chaowei Jiang, Jie Jiang, Chunlan Jin, Xiaolei Li, Yiren Li, Jiajia Liu, Kai Liu , et al. (29 additional authors not shown)

Abstract: Solar Ring (SOR) is a proposed space science mission to monitor and study the Sun and inner heliosphere from a full 360° perspective in the ecliptic plane. It will deploy three 120°-separated spacecraft on the 1-AU orbit. The first spacecraft, S1, locates 30° upstream of the Earth, the second, S2, 90° downstream, and the third, S3, completes the configuration. This design with necessary science in… ▽ More Solar Ring (SOR) is a proposed space science mission to monitor and study the Sun and inner heliosphere from a full 360° perspective in the ecliptic plane. It will deploy three 120°-separated spacecraft on the 1-AU orbit. The first spacecraft, S1, locates 30° upstream of the Earth, the second, S2, 90° downstream, and the third, S3, completes the configuration. This design with necessary science instruments, e.g., the Doppler-velocity and vector magnetic field imager, wide-angle coronagraph, and in-situ instruments, will allow us to establish many unprecedented capabilities: (1) provide simultaneous Doppler-velocity observations of the whole solar surface to understand the deep interior, (2) provide vector magnetograms of the whole photosphere - the inner boundary of the solar atmosphere and heliosphere, (3) provide the information of the whole lifetime evolution of solar featured structures, and (4) provide the whole view of solar transients and space weather in the inner heliosphere. With these capabilities, Solar Ring mission aims to address outstanding questions about the origin of solar cycle, the origin of solar eruptions and the origin of extreme space weather events. The successful accomplishment of the mission will construct a panorama of the Sun and inner-heliosphere, and therefore advance our understanding of the star and the space environment that holds our life. △ Less

Submitted 23 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 41 pages, 6 figures, 1 table, to be published in Advances in Space Research

arXiv:2210.08604 [pdf, other]

NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly

Authors: Yi R. Fung, Tuhin Chakraborty, Hao Guo, Owen Rambow, Smaranda Muresan, Heng Ji

Abstract: Norm discovery is important for understanding and reasoning about the acceptable behaviors and potential violations in human communication and interactions. We introduce NormSage, a framework for addressing the novel task of conversation-grounded multi-lingual, multi-cultural norm discovery, based on language model prompting and self-verification. NormSAGE leverages the expressiveness and implicit… ▽ More Norm discovery is important for understanding and reasoning about the acceptable behaviors and potential violations in human communication and interactions. We introduce NormSage, a framework for addressing the novel task of conversation-grounded multi-lingual, multi-cultural norm discovery, based on language model prompting and self-verification. NormSAGE leverages the expressiveness and implicit knowledge of the pretrained GPT-3 language model backbone, to elicit knowledge about norms through directed questions representing the norm discovery task and conversation context. It further addresses the risk of language model hallucination with a self-verification mechanism ensuring that the norms discovered are correct and are substantially grounded to their source conversations. Evaluation results show that our approach discovers significantly more relevant and insightful norms for conversations on-the-fly compared to baselines (>10+% in Likert scale rating). The norms discovered from Chinese conversation are also comparable to the norms discovered from English conversation in terms of insightfulness and correctness (<3% difference). In addition, the culture-specific norms are promising quality, allowing for 80% accuracy in culture pair human identification. Finally, our grounding process in norm discovery self-verification can be extended for instantiating the adherence and violation of any norm for a given conversation on-the-fly, with explainability and transparency. NormSAGE achieves an AUC of 95.4% in grounding, with natural language explanation matching human-written quality. △ Less

Submitted 13 January, 2024; v1 submitted 16 October, 2022; originally announced October 2022.

arXiv:2210.07197 [pdf, other]

Towards a Unified Multi-Dimensional Evaluator for Text Generation

Authors: Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, Pengfei Liu, Chenguang Zhu, Heng Ji, Jiawei Han

Abstract: Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this… ▽ More Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG evaluation as a Boolean Question Answering (QA) task, and by guiding the model with different questions, we can use one evaluator to evaluate from multiple dimensions. Furthermore, thanks to the unified Boolean QA format, we are able to introduce an intermediate learning phase that enables UniEval to incorporate external knowledge from multiple related tasks and gain further improvement. Experiments on three typical NLG tasks show that UniEval correlates substantially better with human judgments than existing metrics. Specifically, compared to the top-performing unified evaluators, UniEval achieves a 23% higher correlation on text summarization, and over 43% on dialogue response generation. Also, UniEval demonstrates a strong zero-shot learning ability for unseen evaluation dimensions and tasks. Source code, data and all pre-trained evaluators are available on our GitHub repository (https://github.com/maszhongming/UniEval). △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: EMNLP 2022

arXiv:2210.06533 [pdf, other]

doi 10.1063/5.0139276

Super-Fermi Acceleration in Multiscale MHD Reconnection

Authors: Stephen Majeski, Hantao Ji

Abstract: We investigate the Fermi acceleration of charged particles in 2D MHD anti-parallel plasmoid reconnection, finding a drastic enhancement in energization rate $\dot{\varepsilon}$ over a standard Fermi model of $\dot{\varepsilon} \sim \varepsilon$. The shrinking particle orbit width around a magnetic island due to $\vec{E}\times\vec{B}$ drift produces a… ▽ More We investigate the Fermi acceleration of charged particles in 2D MHD anti-parallel plasmoid reconnection, finding a drastic enhancement in energization rate $\dot{\varepsilon}$ over a standard Fermi model of $\dot{\varepsilon} \sim \varepsilon$. The shrinking particle orbit width around a magnetic island due to $\vec{E}\times\vec{B}$ drift produces a $\dot{\varepsilon}_\parallel \sim \varepsilon_\parallel^{1+1/2χ}$ power law with $χ\sim 0.75$. The increase in the maximum possible energy gain of a particle within a plasmoid due to the enhanced efficiency increases with the plasmoid size, and is by multiple factors of 10 in the case of solar flares and much more for larger plasmas. Including effects of the non-constant $\vec{E}\times\vec{B}$ drift rates leads to further variation of power law indices from $\gtrsim 2$ to $\lesssim 1$, decreasing with plasmoid size at the time of injection. △ Less

Submitted 30 March, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 7 pages, 7 figures

arXiv:2210.05919 [pdf, other]

Multiwavelength observations of a partial filament eruption on 13 June 2011

Authors: Yanjie Zhang, Qingmin Zhang, Jun Dai, Dong Li, Haisheng Ji

Abstract: In this paper, we report the multiwavelength observations of the partial filament eruption associated with a C1.2 class flare in NOAA active region 11236 on 13 June 2011. The event occurred at the eastern limb in the field of view (FOV) of Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory (SDO) spacecraft and was close to the disk center in the FOV of Extreme-UltraViolet I… ▽ More In this paper, we report the multiwavelength observations of the partial filament eruption associated with a C1.2 class flare in NOAA active region 11236 on 13 June 2011. The event occurred at the eastern limb in the field of view (FOV) of Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory (SDO) spacecraft and was close to the disk center in the FOV of Extreme-UltraViolet Imager (EUVI) on board the behind Solar Terrestrial Relations Observatory (STEREO) spacecraft. During eruption, the filament splits into two parts: the major part and runaway part. The major part flows along closed loops and experiences bifurcation at the loop top. Some of the materials move forward and reach the remote footpoint, while others return back to the original footpoint. The runaway part flows along open field lines, which is evidenced by a flare-related type III radio burst. The runaway part also undergoes bifurcation. The upper branch of escapes the corona and evolves into a jet-like narrow coronal mass ejection (CME) at a speed of 324 km s-1, while the lower branch falls back to the solar surface. A schematic cartoon is proposed to explain the event and provides a new mechanism of partial filament eruptions △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 18 pages, 7 figures, accepted by Solar Physics (SoPh)

arXiv:2210.04405 [pdf, other]

Finite-time self-similar rupture in a generalized elastohydrodynamic lubrication model

Authors: William Chang, Hanjie Ji

Abstract: Thin film rupture is a type of nonlinear instability that causes the solution to touch down to zero at finite time. We investigate the finite-time rupture behavior of a generalized elastohydrodynamic lubrication model. This model features the interplay between destabilizing disjoining pressure and stabilizing elastic bending pressure and surface tension. The governing equation is a sixth-order non… ▽ More Thin film rupture is a type of nonlinear instability that causes the solution to touch down to zero at finite time. We investigate the finite-time rupture behavior of a generalized elastohydrodynamic lubrication model. This model features the interplay between destabilizing disjoining pressure and stabilizing elastic bending pressure and surface tension. The governing equation is a sixth-order nonlinear degenerate parabolic partial differential equation parameterized by exponents in the mobility function and the disjoining pressure, respectively. Asymptotic self-similar finite-time rupture solutions governed by a sixth-order leading-order equation are analyzed. In the weak elasticity limit, transient self-similar dynamics governed by a fourth-order similarity equation are also identified. △ Less

Submitted 9 October, 2022; originally announced October 2022.

MSC Class: 76A20

arXiv:2210.04287 [pdf, other]

Learning to Decompose Visual Features with Latent Textual Prompts

Authors: Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander G. Schwing, Heng Ji

Abstract: Recent advances in pre-training vision-language models like CLIP have shown great potential in learning transferable visual representations. Nonetheless, for downstream inference, CLIP-like models suffer from either 1) degraded accuracy and robustness in the case of inaccurate text descriptions during retrieval-based inference (the challenge for zero-shot protocol); or 2) breaking the well-establi… ▽ More Recent advances in pre-training vision-language models like CLIP have shown great potential in learning transferable visual representations. Nonetheless, for downstream inference, CLIP-like models suffer from either 1) degraded accuracy and robustness in the case of inaccurate text descriptions during retrieval-based inference (the challenge for zero-shot protocol); or 2) breaking the well-established vision-language alignment (the challenge for linear probing). To address them, we propose Decomposed Feature Prompting (DeFo). DeFo leverages a flexible number of learnable embeddings as textual input while maintaining the vision-language dual-model architecture, which enables the model to learn decomposed visual features with the help of feature-level textual prompts. We further use an additional linear layer to perform classification, allowing a scalable size of language inputs. Our empirical study shows DeFo's significance in improving the vision-language models. For example, DeFo obtains 73.2% test accuracy on ImageNet with a ResNet-50 backbone without tuning any pretrained weights of both the vision and language encoder, outperforming zero-shot CLIP by a large margin of 15.0%, and outperforming state-of-the-art vision-language prompt tuning method by 7.6%. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2210.00185 [pdf, other]

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

Authors: Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji

Abstract: Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as… ▽ More Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a zero-shot semi-parametric language model. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$ with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel $\text{augmentation fusion}$ module leveraging perceiver resampler and gated cross-attention. Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3.9x smaller in model size. △ Less

Submitted 22 May, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: Accepted as a conference paper at Findings of ACL 2023

arXiv:2209.15354 [pdf]

Design of Partially Etched GaP-OI Microresonators for Two-Color Kerr Soliton Generation at NIR and MIR

Authors: Houling Ji, Zhaoting Geng, Weiren Cheng, Zhuoyu Yu, Pengzhuo Wu, Yi Li, Qiancheng Zhao

Abstract: We present and theoretically investigate a dispersion engineered GaP-OI microresonator containing a partially-etched gap of 250 nm x 410 nm in a 600 nm x 2990 nm waveguide. This gap enables a 3.25 μm wide anomalous dispersion spectral span covering both the near-infrared and the mid-infrared spectra. This anomalous dispersion is manifested by two mechanisms, being the hybridization of the fundamen… ▽ More We present and theoretically investigate a dispersion engineered GaP-OI microresonator containing a partially-etched gap of 250 nm x 410 nm in a 600 nm x 2990 nm waveguide. This gap enables a 3.25 μm wide anomalous dispersion spectral span covering both the near-infrared and the mid-infrared spectra. This anomalous dispersion is manifested by two mechanisms, being the hybridization of the fundamental TE modes around 1550 nm and the geometric dispersion of the higher order TE mode around the 3100 nm wavelengths, respectively. Two Kerr soliton combs can be numerically generated with 101 GHz and 97 GHz teeth spacings at these spectral windows. The proposed structure demonstrates the design flexibility thanks to the partially etched gap and paves the way towards potential coherent multicolor frequency comb generation in the emerging GaP-OI platform. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.12754 [pdf, other]

doi 10.1038/s41567-023-01972-1

Ion and Electron Acoustic Bursts during Anti-Parallel Magnetic Reconnection Driven by Lasers

Authors: Shu Zhang, Abraham Chien, Lan Gao, Hantao Ji, Eric G. Blackman, Russ Follett, Dustin H. Froula, Joseph Katz, Chikang Li, Andrew Birkel, Richard Petrasso, John Moody, Hui Chen

Abstract: Magnetic reconnection converts magnetic energy into thermal and kinetic energy in plasma. Among numerous candidate mechanisms, ion acoustic instabilities driven by the relative drift between ions and electrons, or equivalently electric current, have been suggested to play a critical role in dissipating magnetic energy in collisionless plasmas. However, their existence and effectiveness during reco… ▽ More Magnetic reconnection converts magnetic energy into thermal and kinetic energy in plasma. Among numerous candidate mechanisms, ion acoustic instabilities driven by the relative drift between ions and electrons, or equivalently electric current, have been suggested to play a critical role in dissipating magnetic energy in collisionless plasmas. However, their existence and effectiveness during reconnection have not been well understood due to ion Landau damping and difficulties in resolving the Debye length scale in the laboratory. Here we report a sudden onset of ion acoustic bursts measured by collective Thomson scattering in the exhaust of anti-parallel magnetically driven reconnection using high-power lasers. The ion acoustic bursts are followed by electron acoustic bursts with electron heating and bulk acceleration. We reproduce these observations with 1D and 2D particle-in-cell simulations in which electron outflow jet drives ion-acoustic instabilities, forming double layers. These layers induce electron two-stream instabilities that generate electron acoustic bursts and energize electrons. Our results demonstrate the importance of ion and electron acoustic dynamics during reconnection when ion Landau damping is ineffective, a condition applicable to a range of astrophysical plasmas including near-Earth space, stellar flares, and black hole accretion engines. △ Less

Submitted 29 March, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.09696 [pdf]

Synthesis of realistic fetal MRI with conditional Generative Adversarial Networks

Authors: Marina Fernandez Garcia, Rodrigo Gonzalez Laiz, Hui Ji, Kelly Payette, Andras Jakab

Abstract: Fetal brain magnetic resonance imaging serves as an emerging modality for prenatal counseling and diagnosis in disorders affecting the brain. Machine learning based segmentation plays an important role in the quantification of brain development. However, a limiting factor is the lack of sufficiently large, labeled training data. Our study explored the application of SPADE, a conditional general ad… ▽ More Fetal brain magnetic resonance imaging serves as an emerging modality for prenatal counseling and diagnosis in disorders affecting the brain. Machine learning based segmentation plays an important role in the quantification of brain development. However, a limiting factor is the lack of sufficiently large, labeled training data. Our study explored the application of SPADE, a conditional general adversarial network (cGAN), which learns the mapping from the label to the image space. The input to the network was super-resolution T2-weighted cerebral MRI data of 120 fetuses (gestational age range: 20-35 weeks, normal and pathological), which were annotated for 7 different tissue categories. SPADE networks were trained on 256*256 2D slices of the reconstructed volumes (image and label pairs) in each orthogonal orientation. To combine the generated volumes from each orientation into one image, a simple mean of the outputs of the three networks was taken. Based on the label maps only, we synthesized highly realistic images. However, some finer details, like small vessels were not synthesized. A structural similarity index (SSIM) of 0.972+-0.016 and correlation coefficient of 0.974+-0.008 were achieved. To demonstrate the capacity of the cGAN to create new anatomical variants, we artificially dilated the ventricles in the segmentation map and created synthetic MRI of different degrees of fetal hydrocephalus. cGANs, such as the SPADE algorithm, allow the generation of hypothetically unseen scenarios and anatomical configurations in the label space, which data in turn can be utilized for training various machine learning algorithms. In the future, this algorithm would be used for generating large, synthetic datasets representing fetal brain development. These datasets would potentially improve the performance of currently available segmentation networks. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.09104 [pdf, other]

VS-CAM: Vertex Semantic Class Activation Mapping to Interpret Vision Graph Neural Network

Authors: Zhenpeng Feng, Xiyang Cui, Hongbing Ji, Mingzhe Zhu, Ljubisa Stankovic

Abstract: Graph convolutional neural network (GCN) has drawn increasing attention and attained good performance in various computer vision tasks, however, there lacks a clear interpretation of GCN's inner mechanism. For standard convolutional neural networks (CNNs), class activation mapping (CAM) methods are commonly used to visualize the connection between CNN's decision and image region by generating a he… ▽ More Graph convolutional neural network (GCN) has drawn increasing attention and attained good performance in various computer vision tasks, however, there lacks a clear interpretation of GCN's inner mechanism. For standard convolutional neural networks (CNNs), class activation mapping (CAM) methods are commonly used to visualize the connection between CNN's decision and image region by generating a heatmap. Nonetheless, such heatmap usually exhibits semantic-chaos when these CAMs are applied to GCN directly. In this paper, we proposed a novel visualization method particularly applicable to GCN, Vertex Semantic Class Activation Mapping (VS-CAM). VS-CAM includes two independent pipelines to produce a set of semantic-probe maps and a semantic-base map, respectively. Semantic-probe maps are used to detect the semantic information from semantic-base map to aggregate a semantic-aware heatmap. Qualitative results show that VS-CAM can obtain heatmaps where the highlighted regions match the objects much more precisely than CNN-based CAM. The quantitative evaluation further demonstrates the superiority of VS-CAM. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 10 pages, 10 figures

arXiv:2209.08679 [pdf, other]

Dynamic Global Memory for Document-level Argument Extraction

Authors: Xinya Du, Sha Li, Heng Ji

Abstract: Extracting informative arguments of events from news articles is a challenging problem in information extraction, which requires a global contextual understanding of each document. While recent work on document-level extraction has gone beyond single-sentence and increased the cross-sentence inference capability of end-to-end models, they are still restricted by certain input sequence length const… ▽ More Extracting informative arguments of events from news articles is a challenging problem in information extraction, which requires a global contextual understanding of each document. While recent work on document-level extraction has gone beyond single-sentence and increased the cross-sentence inference capability of end-to-end models, they are still restricted by certain input sequence length constraints and usually ignore the global context between events. To tackle this issue, we introduce a new global neural generation-based framework for document-level event argument extraction by constructing a document memory store to record the contextual event information and leveraging it to implicitly and explicitly help with decoding of arguments for later events. Empirical results show that our framework outperforms prior methods substantially and it is more robust to adversarially annotated examples with our constrained decoding design. (Our code and resources are available at https://github.com/xinyadu/memory_docie for research purpose.) △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: ACL 2022 main conference (12 pages)

arXiv:2209.08457 [pdf, other]

doi 10.1103/PhysRevLett.129.115001

Observation of axisymmetric standard magnetorotational instability in the laboratory

Authors: Yin Wang, Erik P. Gilson, Fatima Ebrahimi, Jeremy Goodman, Hantao Ji

Abstract: We report the first direct evidence for the axisymmetric standard magnetorotational instability (SMRI) from a combined experimental and numerical study of a magnetized liquid-metal shear flow in a Taylor-Couette cell with independently rotating and electrically conducting end caps. When a uniform vertical magnetic field $B_i$ is applied along the rotation axis, the measured radial magnetic field… ▽ More We report the first direct evidence for the axisymmetric standard magnetorotational instability (SMRI) from a combined experimental and numerical study of a magnetized liquid-metal shear flow in a Taylor-Couette cell with independently rotating and electrically conducting end caps. When a uniform vertical magnetic field $B_i$ is applied along the rotation axis, the measured radial magnetic field $B_r$ on the inner cylinder increases linearly with a small magnetic Reynolds number $Rm$ due to the magnetization of the residue Ekman circulation. Onset of the axisymmetric SMRI is identified from the nonlinear increase of $B_r$ beyond a critical $Rm$ in both experiments and nonlinear numerical simulations. The axisymmetric SMRI exists only at sufficiently large $Rm$ and intermediate $B_i$, a feature consistent with theoretical predictions. Our simulations further show that the axisymmetric SMRI causes the velocity and magnetic fields to contribute an outward flux of axial angular momentum in the bulk region, just as it should in accretion disks. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 10 pages; 11 figures

Journal ref: Physical Review Letters 129, 115001 (2022)

arXiv:2209.08410 [pdf, other]

doi 10.1038/s41467-022-32278-0

Identification of a non-axisymmetric mode in laboratory experiments searching for standard magnetorotational instability

Authors: Yin Wang, Erik P. Gilson, Fatima Ebrahimi, Jeremy Goodman, Kyle J. Caspary, Himawan W. Winarto, Hantao Ji

Abstract: The standard magnetorotational instability (SMRI) is a promising mechanism for turbulence and rapid accretion in astrophysical disks. It is a magnetohydrodynamic (MHD) instability that destabilizes otherwise hydrodynamically stable disk flow. Due to its microscopic nature at astronomical distances and stringent requirements in laboratory experiments, SMRI has remained unconfirmed since its proposa… ▽ More The standard magnetorotational instability (SMRI) is a promising mechanism for turbulence and rapid accretion in astrophysical disks. It is a magnetohydrodynamic (MHD) instability that destabilizes otherwise hydrodynamically stable disk flow. Due to its microscopic nature at astronomical distances and stringent requirements in laboratory experiments, SMRI has remained unconfirmed since its proposal, despite its astrophysical importance. Here we report a nonaxisymmetric MHD instability in a modified Taylor-Couette experiment. To search for SMRI, a uniform magnetic field is imposed along the rotation axis of a swirling liquid-metal flow. The instability initially grows exponentially, becoming prominent only for sufficient flow shear and moderate magnetic field. These conditions for instability are qualitatively consistent with SMRI, but at magnetic Reynolds numbers below the predictions of linear analyses with periodic axial boundaries. Three-dimensional numerical simulations, however, reproduce the observed instability, indicating that it grows linearly from the primary axisymmetric flow modified by the applied magnetic field. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 15 pages, 16 figures

Journal ref: Nature Communications 13, 4679 (2022)

arXiv:2209.03611 [pdf, other]

Advancing Theory and Modeling Efforts in Heliophysics

Authors: Fan Guo, Spiro Antiochos, Paul Cassak, Bin Chen, Xiaohang Chen, Chuanfei Dong, Cooper Downs, Joe Giacalone, Colby C. Haggerty, Hantao Ji, Judith Karpen, James Klimchuk, Wen Li, Xiaocan Li, Mitsuo Oka, Katharine K. Reeves, Marc Swisdak, Weichao Tu

Abstract: Heliophysics theory and modeling build understanding from fundamental principles to motivate, interpret, and predict observations. Together with observational analysis, they constitute a comprehensive scientific program in heliophysics. As observations and data analysis become increasingly detailed, it is critical that theory and modeling develop more quantitative predictions and iterate with obse… ▽ More Heliophysics theory and modeling build understanding from fundamental principles to motivate, interpret, and predict observations. Together with observational analysis, they constitute a comprehensive scientific program in heliophysics. As observations and data analysis become increasingly detailed, it is critical that theory and modeling develop more quantitative predictions and iterate with observations. Advanced theory and modeling can inspire and greatly improve the design of new instruments and increase their chance of success. In addition, in order to build physics-based space weather forecast models, it is important to keep developing and testing new theories, and maintaining constant communications with theory and modeling. Maintaining a sustainable effort in theory and modeling is critically important to heliophysics. We recommend that all funding agencies join forces and consider expanding current and creating new theory and modeling programs--especially, 1. NASA should restore the HTMS program to its original support level to meet the critical needs of heliophysics science; 2. a Strategic Research Model program needs to be created to support model development for next-generation basic research codes; 3. new programs must be created for addressing mission-critical theory and modeling needs; and 4. enhanced programs are urgently required for training the next generation of theorists and modelers. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: White paper submitted to Heliophysics 2024 Decadal Survey

arXiv:2209.02071 [pdf, other]

CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval

Authors: Kung-Hsiang Huang, ChengXiang Zhai, Heng Ji

Abstract: Fact-checking has gained increasing attention due to the widespread of falsified information. Most fact-checking approaches focus on claims made in English only due to the data scarcity issue in other languages. The lack of fact-checking datasets in low-resource languages calls for an effective cross-lingual transfer technique for fact-checking. Additionally, trustworthy information in different l… ▽ More Fact-checking has gained increasing attention due to the widespread of falsified information. Most fact-checking approaches focus on claims made in English only due to the data scarcity issue in other languages. The lack of fact-checking datasets in low-resource languages calls for an effective cross-lingual transfer technique for fact-checking. Additionally, trustworthy information in different languages can be complementary and helpful in verifying facts. To this end, we present the first fact-checking framework augmented with cross-lingual retrieval that aggregates evidence retrieved from multiple languages through a cross-lingual retriever. Given the absence of cross-lingual information retrieval datasets with claim-like queries, we train the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a self-supervised algorithm that creates training instances by translating the title of a passage. The goal for X-ICT is to learn cross-lingual retrieval in which the model learns to identify the passage corresponding to a given translated title. On the X-Fact dataset, our approach achieves 2.23% absolute F1 improvement in the zero-shot cross-lingual setup over prior systems. The source code and data are publicly available at https://github.com/khuangaf/CONCRETE. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: Accepted by COLING 2022

arXiv:2209.01988 [pdf, other]

A Benchmark for Weakly Semi-Supervised Abnormality Localization in Chest X-Rays

Authors: Haoqin Ji, Haozhe Liu, Yuexiang Li, Jinheng Xie, Nanjun He, Yawen Huang, Dong Wei, Xinrong Chen, Linlin Shen, Yefeng Zheng

Abstract: Accurate abnormality localization in chest X-rays (CXR) can benefit the clinical diagnosis of various thoracic diseases. However, the lesion-level annotation can only be performed by experienced radiologists, and it is tedious and time-consuming, thus difficult to acquire. Such a situation results in a difficulty to develop a fully-supervised abnormality localization system for CXR. In this regard… ▽ More Accurate abnormality localization in chest X-rays (CXR) can benefit the clinical diagnosis of various thoracic diseases. However, the lesion-level annotation can only be performed by experienced radiologists, and it is tedious and time-consuming, thus difficult to acquire. Such a situation results in a difficulty to develop a fully-supervised abnormality localization system for CXR. In this regard, we propose to train the CXR abnormality localization framework via a weakly semi-supervised strategy, termed Point Beyond Class (PBC), which utilizes a small number of fully annotated CXRs with lesion-level bounding boxes and extensive weakly annotated samples by points. Such a point annotation setting can provide weakly instance-level information for abnormality localization with a marginal annotation cost. Particularly, the core idea behind our PBC is to learn a robust and accurate mapping from the point annotations to the bounding boxes against the variance of annotated points. To achieve that, a regularization term, namely multi-point consistency, is proposed, which drives the model to generate the consistent bounding box from different point annotations inside the same abnormality. Furthermore, a self-supervision, termed symmetric consistency, is also proposed to deeply exploit the useful information from the weakly annotated data for abnormality localization. Experimental results on RSNA and VinDr-CXR datasets justify the effectiveness of the proposed method. When less than 20% box-level labels are used for training, an improvement of ~5 in mAP can be achieved by our PBC, compared to the current state-of-the-art method (i.e., Point DETR). Code is available at https://github.com/HaozheLiu-ST/Point-Beyond-Class. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: Accepted by MICCAI-2022

arXiv:2209.00068 [pdf, other]

Incorporating Task-specific Concept Knowledge into Script Learning

Authors: Chenkai Sun, Tie Xu, ChengXiang Zhai, Heng Ji

Abstract: In this paper, we present Tetris, a new task of Goal-Oriented Script Completion. Unlike previous work, it considers a more realistic and general setting, where the input includes not only the goal but also additional user context, including preferences and history. To address this problem, we propose a novel approach, which uses two techniques to improve performance: (1) concept prompting, and (2)… ▽ More In this paper, we present Tetris, a new task of Goal-Oriented Script Completion. Unlike previous work, it considers a more realistic and general setting, where the input includes not only the goal but also additional user context, including preferences and history. To address this problem, we propose a novel approach, which uses two techniques to improve performance: (1) concept prompting, and (2) script-oriented contrastive learning that addresses step repetition and hallucination problems. On our WikiHow-based dataset, we find that both methods improve performance. The dataset, repository, and models will be publicly available to facilitate further research on this new task. △ Less

Submitted 23 April, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

arXiv:2208.12306 [pdf, other]

Multimedia Generative Script Learning for Task Planning

Authors: Qingyun Wang, Manling Li, Hou Pong Chan, Lifu Huang, Julia Hockenmaier, Girish Chowdhary, Heng Ji

Abstract: Goal-oriented generative script learning aims to generate subsequent steps to reach a particular goal, which is an essential task to assist robots or humans in performing stereotypical activities. An important aspect of this process is the ability to capture historical states visually, which provides detailed information that is not covered by text and will guide subsequent steps. Therefore, we pr… ▽ More Goal-oriented generative script learning aims to generate subsequent steps to reach a particular goal, which is an essential task to assist robots or humans in performing stereotypical activities. An important aspect of this process is the ability to capture historical states visually, which provides detailed information that is not covered by text and will guide subsequent steps. Therefore, we propose a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modalities, as well as presenting the first benchmark containing 5,652 tasks and 79,089 multimedia steps. This task is challenging in three aspects: the multimedia challenge of capturing the visual states in images, the induction challenge of performing unseen tasks, and the diversity challenge of covering different information in individual steps. We propose to encode visual state changes through a selective multimedia encoder to address the multimedia challenge, transfer knowledge from previously observed tasks using a retrieval-augmented decoder to overcome the induction challenge, and further present distinct information at each step by optimizing a diversity-oriented contrastive learning objective. We define metrics to evaluate both generation and inductive quality. Experiment results demonstrate that our approach significantly outperforms strong baselines. △ Less

Submitted 10 July, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: 21 pages, Accepted by Findings of the Association for Computational Linguistics: ACL 2023, Code and Resources at https://github.com/EagleW/Multimedia-Generative-Script-Learning

arXiv:2208.05035 [pdf, ps, other]

Adaptive Target-Condition Neural Network: DNN-Aided Load Balancing for Hybrid LiFi and WiFi Networks

Authors: Han Ji, Qiang Wang, Stephen J. Redmond, Iman Tavakkolnia, Xiping Wu

Abstract: Load balancing (LB) is a challenging issue in the hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets), due to the nature of heterogeneous access points (APs). Machine learning has the potential to provide a complexity-friendly LB solution with near-optimal network performance, at the cost of a training process. The state-of-the-art (SOTA) learning-aided LB methods, however… ▽ More Load balancing (LB) is a challenging issue in the hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets), due to the nature of heterogeneous access points (APs). Machine learning has the potential to provide a complexity-friendly LB solution with near-optimal network performance, at the cost of a training process. The state-of-the-art (SOTA) learning-aided LB methods, however, need retraining when the network environment (especially the number of users) changes, significantly limiting its practicability. In this paper, a novel deep neural network (DNN) structure named adaptive target-condition neural network (A-TCNN) is proposed, which conducts AP selection for one target user upon the condition of other users. Also, an adaptive mechanism is developed to map a smaller number of users to a larger number through splitting their data rate requirements, without affecting the AP selection result for the target user. This enables the proposed method to handle different numbers of users without the need for retraining. Results show that A-TCNN achieves a network throughput very close to that of the testing dataset, with a gap less than 3%. It is also proven that A-TCNN can obtain a network throughput comparable to two SOTA benchmarks, while reducing the runtime by up to three orders of magnitude. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 13 pages, 9 figures, and 4 tables, submitted to IEEE JSAC SI-BeyondShannon

arXiv:2207.08808 [pdf, other]

Global-Local Stepwise Generative Network for Ultra High-Resolution Image Restoration

Authors: Xin Feng, Haobo Ji, Wenjie Pei, Fanglin Chen, Guangming Lu

Abstract: While the research on image background restoration from regular size of degraded images has achieved remarkable progress, restoring ultra high-resolution (e.g., 4K) images remains an extremely challenging task due to the explosion of computational complexity and memory usage, as well as the deficiency of annotated data. In this paper we present a novel model for ultra high-resolution image restora… ▽ More While the research on image background restoration from regular size of degraded images has achieved remarkable progress, restoring ultra high-resolution (e.g., 4K) images remains an extremely challenging task due to the explosion of computational complexity and memory usage, as well as the deficiency of annotated data. In this paper we present a novel model for ultra high-resolution image restoration, referred to as the Global-Local Stepwise Generative Network (GLSGN), which employs a stepwise restoring strategy involving four restoring pathways: three local pathways and one global pathway. The local pathways focus on conducting image restoration in a fine-grained manner over local but high-resolution image patches, while the global pathway performs image restoration coarsely on the scale-down but intact image to provide cues for the local pathways in a global view including semantics and noise patterns. To smooth the mutual collaboration between these four pathways, our GLSGN is designed to ensure the inter-pathway consistency in four aspects in terms of low-level content, perceptual attention, restoring intensity and high-level semantics, respectively. As another major contribution of this work, we also introduce the first ultra high-resolution dataset to date for both reflection removal and rain streak removal, comprising 4,670 real-world and synthetic images. Extensive experiments across three typical tasks for image background restoration, including image reflection removal, image rain streak removal and image dehazing, show that our GLSGN consistently outperforms state-of-the-art methods. △ Less

Submitted 17 May, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2206.12489 [pdf, other]

Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

Authors: Hang Ji, Tanvina Patel, Odette Scharenborg

Abstract: In this work, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory features (AF) information and their subsequent prediction of phone recognition performance for within and across language scenarios. Specifically, we compared CPC, wav2vec 2.0, and HuBert. First, frame-level… ▽ More In this work, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory features (AF) information and their subsequent prediction of phone recognition performance for within and across language scenarios. Specifically, we compared CPC, wav2vec 2.0, and HuBert. First, frame-level AF probing tasks were implemented. Subsequently, phone-level end-to-end ASR systems for phoneme recognition tasks were implemented, and the performance on the frame-level AF probing task and the phone accuracy were correlated. Compared to the conventional speech representation MFCC, all SSL pre-trained speech representations captured more AF information, and achieved better phoneme recognition performance within and across languages, with HuBert performing best. The frame-level AF probing task is a good predictor of phoneme recognition performance, showing the importance of capturing AF information in the speech representations. Compared with MFCC, in the within-language scenario, the performance of these SSL speech pre-trained models on AF probing tasks achieved a maximum relative increase of 34.4%, and it resulted in the lowest PER of 10.2%. In the cross-language scenario, the maximum relative increase of 26.7% also resulted in the lowest PER of 23.0%. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:2206.09156 [pdf, other]

doi 10.3847/2041-8213/ac79b7

Sunspot shearing and sudden retraction motion associated with the 2013 August 17 M3.3 Flare

Authors: Yanjie Zhang, Zhe Xu, Qingmin Zhang, Jun Dai, Haisheng Ji

Abstract: In this Letter, we give a detailed analysis to the M3.3 class flare that occurred on August 17, 2013 (SOL2013-08-17T18:16). It presents a clear picture of mutual magnetic interaction initially from the photosphere to the corona via the abrupt rapid shearing motion of a small sunspot before the flare, and then suddenly from the corona back to the photosphere via the sudden retraction motion of the… ▽ More In this Letter, we give a detailed analysis to the M3.3 class flare that occurred on August 17, 2013 (SOL2013-08-17T18:16). It presents a clear picture of mutual magnetic interaction initially from the photosphere to the corona via the abrupt rapid shearing motion of a small sunspot before the flare, and then suddenly from the corona back to the photosphere via the sudden retraction motion of the same sunspot during the flare impulsive phase. About 10 hours before the flare, a small sunspot in the active region NOAA 11818 started to move northeast along a magnetic polarity inversion line (PIL), creating a shearing motion that changed the quasi-static state of the active region. A filament right above the PIL was activated following the movement of the sunspot and then got partially erupted. The eruption eventually led to the M3.3 flare. The sunspot was then suddenly pulled back to the opposite direction upon the flare onset. During the backward motion, the Lorentz force underwent a simultaneous impulsive change both in magnitude and direction. Its directional change is found to be conformable with the retraction motion. The observation provides direct evidence for the role of the shearing motion of the sunspot in powering and triggering the flare. It especially confirms that the abrupt motion of a sunspot during a solar flare is the result of a back reaction caused by the reconfiguration of the coronal magnetic field. △ Less

Submitted 18 June, 2022; originally announced June 2022.

arXiv:2206.07296 [pdf, other]

Enhanced Knowledge Selection for Grounded Dialogues via Document Semantic Graphs

Authors: Sha Li, Mahdi Namazifar, Di Jin, Mohit Bansal, Heng Ji, Yang Liu, Dilek Hakkani-Tur

Abstract: Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging. Existing models treat knowledge selection as a sentence ranking or classification problem where each sentence is handled individually, ignoring the internal semantic connection among sentences in the background document. In this work, we propose to automatically conve… ▽ More Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging. Existing models treat knowledge selection as a sentence ranking or classification problem where each sentence is handled individually, ignoring the internal semantic connection among sentences in the background document. In this work, we propose to automatically convert the background knowledge documents into document semantic graphs and then perform knowledge selection over such graphs. Our document semantic graphs preserve sentence-level information through the use of sentence nodes and provide concept connections between sentences. We jointly apply multi-task learning for sentence-level and concept-level knowledge selection and show that it improves sentence-level selection. Our experiments show that our semantic graph-based knowledge selection improves over sentence selection baselines for both the knowledge selection task and the end-to-end response generation task on HollE and improves generalization on unseen topics in WoW. △ Less

Submitted 30 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: NAACL 2022. Please refer to https://www.amazon.science/publications/enhanced-knowledge-selection-for-grounded-dialogues-via-document-semantic-graphs for code and resources

arXiv:2206.02921 [pdf, other]

Schema-Guided Event Graph Completion

Authors: Hongwei Wang, Zixuan Zhang, Sha Li, Jiawei Han, Yizhou Sun, Hanghang Tong, Joseph P. Olive, Heng Ji

Abstract: We tackle a new task, event graph completion, which aims to predict missing event nodes for event graphs. Existing link prediction or graph completion methods have difficulty dealing with event graphs because they are usually designed for a single large graph such as a social network or a knowledge graph, rather than multiple small dynamic event graphs. Moreover, they can only predict missing edge… ▽ More We tackle a new task, event graph completion, which aims to predict missing event nodes for event graphs. Existing link prediction or graph completion methods have difficulty dealing with event graphs because they are usually designed for a single large graph such as a social network or a knowledge graph, rather than multiple small dynamic event graphs. Moreover, they can only predict missing edges rather than missing nodes. In this work, we propose to utilize event schema, a template that describes the stereotypical structure of event graphs, to address the above issues. Our schema-guided event graph completion approach first maps an instance event graph to a subgraph of the schema graph by a heuristic subgraph matching algorithm. Then it predicts whether a candidate event node in the schema graph should be added to the instantiated schema subgraph by characterizing two types of local topology of the schema graph: neighbors of the candidate node and the subgraph, and paths that connect the candidate node and the subgraph. These two modules are later combined together for the final prediction. We also propose a self-supervised strategy to construct training samples, as well as an inference algorithm that is specifically designed to complete event graphs. Extensive experimental results on four datasets demonstrate that our proposed method achieves state-of-the-art performance, with 4.3% to 19.4% absolute F1 gains over the best baseline method on the four datasets. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2206.02712 [pdf, other]

Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

Authors: Pei Ke, Haozhe Ji, Zhenyu Yang, Yi Huang, Junlan Feng, Xiaoyan Zhu, Minlie Huang

Abstract: Despite the success of text-to-text pre-trained models in various natural language generation (NLG) tasks, the generation performance is largely restricted by the number of labeled data in downstream tasks, particularly in data-to-text generation tasks. Existing works mostly utilize abundant unlabeled structured data to conduct unsupervised pre-training for task adaption, which fail to model the c… ▽ More Despite the success of text-to-text pre-trained models in various natural language generation (NLG) tasks, the generation performance is largely restricted by the number of labeled data in downstream tasks, particularly in data-to-text generation tasks. Existing works mostly utilize abundant unlabeled structured data to conduct unsupervised pre-training for task adaption, which fail to model the complex relationship between source structured data and target texts. Thus, we introduce self-training as a better few-shot learner than task-adaptive pre-training, which explicitly captures this relationship via pseudo-labeled data generated by the pre-trained model. To alleviate the side-effect of low-quality pseudo-labeled data during self-training, we propose a novel method called Curriculum-Based Self-Training (CBST) to effectively leverage unlabeled data in a rearranged order determined by the difficulty of text generation. Experimental results show that our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted by IJCAI 2022

arXiv:2206.02082 [pdf, other]

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Authors: Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang

Abstract: Multi-channel video-language retrieval require models to understand information from different channels (e.g. video$+$question, video$+$speech) to correctly link a video with a textual response or query. Fortunately, contrastive multimodal models are shown to be highly effective at aligning entities in images/videos and text, e.g., CLIP; text contrastive models are extensively studied recently for… ▽ More Multi-channel video-language retrieval require models to understand information from different channels (e.g. video$+$question, video$+$speech) to correctly link a video with a textual response or query. Fortunately, contrastive multimodal models are shown to be highly effective at aligning entities in images/videos and text, e.g., CLIP; text contrastive models are extensively studied recently for their strong ability of producing discriminative sentence embeddings, e.g., SimCSE. However, there is not a clear way to quickly adapt these two lines to multi-channel video-language retrieval with limited data and resources. In this paper, we identify a principled model design space with two axes: how to represent videos and how to fuse video and text information. Based on categorization of recent methods, we investigate the options of representing videos using continuous feature vectors or discrete text tokens; for the fusion method, we explore the use of a multimodal transformer or a pretrained contrastive text model. We extensively evaluate the four combinations on five video-language datasets. We surprisingly find that discrete text tokens coupled with a pretrained contrastive text model yields the best performance, which can even outperform state-of-the-art on the iVQA and How2QA datasets without additional training on millions of video-text data. Further analysis shows that this is because representing videos as text tokens captures the key visual information and text tokens are naturally aligned with text models that are strong retrievers after the contrastive pretraining process. All the empirical analysis establishes a solid foundation for future research on affordable and upgradable multimodal intelligence. △ Less

Submitted 10 April, 2023; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: To appear in CVPR 2023; The code will be released at https://github.com/XudongLinthu/upgradable-multimodal-intelligence

arXiv:2205.14847 [pdf, other]

EA$^2$E: Improving Consistency with Event Awareness for Document-Level Argument Extraction

Authors: Qi Zeng, Qiusi Zhan, Heng Ji

Abstract: Events are inter-related in documents. Motivated by the one-sense-per-discourse theory, we hypothesize that a participant tends to play consistent roles across multiple events in the same document. However recent work on document-level event argument extraction models each individual event in isolation and therefore causes inconsistency among extracted arguments across events, which will further c… ▽ More Events are inter-related in documents. Motivated by the one-sense-per-discourse theory, we hypothesize that a participant tends to play consistent roles across multiple events in the same document. However recent work on document-level event argument extraction models each individual event in isolation and therefore causes inconsistency among extracted arguments across events, which will further cause discrepancy for downstream applications such as event knowledge base population, question answering, and hypothesis generation. In this work, we formulate event argument consistency as the constraints from event-event relations under the document-level setting. To improve consistency we introduce the Event-Aware Argument Extraction (EA$^2$E) model with augmented context for training and inference. Experiment results on WIKIEVENTS and ACE2005 datasets demonstrate the effectiveness of EA$^2$E compared to baseline methods. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: NAACL 2022 Findings

arXiv:2205.13294 [pdf, other]

Analytical Interpretation of Latent Codes in InfoGAN with SAR Images

Authors: Zhenpeng Feng, Milos Dakovic, Hongbing Ji, Mingzhe Zhu, Ljubisa Stankovic

Abstract: Generative Adversarial Networks (GANs) can synthesize abundant photo-realistic synthetic aperture radar (SAR) images. Some recent GANs (e.g., InfoGAN), are even able to edit specific properties of the synthesized images by introducing latent codes. It is crucial for SAR image synthesis since the targets in real SAR images are with different properties due to the imaging mechanism. Despite the succ… ▽ More Generative Adversarial Networks (GANs) can synthesize abundant photo-realistic synthetic aperture radar (SAR) images. Some recent GANs (e.g., InfoGAN), are even able to edit specific properties of the synthesized images by introducing latent codes. It is crucial for SAR image synthesis since the targets in real SAR images are with different properties due to the imaging mechanism. Despite the success of InfoGAN in manipulating properties, there still lacks a clear explanation of how these latent codes affect synthesized properties, thus editing specific properties usually relies on empirical trials, unreliable and time-consuming. In this paper, we show that latent codes are disentangled to affect the properties of SAR images in a non-linear manner. By introducing some property estimators for latent codes, we are able to provide a completely analytical nonlinear model to decompose the entangled causality between latent codes and different properties. The qualitative and quantitative experimental results further reveal that the properties can be calculated by latent codes, inversely, the satisfying latent codes can be estimated given desired properties. In this case, properties can be manipulated by latent codes as we expect. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: 13 pages, 14 figures

arXiv:2205.11602 [pdf, other]

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies

Authors: Anish Saha, Amith Ananthram, Emily Allaway, Heng Ji, Kathleen McKeown

Abstract: Practitioners from many disciplines (e.g., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. In this work, we study Seeded Hierarchical Clustering (SHC): the task of automatically fitting unlabeled data to such taxonomies using only a small set of labeled examples. We propose HierSeed, a novel weakly supervised algorithm for this task that uses only a smal… ▽ More Practitioners from many disciplines (e.g., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. In this work, we study Seeded Hierarchical Clustering (SHC): the task of automatically fitting unlabeled data to such taxonomies using only a small set of labeled examples. We propose HierSeed, a novel weakly supervised algorithm for this task that uses only a small set of labeled seed examples. It is both data and computationally efficient. HierSeed assigns documents to topics by weighing document density against topic hierarchical structure. It outperforms both unsupervised and supervised baselines for the SHC task on three real-world datasets. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.10977 [pdf, other]

What should I Ask: A Knowledge-driven Approach for Follow-up Questions Generation in Conversational Surveys

Authors: Yubin Ge, Ziang Xiao, Jana Diesner, Heng Ji, Karrie Karahalios, Hari Sundaram

Abstract: Generating follow-up questions on the fly could significantly improve conversational survey quality and user experiences by enabling a more dynamic and personalized survey structure. In this paper, we proposed a novel task for knowledge-driven follow-up question generation in conversational surveys. We constructed a new human-annotated dataset of human-written follow-up questions with dialogue his… ▽ More Generating follow-up questions on the fly could significantly improve conversational survey quality and user experiences by enabling a more dynamic and personalized survey structure. In this paper, we proposed a novel task for knowledge-driven follow-up question generation in conversational surveys. We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge in the context of conversational surveys. Along with the dataset, we designed and validated a set of reference-free Gricean-inspired evaluation metrics to systematically evaluate the quality of generated follow-up questions. We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions by using knowledge to steer the generation process. The experiments demonstrate that compared to GPT-based baseline models, our two-staged model generates more informative, coherent, and clear follow-up questions. △ Less

Submitted 13 October, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.10747 [pdf, other]

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Authors: Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

Abstract: The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction. Existing few-shot video-language learners focus exclusively on the encoder, resulting in the absence of a video-to-text decoder to handle generative tasks. Video captioners have be… ▽ More The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction. Existing few-shot video-language learners focus exclusively on the encoder, resulting in the absence of a video-to-text decoder to handle generative tasks. Video captioners have been pretrained on large-scale video-language datasets, but they rely heavily on finetuning and lack the ability to generate text for unseen tasks in a few-shot setting. We propose VidIL, a few-shot Video-language Learner via Image and Language models, which demonstrates strong performance on few-shot video-to-text tasks without the necessity of pretraining or finetuning on any video datasets. We use the image-language models to translate the video content into frame captions, object, attribute, and event phrases, and compose them into a temporal structure template. We then instruct a language model, with a prompt containing a few in-context examples, to generate a target output from the composed content. The flexibility of prompting allows the model to capture any form of text input, such as automatic speech recognition (ASR) transcripts. Our experiments demonstrate the power of language models in understanding videos on a wide variety of video-language tasks, including video captioning, video question answering, video caption retrieval, and video future event prediction. Especially, on video future event prediction, our few-shot model significantly outperforms state-of-the-art supervised models trained on large-scale video datasets. Code and resources are publicly available for research purposes at https://github.com/MikeWangWZHL/VidIL . △ Less

Submitted 13 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.07466 [pdf, other]

Robust Representation via Dynamic Feature Aggregation

Authors: Haozhe Liu, Haoqin Ji, Yuexiang Li, Nanjun He, Haoqian Wu, Feng Liu, Linlin Shen, Yefeng Zheng

Abstract: Deep convolutional neural network (CNN) based models are vulnerable to the adversarial attacks. One of the possible reasons is that the embedding space of CNN based model is sparse, resulting in a large space for the generation of adversarial samples. In this study, we propose a method, denoted as Dynamic Feature Aggregation, to compress the embedding space with a novel regularization. Particularl… ▽ More Deep convolutional neural network (CNN) based models are vulnerable to the adversarial attacks. One of the possible reasons is that the embedding space of CNN based model is sparse, resulting in a large space for the generation of adversarial samples. In this study, we propose a method, denoted as Dynamic Feature Aggregation, to compress the embedding space with a novel regularization. Particularly, the convex combination between two samples are regarded as the pivot for aggregation. In the embedding space, the selected samples are guided to be similar to the representation of the pivot. On the other side, to mitigate the trivial solution of such regularization, the last fully-connected layer of the model is replaced by an orthogonal classifier, in which the embedding codes for different classes are processed orthogonally and separately. With the regularization and orthogonal classifier, a more compact embedding space can be obtained, which accordingly improves the model robustness against adversarial attacks. An averaging accuracy of 56.91% is achieved by our method on CIFAR-10 against various attack methods, which significantly surpasses a solid baseline (Mixup) by a margin of 37.31%. More surprisingly, empirical results show that, the proposed method can also achieve the state-of-the-art performance for out-of-distribution (OOD) detection, due to the learned compact feature space. An F1 score of 0.937 is achieved by the proposed method, when adopting CIFAR-10 as in-distribution (ID) dataset and LSUN as OOD dataset. Code is available at https://github.com/HaozheLiu-ST/DynamicFeatureAggregation. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2205.00463 [pdf, other]

doi 10.1088/1361-6420/ac8ac6

A Dataset-free Deep learning Method for Low-Dose CT Image Reconstruction

Authors: Qiaoqiao Ding, Hui Ji, Yuhui Quan, Xiaoqun Zhang

Abstract: Low-dose CT (LDCT) imaging attracted a considerable interest for the reduction of the object's exposure to X-ray radiation. In recent years, supervised deep learning (DL) has been extensively studied for LDCT image reconstruction, which trains a network over a dataset containing many pairs of normal-dose and low-dose images. However, the challenge on collecting many such pairs in the clinical setu… ▽ More Low-dose CT (LDCT) imaging attracted a considerable interest for the reduction of the object's exposure to X-ray radiation. In recent years, supervised deep learning (DL) has been extensively studied for LDCT image reconstruction, which trains a network over a dataset containing many pairs of normal-dose and low-dose images. However, the challenge on collecting many such pairs in the clinical setup limits the application of such supervised-learning-based methods for LDCT image reconstruction in practice. Aiming at addressing the challenges raised by the collection of training dataset, this paper proposed a unsupervised deep learning method for LDCT image reconstruction, which does not require any external training data. The proposed method is built on a re-parametrization technique for Bayesian inference via deep network with random weights, combined with additional total variational~(TV) regularization. The experiments show that the proposed method noticeably outperforms existing dataset-free image reconstruction methods on the test data. △ Less

Submitted 5 October, 2022; v1 submitted 1 May, 2022; originally announced May 2022.

arXiv:2204.11817 [pdf, other]

Translation between Molecules and Natural Language

Authors: Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, Heng Ji

Abstract: We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), wh… ▽ More We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality. △ Less

Submitted 3 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted at EMNLP 2022. Data and code can be found on [Github](https://github.com/blender-nlp/MolT5)

arXiv:2204.11373 [pdf, other]

doi 10.1145/3477495.3531878

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval

Authors: Revanth Gangi Reddy, Md Arafat Sultan, Martin Franz, Avirup Sil, Heng Ji

Abstract: We show that supervised neural information retrieval (IR) models are prone to learning sparse attention patterns over passage tokens, which can result in key phrases including named entities receiving low attention weights, eventually leading to model under-performance. Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation ep… ▽ More We show that supervised neural information retrieval (IR) models are prone to learning sparse attention patterns over passage tokens, which can result in key phrases including named entities receiving low attention weights, eventually leading to model under-performance. Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation episodes on those, we teach neural IR to attend more uniformly and robustly to all entities in a given passage. On two public IR benchmarks, we empirically show that the proposed method helps improve both the model's attention patterns and retrieval performance, including in zero-shot settings. △ Less

Submitted 24 April, 2022; originally announced April 2022.

Comments: Published at SIGIR 2022

arXiv:2204.10502 [pdf, other]

LiDetector: License Incompatibility Detection for Open Source Software

Authors: Sihan Xu, Ya Gao, Lingling Fan, Zheli Liu, Yang Liu, Hua Ji

Abstract: Open-source software (OSS) licenses dictate the conditions which should be followed to reuse, distribute, and modify the software. Apart from widely-used licenses such as the MIT License, developers are also allowed to customize their own licenses (called custom licenses), whose descriptions are more flexible. The presence of such various licenses imposes challenges to understanding licenses and t… ▽ More Open-source software (OSS) licenses dictate the conditions which should be followed to reuse, distribute, and modify the software. Apart from widely-used licenses such as the MIT License, developers are also allowed to customize their own licenses (called custom licenses), whose descriptions are more flexible. The presence of such various licenses imposes challenges to understanding licenses and their compatibility. To avoid financial and legal risks, it is essential to ensure license compatibility when integrating third-party packages or reusing code accompanied with licenses. In this work, we propose LiDetector, an effective tool that extracts and interprets OSS licenses (including both official licenses and custom licenses), and detects license incompatibility among these licenses. Specifically, LiDetector introduces a learning-based method to automatically identify meaningful license terms from an arbitrary license and employs Probabilistic Context-Free Grammar (PCFG) to infer rights and obligations for incompatibility detection. Experiments demonstrate that LiDetector outperforms existing methods with 93.28% precision for term identification, and 91.09% accuracy for right and obligation inference, and can effectively detect incompatibility with a 10.06% FP rate and 2.56% FN rate. Furthermore, with LiDetector, our large-scale empirical study on 1,846 projects reveals that 72.91% of the projects are suffering from license incompatibility, including popular ones such as the MIT License and the Apache License. We highlighted lessons learned from the perspectives of different stakeholders and made all related data and the replication package publicly available to facilitate follow-up research. △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2204.09573 [pdf]

doi 10.1016/j.media.2023.102833

Fetal Brain Tissue Annotation and Segmentation Challenge Results

Authors: Kelly Payette, Hongwei Li, Priscille de Dumast, Roxane Licandro, Hui Ji, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Hao Liu, Yuchen Pei, Lisheng Wang, Ying Peng, Juanying Xie, Huiquan Zhang, Guiming Dong, Hao Fu, Guotai Wang, ZunHyan Rieu, Donghyeon Kim, Hyun Gi Kim, Davood Karimi, Ali Gholipour, Helena R. Torres, Bruno Oliveira, João L. Vilaça , et al. (33 additional authors not shown)

Abstract: In-utero fetal MRI is emerging as an important tool in the diagnosis and analysis of the developing human brain. Automatic segmentation of the developing fetal brain is a vital step in the quantitative analysis of prenatal neurodevelopment both in the research and clinical context. However, manual segmentation of cerebral structures is time-consuming and prone to error and inter-observer variabili… ▽ More In-utero fetal MRI is emerging as an important tool in the diagnosis and analysis of the developing human brain. Automatic segmentation of the developing fetal brain is a vital step in the quantitative analysis of prenatal neurodevelopment both in the research and clinical context. However, manual segmentation of cerebral structures is time-consuming and prone to error and inter-observer variability. Therefore, we organized the Fetal Tissue Annotation (FeTA) Challenge in 2021 in order to encourage the development of automatic segmentation algorithms on an international level. The challenge utilized FeTA Dataset, an open dataset of fetal brain MRI reconstructions segmented into seven different tissues (external cerebrospinal fluid, grey matter, white matter, ventricles, cerebellum, brainstem, deep grey matter). 20 international teams participated in this challenge, submitting a total of 21 algorithms for evaluation. In this paper, we provide a detailed analysis of the results from both a technical and clinical perspective. All participants relied on deep learning methods, mainly U-Nets, with some variability present in the network architecture, optimization, and image pre- and post-processing. The majority of teams used existing medical imaging deep learning frameworks. The main differences between the submissions were the fine tuning done during training, and the specific pre- and post-processing steps performed. The challenge results showed that almost all submissions performed similarly. Four of the top five teams used ensemble learning methods. However, one team's algorithm performed significantly superior to the other submissions, and consisted of an asymmetrical U-Net network architecture. This paper provides a first of its kind benchmark for future automatic multi-tissue segmentation algorithms for the developing human brain in utero. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: Results from FeTA Challenge 2021, held at MICCAI; Manuscript submitted

arXiv:2204.07700 [pdf, other]

doi 10.1063/5.0089459

Two-dimensional plasma density evolution local to the inversion layer during sawtooth crash events using Beam Emission Spectroscopy

Authors: Sayak Bose, William Fox, Dingyun Liu, Zheng Yan, George McKee, Aaron Goodman, Hantao Ji

Abstract: We present methods for analyzing Beam Emission Spectroscopy (BES) data to obtain the plasma density evolution associated with rapid sawtooth crash events at the DIII-D tokamak. BES allows coverage over a 2-D spatial plane, inherently local measurements, with fast time responses, and therefore provides a valuable new channel for data during sawtooth events. A method is developed to remove sawtooth-… ▽ More We present methods for analyzing Beam Emission Spectroscopy (BES) data to obtain the plasma density evolution associated with rapid sawtooth crash events at the DIII-D tokamak. BES allows coverage over a 2-D spatial plane, inherently local measurements, with fast time responses, and therefore provides a valuable new channel for data during sawtooth events. A method is developed to remove sawtooth-induced edge-light pulses contained in the BES data. The edge light pulses appear to be from the $\rm{D}_α$ emission produced by edge recycling during sawtooth events, and are large enough that traditional spectroscopic filtering and data analysis techniques are insufficient to deduce physically meaningful quantities. A cross-calibration of 64 BES channels is performed using a novel method to ensure accurate measurements. For the large-amplitude density oscillations observed, we discuss and use the non-linear relationship between BES signal $δI/I_{0}$ and plasma density variation $δn_{e}/n_{e0}$. 2-D BES images cover a 8~cm~$\times$~20~cm region around the sawtooth inversion layer and show large-amplitude density oscillations, with additional significant spatial variations across the inversion layer, which grows and peaks near the time of the temperature crash. The edge light removal technique and method of converting large-amplitude $δI/I_{0}$ to $δn_{e}/n_{e0}$ presented here may help analyze other impulsive MHD phenomena in tokamaks. △ Less

Submitted 15 April, 2022; originally announced April 2022.

arXiv:2204.07341 [pdf, other]

LaMemo: Language Modeling with Look-Ahead Memory

Authors: Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, Minlie Huang

Abstract: Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a… ▽ More Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory. △ Less

Submitted 26 April, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: Accepted by NAACL 2022

arXiv:2204.00961 [pdf]

doi 10.1177/20552076241233247

Enhancing Digital Health Services: A Machine Learning Approach to Personalized Exercise Goal Setting

Authors: Ji Fang, Vincent CS Lee, Hao Ji, Haiyan Wang

Abstract: The utilization of digital health has increased recently, and these services provide extensive guidance to encourage users to exercise frequently by setting daily exercise goals to promote a healthy lifestyle. These comprehensive guides evolved from the consideration of various personalized behavioral factors. Nevertheless, existing approaches frequently neglect the users dynamic behavior and the… ▽ More The utilization of digital health has increased recently, and these services provide extensive guidance to encourage users to exercise frequently by setting daily exercise goals to promote a healthy lifestyle. These comprehensive guides evolved from the consideration of various personalized behavioral factors. Nevertheless, existing approaches frequently neglect the users dynamic behavior and the changing in their health conditions. This study aims to fill this gap by developing a machine learning algorithm that dynamically updates auto-suggestion exercise goals using retrospective data and realistic behavior trajectory. We conducted a methodological study by designing a deep reinforcement learning algorithm to evaluate exercise performance, considering fitness-fatigue effects. The deep reinforcement learning algorithm combines deep learning techniques to analyse time series data and infer user exercise behavior. In addition, we use the asynchronous advantage actor-critic algorithm for reinforcement learning to determine the optimal exercise intensity through exploration and exploitation. The personalized exercise data and biometric data used in this study were collected from publicly available datasets, encompassing walking, sports logs, and running. In our study, we conducted The statistical analyses/inferential tests to compare the effectiveness of machine learning approach in exercise goal setting across different exercise goal setting strategies. △ Less

Submitted 4 March, 2024; v1 submitted 2 April, 2022; originally announced April 2022.

arXiv:2203.12819 [pdf, ps, other]

doi 10.3847/1538-4365/ac5f4c

Statistical analysis of circular-ribbon flares

Authors: Yanjie Zhang, Qingmin Zhang, Dechao Song, Shuting Li, Jun Dai, Zhe Xu, Haisheng Ji

Abstract: Circular-ribbon flares (CFs) are a special type of solar flares owing to their particular magnetic topology. In this paper, we conducted a comprehensive statistical analysis of 134 CFs from 2011 September to 2017 June, including four B-class, 82 C-class, 40 M-class, and eight X-class flares, respectively. The flares were observed by the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamic… ▽ More Circular-ribbon flares (CFs) are a special type of solar flares owing to their particular magnetic topology. In this paper, we conducted a comprehensive statistical analysis of 134 CFs from 2011 September to 2017 June, including four B-class, 82 C-class, 40 M-class, and eight X-class flares, respectively. The flares were observed by the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory (SDO) spacecraft. The physical properties of CFs are derived, including the location, area ($A_{CF}$), equivalent radius ($r_{CF}$) assuming a semi-spherical fan dome, lifetime ($τ_{CF}$), and peak SXR flux in 1$-$8 Å. It is found that all CFs are located in active regions, with the latitudes between -30$^\circ$ and 30$^\circ$. The distributions of areas and lifetimes could be fitted with a log-normal function. There is a positive correlation between the lifetime and area. The peak SXR flux in 1$-$8 Å is well in accord with a power-law distribution with an index of $-$1.42. For the 134 CFs, 57\% of them are accompanied by remote brightenings or ribbons. A positive correlation exists between the total length ($L_{RB}$) and average distance ($D_{RB}$) of remote brightenings. About 47\% and 51\% of the 134 CFs are related to type III radio bursts and jets, respectively. The association rates are independent of flare energies. About 38\% of CFs are related to mini-filament eruptions, and the association rates increase with flare classes. Only 28\% of CFs are related to CMEs, meaning that a majority of them are confined rather than eruptive events. There is a positive correlation between the CME speed and peak SXR flux in 1$-$8 Å, and faster CMEs tend to be wider. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: 17 pages, 22 figures, accepted for publication in The Astrophysical Journal Supplement Series, comments are welcome

arXiv:2203.06879 [pdf, ps, other]

doi 10.1007/s11433-022-2017-6

Planckian Dissipation and non-Ginzburg-Landau Type Upper Critical Field in Bi2201

Authors: Qihao Zang, Zhengyan Zhu, Zuyu Xu, Shichao Qi, Haoran Ji, Yiwen Li, Jian Wang, Huiqian Luo, Hua-Bing Wang, Hai-Hu Wen

Abstract: Resistivity and Hall effect measurements have been carried out on a micro-fabricated bridge of Bi2201 single crystal at low temperatures down to 0.4 K under high magnetic fields. When superconductivity is crashed by a high magnetic field, the recovered "normal state" resistivity still shows a linear temperature dependence in low temperature region. Combining with the effective mass and the charge… ▽ More Resistivity and Hall effect measurements have been carried out on a micro-fabricated bridge of Bi2201 single crystal at low temperatures down to 0.4 K under high magnetic fields. When superconductivity is crashed by a high magnetic field, the recovered "normal state" resistivity still shows a linear temperature dependence in low temperature region. Combining with the effective mass and the charge carrier density, we get a linear scattering rate $1/τ= αk_{B} T/\hbar$ with $0.77<α<1.16$, which gives a strong evidence of the Planckian dissipation. Furthermore, our results reveal a new type of temperature dependence of upper critical field, $H_{c2}(T)=H^*\sqrt{(1-t)/(t+0.154)}$, which is totally different from the expectation of the Ginzburg-Landau theory, and suggests uncondensed Cooper pairs above $H_{c2}(T)$ line. △ Less

Submitted 22 February, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 8 pages, 4 figures

Journal ref: Sci. China-Phys. Mech. Astron. 66, 237412 (2023)

arXiv:2203.05967 [pdf, other]

A Weibo Dataset for the 2022 Russo-Ukrainian Crisis

Authors: Yi R. Fung, Heng Ji

Abstract: Online social networks such as Twitter and Weibo play an important role in how people stay informed and exchange reactions. Each crisis encompasses a new opportunity to study the portability of models for various tasks (e.g., information extraction, complex event understanding, misinformation detection, etc.), due to differences in domain, entities, and event types. We present the Russia-Ukraine C… ▽ More Online social networks such as Twitter and Weibo play an important role in how people stay informed and exchange reactions. Each crisis encompasses a new opportunity to study the portability of models for various tasks (e.g., information extraction, complex event understanding, misinformation detection, etc.), due to differences in domain, entities, and event types. We present the Russia-Ukraine Crisis Weibo (RUW) dataset, with over 3.5M user posts and comments in the first release. Our data is available at https://github.com/yrf1/RussiaUkraine_weibo_dataset. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: Russia-Ukraine Crisis, Weibo Dataset

Showing 201–250 of 572 results for author: Ji, H