Search | arXiv e-print repository

Low-latency Federated Learning with DNN Partition in Distributed Industrial IoT Networks

Authors: Xiumei Deng, Jun Li, Chuan Ma, Kang Wei, Long Shi, Ming Ding, Wen Chen

Abstract: Federated Learning (FL) empowers Industrial Internet of Things (IIoT) with distributed intelligence of industrial automation thanks to its capability of distributed machine learning without any raw data exchange. However, it is rather challenging for lightweight IIoT devices to perform computation-intensive local model training over large-scale deep neural networks (DNNs). Driven by this issue, we… ▽ More Federated Learning (FL) empowers Industrial Internet of Things (IIoT) with distributed intelligence of industrial automation thanks to its capability of distributed machine learning without any raw data exchange. However, it is rather challenging for lightweight IIoT devices to perform computation-intensive local model training over large-scale deep neural networks (DNNs). Driven by this issue, we develop a communication-computation efficient FL framework for resource-limited IIoT networks that integrates DNN partition technique into the standard FL mechanism, wherein IIoT devices perform local model training over the bottom layers of the objective DNN, and offload the top layers to the edge gateway side. Considering imbalanced data distribution, we derive the device-specific participation rate to involve the devices with better data distribution in more communication rounds. Upon deriving the device-specific participation rate, we propose to minimize the training delay under the constraints of device-specific participation rate, energy consumption and memory usage. To this end, we formulate a joint optimization problem of device scheduling and resource allocation (i.e. DNN partition point, channel assignment, transmit power, and computation frequency), and solve the long-term min-max mixed integer non-linear programming based on the Lyapunov technique. In particular, the proposed dynamic device scheduling and resource allocation (DDSRA) algorithm can achieve a trade-off to balance the training delay minimization and FL performance. We also provide the FL convergence bound for the DDSRA algorithm with both convex and non-convex settings. Experimental results demonstrate the derived device-specific participation rate in terms of feasibility, and show that the DDSRA algorithm outperforms baselines in terms of test accuracy and convergence time. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.11740 [pdf, other]

Discrimination of Chiral Molecules through Holonomic Quantum Coherent Control

Authors: Teng Liu, Fa Zhao, Pengfei Lu, Qifeng Lao, Min Ding, Ji Bian, Feng Zhu, Le Luo

Abstract: A novel optical method for distinguishing chiral molecules is proposed and validated within a quantum simulator employing a trapped-ion qudit. This approach correlates the sign disparity of the dipole moment of chiral molecules with distinct cyclic evolution trajectories, yielding the unity population contrast induced by the different non-Abelian holonomies corresponding to the chirality. Harnessi… ▽ More A novel optical method for distinguishing chiral molecules is proposed and validated within a quantum simulator employing a trapped-ion qudit. This approach correlates the sign disparity of the dipole moment of chiral molecules with distinct cyclic evolution trajectories, yielding the unity population contrast induced by the different non-Abelian holonomies corresponding to the chirality. Harnessing the principles of holonomic quantum computation (HQC), our method achieves highly efficient, non-adiabatic, and robust detection and separation of chiral molecules. Demonstrated in a trapped ion quantum simulator, this scheme achieves nearly 100% contrast between the two enantiomers in the population of a specific state, showcasing its resilience to the noise inherent in the driving field. △ Less

Submitted 8 March, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.11049 [pdf, other]

How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers

Authors: Guangsheng Zhang, Bo Liu, Huan Tian, Tianqing Zhu, Ming Ding, Wanlei Zhou

Abstract: As a booming research area in the past decade, deep learning technologies have been driven by big data collected and processed on an unprecedented scale. However, privacy concerns arise due to the potential leakage of sensitive information from the training data. Recent research has revealed that deep learning models are vulnerable to various privacy attacks, including membership inference attacks… ▽ More As a booming research area in the past decade, deep learning technologies have been driven by big data collected and processed on an unprecedented scale. However, privacy concerns arise due to the potential leakage of sensitive information from the training data. Recent research has revealed that deep learning models are vulnerable to various privacy attacks, including membership inference attacks, attribute inference attacks, and gradient inversion attacks. Notably, the efficacy of these attacks varies from model to model. In this paper, we answer a fundamental question: Does model architecture affect model privacy? By investigating representative model architectures from convolutional neural networks (CNNs) to Transformers, we demonstrate that Transformers generally exhibit higher vulnerability to privacy attacks than CNNs. Additionally, we identify the micro design of activation layers, stem layers, and LN layers, as major factors contributing to the resilience of CNNs against privacy attacks, while the presence of attention modules is another main factor that exacerbates the privacy vulnerability of Transformers. Our discovery reveals valuable insights for deep learning models to defend against privacy attacks and inspires the research community to develop privacy-friendly model architectures. △ Less

Submitted 2 February, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: To appear in USENIX Security 2024

arXiv:2210.04461 [pdf, other]

doi 10.1051/0004-6361/202244615

Evaluation of different recipes for chromospheric radiative losses in solar flares

Authors: J. Tian, J. Hong, Y. Li, M. D. Ding

Abstract: Context. Radiative losses are an indispensable part in the numerical simulation of flares. Detailed calculations could be computationally expensive, especially in the chromosphere. There have been some approximate recipes for chromospheric radiative losses in flares, yet their feasibility in flare simulations needs further evaluation. Aims. We aim to evaluate the performance of different recipes… ▽ More Context. Radiative losses are an indispensable part in the numerical simulation of flares. Detailed calculations could be computationally expensive, especially in the chromosphere. There have been some approximate recipes for chromospheric radiative losses in flares, yet their feasibility in flare simulations needs further evaluation. Aims. We aim to evaluate the performance of different recipes for chromospheric radiative losses in flare simulations. Methods. We compare the atmospheric structure and line profiles in beam-heated flares calculated with detailed radiative losses and the approximate recipes. Results. Both GF90 and HCD22 recipes provide acceptable total radiative losses compared with detailed one, but there are discrepancies in the different atmospheric layers during the different evolutionary phases, which leads to misestimations of temperature and line intensity. The recipe of GF90 overestimates the coolings in the upper chromosphere greatly when temperature exceeds 10^5 K, which also affects the flare evolution and line asymmetries. Radiative heating in the middle chromosphere only functions in the initial stage and could be safely neglected. However, radiative heating from Lyman continuum could dominate near the transition region. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 16 pages, 11 figures, 3 tables. A&A accepted

Journal ref: A&A 668, A96 (2022)

arXiv:2210.02414 [pdf, other]

GLM-130B: An Open Bilingual Pre-trained Model

Authors: Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang

Abstract: We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and… ▽ More We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language model -- across related benchmarks. Finally, we leverage a unique scaling property of GLM-130B to reach INT4 quantization without post training, with almost no performance loss, making it the first among 100B-scale models and more importantly, allowing its effective inference on 4$\times$RTX 3090 (24G) or 8$\times$RTX 2080 Ti (11G) GPUs, the most affordable GPUs required for using 100B-scale models. The GLM-130B model weights are publicly accessible and its code, training logs, related toolkit, and lessons learned are open-sourced at \url{https://github.com/THUDM/GLM-130B/}. △ Less

Submitted 25 October, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted to ICLR 2023

arXiv:2210.01063

On Stability and Generalization of Bilevel Optimization Problem

Authors: Meng Ding, Mingxi Lei, Yunwen Lei, Di Wang, Jinhui Xu

Abstract: (Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization… ▽ More (Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization behaviors. In this paper, we conduct a thorough analysis on the generalization of first-order (gradient-based) methods for the bilevel optimization problem. We first establish a fundamental connection between algorithmic stability and generalization error in different forms and give a high probability generalization bound which improves the previous best one from $\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then provide the first stability bounds for the general case where both inner and outer level parameters are subject to continuous update, while existing work allows only the outer level parameter to be updated. Our analysis can be applied in various standard settings such as strongly-convex-strongly-convex (SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for the NC-NC setting can also be extended to a particular nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in practice. Finally, we corroborate our theoretical analysis and demonstrate how iterations can affect the generalization error by experiments on meta-learning and hyper-parameter optimization. △ Less

Submitted 15 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: This paper currently contains unresolved technical flaws that have the potential to mislead readers. However, we are committed to addressing these issues and improving the quality of the paper in the future

arXiv:2209.14637 [pdf, other]

Tensor-Based Sketching Method for the Low-Rank Approximation of Data Streams

Authors: Cuiyu Liu, Chuanfu Xiao, Mingshuo Ding, Chao Yang

Abstract: Low-rank approximation in data streams is a fundamental and significant task in computing science, machine learning and statistics. Multiple streaming algorithms have emerged over years and most of them are inspired by randomized algorithms, more specifically, sketching methods. However, many algorithms are not able to leverage information of data streams and consequently suffer from low accuracy.… ▽ More Low-rank approximation in data streams is a fundamental and significant task in computing science, machine learning and statistics. Multiple streaming algorithms have emerged over years and most of them are inspired by randomized algorithms, more specifically, sketching methods. However, many algorithms are not able to leverage information of data streams and consequently suffer from low accuracy. Existing data-driven methods improve accuracy but the training cost is expensive in practice. In this paper, from a subspace perspective, we propose a tensor-based sketching method for low-rank approximation of data streams. The proposed algorithm fully exploits the structure of data streams and obtains quasi-optimal sketching matrices by performing tensor decomposition on training data. A series of experiments are carried out and show that the proposed tensor-based method can be more accurate and much faster than the previous work. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: 12 pages, 2 figures

arXiv:2209.14495 [pdf, other]

doi 10.1051/0004-6361/202244427

Statistical analysis of the Si I 6560.58 Å line observed by CHASE

Authors: Jie Hong, Ye Qiu, Qi Hao, Zhi Xu, Chuan Li, Mingde Ding, Cheng Fang

Abstract: The Si I 6560.58 Å line in the H$α$ blue wing is blended with a telluric absorption line from water vapor in ground-based observations. Recent observations with the space-based telescope CHASE provide a new window to study this line. We aim to study the Si I line statistically and to explore possible diagnostics. We select three scannings in the CHASE observations, and measure the equivalent width… ▽ More The Si I 6560.58 Å line in the H$α$ blue wing is blended with a telluric absorption line from water vapor in ground-based observations. Recent observations with the space-based telescope CHASE provide a new window to study this line. We aim to study the Si I line statistically and to explore possible diagnostics. We select three scannings in the CHASE observations, and measure the equivalent width (EW) and the full width at half maximum (FWHM) for each pixel on the solar disk. We then calculate the theoretical EW and FWHM from the VALC model. An active region is also studied in particular for difference in the quiet Sun and the sunspots. The Si I line is formed at the bottom of the photosphere. The EW of this line increases from the disk center to $μ$ = 0.2, and then decreases toward the solar limb, while the FWHM shows a monotonically increasing trend. Theoretically predicted EW agrees well with observations, while the predicted FWHM is far smaller due to the absence of unresolved turbulence in models. The macroturbulent velocity is estimated to be 2.80 km s$^{-1}$ at the disk center, and increases to 3.52 km s$^{-1}$ at $μ$ = 0.2. We do not find any response to flare heating in current observations. Doppler shifts and line widths of the Si I 6560.58 Å and Fe I 6569.21 Å lines can be used to study the mass flows and turbulence of the different photospheric layers. The Si I line has good potentials to diagnose the dynamics and energy transport in the photosphere. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 7 pages, 10 figures, 2 tables. Accepted for publication in A&A. CHASE data are available at https://ssdc.nju.edu.cn

Journal ref: A&A 668, A9 (2022)

arXiv:2209.12068 [pdf, other]

doi 10.1109/LRA.2023.3293308

NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

Authors: Jiankai Sun, Yan Xu, Mingyu Ding, Hongwei Yi, Chen Wang, Jingdong Wang, Liangjun Zhang, Mac Schwager

Abstract: Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input a… ▽ More Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output. Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with conventional RGB(-D) based methods that take rendered RGB images and depths from NeRFs as inputs. Our method is better than the baselines. △ Less

Submitted 15 July, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

Journal ref: IEEE Robotics and Automation Letters ( Volume: 8, Issue: 8, August 2023)

arXiv:2209.11388 [pdf, other]

LGDN: Language-Guided Denoising Network for Video-Language Modeling

Authors: Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu

Abstract: Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video… ▽ More Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e.g., scenery shot, transition or teaser). Although a number of recent works deploy attention mechanism to alleviate this problem, the irrelevant/noisy information still makes it very difficult to address. To overcome such challenge, we thus propose an efficient and effective model, termed Language-Guided Denoising Network (LGDN), for video-language modeling. Different from most existing methods that utilize all extracted video frames, LGDN dynamically filters out the misaligned or redundant frames under the language supervision and obtains only 2--4 salient frames per video for cross-modal token-level alignment. Extensive experiments on five public datasets show that our LGDN outperforms the state-of-the-arts by large margins. We also provide detailed ablation study to reveal the critical importance of solving the noise issue, in hope of inspiring future video-language work. △ Less

Submitted 5 December, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: Accepted by NeurIPS2022

arXiv:2209.10382 [pdf, other]

Robust Information Bottleneck for Task-Oriented Communication with Digital Modulation

Authors: Songjie Xie, Shuai Ma, Ming Ding, Yuanming Shi, Mingjian Tang, Youlong Wu

Abstract: Task-oriented communications, mostly using learning-based joint source-channel coding (JSCC), aim to design a communication-efficient edge inference system by transmitting task-relevant information to the receiver. However, only transmitting task-relevant information without introducing any redundancy may cause robustness issues in learning due to the channel variations, and the JSCC which directl… ▽ More Task-oriented communications, mostly using learning-based joint source-channel coding (JSCC), aim to design a communication-efficient edge inference system by transmitting task-relevant information to the receiver. However, only transmitting task-relevant information without introducing any redundancy may cause robustness issues in learning due to the channel variations, and the JSCC which directly maps the source data into continuous channel input symbols poses compatibility issues on existing digital communication systems. In this paper, we address these two issues by first investigating the inherent tradeoff between the informativeness of the encoded representations and the robustness to information distortion in the received representations, and then propose a task-oriented communication scheme with digital modulation, named discrete task-oriented JSCC (DT-JSCC), where the transmitter encodes the features into a discrete representation and transmits it to the receiver with the digital modulation scheme. In the DT-JSCC scheme, we develop a robust encoding framework, named robust information bottleneck (RIB), to improve the communication robustness to the channel variations, and derive a tractable variational upper bound of the RIB objective function using the variational approximation to overcome the computational intractability of mutual information. The experimental results demonstrate that the proposed DT-JSCC achieves better inference performance than the baseline methods with low communication latency, and exhibits robustness to channel variations due to the applied RIB framework. △ Less

Submitted 9 May, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

arXiv:2209.09958 [pdf, other]

Systematization of Knowledge: Synthetic Assets, Derivatives, and On-Chain Portfolio Management

Authors: Abrar Rahman, Victor Shi, Matthew Ding, Elliot Choi

Abstract: Synthetic assets are decentralized finance (DeFi) analogues of derivatives in the traditional finance (TradFi) world - financial arrangements which derive value from and are directly pegged to fluctuations in the value of an underlying asset (ex: futures and options). Synthetic assets occupy a unique niche, serving to facilitate currency exchange, giving traders a means to speculate on the value o… ▽ More Synthetic assets are decentralized finance (DeFi) analogues of derivatives in the traditional finance (TradFi) world - financial arrangements which derive value from and are directly pegged to fluctuations in the value of an underlying asset (ex: futures and options). Synthetic assets occupy a unique niche, serving to facilitate currency exchange, giving traders a means to speculate on the value of crypto assets without directly holding them, and powering more complex financial tools such as yield optimizers and portfolio management suites. Unfortunately, the academic literature on this topic is highly disparate and struggles to keep up with rapid changes in the space. We present the first Systematization of Knowledge (SoK) in this area, focusing on presenting the key mechanisms, protocols, and issues in an accessible fashion to highlight risks for participants as well as areas of research interest. This paper takes a broad perspective in establishing a general framework for synthetic assets, from the ideological origins of crypto to legal barriers for firms in this space, encapsulating the basic mechanisms underpinning derivatives markets as well as presenting data-driven analyses of major protocols. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 45 pages, 4 figures

arXiv:2209.06508 [pdf, other]

doi 10.1051/0004-6361/202244275

Overexpansion-dominated Coronal Mass Ejection Formation and Induced Radio Bursts

Authors: B. T. Wang, X. Cheng, H. Q. Song, M. D. Ding

Abstract: Aims. Coronal Mass Ejections (CMEs) are the most fascinating explosion in the solar system; however, their formation is still not fully understood. Methods. Here, we investigate a well-observed CME on 2021 May 07 that showed a typical three-component structure and was continuously observed from 0 to 3 Rsun by a combination of SDO/AIA (0--1.3 Rsun), PROBA2/SWAP (0--1.7 Rsun) and MLSO/K-Cor (1.05--3… ▽ More Aims. Coronal Mass Ejections (CMEs) are the most fascinating explosion in the solar system; however, their formation is still not fully understood. Methods. Here, we investigate a well-observed CME on 2021 May 07 that showed a typical three-component structure and was continuously observed from 0 to 3 Rsun by a combination of SDO/AIA (0--1.3 Rsun), PROBA2/SWAP (0--1.7 Rsun) and MLSO/K-Cor (1.05--3 Rsun). Furthermore, we compare the morphological discrepancy between the CME white-light bright core and EUV blob. In the end, we explore the origin of various radio bursts closely related to the interaction of the CME overexpansion with nearby streamer. Results. An interesting finding is that the height increases of both the CME leading front and bright core are dominated by the overexpansion during the CME formation. The aspect ratios of the CME bubble and bright core, quantifying the overexpansion, are found to decrease as the SO/STIX 4--10 keV and GOES 1--8 A soft X-ray flux of the associated flare increases near the peaks, indicating an important role of the flare reconnection in the first overexpansion. The CME bubble even takes place a second overexpansion although relatively weak, which is closely related to the compression with a nearby streamer and likely arises from an ideal MHD process. Moreover, the CME EUV blob is found to be relatively lower and wider than the CME white-light bright core, may correspond to the bottom part of the growing CME flux rope. The interaction between the CME and the streamer leads to two type II radio bursts, one normally drifting and one stationary, which are speculated to be induced at two different sources of the CME-driven shock front. The bidirectional electrons evidenced by series of "C-shaped" type III bursts suggest that the interchange reconnection be also involved during the interaction of the CME and streamer. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Journal ref: A&A 666, A166 (2022)

arXiv:2208.08263 [pdf, other]

Multimodal foundation models are better simulators of the human brain

Authors: Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen

Abstract: Multimodal learning, especially large-scale multimodal pre-training, has developed rapidly over the past few years and led to the greatest advances in artificial intelligence (AI). Despite its effectiveness, understanding the underlying mechanism of multimodal pre-training models still remains a grand challenge. Revealing the explainability of such models is likely to enable breakthroughs of novel… ▽ More Multimodal learning, especially large-scale multimodal pre-training, has developed rapidly over the past few years and led to the greatest advances in artificial intelligence (AI). Despite its effectiveness, understanding the underlying mechanism of multimodal pre-training models still remains a grand challenge. Revealing the explainability of such models is likely to enable breakthroughs of novel learning paradigms in the AI field. To this end, given the multimodal nature of the human brain, we propose to explore the explainability of multimodal learning models with the aid of non-invasive brain imaging technologies such as functional magnetic resonance imaging (fMRI). Concretely, we first present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs, which has shown strong multimodal understanding and generalization abilities in a variety of cognitive downstream tasks. Further, from the perspective of neural encoding (based on our foundation model), we find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones. Particularly, we identify a number of brain regions where multimodally-trained encoders demonstrate better neural encoding performance. This is consistent with the findings in existing studies on exploring brain multi-sensory integration. Therefore, we believe that multimodal foundation models are more suitable tools for neuroscientists to study the multimodal signal processing mechanisms in the human brain. Our findings also demonstrate the potential of multimodal foundation models as ideal computational simulators to promote both AI-for-brain and brain-for-AI research. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.06182 [pdf, other]

doi 10.3847/1538-4357/ac897c

The Lyman-$α$ Emission in a C1.4 Solar Flare Observed by the Extreme Ultraviolet Imager aboard Solar Orbiter

Authors: Ying Li, Qiao Li, De-Chao Song, Andrea Francesco Battaglia, Hualin Xiao, Säm Krucker, Udo Schühle, Hui Li, Weiqun Gan, M. D. Ding

Abstract: The hydrogen Lyman-$α$ (H {\sc i} Ly$α$) emission during solar flares has rarely been studied in spatially resolved images and its physical origin has not been fully understood. In this paper, we present novel Ly$α$ images for a C1.4 solar flare (SOL2021-08-20T22:00) from the Extreme Ultraviolet Imager aboard Solar Orbiter, together with multi-waveband and multi-perspective observations from the S… ▽ More The hydrogen Lyman-$α$ (H {\sc i} Ly$α$) emission during solar flares has rarely been studied in spatially resolved images and its physical origin has not been fully understood. In this paper, we present novel Ly$α$ images for a C1.4 solar flare (SOL2021-08-20T22:00) from the Extreme Ultraviolet Imager aboard Solar Orbiter, together with multi-waveband and multi-perspective observations from the Solar Terrestrial Relations Observatory Ahead and the Solar Dynamics Observatory spacecraft. It is found that the Ly$α$ emission has a good temporal correlation with the thermal emissions at 1--8 Å and 5--7 keV, indicating that the flaring Ly$α$ is mainly produced by a thermal process in this small event. However, nonthermal electrons play a minor role in generating Ly$α$ at flare ribbons during the rise phase of the flare, as revealed by the hard X-ray imaging and spectral fitting. Besides originating from flare ribbons, the Ly$α$ emission can come from flare loops, likely caused by plasma heating and also cooling that happen in different flare phases. It is also found that the Ly$α$ emission shows fairly similar features with the He {\sc ii} 304 Å emission in light curve and spatio-temporal variation along with small differences. These observational results improve our understanding of the Ly$α$ emission in solar flares and also provide some insights for investigating the Ly$α$ emission in stellar flares. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: 19 pages, 7 figures, and 2 tables. ApJ accepted. Comments are welcome

arXiv:2207.05925 [pdf, ps, other]

doi 10.3847/1538-4357/ac7e46

Imaging and Spectroscopic Observations of the Dynamic Processes in Limb Solar Flares

Authors: Ke Yu, Y. Li, Jie Hong, De-Chao Song, M. D. Ding

Abstract: We investigate various dynamic processes including magnetic reconnection, chromospheric evaporation, and coronal rain draining in two limb solar flares through imaging and spectroscopic observations from the Interface Region Imaging Spectrograph (IRIS) and the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory. In the early phase of the flares, a bright and dense loop-top s… ▽ More We investigate various dynamic processes including magnetic reconnection, chromospheric evaporation, and coronal rain draining in two limb solar flares through imaging and spectroscopic observations from the Interface Region Imaging Spectrograph (IRIS) and the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory. In the early phase of the flares, a bright and dense loop-top structure with a cusp-like shape can be seen in multi-wavelength images, which is co-spatial with the hard X-ray 25--50 keV emission. In particular, intermittent magnetic reconnection downflows are detected in the time-space maps of AIA 304 Å. The reconnection downflows are manifested as redshifts on one half of the loops and blueshifts on the other half in the IRIS Si {\sc iv} 1393.76 Å line due to a projection effect. The Si {\sc iv} profiles exhibit complex features (say, multi-peak) with a relatively larger width at the loop-top region. During the impulsive phase, chromospheric evaporation is observed in both AIA images and the IRIS Fe {\sc xxi} 1354.08 Å line. Upward motions can be seen from AIA 131 Å images. The Fe {\sc xxi} line is significantly enhanced and shows a good Gaussian shape. In the gradual phase, warm rains are observed as downward moving plasmas in AIA 304 Å images. Both the Si {\sc iv} and Fe {\sc xxi} lines show a relatively symmetric shape with a larger width around the loop top. These results provide observational evidence for various dynamic processes involved in and are crucial to understand the energy release process of solar flares. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: 24 pages, 13 figures, Accepted for publication in ApJ

arXiv:2206.14188 [pdf, other]

doi 10.3847/2041-8213/ac7c6f

Radiative Magnetohydrodynamic Simulation of the Confined Eruption of a Magnetic Flux Rope: Magnetic Structure and Plasma Thermodynamics

Authors: Can Wang, Feng Chen, Mingde Ding, Zekun Lu

Abstract: It is widely believed that magnetic flux ropes are the key structure of solar eruptions; however, their observable counterparts are not clear yet. We study a flare associated with flux rope eruption in a comprehensive radiative magnetohydrodynamic simulation of flare-productive active regions, especially focusing on the thermodynamic properties of the plasma involved in the eruption and their rela… ▽ More It is widely believed that magnetic flux ropes are the key structure of solar eruptions; however, their observable counterparts are not clear yet. We study a flare associated with flux rope eruption in a comprehensive radiative magnetohydrodynamic simulation of flare-productive active regions, especially focusing on the thermodynamic properties of the plasma involved in the eruption and their relation to the magnetic flux rope. The pre-existing flux rope, which carries cold and dense plasma, rises quasi-statically before the eruption onsets. During this stage, the flux rope does not show obvious signatures in extreme ultraviolet (EUV) emission. After the flare onset, a thin `current shell' is generated around the erupting flux rope. Moreover, a current sheet is formed under the flux rope, where two groups of magnetic arcades reconnect and create a group of post-flare loops. The plasma within the `current shell', current sheet, and post-flare loops are heated to more than 10 MK. The post-flare loops give rise to abundant soft X-ray emission. Meanwhile a majority of the plasma hosted in the flux rope is heated to around 1 MK, and the main body of the flux rope is manifested as a bright arch in cooler EUV passbands such as AIA 171 Å~channel. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted for publication in ApJ Letters

arXiv:2206.12796 [pdf, other]

Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

Authors: Bang An, Zora Che, Mucong Ding, Furong Huang

Abstract: The increasing reliance on ML models in high-stakes tasks has raised a major concern on fairness violations. Although there has been a surge of work that improves algorithmic fairness, most of them are under the assumption of an identical training and test distribution. In many real-world applications, however, such an assumption is often violated as previously trained fair models are often deploy… ▽ More The increasing reliance on ML models in high-stakes tasks has raised a major concern on fairness violations. Although there has been a surge of work that improves algorithmic fairness, most of them are under the assumption of an identical training and test distribution. In many real-world applications, however, such an assumption is often violated as previously trained fair models are often deployed in a different environment, and the fairness of such models has been observed to collapse. In this paper, we study how to transfer model fairness under distribution shifts, a widespread issue in practice. We conduct a fine-grained analysis of how the fair model is affected under different types of distribution shifts and find that domain shifts are more challenging than subpopulation shifts. Inspired by the success of self-training in transferring accuracy under domain shifts, we derive a sufficient condition for transferring group fairness. Guided by it, we propose a practical algorithm with a fair consistency regularization as the key component. A synthetic dataset benchmark, which covers all types of distribution shifts, is deployed for experimental verification of the theoretical findings. Experiments on synthetic and real datasets including image and tabular data demonstrate that our approach effectively transfers fairness and accuracy under various distribution shifts. △ Less

Submitted 14 January, 2023; v1 submitted 26 June, 2022; originally announced June 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2206.09009 [pdf, other]

Intelligent Blockchain-based Edge Computing via Deep Reinforcement Learning: Solutions and Challenges

Authors: Dinh C. Nguyen, Van-Dinh Nguyen, Ming Ding, Symeon Chatzinotas, Pubudu N. Pathirana, Aruna Seneviratne, Octavia Dobre, Albert Y. Zomaya

Abstract: The convergence of mobile edge computing (MEC) and blockchain is transforming the current computing services in wireless Internet-of-Things networks, by enabling task offloading with security enhancement based on blockchain mining. Yet the existing approaches for these enabling technologies are isolated, providing only tailored solutions for specific services and scenarios. To fill this gap, we pr… ▽ More The convergence of mobile edge computing (MEC) and blockchain is transforming the current computing services in wireless Internet-of-Things networks, by enabling task offloading with security enhancement based on blockchain mining. Yet the existing approaches for these enabling technologies are isolated, providing only tailored solutions for specific services and scenarios. To fill this gap, we propose a novel cooperative task offloading and blockchain mining (TOBM) scheme for a blockchain-based MEC system, where each edge device not only handles computation tasks but also deals with block mining for improving system utility. To address the latency issues caused by the blockchain operation in MEC, we develop a new Proof-of-Reputation consensus mechanism based on a lightweight block verification strategy. To accommodate the highly dynamic environment and high-dimensional system state space, we apply a novel distributed deep reinforcement learning-based approach by using a multi-agent deep deterministic policy gradient algorithm. Experimental results demonstrate the superior performance of the proposed TOBM scheme in terms of enhanced system reward, improved offloading utility with lower blockchain mining latency, and better system utility, compared to the existing cooperative and non-cooperative schemes. The paper concludes with key technical challenges and possible directions for future blockchain-based MEC research. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted at IEEE Network Magazine, 8 pages. arXiv admin note: substantial text overlap with arXiv:2109.14263

arXiv:2206.08883 [pdf, other]

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Authors: Yao Mu, Shoufa Chen, Mingyu Ding, Jianyu Chen, Runjian Chen, Ping Luo

Abstract: Transformer has achieved great successes in learning vision and language representation, which is general across various downstream tasks. In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size. However, porting Transformer to sample-efficient visual control remains a challenging and unsolved p… ▽ More Transformer has achieved great successes in learning vision and language representation, which is general across various downstream tasks. In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size. However, porting Transformer to sample-efficient visual control remains a challenging and unsolved problem. To this end, we propose a novel Control Transformer (CtrlFormer), possessing many appealing benefits that prior arts do not have. Firstly, CtrlFormer jointly learns self-attention mechanisms between visual tokens and policy tokens among different control tasks, where multitask representation can be learned and transferred without catastrophic forgetting. Secondly, we carefully design a contrastive reinforcement learning paradigm to train CtrlFormer, enabling it to achieve high sample efficiency, which is important in control problems. For example, in the DMControl benchmark, unlike recent advanced methods that failed by producing a zero score in the "Cartpole" task after transfer learning with 100k samples, CtrlFormer can achieve a state-of-the-art score with only 100k samples while maintaining the performance of previous tasks. The code and models are released in our project homepage. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: ICML 2022

arXiv:2205.15868 [pdf, other]

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

Authors: Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang

Abstract: Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation. Its application to video generation is still facing many challenges: The potential huge computation cost makes the training from scratch unaffordable; The scarcity and weak relevance of text-video datasets hinder the model understanding complex movement semantics. In this… ▽ More Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation. Its application to video generation is still facing many challenges: The potential huge computation cost makes the training from scratch unaffordable; The scarcity and weak relevance of text-video datasets hinder the model understanding complex movement semantics. In this work, we present 9B-parameter transformer CogVideo, trained by inheriting a pretrained text-to-image model, CogView2. We also propose multi-frame-rate hierarchical training strategy to better align text and video clips. As (probably) the first open-source large-scale pretrained text-to-video model, CogVideo outperforms all publicly available models at a large margin in machine and human evaluations. △ Less

Submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.14569 [pdf, ps, other]

doi 10.1364/JOSAB.465554

Magnon squeezing enhanced entanglement in a cavity magnomechanical system

Authors: Ming-Song Ding, Ying Shi, Yu-jie Liu, Li Zheng

Abstract: We investigate the generation of the entanglement in a cavity magnomechanical system, which consists of three modes: a magnon mode, a microwave cavity mode and a mechanical vibration mode, the couplings of the magnon-photon and the magnon-phonon are achieved by the magnetic dipole interaction and the magnetostrictive interaction, respectively. By introducing a squeezing of the magnon mode, the mag… ▽ More We investigate the generation of the entanglement in a cavity magnomechanical system, which consists of three modes: a magnon mode, a microwave cavity mode and a mechanical vibration mode, the couplings of the magnon-photon and the magnon-phonon are achieved by the magnetic dipole interaction and the magnetostrictive interaction, respectively. By introducing a squeezing of the magnon mode, the magnon-photon and the magnon-phonon entanglements are significantly enhanced compared with the case without inserting the magnon squeezing. We find that an optimal parameter of the squeezing exists, which yields the maximum entanglement. This study provides a new idea for exploring the properties of quantum entanglement in the the cavity magnomechanical systems, and may have some potential applications in the quantum state engineering. △ Less

Submitted 3 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.14403 [pdf, other]

Rethinking the Setting of Semi-supervised Learning on Graphs

Authors: Ziang Li, Ming Ding, Weikai Li, Zihan Wang, Ziyu Zeng, Yukuo Cen, Jie Tang

Abstract: We argue that the present setting of semisupervised learning on graphs may result in unfair comparisons, due to its potential risk of over-tuning hyper-parameters for models. In this paper, we highlight the significant influence of tuning hyper-parameters, which leverages the label information in the validation set to improve the performance. To explore the limit of over-tuning hyperparameters, we… ▽ More We argue that the present setting of semisupervised learning on graphs may result in unfair comparisons, due to its potential risk of over-tuning hyper-parameters for models. In this paper, we highlight the significant influence of tuning hyper-parameters, which leverages the label information in the validation set to improve the performance. To explore the limit of over-tuning hyperparameters, we propose ValidUtil, an approach to fully utilize the label information in the validation set through an extra group of hyper-parameters. With ValidUtil, even GCN can easily get high accuracy of 85.8% on Cora. To avoid over-tuning, we merge the training set and the validation set and construct an i.i.d. graph benchmark (IGB) consisting of 4 datasets. Each dataset contains 100 i.i.d. graphs sampled from a large graph to reduce the evaluation variance. Our experiments suggest that IGB is a more stable benchmark than previous datasets for semisupervised learning on graphs. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: To appear in IJCAI 2022

arXiv:2205.10361 [pdf, other]

doi 10.3847/2041-8213/ac715a

Current-sheet Oscillations Caused by Kelvin-Helmholtz Instability at the Loop Top of Solar Flares

Authors: Yulei Wang, Xin Cheng, Zining Ren, Mingde Ding

Abstract: Current sheets (CSs), long stretching structures of magnetic reconnection above solar flare loops, are usually observed to oscillate, their origins, however, are still puzzled at present. Based on a high-resolution 2.5-dimensional MHD simulation of magnetic reconnection, we explore the formation mechanism of the CS oscillations. We find that large-amplitude transverse waves are excited by the Kelv… ▽ More Current sheets (CSs), long stretching structures of magnetic reconnection above solar flare loops, are usually observed to oscillate, their origins, however, are still puzzled at present. Based on a high-resolution 2.5-dimensional MHD simulation of magnetic reconnection, we explore the formation mechanism of the CS oscillations. We find that large-amplitude transverse waves are excited by the Kelvin-Helmholtz instability (KHI) at the highly turbulent cusp-shaped region. The perturbations propagate upward along the CS with a phase speed close to local Alfvén speed thus resulting in the CS oscillations we observe. Though the perturbations damp after propagating for a long distance, the CS oscillations are still detectable. In terms of detected CS oscillations, with a combination of differential emission measure technique, we propose a new method for measuring the magnetic field strength of the CSs and its distribution in height. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: Accepted for publication in The Astrophysical Journal Letters

arXiv:2205.06075 [pdf, ps, other]

doi 10.1007/s11433-022-1900-5

Calibration procedures for the CHASE/HIS science data

Authors: Ye Qiu, ShiHao Rao, Chuan Li, Cheng Fang, MingDe Ding, Zhen Li, YiWei Ni, WenBo Wang, Jie Hong, Qi Hao, Yu Dai, PengFei Chen, XiaoSheng Wan, Zhi Xu, Wei You, Yuan Yuan, HongJiang Tao, XianSheng Li, YuKun He, Qiang Liu

Abstract: The Hα line is an important optical line in solar observations containing the information from the photosphere to the chromosphere. To study the mechanisms of solar eruptions and the plasma dynamics in the lower atmosphere, the Chinese Hα Solar Explorer (CHASE) was launched into a Sun-synchronous orbit on October 14, 2021. The scientific payload of the CHASE satellite is the Hα Imaging Spectrograp… ▽ More The Hα line is an important optical line in solar observations containing the information from the photosphere to the chromosphere. To study the mechanisms of solar eruptions and the plasma dynamics in the lower atmosphere, the Chinese Hα Solar Explorer (CHASE) was launched into a Sun-synchronous orbit on October 14, 2021. The scientific payload of the CHASE satellite is the Hα Imaging Spectrograph (HIS). The CHASE/HIS acquires, for the first time, seeing-free Hα spectroscopic observations with high spectral and temporal resolutions. It consists of two observational modes. The raster scanning mode provides full-Sun or region-of-interest spectra at Hα (6559.7-6565.9 Å) and Fe I (6567.8-6570.6 Å) wavebands. The continuum imaging mode obtains full-Sun photospheric images at around 6689 Å. In this paper, we present detailed calibration procedures for the CHASE/HIS science data, including the dark-field and flat-field correction, slit image curvature correction, wavelength and intensity calibration, and coordinate transformation. The higher-level data products can be directly used for scientific research. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 9 pages, 7 figures

Journal ref: Sci. China-Phys. Mech. Astron. 65, 289603 (2022)

arXiv:2205.05962 [pdf, ps, other]

doi 10.1007/s11433-022-1893-3

The Chinese Hα Solar Explorer (CHASE) mission: An overview

Authors: Chuan Li, Cheng Fang, Zhen Li, MingDe Ding, PengFei Chen, Ye Qiu, Wei You, Yuan Yuan, MinJie An, HongJiang Tao, XianSheng Li, Zhe Chen, Qiang Liu, Gui Mei, Liang Yang, Wei Zhang, WeiQiang Cheng, JianXin Chen, ChangYa Chen, Qiang Gu, QingLong Huang, MingXing Liu, ChengShan Han, HongWei Xin, ChangZheng Chen , et al. (10 additional authors not shown)

Abstract: The Chinese Hα Solar Explorer (CHASE), dubbed "Xihe" - Goddess of the Sun, was launched on October 14, 2021 as the first solar space mission of China National Space Administration (CNSA). The CHASE mission is designed to test a newly developed satellite platform and to acquire the spectroscopic observations in the Hα waveband. The Hα Imaging Spectrograph (HIS) is the scientific payload of the CHAS… ▽ More The Chinese Hα Solar Explorer (CHASE), dubbed "Xihe" - Goddess of the Sun, was launched on October 14, 2021 as the first solar space mission of China National Space Administration (CNSA). The CHASE mission is designed to test a newly developed satellite platform and to acquire the spectroscopic observations in the Hα waveband. The Hα Imaging Spectrograph (HIS) is the scientific payload of the CHASE satellite. It consists of two observational modes: raster scanning mode and continuum imaging mode. The raster scanning mode obtains full-Sun or region-of-interest spectral images from 6559.7 to 6565.9 Å and from 6567.8 to 6570.6 Å with 0.024 Å pixel spectral resolution and 1 minute temporal resolution. The continuum imaging mode obtains photospheric images in continuum around 6689 Å with the full width at half maximum of 13.4 Å. The CHASE mission will advance our understanding of the dynamics of solar activity in the photosphere and chromosphere. In this paper, we present an overview of the CHASE mission including the scientific objectives, HIS instrument overview, data calibration flow, and first results of on-orbit observations. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 9 pages, 6 figures

Journal ref: Sci. China-Phys. Mech. Astron. 65, 289602 (2022)

arXiv:2205.05025 [pdf, other]

doi 10.1016/j.nima.2022.167697

Design and testing of LGAD sensor with shallow carbon implantation

Authors: Kewei Wu, Xuewei Jia, Tao Yang, Mengzhao Li, Wei Wang, Mei Zhao, Zhijun Liang, Joao Guimaraes da Costa, Yunyun Fan, Han Cui, Alissa Howard, Gregor Kramberger, Xin Shi, Yuekun Heng, Yuhang Tan, Bo Liu, Yuan Feng, Shuqi Li, Mengran Li, Chengjun Yu, Xuan Yang, Mingjie Zhai, Gaobo Xu, Gangping Yan, Qionghua Zhai , et al. (4 additional authors not shown)

Abstract: The low gain avalanche detectors (LGADs) are thin sensors with fast charge collection which in combination with internal gain deliver an outstanding time resolution of about 30 ps. High collision rates and consequent large particle rates crossing the detectors at the upgraded Large Hadron Collider (LHC) in 2028 will lead to radiation damage and deteriorated performance of the LGADs. The main conse… ▽ More The low gain avalanche detectors (LGADs) are thin sensors with fast charge collection which in combination with internal gain deliver an outstanding time resolution of about 30 ps. High collision rates and consequent large particle rates crossing the detectors at the upgraded Large Hadron Collider (LHC) in 2028 will lead to radiation damage and deteriorated performance of the LGADs. The main consequence of radiation damage is loss of gain layer doping (acceptor removal) which requires an increase of bias voltage to compensate for the loss of charge collection efficiency and consequently time resolution. The Institute of High Energy Physics (IHEP), Chinese Academy of Sciences (CAS) has developed a process based on the Institute of Microelectronics (IME), CAS capability to enrich the gain layer with carbon to reduce the acceptor removal effect by radiation. After 1 MeV neutron equivalent fluence of 2.5$\times$10$^{15}$ n$_{eq}$/cm$^{2}$, which is the maximum fluence to which sensors will be exposed at ATLAS High Granularity Timing Detector (HGTD), the IHEP-IME second version (IHEP-IMEv2) 50 $μ$m LGAD sensors already deliver adequate charge collection > 4 fC and time resolution < 50 ps at voltages < 400 V. The operation voltages of these 50 $μ$m devices are well below those at which single event burnout may occur. △ Less

Submitted 31 May, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

arXiv:2205.04692 [pdf, other]

Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting

Authors: Mingyang Chen, Wen Zhang, Zhen Yao, Xiangnan Chen, Mengxiao Ding, Fei Huang, Huajun Chen

Abstract: We study the knowledge extrapolation problem to embed new components (i.e., entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting. In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations. To solve this problem, we introduce the meta-learning setting, where a set of tasks are sampled on the existin… ▽ More We study the knowledge extrapolation problem to embed new components (i.e., entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting. In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations. To solve this problem, we introduce the meta-learning setting, where a set of tasks are sampled on the existing KG to mimic the link prediction task on the emerging KG. Based on sampled tasks, we meta-train a graph neural network framework that can construct features for unseen components based on structural information and output embeddings for them. Experimental results show that our proposed method can effectively embed unseen components and outperforms models that consider inductive settings for KGs and baselines that directly use conventional KG embedding methods. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: IJCAI 2022

arXiv:2205.04232 [pdf, other]

doi 10.1093/mnras/stac1305

Arecibo and FAST Timing Follow-up of twelve Millisecond Pulsars Discovered in Commensal Radio Astronomy FAST Survey

Authors: C. C. Miao, W. W. Zhu, D. Li, P. C. C. Freire, J. R. Niu, P. Wang, J. P. Yuan, M. Y. Xue, A. D. Cameron, D. J. Champion, M. Cruces, Y. T. Chen, M. M. Chi, X. F. Cheng, S. J. Dang, M. F. Ding, Y. Feng, Z. Y. Gan, G. Hobbs, M. Kramer, Z. J. Liu, Y. X. Li, Z. K. Luo, X. L. Miao, L. Q. Meng , et al. (24 additional authors not shown)

Abstract: We report the phase-connected timing ephemeris, polarization pulse profiles, Faraday rotation measurements, and Rotating-Vector-Model (RVM) fitting results of twelve millisecond pulsars (MSPs) discovered with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in the Commensal radio Astronomy FAST survey (CRAFTS). The timing campaigns were carried out with FAST and Arecibo over three… ▽ More We report the phase-connected timing ephemeris, polarization pulse profiles, Faraday rotation measurements, and Rotating-Vector-Model (RVM) fitting results of twelve millisecond pulsars (MSPs) discovered with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in the Commensal radio Astronomy FAST survey (CRAFTS). The timing campaigns were carried out with FAST and Arecibo over three years. Eleven of the twelve pulsars are in neutron star - white dwarf binary systems, with orbital periods between 2.4 and 100 d. Ten of them have spin periods, companion masses, and orbital eccentricities that are consistent with the theoretical expectations for MSP - Helium white dwarf (He WD) systems. The last binary pulsar (PSR J1912$-$0952) has a significantly smaller spin frequency and a smaller companion mass, the latter could be caused by a low orbital inclination for the system. Its orbital period of 29 days is well within the range of orbital periods where some MSP - He WD systems have shown anomalous eccentricities, however, the eccentricity of PSR J1912$-$0952 is typical of what one finds for the remaining MSP - He WD systems. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 11 pages, 5 figures, MNRAS accepted

arXiv:2205.04042 [pdf, other]

Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning

Authors: Na Dong, Yongqiang Zhang, Mingli Ding, Gim Hee Lee

Abstract: Incremental few-shot object detection aims at detecting novel classes without forgetting knowledge of the base classes with only a few labeled training data from the novel classes. Most related prior works are on incremental object detection that rely on the availability of abundant training samples per novel class that substantially limits the scalability to real-world setting where novel data ca… ▽ More Incremental few-shot object detection aims at detecting novel classes without forgetting knowledge of the base classes with only a few labeled training data from the novel classes. Most related prior works are on incremental object detection that rely on the availability of abundant training samples per novel class that substantially limits the scalability to real-world setting where novel data can be scarce. In this paper, we propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector. To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision from additional object proposals generated using Selective Search as pseudo labels. We further introduce an incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without forgetting the base classes. Extensive experiments conducted on standard incremental object detection and incremental few-shot object detection settings show that our approach significantly outperforms state-of-the-art methods by a large margin. △ Less

Submitted 27 February, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: Accepted by AAAI2023

arXiv:2205.03798 [pdf, other]

Fast and Structured Block-Term Tensor Decomposition For Hyperspectral Unmixing

Authors: Meng Ding, Xiao Fu, Xi-Le Zhao

Abstract: The block-term tensor decomposition model with multilinear rank-$(L_r,L_r,1)$ terms (or, the "LL1 tensor decomposition" in short) offers a valuable alternative for hyperspectral unmixing (HU) under the linear mixture model. Particularly, the LL1 decomposition ensures the endmember/abundance identifiability in scenarios where such guarantees are not supported by the classic matrix factorization (MF… ▽ More The block-term tensor decomposition model with multilinear rank-$(L_r,L_r,1)$ terms (or, the "LL1 tensor decomposition" in short) offers a valuable alternative for hyperspectral unmixing (HU) under the linear mixture model. Particularly, the LL1 decomposition ensures the endmember/abundance identifiability in scenarios where such guarantees are not supported by the classic matrix factorization (MF) approaches. However, existing LL1-based HU algorithms use a three-factor parameterization of the tensor (i.e., the hyperspectral image cube), which leads to a number of challenges including high per-iteration complexity, slow convergence, and difficulties in incorporating structural prior information. This work puts forth an LL1 tensor decomposition-based HU algorithm that uses a constrained two-factor re-parameterization of the tensor data. As a consequence, a two-block alternating gradient projection (GP)-based LL1 algorithm is proposed for HU. With carefully designed projection solvers, the GP algorithm enjoys a relatively low per-iteration complexity. Like in MF-based HU, the factors under our parameterization correspond to the endmembers and abundances. Thus, the proposed framework is natural to incorporate physics-motivated priors that arise in HU. The proposed algorithm often attains orders-of-magnitude speedup and substantial HU performance gains compared to the existing three-factor parameterization-based HU algorithms. △ Less

Submitted 8 May, 2022; originally announced May 2022.

arXiv:2205.03282 [pdf, other]

doi 10.1016/j.jqsrt.2022.108240

Comment on: "Hyperfine structure measurements of Co I and Co II with Fourier transform spectroscopy" by Fu et al. [JQSRT 2021, 107590]

Authors: Milan Ding, Juliet C. Pickering

Abstract: This comment points out errors in the analysis of 61 magnetic hyperfine structure ($A$) constants of Co II energy levels by Fu et al. [JQSRT 2021, 107590]. The paper was published without full awareness of the extensive literature already available for Co II hyperfine $A$ constants at the time; 57 of 58 $A$ constants that were claimed to have been measured for the first time had already been measu… ▽ More This comment points out errors in the analysis of 61 magnetic hyperfine structure ($A$) constants of Co II energy levels by Fu et al. [JQSRT 2021, 107590]. The paper was published without full awareness of the extensive literature already available for Co II hyperfine $A$ constants at the time; 57 of 58 $A$ constants that were claimed to have been measured for the first time had already been measured by the prior work of Ding \& Pickering [ApJS 2020, 251:24], who had published $A$ constants for 292 levels of Co II. The $A$ constant of 3d$^6$4s$^2$ a$^5$D$_4$ has been determined by Fu et al. [JQSRT 2021, 107590] for the first time to be $12.0\pm1.8$ mK (1 mK $=$ 0.001 cm$^{-1}$), which was found to agree with line profiles observed by Ding \& Pickering [ApJS 2020, 251:24]. Discrepancies in 17 $A$ constants of Fu et al. [JQSRT 2021, 107590] were found, which are likely due to the analysis of weak, experimentally unclassified transitions with Ritz wavenumbers 25453.966 cm$^{-1}$ and 25149.948 cm$^{-1}$ by Fu et al. [JQSRT 2021, 107590] for the $A$ constants of the energy levels 3d$^7$($^2$G)4s a$^3$G$_5$ and 3d$^7$($^2$P)4s c$^3$P$_2$ respectively. Fewer transitions and poorer quality spectra analysed by Fu et al. [JQSRT 2021, 107590] are also concluded to have contributed to disagreements in the 17 $A$ constants. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2205.01258 [pdf, other]

Universal Optimality and Robust Utility Bounds for Metric Differential Privacy

Authors: Natasha Fernandes, Annabelle McIver, Catuscia Palamidessi, Ming Ding

Abstract: We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the best mechanism for a certain query that simultaneously satisfies (a fixed) $ε$-differential privacy constraint whilst at the same time providing better utility compared to any other $ε$-differentially private mechanism for the same query… ▽ More We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the best mechanism for a certain query that simultaneously satisfies (a fixed) $ε$-differential privacy constraint whilst at the same time providing better utility compared to any other $ε$-differentially private mechanism for the same query. They showed that the Geometric mechanism is "universally optimal" for the class of counting queries. On the other hand, Brenner and Nissim showed that outside the space of counting queries, and for the Bayes risk loss function, no such universally optimal mechanisms exist. In this paper we use metric differential privacy and quantitative information flow as the fundamental principle for studying universal optimality. Metric differential privacy is a generalisation of both standard (i.e., central) differential privacy and local differential privacy, and it is increasingly being used in various application domains, for instance in location privacy and in privacy preserving machine learning. Using this framework we are able to clarify Nissim and Brenner's negative results, showing (a) that in fact all privacy types contain optimal mechanisms relative to certain kinds of non-trivial loss functions, and (b) extending and generalising their negative results beyond Bayes risk specifically to a wide class of non-trivial loss functions. We also propose weaker universal benchmarks of utility called "privacy type capacities". We show that such capacities always exist and can be computed using a convex optimisation algorithm. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2205.01089 [pdf, other]

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

Authors: Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Abstract: Objects' motions in nature are governed by complex interactions and their properties. While some properties, such as shape and material, can be identified via the object's visual appearances, others like mass and electric charge are not directly visible. The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world, whereas h… ▽ More Objects' motions in nature are governed by complex interactions and their properties. While some properties, such as shape and material, can be identified via the object's visual appearances, others like mass and electric charge are not directly visible. The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world, whereas humans can effortlessly infer them with limited observations. Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction. In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes few videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions posted on one of the videos. Evaluation results of several state-of-the-art video reasoning models on ComPhy show unsatisfactory performance as they fail to capture these hidden properties. We further propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution into a unified framework. CPL can effectively identify objects' physical properties from their interactions and predict their dynamics to answer questions. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: ICLR 2022. Project page: https://comphyreasoning.github.io/

arXiv:2204.14217 [pdf, other]

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Authors: Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang

Abstract: The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and fi… ▽ More The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and finetune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images. △ Less

Submitted 27 May, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.09200 [pdf, other]

doi 10.3847/1538-4365/ac6754

Stellar Atmospheric Parameters of M-type Stars from LAMOST DR8

Authors: Ming-Yi Ding, Jian-Rong Shi, Yue Wu, Hugh R. A. Jones, Hong-liang Yan, Chun-Qian Li, Qi Gao, Tian-Yi Chen, Jing-Hua Zhang, Shuai Liu, Tai-Sheng Yan, Xiao-Jin Xie

Abstract: The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) Low Resolution Spectroscopic Survey (LRS) provides massive spectroscopic data of M-type stars, and the derived stellar parameters could bring vital help to various studies. We adopt the ULySS package to perform $χ^2$ minimization with model spectra generated from the MILES interpolator, and determine the stellar atmospheric par… ▽ More The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) Low Resolution Spectroscopic Survey (LRS) provides massive spectroscopic data of M-type stars, and the derived stellar parameters could bring vital help to various studies. We adopt the ULySS package to perform $χ^2$ minimization with model spectra generated from the MILES interpolator, and determine the stellar atmospheric parameters for the M-type stars from LAMOST LRS Data Release (DR) 8. Comparison with the stellar parameters from APOGEE Stellar Parameter and Chemical Abundance Pipeline (ASPCAP) suggests that most of our results have good consistency. For M dwarfs, we achieve dispersions better than 74 K, 0.19 dex and 0.16 dex for $T_{\rm eff}$, $\log{g}$ and [Fe/H], while for M giants, the internal uncertainties are 58 K, 0.32 dex and 0.26 dex, respectively. Compared to ASPCAP we also find a systematic underestimation of $Δ{T_{\rm eff}} =$ $-$176 K for M dwarfs, and a systematic overestimation of $Δ{\log{g}} =$ 0.30 dex for M giants. However, such differences are less significant when we make comparison with common stars from other literature, which indicates that systematic biases exist in the difference of ASPCAP and other measurements. A catalog of 763,136 spectra corresponding to 616,314 M-type stars with derived stellar parameters is presented. We determine the stellar parameters for stars with $T_{\rm eff}$ higher than 2,900 K, with $\log{g}$ from -0.24 dex to 5.9 dex. The typical precisions are 45 K, 0.25 dex and 0.22 dex, for $T_{\rm eff}$, $\log{g}$ and [Fe/H], respectively, which are estimated from the duplicate observations of the same stars. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: 21 pages, 20 figures, 3 tables, accepted for publication in The Astrophysical Journal Supplement Series

arXiv:2204.07762 [pdf, other]

doi 10.3847/2041-8213/ac67aa

Three-dimensional Magnetic and Thermodynamic Structures of Solar Microflares

Authors: Z. F. Li, X. Cheng, F. Chen, J. Chen, M. D. Ding

Abstract: Microflares, one of small-scale solar activities, are believed to be caused by magnetic reconnection. Nevertheless, their three-dimensional (3D) magnetic structures, thermodynamic structures, and physical links to the reconnection have been unclear. In this Letter, based on high-resolution 3D radiative magnetohydrodynamic simulation of the quiet Sun spanning from the upper convection zone to the c… ▽ More Microflares, one of small-scale solar activities, are believed to be caused by magnetic reconnection. Nevertheless, their three-dimensional (3D) magnetic structures, thermodynamic structures, and physical links to the reconnection have been unclear. In this Letter, based on high-resolution 3D radiative magnetohydrodynamic simulation of the quiet Sun spanning from the upper convection zone to the corona, we investigate 3D magnetic and thermodynamic structures of three homologous microflares. It is found that they originate from localized hot plasma embedded in the chromospheric environment at the height of 2--10 Mm above the photosphere and last for 3--10 minutes with released magnetic energy in the range of $10^{27}-10^{28}$ erg. The heated plasma is almost co-spatial with the regions where the heating rate per particle is maximal. The 3D velocity field reveals a pair of converging flows with velocities of tens of km s$^{-1}$ toward and outflows with velocities of about 100 km s$^{-1}$ moving away from the hot plasma. These features support that magnetic reconnection plays a critical role in heating the localized chromospheric plasma to coronal temperature, giving rise to observed microflares. The magnetic topology analysis further discloses that the reconnection region is located near quasi-separators where both current density and squashing factors are maximal although the specific topology may vary from tether-cutting to fan-spine-like structure. △ Less

Submitted 16 April, 2022; originally announced April 2022.

arXiv:2204.04363 [pdf, other]

doi 10.1109/TIP.2022.3166673

Attention guided global enhancement and local refinement network for semantic segmentation

Authors: Jiangyun Li, Sen Zha, Chen Chen, Meng Ding, Tianxiang Zhang, Hong Yu

Abstract: The encoder-decoder architecture is widely used as a lightweight semantic segmentation network. However, it struggles with a limited performance compared to a well-designed Dilated-FCN model for two major problems. First, commonly used upsampling methods in the decoder such as interpolation and deconvolution suffer from a local receptive field, unable to encode global contexts. Second, low-level f… ▽ More The encoder-decoder architecture is widely used as a lightweight semantic segmentation network. However, it struggles with a limited performance compared to a well-designed Dilated-FCN model for two major problems. First, commonly used upsampling methods in the decoder such as interpolation and deconvolution suffer from a local receptive field, unable to encode global contexts. Second, low-level features may bring noises to the network decoder through skip connections for the inadequacy of semantic concepts in early encoder layers. To tackle these challenges, a Global Enhancement Method is proposed to aggregate global information from high-level feature maps and adaptively distribute them to different decoder layers, alleviating the shortage of global contexts in the upsampling process. Besides, a Local Refinement Module is developed by utilizing the decoder features as the semantic guidance to refine the noisy encoder features before the fusion of these two (the decoder features and the encoder features). Then, the two methods are integrated into a Context Fusion Block, and based on that, a novel Attention guided Global enhancement and Local refinement Network (AGLN) is elaborately designed. Extensive experiments on PASCAL Context, ADE20K, and PASCAL VOC 2012 datasets have demonstrated the effectiveness of the proposed approach. In particular, with a vanilla ResNet-101 backbone, AGLN achieves the state-of-the-art result (56.23% mean IoU) on the PASCAL Context dataset. The code is available at https://github.com/zhasen1996/AGLN. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 12 pages, 6 figures

ACM Class: I.4.6

arXiv:2204.03645 [pdf, other]

DaViT: Dual Attention Vision Transformers

Authors: Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

Abstract: In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both "spatial tokens" and "channel tokens". With spatial tokens, the spatial dimension d… ▽ More In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both "spatial tokens" and "channel tokens". With spatial tokens, the spatial dimension defines the token scope, and the channel dimension defines the token feature dimension. With channel tokens, we have the inverse: the channel dimension defines the token scope, and the spatial dimension defines the token feature dimension. We further group tokens along the sequence direction for both spatial and channel tokens to maintain the linear complexity of the entire model. We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention. Extensive experiments show our DaViT achieves state-of-the-art performance on four different tasks with efficient computations. Without extra data, DaViT-Tiny, DaViT-Small, and DaViT-Base achieve 82.8%, 84.2%, and 84.6% top-1 accuracy on ImageNet-1K with 28.3M, 49.7M, and 87.9M parameters, respectively. When we further scale up DaViT with 1.5B weakly supervised image and text pairs, DaViT-Gaint reaches 90.4% top-1 accuracy on ImageNet-1K. Code is available at https://github.com/dingmyu/davit. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2203.15383 [pdf, other]

doi 10.1088/1361-6560/ac628a

Category Guided Attention Network for Brain Tumor Segmentation in MRI

Authors: Jiangyun Li, Hong Yu, Chen Chen, Meng Ding, Sen Zha

Abstract: Objective: Magnetic resonance imaging (MRI) has been widely used for the analysis and diagnosis of brain diseases. Accurate and automatic brain tumor segmentation is of paramount importance for radiation treatment. However, low tissue contrast in tumor regions makes it a challenging task.Approach: We propose a novel segmentation network named Category Guided Attention U-Net (CGA U-Net). In this mo… ▽ More Objective: Magnetic resonance imaging (MRI) has been widely used for the analysis and diagnosis of brain diseases. Accurate and automatic brain tumor segmentation is of paramount importance for radiation treatment. However, low tissue contrast in tumor regions makes it a challenging task.Approach: We propose a novel segmentation network named Category Guided Attention U-Net (CGA U-Net). In this model, we design a Supervised Attention Module (SAM) based on the attention mechanism, which can capture more accurate and stable long-range dependency in feature maps without introducing much computational cost. Moreover, we propose an intra-class update approach to reconstruct feature maps by aggregating pixels of the same category. Main results: Experimental results on the BraTS 2019 datasets show that the proposed method outperformers the state-of-the-art algorithms in both segmentation performance and computational complexity. Significance: The CGA U-Net can effectively capture the global semantic information in the MRI image by using the SAM module, while significantly reducing the computational cost. Code is available at https://github.com/delugewalker/CGA-U-Net. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.14101

A Roadmap for Big Model

Authors: Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui , et al. (75 additional authors not shown)

Abstract: With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM… ▽ More With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view. △ Less

Submitted 20 April, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

Comments: This report has been withdrawn by the authors due to critical issues in Section 2.3.1 of Article 2

arXiv:2203.14098 [pdf, other]

doi 10.1109/TPAMI.2022.3163806

Uncertainty-aware Contrastive Distillation for Incremental Semantic Segmentation

Authors: Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Moin Nabi, Xavier Alameda-Pineda, Elisa Ricci

Abstract: A fundamental and challenging problem in deep learning is catastrophic forgetting, i.e. the tendency of neural networks to fail to preserve the knowledge acquired from old tasks when learning new tasks. This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years. While earlier works in computer vision hav… ▽ More A fundamental and challenging problem in deep learning is catastrophic forgetting, i.e. the tendency of neural networks to fail to preserve the knowledge acquired from old tasks when learning new tasks. This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years. While earlier works in computer vision have mostly focused on image classification and object detection, more recently some IL approaches for semantic segmentation have been introduced. These previous works showed that, despite its simplicity, knowledge distillation can be effectively employed to alleviate catastrophic forgetting. In this paper, we follow this research direction and, inspired by recent literature on contrastive learning, we propose a novel distillation framework, Uncertainty-aware Contrastive Distillation (\method). In a nutshell, \method~is operated by introducing a novel distillation loss that takes into account all the images in a mini-batch, enforcing similarity between features associated to all the pixels from the same classes, and pulling apart those corresponding to pixels from different classes. In order to mitigate catastrophic forgetting, we contrast features of the new model with features extracted by a frozen model learned at the previous incremental step. Our experimental results demonstrate the advantage of the proposed distillation technique, which can be used in synergy with previous IL approaches, and leads to state-of-art performance on three commonly adopted benchmarks for incremental semantic segmentation. The code is available at \url{https://github.com/ygjwd12345/UCD}. △ Less

Submitted 20 May, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

Comments: TPAMI

arXiv:2203.11042 [pdf, other]

doi 10.3847/1538-4357/ac5fac

Quantifying the Magnetic Structure of a Coronal Shock Producing a Type II Radio Burst

Authors: W. Su, T. M. Li, X. Cheng, L. Feng, P. J. Zhang, P. F. Chen, M. D. Ding, L. J. Chen, Y. Guo, Y. Wang, D. Li, L. Y. Zhang

Abstract: Type II radio bursts are thought to be produced by shock waves in the solar atmosphere. However, what magnetic conditions are needed for the generation of type II radio bursts is still a puzzling issue. Here, we quantify the magnetic structure of a coronal shock associated with a type II radio burst. Based on the multi-perspective extreme-ultraviolet observations, we reconstruct the three-dimensio… ▽ More Type II radio bursts are thought to be produced by shock waves in the solar atmosphere. However, what magnetic conditions are needed for the generation of type II radio bursts is still a puzzling issue. Here, we quantify the magnetic structure of a coronal shock associated with a type II radio burst. Based on the multi-perspective extreme-ultraviolet observations, we reconstruct the three-dimensional (3D) shock surface. By using a magnetic field extrapolation model, we then derive the orientation of the magnetic field relative to the normal of the shock front ($θ_{\rm Bn}$) and Alfvén Mach number ($M_A$) on the shock front. Combining the radio observations from Nancay Radio Heliograph, we obtain the source region of the type II radio burst on the shock front. It is found that the radio burst is generated by a shock with $M_A \gtrsim 1.5$ and a bimodal distribution of $θ_{Bn}$. We also use the Rankine-Hugoniot relations to quantify the properties of the shock downstream. Our results provide a quantitative 3D magnetic structure condition of a coronal shock that produces a type II radio burst. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 18 Pages, 10 figures, accepted for publication in ApJ

arXiv:2203.10673 [pdf]

5G-Enabled Pseudonymity for Cooperative Intelligent Transportation System

Authors: Nardine Basta, Ming Ding, Muhammad Ikram, Mohamed Ali Kaafar

Abstract: Cooperative Intelligent Transportation Systems (C-ITS) enable communications between vehicles, road-side infrastructures, and road-users to improve users' safety and to efficiently manage traffic. Most, if not all, of the intelligent vehicles-to-everything (V2X) applications, often rely on continuous collection and sharing of sensitive information such as detailed location information which raises… ▽ More Cooperative Intelligent Transportation Systems (C-ITS) enable communications between vehicles, road-side infrastructures, and road-users to improve users' safety and to efficiently manage traffic. Most, if not all, of the intelligent vehicles-to-everything (V2X) applications, often rely on continuous collection and sharing of sensitive information such as detailed location information which raises privacy concerns. In this light, a common approach to concealing the long-term identity of C-ITS vehicles is using multiple temporary identifiers, called pseudonyms. However, the legacy pseudonyms management approach is prone to linking attacks. The introduction of 5G network to V2X offers enhanced location accuracy, better clock synchronisation, improved modular service-based architecture, and enhanced security and privacy preservation controls. Motivated by the above enhancements, we study 5G-enabled pseudonyms for protecting vehicle identity privacy in C-ITS. We highlight the gaps in the current standards of pseudonyms management. We further provide recommendations regarding the pseudonyms management life-cycle. △ Less

Submitted 20 March, 2022; originally announced March 2022.

arXiv:2203.09110 [pdf, other]

doi 10.1051/0004-6361/202243115

Grow-up of a Filament Channel by Intermittent Small-scale Magnetic Reconnection

Authors: H. T. Li, X. Cheng, J. H. Guo, X. L. Yan, L. F. Wang, Z. Zhong, C. Li, M. D. Ding

Abstract: Filament channel (FC), a plasma volume where the magnetic field is primarily aligned with the polarity inversion line, is believed to be the pre-eruptive configuration of coronal mass ejections. Nevertheless, evidence for how the FC is formed is still elusive. In this paper, we present a detailed study on the build-up of a FC to understand its formation mechanism. The New Vacuum Solar Telescope of… ▽ More Filament channel (FC), a plasma volume where the magnetic field is primarily aligned with the polarity inversion line, is believed to be the pre-eruptive configuration of coronal mass ejections. Nevertheless, evidence for how the FC is formed is still elusive. In this paper, we present a detailed study on the build-up of a FC to understand its formation mechanism. The New Vacuum Solar Telescope of Yunnan Observatories and Optical and Near-Infrared Solar Eruption Tracer of Nanjing University, as well as the AIA and HMI on board Solar Dynamics Observatory are used to study the grow-up process of the FC. Furthermore, we reconstruct the non-linear force-free field (NLFFF) of the active region using the regularized Biot-Savart laws (RBSL) and magnetofrictional method to reveal three-dimension (3D) magnetic field properties of the FC. We find that partial filament materials are quickly transferred to longer magnetic field lines formed by small-scale magnetic reconnection, as evidenced by dot-like Hα/EUV brightenings and subsequent bidirectional outflow jets, as well as untwisting motions. The Hα/EUV bursts appear repeatedly at the same location and are closely associated with flux cancellation, which occurs between two small-scale opposite polarities and is driven by shearing and converging motions. The 3D NLFFF model reveals that the reconnection takes place in a hyperbolic flux tube that is located above the flux cancellation site and below the FC. The FC is gradually built up toward a twisted flux rope via series of small-scale reconnection events that occur intermittently prior to the eruption. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 10 pages, 7 figures, 1 table. A&A accepted

Journal ref: A&A 663, A127 (2022)

arXiv:2203.07630 [pdf, other]

doi 10.1051/0004-6361/202142839

An approximate recipe of chromospheric radiative losses for solar flares

Authors: Jie Hong, Mats Carlsson, M. D. Ding

Abstract: Radiative losses in the chromosphere are very important in the energy balance. There have been efforts to make simple lookup tables for chromospheric radiative losses in the quiet Sun. During solar flares, the atmospheric conditions are quite different, and the currently available recipe of Gan & Fang (1990) is constructed from semi-empirical models. It remains to be evaluated how these recipes wo… ▽ More Radiative losses in the chromosphere are very important in the energy balance. There have been efforts to make simple lookup tables for chromospheric radiative losses in the quiet Sun. During solar flares, the atmospheric conditions are quite different, and the currently available recipe of Gan & Fang (1990) is constructed from semi-empirical models. It remains to be evaluated how these recipes work in flare conditions. We aim to construct an approximate recipe of chromospheric radiative losses for solar flares. We follow the method of Carlsson & Leenaarts (2012) to tabulate the optically thin radiative loss, escape probability, and ionization fraction, while using a grid of flare models from radiative hydrodynamic simulations as our dataset. We provide new lookup tables to calculate chromospheric radiative losses for flares. Compared with previous recipes, our recipe provides a better approximation to the detailed radiative losses for flares. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 7 pages, 6 figures, 2 tables. A&A accepted. The lookup tables of the fitted curves are available in http://sdc.nju.edu.cn/d/8f8ef25c91684926ae3c

Journal ref: A&A 661, A77 (2022)

arXiv:2203.06928 [pdf, ps, other]

Generalized quantum cluster algebras: the Laurent phenomenon and upper bounds

Authors: Liqian Bai, Xueqing Chen, Ming Ding, Fan Xu

Abstract: Generalized quantum cluster algebras introduced in [1] are quantum deformation of generalized cluster algebras of geometric types. In this paper, we prove that the Laurent phenomenon holds in these generalized quantum cluster algebras. We also show that upper bounds coincide with the corresponding generalized quantum upper cluster algebras under the "coprimality" condition. Generalized quantum cluster algebras introduced in [1] are quantum deformation of generalized cluster algebras of geometric types. In this paper, we prove that the Laurent phenomenon holds in these generalized quantum cluster algebras. We also show that upper bounds coincide with the corresponding generalized quantum upper cluster algebras under the "coprimality" condition. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 22 pages. Comments are welcome

arXiv:2203.02761 [pdf, other]

A randomized singular value decomposition for third-order oriented tensors

Authors: Minghui Ding, Yimin Wei, Pengpeng Xie

Abstract: The oriented singular value decomposition (O-SVD) proposed by Zeng and Ng provides a hybrid approach to the t-product based third-order tensor singular value decomposition with the transform matrix being a factor matrix of the higher order singular value decomposition. Continuing along this vein, this paper explores realizing the O-SVD more efficiently by the tensor-train rank-1 decomposition and… ▽ More The oriented singular value decomposition (O-SVD) proposed by Zeng and Ng provides a hybrid approach to the t-product based third-order tensor singular value decomposition with the transform matrix being a factor matrix of the higher order singular value decomposition. Continuing along this vein, this paper explores realizing the O-SVD more efficiently by the tensor-train rank-1 decomposition and gives a truncated O-SVD. Motivated by the success of probabilistic algorithms, we develop a randomized version of the O-SVD and present its detailed error analysis. The new algorithm has advantages in efficiency while keeping good accuracy compared with the current tensor decompositions. Our claims are supported by numerical experiments on several oriented tensors from real applications. △ Less

Submitted 26 February, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

arXiv:2202.09027 [pdf, other]

Trusted AI in Multi-agent Systems: An Overview of Privacy and Security for Distributed Learning

Authors: Chuan Ma, Jun Li, Kang Wei, Bo Liu, Ming Ding, Long Yuan, Zhu Han, H. Vincent Poor

Abstract: Motivated by the advancing computational capacity of distributed end-user equipments (UEs), as well as the increasing concerns about sharing private data, there has been considerable recent interest in machine learning (ML) and artificial intelligence (AI) that can be processed on on distributed UEs. Specifically, in this paradigm, parts of an ML process are outsourced to multiple distributed UEs,… ▽ More Motivated by the advancing computational capacity of distributed end-user equipments (UEs), as well as the increasing concerns about sharing private data, there has been considerable recent interest in machine learning (ML) and artificial intelligence (AI) that can be processed on on distributed UEs. Specifically, in this paradigm, parts of an ML process are outsourced to multiple distributed UEs, and then the processed ML information is aggregated on a certain level at a central server, which turns a centralized ML process into a distributed one, and brings about significant benefits. However, this new distributed ML paradigm raises new risks of privacy and security issues. In this paper, we provide a survey of the emerging security and privacy risks of distributed ML from a unique perspective of information exchange levels, which are defined according to the key steps of an ML process, i.e.: i) the level of preprocessed data, ii) the level of learning models, iii) the level of extracted knowledge and, iv) the level of intermediate results. We explore and analyze the potential of threats for each information exchange level based on an overview of the current state-of-the-art attack mechanisms, and then discuss the possible defense methods against such threats. Finally, we complete the survey by providing an outlook on the challenges and possible directions for future research in this critical area. △ Less

Submitted 9 August, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:1907.09470, arXiv:2003.02133, arXiv:1606.05053, arXiv:1812.06415 by other authors

arXiv:2202.05011 [pdf, other]

Hardness Results for Laplacians of Simplicial Complexes via Sparse-Linear Equation Complete Gadgets

Authors: Ming Ding, Rasmus Kyng, Maximilian Probst Gutenberg, Peng Zhang

Abstract: We study linear equations in combinatorial Laplacians of $k$-dimensional simplicial complexes ($k$-complexes), a natural generalization of graph Laplacians. Combinatorial Laplacians play a crucial role in homology and are a central tool in topology. Beyond this, they have various applications in data analysis and physical modeling problems. It is known that nearly-linear time solvers exist for gra… ▽ More We study linear equations in combinatorial Laplacians of $k$-dimensional simplicial complexes ($k$-complexes), a natural generalization of graph Laplacians. Combinatorial Laplacians play a crucial role in homology and are a central tool in topology. Beyond this, they have various applications in data analysis and physical modeling problems. It is known that nearly-linear time solvers exist for graph Laplacians. However, nearly-linear time solvers for combinatorial Laplacians are only known for restricted classes of complexes. This paper shows that linear equations in combinatorial Laplacians of 2-complexes are as hard to solve as general linear equations. More precisely, for any constant $c \geq 1$, if we can solve linear equations in combinatorial Laplacians of 2-complexes up to high accuracy in time $\tilde{O}((\# \text{ of nonzero coefficients})^c)$, then we can solve general linear equations with polynomially bounded integer coefficients and condition numbers up to high accuracy in time $\tilde{O}((\# \text{ of nonzero coefficients})^c)$. We prove this by a nearly-linear time reduction from general linear equations to combinatorial Laplacians of 2-complexes. Our reduction preserves the sparsity of the problem instances up to poly-logarithmic factors. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Showing 201–250 of 733 results for author: Ding, M