Search | arXiv e-print repository

Jump-Start Reinforcement Learning

Authors: Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman

Abstract: Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently learning policies from scratch can be very difficult, particularly for tasks with exploration challenges. In such settings, it might be desirable to initialize RL with an existing policy, offline data, or demonstrations. However, naively performing s… ▽ More Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently learning policies from scratch can be very difficult, particularly for tasks with exploration challenges. In such settings, it might be desirable to initialize RL with an existing policy, offline data, or demonstrations. However, naively performing such initialization in RL often works poorly, especially for value-based methods. In this paper, we present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy, and is compatible with any RL approach. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks: a guide-policy, and an exploration-policy. By using the guide-policy to form a curriculum of starting states for the exploration-policy, we are able to efficiently improve performance on a set of simulated robotic tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms, particularly in the small-data regime. In addition, we provide an upper bound on the sample complexity of JSRL and show that with the help of a guide-policy, one can improve the sample complexity for non-optimism exploration methods from exponential in horizon to polynomial. △ Less

Submitted 7 July, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: 20 pages, 10 figures

arXiv:2204.01691 [pdf, other]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee , et al. (20 additional authors not shown)

Abstract: Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embo… ▽ More Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/. △ Less

Submitted 16 August, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: See website at https://say-can.github.io/ V1. Initial Upload. V2. Added PaLM results. Added study about new capabilities (drawer manipulation, chain of thought prompting, multilingual instructions). Added an ablation study of language model size. Added an open-source version of \algname on a simulated tabletop environment. Improved readability

arXiv:2203.15442 [pdf, other]

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

Authors: Jiabo Ye, Junfeng Tian, Ming Yan, Xiaoshan Yang, Xuwu Wang, Ji Zhang, Liang He, Xin Lin

Abstract: Visual grounding focuses on establishing fine-grained alignment between vision and natural language, which has essential applications in multimodal reasoning systems. Existing methods use pre-trained query-agnostic visual backbones to extract visual feature maps independently without considering the query information. We argue that the visual features extracted from the visual backbones and the fe… ▽ More Visual grounding focuses on establishing fine-grained alignment between vision and natural language, which has essential applications in multimodal reasoning systems. Existing methods use pre-trained query-agnostic visual backbones to extract visual feature maps independently without considering the query information. We argue that the visual features extracted from the visual backbones and the features really needed for multimodal reasoning are inconsistent. One reason is that there are differences between pre-training tasks and visual grounding. Moreover, since the backbones are query-agnostic, it is difficult to completely avoid the inconsistency issue by training the visual backbone end-to-end in the visual grounding framework. In this paper, we propose a Query-modulated Refinement Network (QRNet) to address the inconsistent issue by adjusting intermediate features in the visual backbone with a novel Query-aware Dynamic Attention (QD-ATT) mechanism and query-aware multiscale fusion. The QD-ATT can dynamically compute query-dependent visual attention at the spatial and channel levels of the feature maps produced by the visual backbone. We apply the QRNet to an end-to-end visual grounding framework. Extensive experiments show that the proposed method outperforms state-of-the-art methods on five widely used datasets. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.14600 [pdf]

Observation of quadruple Weyl point in hybrid-Weyl phononic crystals

Authors: Li Luo, Weiyin Deng, Yating Yang, Mou Yan, Jiuyang Lu, Xueqin Huang, Zhengyou Liu

Abstract: The discovery of Weyl semimetals opens the door for searching topological semimetals in physical science. The Weyl points are generally recognized as conventional, quadratic, spin-1, and those of high topological charges. Here we report the observation of the quadruple Weyl point of charge 4, the highest topological charge a twofold degenerate node can carry. Besides the quadruple Weyl point, the… ▽ More The discovery of Weyl semimetals opens the door for searching topological semimetals in physical science. The Weyl points are generally recognized as conventional, quadratic, spin-1, and those of high topological charges. Here we report the observation of the quadruple Weyl point of charge 4, the highest topological charge a twofold degenerate node can carry. Besides the quadruple Weyl point, the phononic semimetal also hosts conventional, quadratic, and spin-1 Weyl points, which stands as a system with yet the richest types of Weyl points. The quadruple-helicoid surface states, specific to the quadruple Weyl point, are demonstrated. The finding of the high-charge Weyl point enriches the knowledge of Weyl semimetals and may stimulate related researches in other systems, such as photonic, mechanical and cold atom systems. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 15 pages, 3 figures, and 1 table

arXiv:2202.11343 [pdf, other]

Alleviating Datapath Conflicts and Design Centralization in Graph Analytics Acceleration

Authors: Haiyang Lin, Mingyu Yan, Duo Wang, Mo Zou, Fengbin Tu, Xiaochun Ye, Dongrui Fan, Yuan Xie

Abstract: Previous graph analytics accelerators have achieved great improvement on throughput by alleviating irregular off-chip memory accesses. However, on-chip side datapath conflicts and design centralization have become the critical issues hindering further throughput improvement. In this paper, a general solution, Multiple-stage Decentralized Propagation network (MDP-network), is proposed to address th… ▽ More Previous graph analytics accelerators have achieved great improvement on throughput by alleviating irregular off-chip memory accesses. However, on-chip side datapath conflicts and design centralization have become the critical issues hindering further throughput improvement. In this paper, a general solution, Multiple-stage Decentralized Propagation network (MDP-network), is proposed to address these issues, inspired by the key idea of trading latency for throughput. Besides, a novel High throughput Graph analytics accelerator, HiGraph, is proposed by deploying MDP-network to address each issue in practice. The experiment shows that compared with state-of-the-art accelerator, HiGraph achieves up to 2.2x speedup (1.5x on average) as well as better scalability. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: To Appear in 59th Design Automation Conference (DAC 2022)

arXiv:2202.04822 [pdf, other]

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Authors: Xin Liu, Mingyu Yan, Lei Deng, Guoqi Li, Xiaochun Ye, Dongrui Fan, Shirui Pan, Yuan Xie

Abstract: Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications. However, with the use of huger data and deeper models, an urgent demand is unsurprisingly made to accelerate GNNs for more efficient execution. In this paper, we provide a comprehensive survey on acceleration methods for GNNs from an algorithmic perspective. We first present a new… ▽ More Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications. However, with the use of huger data and deeper models, an urgent demand is unsurprisingly made to accelerate GNNs for more efficient execution. In this paper, we provide a comprehensive survey on acceleration methods for GNNs from an algorithmic perspective. We first present a new taxonomy to classify existing acceleration methods into five categories. Based on the classification, we systematically discuss these methods and highlight their correlations. Next, we provide comparisons from aspects of the efficiency and characteristics of these methods. Finally, we suggest some promising prospects for future research. △ Less

Submitted 24 April, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted by International Joint Conference on Artificial Intelligence (IJCAI-22)

arXiv:2202.01382 [pdf, other]

doi 10.1017/jfm.2022.848

Asymptotic behaviour of rotating convection-driven dynamos in the plane layer geometry

Authors: Ming Yan, Michael A. Calkins

Abstract: Dynamos driven by rotating convection in the plane layer geometry are investigated numerically for a range of Ekman number ($E$), magnetic Prandtl number ($Pm$) and Rayleigh number ($Ra$). The primary purpose of the investigation is to compare results of the simulations with previously developed asymptotic theory that is applicable in the limit of rapid rotation. We find that all of the simulation… ▽ More Dynamos driven by rotating convection in the plane layer geometry are investigated numerically for a range of Ekman number ($E$), magnetic Prandtl number ($Pm$) and Rayleigh number ($Ra$). The primary purpose of the investigation is to compare results of the simulations with previously developed asymptotic theory that is applicable in the limit of rapid rotation. We find that all of the simulations are in the quasi-geostrophic regime in which the Coriolis and pressure gradient forces are approximately balanced at leading order, whereas all other forces, including the Lorentz force, act as perturbations. Agreement between simulation output and asymptotic scalings for the energetics, flow speeds, magnetic field amplitude and length scales is found. The transition from large scale dynamos to small scale dynamos is well described by the magnetic Reynolds number based on the small convective length scale, $\widetilde{Rm}$, with large scale dynamos preferred when $\widetilde{Rm} \lesssim O(1)$. The magnitude of the large scale magnetic field is observed to saturate and become approximately constant with increasing Rayleigh number. Energy spectra show that all length scales present in the flow field and the small-scale magnetic field are consistent with a scaling of $E^{1/3}$, even in the turbulent regime. For a fixed value of $E$, we find that the viscous dissipation length scale is approximately constant over a broad range of $Ra$; the ohmic dissipation length scale is approximately constant within the large scale dynamo regime, but transitions to a $\widetilde{Rm}^{-1/2}$ scaling in the small scale dynamo regime. △ Less

Submitted 2 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: 38 pages, 13 figures

arXiv:2201.12667 [pdf, other]

Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity

Authors: Minghao Yan, Nicholas Meisburger, Tharun Medini, Anshumali Shrivastava

Abstract: More than 70% of cloud computing is paid for but sits idle. A large fraction of these idle compute are cheap CPUs with few cores that are not utilized during the less busy hours. This paper aims to enable those CPU cycles to train heavyweight AI models. Our goal is against mainstream frameworks, which focus on leveraging expensive specialized ultra-high bandwidth interconnect to address the commun… ▽ More More than 70% of cloud computing is paid for but sits idle. A large fraction of these idle compute are cheap CPUs with few cores that are not utilized during the less busy hours. This paper aims to enable those CPU cycles to train heavyweight AI models. Our goal is against mainstream frameworks, which focus on leveraging expensive specialized ultra-high bandwidth interconnect to address the communication bottleneck in distributed neural network training. This paper presents a distributed model-parallel training framework that enables training large neural networks on small CPU clusters with low Internet bandwidth. We build upon the adaptive sparse training framework introduced by the SLIDE algorithm. By carefully deploying sparsity over distributed nodes, we demonstrate several orders of magnitude faster model parallel training than Horovod, the main engine behind most commercial software. We show that with reduced communication, due to sparsity, we can train close to a billion parameter model on simple 4-16 core CPU nodes connected by basic low bandwidth interconnect. Moreover, the training time is at par with some of the best hardware accelerators. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2201.11313 [pdf, other]

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Authors: Chen Wu, Ming Yan

Abstract: Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics. Recently, deep neural network for code search has been a hot research topic. Typical met… ▽ More Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics. Recently, deep neural network for code search has been a hot research topic. Typical methods for neural code search first represent the code snippet and query text as separate embeddings, and then use vector distance (e.g. dot-product or cosine) to calculate the semantic similarity between them. There exist many different ways for aggregating the variable length of code or query tokens into a learnable embedding, including bi-encoder, cross-encoder, and poly-encoder. The goal of the query encoder and code encoder is to produce embeddings that are close with each other for a related pair of query and the corresponding desired code snippet, in which the choice and design of encoder is very significant. In this paper, we propose a novel deep semantic model which makes use of the utilities of not only the multi-modal sources, but also feature extractors such as self-attention, the aggregated vectors, combination of the intermediate representations. We apply the proposed model to tackle the CodeSearchNet challenge about semantic code search. We align cross-lingual embedding for multi-modality learning with large batches and hard example mining, and combine different learned representations for better enhancing the representation learning. Our model is trained on CodeSearchNet corpus and evaluated on the held-out data, the final model achieves 0.384 NDCG and won the first place in this benchmark. Models and code are available at https://github.com/overwindows/SemanticCodeSearch.git. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2201.00139 [pdf, other]

On the improved conditions for some primal-dual algorithms

Authors: Yao Li, Ming Yan

Abstract: The convex minimization of $f(\mathbf{x})+g(\mathbf{x})+h(\mathbf{A}\mathbf{x})$ over $\mathbb{R}^n$ with differentiable $f$ and linear operator $\mathbf{A}: \mathbb{R}^n\rightarrow \mathbb{R}^m$, has been well-studied in the literature. By considering the primal-dual optimality of the problem, many algorithms are proposed from different perspectives such as monotone operator scheme and fixed poin… ▽ More The convex minimization of $f(\mathbf{x})+g(\mathbf{x})+h(\mathbf{A}\mathbf{x})$ over $\mathbb{R}^n$ with differentiable $f$ and linear operator $\mathbf{A}: \mathbb{R}^n\rightarrow \mathbb{R}^m$, has been well-studied in the literature. By considering the primal-dual optimality of the problem, many algorithms are proposed from different perspectives such as monotone operator scheme and fixed point theory. In this paper, we start with a base algorithm to reveal the connection between several algorithms such as AFBA, PD3O and Chambolle-Pock. Then, we prove its convergence under a relaxed assumption associated with the linear operator and characterize the general constraint on primal and dual stepsizes. The result improves the upper bound of stepsizes of AFBA and indicates that Chambolle-Pock, as the special case of the base algorithm when $f=0$, can take the stepsize of the dual iteration up to $4/3$ of the previously proven one. △ Less

Submitted 1 January, 2022; originally announced January 2022.

arXiv:2112.13732 [pdf, other]

doi 10.1021/acs.jpclett.1c00394

Correlations in the Electronic Structure of van der Waals NiPS$_3$ Crystals: An X-Ray Absorption and Resonant Photoelectron Spectroscopy Study

Authors: Mouhui Yan, Yichen Jin, Zhicheng Wu, Arshak Tsaturyan, Anna Makarova, Dmitry Smirnov, Elena Voloshina, Yuriy Dedkov

Abstract: The electronic structure of high-quality van der Waals NiPS$_3$ crystals was studied using near-edge x-ray absorption spectroscopy (NEXAFS) and resonant photoelectron spectroscopy (ResPES) in combination with density functional theory (DFT) approach. The experimental spectroscopic methods, being element specific, allow to discriminate between atomic contributions in the valence and conduction band… ▽ More The electronic structure of high-quality van der Waals NiPS$_3$ crystals was studied using near-edge x-ray absorption spectroscopy (NEXAFS) and resonant photoelectron spectroscopy (ResPES) in combination with density functional theory (DFT) approach. The experimental spectroscopic methods, being element specific, allow to discriminate between atomic contributions in the valence and conduction band density of states and give direct comparison with the results of DFT calculations. Analysis of the NEXAFS and ResPES data allows to identify the NiPS$_3$ material as a charge-transfer insulator. Obtained spectroscopic and theoretical data are very important for the consideration of possible correlated-electron phenomena in such transition-metal layered materials, where the interplay between different degrees of freedom for electrons defines their electronic properties, allowing to understand their optical and transport properties and to propose further possible applications in electronics, spintronics and catalysis. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Journal ref: J. Phys. Chem. Lett. 12, 2400 (2021)

arXiv:2112.12529 [pdf, other]

Mott-Hubbard Insulating State for the Layered van der Waals FePX$_3$ (X:S, Se) As Revealed by NEXAFS and Resonant Photoelectron Spectroscopy

Authors: Yichen Jin, Mouhui Yan, Tomislav Kremer, Elena Voloshina, Yuriy Dedkov

Abstract: A broad family of the nowadays studied low-dimensional systems, including 2D materials, demonstrate many fascinating properties, which however depend on the atomic composition as well as on the system dimensionality. Therefore, the studies of the electronic correlation effects in the new 2D materials is of paramount importance for the understanding of their transport, optical and catalytic propert… ▽ More A broad family of the nowadays studied low-dimensional systems, including 2D materials, demonstrate many fascinating properties, which however depend on the atomic composition as well as on the system dimensionality. Therefore, the studies of the electronic correlation effects in the new 2D materials is of paramount importance for the understanding of their transport, optical and catalytic properties. Here, by means of electron spectroscopy methods in combination with density functional theory calculations we investigate the electronic structure of a new layered van der Waals FePX$_3$ (X: S, Se) materials. Using systematic resonant photoelectron spectroscopy studies we observed strong resonant behavior for the peaks associated with the $3d^{n-1}$ final state at low binding energies for these materials. Such observations clearly assign FePX$_3$ to the class of Mott-Hubbard type insulators for which the top of the valence band is formed by the hybrid Fe-S/Se electronic states. These observations are important for the deep understanding of this new class of materials and draw perspectives for their further applications in different application areas, like (opto)spintronics and catalysis. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted for publication in Sci. Rep. (22.12.2021)

arXiv:2112.12527 [pdf, other]

doi 10.1021/acs.jpclett.1c02790

Topological Quasi-2D Semimetal Co$_3$Sn$_2$S$_2$: Insights To Electronic Structure From NEXAFS and Resonant Photoelectron Spectroscopy

Authors: Mouhui Yan, Yichen Jin, Xiaofei Hou, Yanfeng Guo, Arshak Tsaturyan, Anna Makarova, Dmitry Smirnov, Yuriy Dedkov, Elena Voloshina

Abstract: The electronic structure of the natural topological semimetal Co$_3$Sn$_2$S$_2$ crystals was studied using near-edge x-ray absorption spectroscopy (NEXAFS) and resonant photoelectron spectroscopy (ResPES). Although, the significant increase of the Co\,$3d$ valence band emission is observed at the Co\,$2p$ absorption edge in the ResPES experiments, the spectral weight at these photon energies is do… ▽ More The electronic structure of the natural topological semimetal Co$_3$Sn$_2$S$_2$ crystals was studied using near-edge x-ray absorption spectroscopy (NEXAFS) and resonant photoelectron spectroscopy (ResPES). Although, the significant increase of the Co\,$3d$ valence band emission is observed at the Co\,$2p$ absorption edge in the ResPES experiments, the spectral weight at these photon energies is dominated by the normal Auger contribution. This observation indicates the delocalised character of photoexcited Co\,$3d$ electrons and is supported by the first-principle calculations. Our results on the investigations of the element- and orbital-specific electronic states near the Fermi level of Co$_3$Sn$_2$S$_2$ are of importance for the comprehensive description of the electronic structure of this materials, which is significant for future applications of this material in different areas of science and technology, including catalysis and water splitting applications. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Journal ref: J. Phys. Chem. Lett. 12, 9807 (2021)

arXiv:2112.11821 [pdf, other]

doi 10.1016/j.cplett.2020.137627

To the synthesis and characterization of layered metal phosphorus triselenides proposed for electrochemical sensing and energy applications

Authors: Yuriy Dedkov, Mouhui Yan, Elena Voloshina

Abstract: Recent studies reported on the synthesis and characterization of several bulk crystals of layered metal triselenophosphites MPSe$_3$ (M = transition metals). In these works characterization was performed via a combination of different bulk- and surface-sensitive experimental methods accompanied by DFT calculations. However, the critical examination of the available experimental and theoretical dat… ▽ More Recent studies reported on the synthesis and characterization of several bulk crystals of layered metal triselenophosphites MPSe$_3$ (M = transition metals). In these works characterization was performed via a combination of different bulk- and surface-sensitive experimental methods accompanied by DFT calculations. However, the critical examination of the available experimental and theoretical data demonstrates that these results do not support the conclusions on the electrochemical sensing and energy applications of studied triselenophosphites. These conclusions are made without any relation to the age of discussed data and possible recent progress in experimental and theoretical approaches. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Journal ref: Chem. Phys. Lett. 754, 137627 (2020)

arXiv:2112.09343 [pdf, other]

doi 10.1109/CVPR52688.2022.00708

Domain Adaptation on Point Clouds via Geometry-Aware Implicits

Authors: Yuefan Shen, Yanchao Yang, Mi Yan, He Wang, Youyi Zheng, Leonidas Guibas

Abstract: As a popular geometric representation, point clouds have attracted much attention in 3D vision, leading to many applications in autonomous driving and robotics. One important yet unsolved issue for learning on point cloud is that point clouds of the same object can have significant geometric variations if generated using different procedures or captured using different sensors. These inconsistenci… ▽ More As a popular geometric representation, point clouds have attracted much attention in 3D vision, leading to many applications in autonomous driving and robotics. One important yet unsolved issue for learning on point cloud is that point clouds of the same object can have significant geometric variations if generated using different procedures or captured using different sensors. These inconsistencies induce domain gaps such that neural networks trained on one domain may fail to generalize on others. A typical technique to reduce the domain gap is to perform adversarial training so that point clouds in the feature space can align. However, adversarial training is easy to fall into degenerated local minima, resulting in negative adaptation gains. Here we propose a simple yet effective method for unsupervised domain adaptation on point clouds by employing a self-supervised task of learning geometry-aware implicits, which plays two critical roles in one shot. First, the geometric information in the point clouds is preserved through the implicit representations for downstream tasks. More importantly, the domain-specific variations can be effectively learned away in the implicit space. We also propose an adaptive strategy to compute unsigned distance fields for arbitrary point clouds due to the lack of shape models in practice. When combined with a task loss, the proposed outperforms state-of-the-art unsupervised domain adaptation methods that rely on adversarial domain alignment and more complicated self-supervised tasks. Our method is evaluated on both PointDA-10 and GraspNet datasets. The code and trained models will be publicly available. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2111.08896 [pdf, other]

Achieving Human Parity on Visual Question Answering

Authors: Ming Yan, Haiyang Xu, Chenliang Li, Junfeng Tian, Bin Bi, Wei Wang, Weihua Chen, Xianzhe Xu, Fan Wang, Zheng Cao, Zhicheng Zhang, Qiyu Zhang, Ji Zhang, Songfang Huang, Fei Huang, Luo Si, Rong Jin

Abstract: The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image. It has been a popular research topic with an increasing number of real-world applications in the last decade. This paper describes our recent research of AliceMind-MMU (ALIbaba's Collection of Encoder-decoders from Machine IntelligeNce lab of Damo academy… ▽ More The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image. It has been a popular research topic with an increasing number of real-world applications in the last decade. This paper describes our recent research of AliceMind-MMU (ALIbaba's Collection of Encoder-decoders from Machine IntelligeNce lab of Damo academy - MultiMedia Understanding) that obtains similar or even slightly better results than human being does on VQA. This is achieved by systematically improving the VQA pipeline including: (1) pre-training with comprehensive visual and textual feature representation; (2) effective cross-modal interaction with learning to attend; and (3) A novel knowledge mining framework with specialized expert modules for the complex VQA task. Treating different types of visual questions with corresponding expertise needed plays an important role in boosting the performance of our VQA architecture up to the human level. An extensive set of experiments and analysis are conducted to demonstrate the effectiveness of the new research work. △ Less

Submitted 19 November, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

arXiv:2111.07549 [pdf, other]

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

Authors: Zhu Li, Yuqing Zhang, Mengxi Nie, Ming Yan, Mengnan He, Ruixiong Zhang, Caixia Gong

Abstract: Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech. However, training these models typically requires a large amount of high-fidelity speech data, and for unseen texts, the prosody of synthesized speech is relatively unnatural. To address these issues, we propose to combine a fine-tuned BERT-based front-end with a pre-trained FastSpeech2-base… ▽ More Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech. However, training these models typically requires a large amount of high-fidelity speech data, and for unseen texts, the prosody of synthesized speech is relatively unnatural. To address these issues, we propose to combine a fine-tuned BERT-based front-end with a pre-trained FastSpeech2-based acoustic model to improve prosody modeling. The pre-trained BERT is fine-tuned on the polyphone disambiguation task, the joint Chinese word segmentation (CWS) and part-of-speech (POS) tagging task, and the prosody structure prediction (PSP) task in a multi-task learning framework. FastSpeech 2 is pre-trained on large-scale external data that are noisy but easier to obtain. Experimental results show that both the fine-tuned BERT model and the pre-trained FastSpeech 2 can improve prosody, especially for those structurally complex sentences. △ Less

Submitted 15 November, 2021; originally announced November 2021.

arXiv:2111.05424 [pdf, other]

AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Authors: Yao Lu, Karol Hausman, Yevgen Chebotar, Mengyuan Yan, Eric Jang, Alexander Herzog, Ted Xiao, Alex Irpan, Mohi Khansari, Dmitry Kalashnikov, Sergey Levine

Abstract: Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amountsof autonomously collected experience.Both methods have complementarystrengths and weaknesses: RL can reach a high level of performance, but requiresexploration, which can be very time consuming and unsafe; IL does not requireexploration, but only learn… ▽ More Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amountsof autonomously collected experience.Both methods have complementarystrengths and weaknesses: RL can reach a high level of performance, but requiresexploration, which can be very time consuming and unsafe; IL does not requireexploration, but only learns skills that are as good as the provided demonstrations.Can a single method combine the strengths of both approaches? A number ofprior methods have aimed to address this question, proposing a variety of tech-niques that integrate elements of IL and RL. However, scaling up such methodsto complex robotic skills that integrate diverse offline data and generalize mean-ingfully to real-world scenarios still presents a major challenge. In this paper, ouraim is to test the scalability of prior IL + RL algorithms and devise a system basedon detailed empirical experimentation that combines existing components in themost effective and scalable way. To that end, we present a series of experimentsaimed at understanding the implications of each design decision, so as to develop acombined approach that can utilize demonstrations and heterogeneous prior datato attain the best performance on a range of real-world and realistic simulatedrobotic problems. Our complete method, which we call AW-Opt, combines ele-ments of advantage-weighted regression [1, 2] and QT-Opt [3], providing a unifiedapproach for integrating demonstrations and offline data for robotic manipulation.Please see https://awopt.github.io for more details. △ Less

Submitted 11 November, 2021; v1 submitted 9 November, 2021; originally announced November 2021.

arXiv:2110.14721 [pdf, other]

Quasi-static magnetoconvection with a tilted magnetic field

Authors: Justin A. Nicoski, Ming Yan, Michael A. Calkins

Abstract: A numerical study of convection with stress-free boundary conditions in the presence of an imposed magnetic field that is tilted with respect to the direction of gravity is carried out in the limit of small magnetic Reynolds number. The dynamics are investigated over a range of Rayleigh number $Ra$ and Chandrasekhar numbers up to $Q = 2\times10^6$, with the tilt angle between the gravity vector an… ▽ More A numerical study of convection with stress-free boundary conditions in the presence of an imposed magnetic field that is tilted with respect to the direction of gravity is carried out in the limit of small magnetic Reynolds number. The dynamics are investigated over a range of Rayleigh number $Ra$ and Chandrasekhar numbers up to $Q = 2\times10^6$, with the tilt angle between the gravity vector and imposed magnetic field vector fixed at $45^{\circ}$. For a fixed value of $Q$ and increasing $Ra$, the convection dynamics can be broadly characterized by three primary flow regimes: (1) quasi-two-dimensional convection rolls near the onset of convection; (2) isolated convection columns aligned with the imposed magnetic field; and (3) unconstrained convection reminiscent of non-magnetic convection. The influence of varying $Q$ and $Ra$ on the various fields is analyzed. Heat and momentum transport, as characterized by the Nusselt and Reynolds numbers, are quantified and compared with the vertical field case. Ohmic dissipation dominates over viscous dissipation in all cases investigated. Various mean fields are investigated and their scaling behavior is analyzed. Provided $Ra$ is sufficiently large, all investigated values of $Q$ exhibit an inverse kinetic energy cascade that yields strong `zonal' flows. Relaxation oscillations, as characterized by a quasi-periodic shift in the predominance of either the zonal or non-zonal component of the mean flow, appear for sufficiently large $Ra$ and $Q$. △ Less

Submitted 2 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 30 pages, 16 figures

arXiv:2110.12478 [pdf, other]

Deep Asymmetric Hashing with Dual Semantic Regression and Class Structure Quantization

Authors: Jianglin Lu, Hailing Wang, Jie Zhou, Mengfan Yan, Jiajun Wen

Abstract: Recently, deep hashing methods have been widely used in image retrieval task. Most existing deep hashing approaches adopt one-to-one quantization to reduce information loss. However, such class-unrelated quantization cannot give discriminative feedback for network training. In addition, these methods only utilize single label to integrate supervision information of data for hashing function learni… ▽ More Recently, deep hashing methods have been widely used in image retrieval task. Most existing deep hashing approaches adopt one-to-one quantization to reduce information loss. However, such class-unrelated quantization cannot give discriminative feedback for network training. In addition, these methods only utilize single label to integrate supervision information of data for hashing function learning, which may result in inferior network generalization performance and relatively low-quality hash codes since the inter-class information of data is totally ignored. In this paper, we propose a dual semantic asymmetric hashing (DSAH) method, which generates discriminative hash codes under three-fold constraints. Firstly, DSAH utilizes class prior to conduct class structure quantization so as to transmit class information during the quantization process. Secondly, a simple yet effective label mechanism is designed to characterize both the intra-class compactness and inter-class separability of data, thereby achieving semantic-sensitive binary code learning. Finally, a meaningful pairwise similarity preserving loss is devised to minimize the distances between class-related network outputs based on an affinity graph. With these three main components, high-quality hash codes can be generated through network. Extensive experiments conducted on various datasets demonstrate the superiority of DSAH in comparison with state-of-the-art deep hashing methods. △ Less

Submitted 23 December, 2021; v1 submitted 24 October, 2021; originally announced October 2021.

arXiv:2110.11495 [pdf, other]

CTEQ-TEA group updates: Photon PDF and Impact from heavy flavors in the CT18 global analysis

Authors: Marco Guzzi, Keping Xie, Tie-Jiun Hou, Pavel Nadolsky, Carl Schmidt, Mengshi Yan, C. -P. Yuan

Abstract: We discuss recent CTEQ-TEA group activities after the publication of the CT18 global analysis of parton distribution functions (PDFs) in the proton. In particular, we discuss a new calculation for the photon content in the proton, termed as CT18lux and CT18qed PDFs, and the impact of novel charm- and bottom-quark production cross section measurements at HERA on the CT18 global analysis. We discuss recent CTEQ-TEA group activities after the publication of the CT18 global analysis of parton distribution functions (PDFs) in the proton. In particular, we discuss a new calculation for the photon content in the proton, termed as CT18lux and CT18qed PDFs, and the impact of novel charm- and bottom-quark production cross section measurements at HERA on the CT18 global analysis. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 6 pages, 4 figures, EPS-HEP2021 Conference Proceedings. Contribution to the European Physical Society Conference on High Energy Physics 2021

arXiv:2110.10903 [pdf, ps, other]

doi 10.1093/mnras/stab3063

Nulling and subpulse drifting in PSR J1727-2739

Authors: Rukiye Rejep, N. Wang, W. M. Yan, Z. G. Wen

Abstract: In this paper, we investigate the emission properties of PSR J1727-2739, whose mean pulse profile has two main components, by analysing five single-pulse observations made using the Parkes 64-m radio telescope with a central frequency of 1369 MHz between 2014 April and October. The total observation time is about 6.1 hours which contains 16718 pulses after removal of radio frequency interference (… ▽ More In this paper, we investigate the emission properties of PSR J1727-2739, whose mean pulse profile has two main components, by analysing five single-pulse observations made using the Parkes 64-m radio telescope with a central frequency of 1369 MHz between 2014 April and October. The total observation time is about 6.1 hours which contains 16718 pulses after removal of radio frequency interference (RFI). Previous studies reveal that PSR J1727-2739 exhibits both nulling and subpulse drifting. We estimate the nulling fraction to be 66%, which is consistent with previously published results. In addition to the previously known subpulse drifting in the leading component, we also explore the drifting properties for the trailing component. We observe two distinct drift modes whose vertical drift band separations ($P_{3}$) are consistent with earlier studies. We find that both profile components share the same drift periodicity $P_{3}$ in a certain drift mode, but the measured horizontal separations ($P_{2}$) are quite different for them. That is, PSR J1727-2739 is a pulsar showing both changes of drift periodicity $P_{3}$ between different drift modes and drift rate variations between components in a given drift mode. Pulsars exhibiting nulling along with drift mode changing, such as PSR J1727-2739, give an unique opportunity to investigate the physical mechanism of these phenomena. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 10 pages, 15 figures

arXiv:2110.07058 [pdf, other]

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/ △ Less

Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

arXiv:2110.05282 [pdf, ps, other]

Optimal Gradient Tracking for Decentralized Optimization

Authors: Zhuoqing Song, Lei Shi, Shi Pu, Ming Yan

Abstract: In this paper, we focus on solving the decentralized optimization problem of minimizing the sum of $n$ objective functions over a multi-agent network. The agents are embedded in an undirected graph where they can only send/receive information directly to/from their immediate neighbors. Assuming smooth and strongly convex objective functions, we propose an Optimal Gradient Tracking (OGT) method tha… ▽ More In this paper, we focus on solving the decentralized optimization problem of minimizing the sum of $n$ objective functions over a multi-agent network. The agents are embedded in an undirected graph where they can only send/receive information directly to/from their immediate neighbors. Assuming smooth and strongly convex objective functions, we propose an Optimal Gradient Tracking (OGT) method that achieves the optimal gradient computation complexity $O\left(\sqrtκ\log\frac{1}ε\right)$ and the optimal communication complexity $O\left(\sqrt{\fracκθ}\log\frac{1}ε\right)$ simultaneously, where $κ$ and $\frac{1}θ$ denote the condition numbers related to the objective functions and the communication graph, respectively. To our knowledge, OGT is the first single-loop decentralized gradient-type method that is optimal in both gradient computation and communication complexities. The development of OGT involves two building blocks which are also of independent interest. The first one is another new decentralized gradient tracking method termed "Snapshot" Gradient Tracking (SS-GT), which achieves the gradient computation and communication complexities of $O\left(\sqrtκ\log\frac{1}ε\right)$ and $O\left(\frac{\sqrtκ}θ\log\frac{1}ε\right)$, respectively. SS-GT can be potentially extended to more general settings compared to OGT. The second one is a technique termed Loopless Chebyshev Acceleration (LCA) which can be implemented "looplessly" but achieve similar effect with adding multiple inner loops of Chebyshev acceleration in the algorithms. In addition to SS-GT, this LCA technique can accelerate many other gradient tracking based methods with respect to the graph condition number $\frac{1}θ$. △ Less

Submitted 20 April, 2024; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: Mathematical Programming, in press

arXiv:2110.03086 [pdf, other]

doi 10.1103/PhysRevResearch.4.L012026

Strong large scale magnetic fields in rotating convection-driven dynamos: the important role of magnetic diffusion

Authors: Ming Yan, Michael A. Calkins

Abstract: Natural dynamos such as planets and stars generate global scale magnetic field despite the inferred presence of small scale turbulence. Such systems are known as large scale dynamos and are typically driven by convection and influenced by rotation. Previous numerical studies of rotating dynamos generally find that the large scale magnetic field becomes weaker as the flow becomes more turbulent. Th… ▽ More Natural dynamos such as planets and stars generate global scale magnetic field despite the inferred presence of small scale turbulence. Such systems are known as large scale dynamos and are typically driven by convection and influenced by rotation. Previous numerical studies of rotating dynamos generally find that the large scale magnetic field becomes weaker as the flow becomes more turbulent. The underlying physical processes necessary for sustaining so-called large scale dynamos is therefore still debated. Here we use a suite of numerical simulations to show that strong large scale magnetic fields can be generated in rotating convective turbulence provided that two conditions are satisfied: (1) the flow remains rotationally constrained; and (2) magnetic diffusion is important on the small convective length scale. These findings are in agreement with previous asymptotic predictions and suggest that natural dynamos might satisfy these two conditions. △ Less

Submitted 9 February, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: 6 pages, 4 figures

arXiv:2109.00879 [pdf]

Observation of Square-Root Higher-Order Topological States in Photonic Waveguide Arrays

Authors: Juan Kang, Tao Liu, Mou Yan, Dandan Yang, Xiongjian Huang, Ruishan Wei, Jianrong Qiu, Guoping Dong, Zhongmin Yang, Franco Nori

Abstract: Recently, high-order topological insulators (HOTIs), accompanied by topologically nontrivial boundary states with codimension larger than one, have been extensively explored because of unconventional bulk-boundary correspondences. As a novel type of HOTIs, very recent works have explored the square-root HOTIs, where the topological nontrivial nature of bulk bands stems from the square of the Hamil… ▽ More Recently, high-order topological insulators (HOTIs), accompanied by topologically nontrivial boundary states with codimension larger than one, have been extensively explored because of unconventional bulk-boundary correspondences. As a novel type of HOTIs, very recent works have explored the square-root HOTIs, where the topological nontrivial nature of bulk bands stems from the square of the Hamiltonian. In this paper, we experimentally demonstrate 2D square-root HOTIs in photonic waveguide arrays written in glass using femtosecond laser direct-write techniques. Edge and corner states are clearly observed through visible light spectra. The dynamical evolutions of topological boundary states are experimentally demonstrated, which further verify the existence of in-gap edge and corner states. The robustness of these edge and corner states is revealed by introducing defects and disorders into the bulk structures. Our studies provide an extended platform for realizing light manipulation and stable photonic devices. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.11571 [pdf, other]

GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware

Authors: Xin Liu, Mingyu Yan, Shuhan Song, Zhengyang Lv, Wenming Li, Guangyu Sun, Xiaochun Ye, Dongrui Fan

Abstract: Sampling is a critical operation in Graph Neural Network (GNN) training that helps reduce the cost. Previous literature has explored improving sampling algorithms via mathematical and statistical methods. However, there is a gap between sampling algorithms and hardware. Without consideration of hardware, algorithm designers merely optimize sampling at the algorithm level, missing the great potenti… ▽ More Sampling is a critical operation in Graph Neural Network (GNN) training that helps reduce the cost. Previous literature has explored improving sampling algorithms via mathematical and statistical methods. However, there is a gap between sampling algorithms and hardware. Without consideration of hardware, algorithm designers merely optimize sampling at the algorithm level, missing the great potential of promoting the efficiency of existing sampling algorithms by leveraging hardware features. In this paper, we pioneer to propose a unified programming model for mainstream sampling algorithms, termed GNNSampler, covering the critical processes of sampling algorithms in various categories. Second, to leverage the hardware feature, we choose the data locality as a case study, and explore the data locality among nodes and their neighbors in a graph to alleviate irregular memory access in sampling. Third, we implement locality-aware optimizations in GNNSampler for various sampling algorithms to optimize the general sampling process. Finally, we emphatically conduct experiments on large graph datasets to analyze the relevance among training time, accuracy, and hardware-level metrics. Extensive experiments show that our method is universal to mainstream sampling algorithms and helps significantly reduce the training time, especially in large-scale graphs. △ Less

Submitted 24 June, 2022; v1 submitted 26 August, 2021; originally announced August 2021.

Comments: Accepted by ECML-PKDD 2022

arXiv:2108.09479 [pdf, other]

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

Authors: Ming Yan, Haiyang Xu, Chenliang Li, Bin Bi, Junfeng Tian, Min Gui, Wei Wang

Abstract: Existing approaches to vision-language pre-training (VLP) heavily rely on an object detector based on bounding boxes (regions), where salient objects are first detected from images and then a Transformer-based model is used for cross-modal fusion. Despite their superior performance, these approaches are bounded by the capability of the object detector in terms of both effectiveness and efficiency.… ▽ More Existing approaches to vision-language pre-training (VLP) heavily rely on an object detector based on bounding boxes (regions), where salient objects are first detected from images and then a Transformer-based model is used for cross-modal fusion. Despite their superior performance, these approaches are bounded by the capability of the object detector in terms of both effectiveness and efficiency. Besides, the presence of object detection imposes unnecessary constraints on model designs and makes it difficult to support end-to-end training. In this paper, we revisit grid-based convolutional features for vision-language pre-training, skipping the expensive region-related steps. We propose a simple yet effective grid-based VLP method that works surprisingly well with the grid features. By pre-training only with in-domain datasets, the proposed Grid-VLP method can outperform most competitive region-based VLP methods on three examined vision-language understanding tasks. We hope that our findings help to further advance the state of the art of vision-language pre-training, and provide a new direction towards effective and efficient VLP. △ Less

Submitted 21 August, 2021; originally announced August 2021.

arXiv:2108.06768 [pdf, other]

Connected and Disconnected Sea Partons from CT18 Parametrization of PDFs

Authors: Tie-Jiun Hou, Jian Liang, Keh-Fei Liu, Mengshi Yan, C. --P. Yuan

Abstract: The separation of the connected and disconnected sea partons, which were uncovered in the Euclidean path-integral formulation of the hadronic tensor, is accommodated with the CT18 parametrization of the global analysis of the parton distribution functions (PDFs). This is achieved with the help of the distinct small $x$ behaviors of these two sea parton components and the constraint from the lattic… ▽ More The separation of the connected and disconnected sea partons, which were uncovered in the Euclidean path-integral formulation of the hadronic tensor, is accommodated with the CT18 parametrization of the global analysis of the parton distribution functions (PDFs). This is achieved with the help of the distinct small $x$ behaviors of these two sea parton components and the constraint from the lattice calculation of the ratio of the strange momentum fraction to that of the ${\bar u}$ or ${\bar d}$ in the disconnected insertion. This allows lattice calculations of separate flavors in both the connected and disconnected insertions to be directly compared with the global analysis results term by term. △ Less

Submitted 15 August, 2021; originally announced August 2021.

Report number: MSUHEP-21-017

arXiv:2108.06596 [pdf, other]

NNLO constraints on proton PDFs from the SeaQuest and STAR experiments and other developments in the CTEQ-TEA global analysis

Authors: Marco Guzzi, T. J. Hobbs, Tie-Jiun Hou, Xiaoxian Jing, Keping Xie, Aurore Courtoy, Sayipjamal Dulat, Jun Gao, Joey Huston, Pavel M. Nadolsky, Carl Schmidt, Ibrahim Sitiwaldi, Mengshi Yan, C. -P. Yuan

Abstract: We review progress in the global QCD analysis by the CTEQ-TEA group since the publication of CT18 parton distribution functions (PDFs) in the proton. Specifically, we discuss comparisons of CT18 NNLO predictions with the LHC 13 TeV measurements as well as with the FNAL SeaQuest and BNL STAR data on lepton pair production. The specialized CT18X PDFs approximating saturation effects are compared wit… ▽ More We review progress in the global QCD analysis by the CTEQ-TEA group since the publication of CT18 parton distribution functions (PDFs) in the proton. Specifically, we discuss comparisons of CT18 NNLO predictions with the LHC 13 TeV measurements as well as with the FNAL SeaQuest and BNL STAR data on lepton pair production. The specialized CT18X PDFs approximating saturation effects are compared with the CT18sx PDFs obtained using NLL/NLO small-$x$ resummation. Short summaries are presented for the special CT18 parton distributions with fitted charm and with lattice QCD inputs. A recent comparative analysis of the impact of deuteron nuclear effects on the parton distributions by the CTEQ-JLab and CTEQ-TEA groups is summarized. △ Less

Submitted 11 February, 2022; v1 submitted 14 August, 2021; originally announced August 2021.

Comments: 16 pages, 7 figures

Report number: FERMILAB-CONF-21-361-QIS-SCD-T, MSUHEP-21-023, PITT-PACC-2117, SMU-HEP-21-09

arXiv:2108.05306 [pdf, ps, other]

doi 10.1140/epjc/s10052-022-10522-7

Interpretations of the new LHCb $P_c(4337)^+$ pentaquark state

Authors: Mao-Jun Yan, Fang-Zheng Peng, Mario Sánchez Sánchez, Manuel Pavon Valderrama

Abstract: Recently the LHCb collaboration has observed a new pentaquark state, the $P_c(4337)^+$. Owing to its proximity to the $χ_{c0}(1S) p$, $\bar{D}^* Λ_c$, $\bar{D} Σ_c$ and $\bar{D} Σ_c^*$ thresholds, this new pentaquark might very well be a meson-baryon bound state. However its spin and parity have not been determined yet and none of the previous possibilities can be ruled out. We briefly explore a f… ▽ More Recently the LHCb collaboration has observed a new pentaquark state, the $P_c(4337)^+$. Owing to its proximity to the $χ_{c0}(1S) p$, $\bar{D}^* Λ_c$, $\bar{D} Σ_c$ and $\bar{D} Σ_c^*$ thresholds, this new pentaquark might very well be a meson-baryon bound state. However its spin and parity have not been determined yet and none of the previous possibilities can be ruled out. We briefly explore a few of these options and the consequences they entail in the present manuscript: (i) the $P_c(4337)^+$ might be a $χ_{c0}(1S) p$ bound state, (ii) the $P_c(4312)^+$ and $P_c(4337)^+$ might be $\bar{D}^* Λ_c$ and $\bar{D} Σ_c$ states close to threshold, respectively, where the Breit-Wigner mass might not correspond to the location of the poles, (iii) the locations of the $P_c(4312)^+$ and $P_c(4337)^+$ might be explained in terms of the $\bar{D}^* Λ_c$-$\bar{D} Σ_c$ and $\bar{D}^* Λ_c$-$\bar{D} Σ_c^*$ coupled channel dynamics. This last option, though not the most probable explanation, is still potentially compatible with the double peak solution of the $P_{cs}(4459)^0$ and with what we know of the $P_c(4312)^+$. As a byproduct of the previous explorations, we conjecture the existence of a series of anticharmed meson - antitriplet charmed baryon bound states and calculate their masses. △ Less

Submitted 4 July, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

Comments: 19 pages, 5 tables, 1 figure, corresponds with published version

Journal ref: EPJC 82, 574 (2022)

arXiv:2108.04785 [pdf, ps, other]

doi 10.1103/PhysRevD.105.014007

Subleading contributions to the decay width of the $T_{cc}^+$ tetraquark

Authors: Mao-Jun Yan, Manuel Pavon Valderrama

Abstract: Recently the LHCb collaboration has announced the discovery of the $T_{cc}^+$ tetraquark. Being merely a few hundred ${\rm keV}$ below the $D^{*+} D^0$ threshold, the $T_{cc}^+$ is expected to have a molecular component, for which there is a good separation of scales that can be exploited to make reasonably accurate theoretical predictions about this tetraquark. Independently of its nature, the mo… ▽ More Recently the LHCb collaboration has announced the discovery of the $T_{cc}^+$ tetraquark. Being merely a few hundred ${\rm keV}$ below the $D^{*+} D^0$ threshold, the $T_{cc}^+$ is expected to have a molecular component, for which there is a good separation of scales that can be exploited to make reasonably accurate theoretical predictions about this tetraquark. Independently of its nature, the most important decay channels will be $D^+ D^0 π^0$, $D^0 D^0 π^+$ and $D^+ D^0 γ$. Its closeness to threshold suggests that the mass and particularly the width of the $T_{cc}^+$ tetraquark depend on the resonance profile. While the standard Breit-Wigner parametrization generates a $T_{cc}^+$ that is too broad for current theoretical calculations to reproduce, a three-body unitarized Breit-Wigner shape reveals instead a decay width ($Γ_{\rm pole} = 48\pm 2\,{}^{+0}_{-12}\,{\rm keV}$) consistent with theoretical expectations. Here we consider subleading order contributions to the decay amplitude, which though having at most a moderate impact in the width still indicate potentially significant differences with the experimental width that can be exploited to disentangle the nature of the $T_{cc}^+$. Concrete calculations yield $Γ^{\rm LO} = 49 \pm 16\,{\rm keV}$ and $Γ^{\rm NLO} = 58^{+7}_{-6}\,{\rm keV}$, though we expect further corrections to the ${\rm NLO}$ decay widths from asymptotic normalization effects. We find that a detailed comparison of the ${\rm NLO}$ total and partial decay widths with experiment suggests the existence of a small (but distinguishable from zero) non-molecular component of the $T_{cc}^+$. △ Less

Submitted 8 January, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 13 pages, 1 figure; corresponds with the published version

Journal ref: Phys. Rev. D 105, 014007(2022)

arXiv:2108.04448 [pdf, other]

Decentralized Composite Optimization with Compression

Authors: Yao Li, Xiaorui Liu, Jiliang Tang, Ming Yan, Kun Yuan

Abstract: Decentralized optimization and communication compression have exhibited their great potential in accelerating distributed machine learning by mitigating the communication bottleneck in practice. While existing decentralized algorithms with communication compression mostly focus on the problems with only smooth components, we study the decentralized stochastic composite optimization problem with a… ▽ More Decentralized optimization and communication compression have exhibited their great potential in accelerating distributed machine learning by mitigating the communication bottleneck in practice. While existing decentralized algorithms with communication compression mostly focus on the problems with only smooth components, we study the decentralized stochastic composite optimization problem with a potentially non-smooth component. A \underline{Prox}imal gradient \underline{L}in\underline{EA}r convergent \underline{D}ecentralized algorithm with compression, Prox-LEAD, is proposed with rigorous theoretical analyses in the general stochastic setting and the finite-sum setting. Our theorems indicate that Prox-LEAD works with arbitrary compression precision, and it tremendously reduces the communication cost almost for free. The superiorities of the proposed algorithms are demonstrated through the comparison with state-of-the-art algorithms in terms of convergence complexities and numerical experiments. Our algorithmic framework also generally enlightens the compressed communication on other primal-dual algorithms by reducing the impact of inexact iterations, which might be of independent interest. △ Less

Submitted 12 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

arXiv:2108.02337 [pdf]

doi 10.1103/PhysRevLett.127.255501

3D hinge transport in acoustic higher-order topological insulators

Authors: Qiang Wei, Xuewei Zhang, Weiyin Deng, Jiuyang Lu, Xueqin Huang, Mou Yan, Gang Chen, Zhengyou Liu, Suotang Jia

Abstract: The discovery of topologically protected boundary states in topological insulators opens a new avenue toward exploring novel transport phenomena. The one-way feature of boundary states against disorders and impurities prospects great potential in applications of electronic and classical wave devices. Particularly, for the 3D higher-order topological insulators, it can host hinge states, which allo… ▽ More The discovery of topologically protected boundary states in topological insulators opens a new avenue toward exploring novel transport phenomena. The one-way feature of boundary states against disorders and impurities prospects great potential in applications of electronic and classical wave devices. Particularly, for the 3D higher-order topological insulators, it can host hinge states, which allow the energy to transport along the hinge channels. However, the hinge states haveonly been observed along a single hinge, and a natural question arises: whether the hinge states can exist simultaneously on all the three independent directions of one sample? Here we theoretically predict and experimentally observe the hinge states on three different directions of a higher-order topological phononic crystal, and demonstrate their robust one-way transport from hinge to hinge. Therefore, 3D topological hinge transport is successfully achieved. The novel sound transport may serve as the basis for acoustic devices of unconventional functions. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2108.02102 [pdf, other]

ErrorCompensatedX: error compensation for variance reduced algorithms

Authors: Hanlin Tang, Yao Li, Ji Liu, Ming Yan

Abstract: Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates the convergence speed, and the resulting algorithm may diverge for biased compression. Recent work addressed this problem for stochastic gradient descent by add… ▽ More Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates the convergence speed, and the resulting algorithm may diverge for biased compression. Recent work addressed this problem for stochastic gradient descent by adding back the compression error from the previous step. This idea was further extended to one class of variance reduced algorithms, where the variance of the stochastic gradient is reduced by taking a moving average over all history gradients. However, our analysis shows that just adding the previous step's compression error, as done in existing work, does not fully compensate the compression error. So, we propose ErrorCompensatedX, which uses the compression error from the previous two steps. We show that ErrorCompensatedX can achieve the same asymptotic convergence rate with the training without compression. Moreover, we provide a unified theoretical analysis framework for this class of variance reduced algorithms, with or without error compensation. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2107.13580 [pdf, other]

The photon content of the proton in the CT18 global analysis

Authors: Keping Xie, T. J. Hobbs, Tie-Jiun Hou, Carl Schmidt, Mengshi Yan, C. -P. Yuan

Abstract: Recently, two photon PDF sets based on implementations of the LUX ansatz into the CT18 global analysis were released. In CT18lux, the photon PDF is calculated directly using the LUX master formula for all scales, $μ$. In an alternative realization, CT18qed, the photon PDF is initialized at the starting scale, $μ_0$, using the LUX formulation and evolved to higher scales $μ(>μ_0)$ with a combined Q… ▽ More Recently, two photon PDF sets based on implementations of the LUX ansatz into the CT18 global analysis were released. In CT18lux, the photon PDF is calculated directly using the LUX master formula for all scales, $μ$. In an alternative realization, CT18qed, the photon PDF is initialized at the starting scale, $μ_0$, using the LUX formulation and evolved to higher scales $μ(>μ_0)$ with a combined QED+QCD kernel at $\mathcal{O}(α),~\mathcal{O}(αα_s)$ and $\mathcal{O}(α^2)$. In the small-$x$ region, the photon PDF uncertainty is mainly induced by the quark and gluon PDFs, through the perturbative DIS structure functions. In comparison, the large-$x$ photon uncertainty comes from various low-energy, nonperturbative contributions, including variations of the inelastic structure functions in the resonance and continuum regions, higher-twist and target-mass corrections, and elastic electromagnetic form factors of the proton. We take the production of doubly-charged Higgs pairs, $(H^{++}H^{--})$, as an example of scenarios beyond the Standard Model to illustrate the phenomenological implications of these photon PDFs at the LHC. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: Submission to SciPost

Report number: MSUHEP-21-015, PITT-PACC-2116, SMU-HEP-21-11

arXiv:2107.12065 [pdf, other]

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs

Authors: Zhuoqing Song, Lei Shi, Shi Pu, Ming Yan

Abstract: We consider the decentralized optimization problem, where a network of $n$ agents aims to collaboratively minimize the average of their individual smooth and convex objective functions through peer-to-peer communication in a directed graph. To tackle this problem, we propose two accelerated gradient tracking methods, namely APD and APD-SC, for non-strongly convex and strongly convex objective func… ▽ More We consider the decentralized optimization problem, where a network of $n$ agents aims to collaboratively minimize the average of their individual smooth and convex objective functions through peer-to-peer communication in a directed graph. To tackle this problem, we propose two accelerated gradient tracking methods, namely APD and APD-SC, for non-strongly convex and strongly convex objective functions, respectively. We show that APD and APD-SC converge at the rates $O\left(\frac{1}{k^2}\right)$ and $O\left(\left(1 - C\sqrt{\fracμ{L}}\right)^k\right)$, respectively, up to constant factors depending only on the mixing matrix. APD and APD-SC are the first decentralized methods over unbalanced directed graphs that achieve the same provable acceleration as centralized methods. Numerical experiments demonstrate the effectiveness of both methods. △ Less

Submitted 6 December, 2023; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: SIAM Journal on Optimization, in press

arXiv:2107.06996 [pdf, other]

Elastic Graph Neural Networks

Authors: Xiaorui Liu, Wei Jin, Yao Ma, Yaxin Li, Hua Liu, Yiqi Wang, Ming Yan, Jiliang Tang

Abstract: While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $\ell_1$ and $\ell_2$-based graph smoothing. In particular, we propose a no… ▽ More While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $\ell_1$ and $\ell_2$-based graph smoothing. In particular, we propose a novel and general message passing scheme into GNNs. This message passing algorithm is not only friendly to back-propagation training but also achieves the desired smoothing properties with a theoretical convergence guarantee. Experiments on semi-supervised learning tasks demonstrate that the proposed Elastic GNNs obtain better adaptivity on benchmark datasets and are significantly robust to graph adversarial attacks. The implementation of Elastic GNNs is available at \url{https://github.com/lxiaorui/ElasticGNN}. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: ICML 2021 (International Conference on Machine Learning)

arXiv:2106.13477 [pdf, ps, other]

Hessian informed mirror descent

Authors: Li Wang, Ming Yan

Abstract: Inspired by the recent paper (L. Ying, Mirror descent algorithms for minimizing interacting free energy, Journal of Scientific Computing, 84 (2020), pp. 1-14),we explore the relationship between the mirror descent and the variable metric method. When the metric in the mirror decent is induced by a convex function, whose Hessian is close to the Hessian of the objective function, this method enjoys… ▽ More Inspired by the recent paper (L. Ying, Mirror descent algorithms for minimizing interacting free energy, Journal of Scientific Computing, 84 (2020), pp. 1-14),we explore the relationship between the mirror descent and the variable metric method. When the metric in the mirror decent is induced by a convex function, whose Hessian is close to the Hessian of the objective function, this method enjoys both robustness from the mirror descent and superlinear convergence for Newton type methods. When applied to a linearly constrained minimization problem, we prove the global and local convergence, both in the continuous and discrete settings. As applications, we compute the Wasserstein gradient flows and Cahn-Hillard equation with degenerate mobility. When formulating these problems using a minimizing movement scheme with respect to a variable metric, our mirror descent algorithm offers a fast convergent speed for the underlining optimization problem while maintaining the total mass and bounds of the solution. △ Less

Submitted 25 June, 2021; originally announced June 2021.

arXiv:2106.10299 [pdf, other]

doi 10.1103/PhysRevD.105.054006

The photon PDF within the CT18 global analysis

Authors: Keping Xie, T. J. Hobbs, Tie-Jiun Hou, Carl Schmidt, Mengshi Yan, C. -P. Yuan

Abstract: Building upon the most recent CT18 global fit, we present a new calculation of the photon content of the proton based on an application of the LUX formalism. In this work, we explore two principal variations of the LUX ansatz. In one approach, which we designate "CT18lux," the photon PDF is calculated directly using the LUX formula for all scales, $μ$. In an alternative realization, "CT18qed," we… ▽ More Building upon the most recent CT18 global fit, we present a new calculation of the photon content of the proton based on an application of the LUX formalism. In this work, we explore two principal variations of the LUX ansatz. In one approach, which we designate "CT18lux," the photon PDF is calculated directly using the LUX formula for all scales, $μ$. In an alternative realization, "CT18qed," we instead initialize the photon PDF in terms of the LUX formulation at a lower scale, $μ\! \sim\! μ_0$, and evolve to higher scales with a combined QED+QCD kernel at $\mathcal{O}(α),~\mathcal{O}(αα_s)$ and $\mathcal{O}(α^2)$. While we find these two approaches generally agree, especially at intermediate $x$ ($10^{-3}\lesssim x\lesssim0.3$), we discuss some moderate discrepancies that can occur toward the end-point regions at very high or low $x$. We also study effects that follow from variations of the inputs to the LUX calculation originating outside the pure deeply-inelastic scattering (DIS) region, including from elastic form factors and other contributions to the photon PDF. Finally, we investigate the phenomenological implications of these photon PDFs for the LHC, including high-mass Drell-Yan, vector-boson pair, top-quark pair, and Higgs associated with vector-boson production. △ Less

Submitted 12 February, 2023; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 45 pages, 27 figures, and 5 tables

Report number: MSUHEP-21-013, PITT-PACC-2112, SMU-HEP-21-06, FERMILAB-PUB-21-370-QIS-SCD-T

arXiv:2106.08235 [pdf, other]

PairConnect: A Compute-Efficient MLP Alternative to Attention

Authors: Zhaozhuo Xu, Minghao Yan, Junyan Zhang, Anshumali Shrivastava

Abstract: Transformer models have demonstrated superior performance in natural language processing. The dot product self-attention in Transformer allows us to model interactions between words. However, this modeling comes with significant computational overhead. In this work, we revisit the memory-compute trade-off associated with Transformer, particularly multi-head attention, and show a memory-heavy but s… ▽ More Transformer models have demonstrated superior performance in natural language processing. The dot product self-attention in Transformer allows us to model interactions between words. However, this modeling comes with significant computational overhead. In this work, we revisit the memory-compute trade-off associated with Transformer, particularly multi-head attention, and show a memory-heavy but significantly more compute-efficient alternative to Transformer. Our proposal, denoted as PairConnect, a multilayer perceptron (MLP), models the pairwise interaction between words by explicit pairwise word embeddings. As a result, PairConnect substitutes self dot product with a simple embedding lookup. We show mathematically that despite being an MLP, our compute-efficient PairConnect is strictly more expressive than Transformer. Our experiment on language modeling tasks suggests that PairConnect could achieve comparable results with Transformer while reducing the computational cost associated with inference significantly. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.07243 [pdf, ps, other]

doi 10.1109/TSP.2022.3160238

Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks

Authors: Zhuoqing Song, Lei Shi, Shi Pu, Ming Yan

Abstract: In this paper, we propose two communication efficient decentralized optimization algorithms over a general directed multi-agent network. The first algorithm, termed Compressed Push-Pull (CPP), combines the gradient tracking Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence rate for st… ▽ More In this paper, we propose two communication efficient decentralized optimization algorithms over a general directed multi-agent network. The first algorithm, termed Compressed Push-Pull (CPP), combines the gradient tracking Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence rate for strongly convex and smooth objective functions. The second algorithm is a broadcast-like version of CPP (B-CPP), and it also achieves linear convergence rate under the same conditions on the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods. △ Less

Submitted 9 April, 2024; v1 submitted 14 June, 2021; originally announced June 2021.

Journal ref: IEEE Transactions on Signal Processing, 70(2022), 1775-1787

arXiv:2106.01804 [pdf, other]

E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning

Authors: Haiyang Xu, Ming Yan, Chenliang Li, Bin Bi, Songfang Huang, Wenming Xiao, Fei Huang

Abstract: Vision-language pre-training (VLP) on large-scale image-text pairs has achieved huge success for the cross-modal downstream tasks. The most existing pre-training methods mainly adopt a two-step training procedure, which firstly employs a pre-trained object detector to extract region-based visual features, then concatenates the image representation and text embedding as the input of Transformer to… ▽ More Vision-language pre-training (VLP) on large-scale image-text pairs has achieved huge success for the cross-modal downstream tasks. The most existing pre-training methods mainly adopt a two-step training procedure, which firstly employs a pre-trained object detector to extract region-based visual features, then concatenates the image representation and text embedding as the input of Transformer to train. However, these methods face problems of using task-specific visual representation of the specific object detector for generic cross-modal understanding, and the computation inefficiency of two-stage pipeline. In this paper, we propose the first end-to-end vision-language pre-trained model for both V+L understanding and generation, namely E2E-VLP, where we build a unified Transformer framework to jointly learn visual representation, and semantic alignments between image and text. We incorporate the tasks of object detection and image captioning into pre-training with a unified Transformer encoder-decoder architecture for enhancing visual learning. An extensive set of experiments have been conducted on well-established vision-language downstream tasks to demonstrate the effectiveness of this novel VLP paradigm. △ Less

Submitted 4 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: ACL2021 main conference

arXiv:2105.11210 [pdf, other]

StructuralLM: Structural Pre-training for Form Understanding

Authors: Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si

Abstract: Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important for form image understanding. In this paper, we propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned… ▽ More Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important for form image understanding. In this paper, we propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. Specifically, we pre-train StructuralLM with two new designs to make the most of the interactions of cell and layout information: 1) each cell as a semantic unit; 2) classification of cell positions. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks, including form understanding (from 78.95 to 85.14), document visual question answering (from 72.59 to 83.94) and document image classification (from 94.43 to 96.08). △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: Accepted by ACL2021 main conference

arXiv:2104.01768 [pdf, other]

Predicting Crash Fault Residence via Simplified Deep Forest Based on A Reduced Feature Set

Authors: Kunsong Zhao, Jin Liu, Zhou Xu, Li Li, Meng Yan, Jiaojiao Yu, Yuxuan Zhou

Abstract: The software inevitably encounters the crash, which will take developers a large amount of effort to find the fault causing the crash (short for crashing fault). Developing automatic methods to identify the residence of the crashing fault is a crucial activity for software quality assurance. Researchers have proposed methods to predict whether the crashing fault resides in the stack trace based on… ▽ More The software inevitably encounters the crash, which will take developers a large amount of effort to find the fault causing the crash (short for crashing fault). Developing automatic methods to identify the residence of the crashing fault is a crucial activity for software quality assurance. Researchers have proposed methods to predict whether the crashing fault resides in the stack trace based on the features collected from the stack trace and faulty code, aiming at saving the debugging effort for developers. However, previous work usually neglected the feature preprocessing operation towards the crash data and only used traditional classification models. In this paper, we propose a novel crashing fault residence prediction framework, called ConDF, which consists of a consistency based feature subset selection method and a state-of-the-art deep forest model. More specifically, first, the feature selection method is used to obtain an optimal feature subset and reduce the feature dimension by reserving the representative features. Then, a simplified deep forest model is employed to build the classification model on the reduced feature set. The experiments on seven open source software projects show that our ConDF method performs significantly better than 17 baseline methods on three performance indicators. △ Less

Submitted 5 April, 2021; originally announced April 2021.

arXiv:2104.01032 [pdf, other]

Plot2API: Recommending Graphic API from Plot via Semantic Parsing Guided Neural Network

Authors: Zeyu Wang, Sheng Huang, Zhongxin Liu, Meng Yan, Xin Xia, Bei Wang, Dan Yang

Abstract: Plot-based Graphic API recommendation (Plot2API) is an unstudied but meaningful issue, which has several important applications in the context of software engineering and data visualization, such as the plotting guidance of the beginner, graphic API correlation analysis, and code conversion for plotting. Plot2API is a very challenging task, since each plot is often associated with multiple APIs an… ▽ More Plot-based Graphic API recommendation (Plot2API) is an unstudied but meaningful issue, which has several important applications in the context of software engineering and data visualization, such as the plotting guidance of the beginner, graphic API correlation analysis, and code conversion for plotting. Plot2API is a very challenging task, since each plot is often associated with multiple APIs and the appearances of the graphics drawn by the same API can be extremely varied due to the different settings of the parameters. Additionally, the samples of different APIs also suffer from extremely imbalanced. Considering the lack of technologies in Plot2API, we present a novel deep multi-task learning approach named Semantic Parsing Guided Neural Network (SPGNN) which translates the Plot2API issue as a multi-label image classification and an image semantic parsing tasks for the solution. In SPGNN, the recently advanced Convolutional Neural Network (CNN) named EfficientNet is employed as the backbone network for API recommendation. Meanwhile, a semantic parsing module is complemented to exploit the semantic relevant visual information in feature learning and eliminate the appearance-relevant visual information which may confuse the visual-information-based API recommendation. Moreover, the recent data augmentation technique named random erasing is also applied for alleviating the imbalance of API categories. We collect plots with the graphic APIs used to drawn them from Stack Overflow, and release three new Plot2API datasets corresponding to the graphic APIs of R and Python programming languages for evaluating the effectiveness of Plot2API techniques. Extensive experimental results not only demonstrate the superiority of our method over the recent deep learning baselines but also show the practicability of our method in the recommendation of graphic APIs. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: Accepted by SANER2021

arXiv:2103.14493 [pdf, other]

RCT: Resource Constrained Training for Edge AI

Authors: Tian Huang, Tao Luo, Ming Yan, Joey Tianyi Zhou, Rick Goh

Abstract: Neural networks training on edge terminals is essential for edge AI computing, which needs to be adaptive to evolving environment. Quantised models can efficiently run on edge devices, but existing training methods for these compact models are designed to run on powerful servers with abundant memory and energy budget. For example, quantisation-aware training (QAT) method involves two copies of mod… ▽ More Neural networks training on edge terminals is essential for edge AI computing, which needs to be adaptive to evolving environment. Quantised models can efficiently run on edge devices, but existing training methods for these compact models are designed to run on powerful servers with abundant memory and energy budget. For example, quantisation-aware training (QAT) method involves two copies of model parameters, which is usually beyond the capacity of on-chip memory in edge devices. Data movement between off-chip and on-chip memory is energy demanding as well. The resource requirements are trivial for powerful servers, but critical for edge devices. To mitigate these issues, We propose Resource Constrained Training (RCT). RCT only keeps a quantised model throughout the training, so that the memory requirements for model parameters in training is reduced. It adjusts per-layer bitwidth dynamically in order to save energy when a model can learn effectively with lower precision. We carry out experiments with representative models and tasks in image application and natural language processing. Experiments show that RCT saves more than 86\% energy for General Matrix Multiply (GEMM) and saves more than 46\% memory for model parameters, with limited accuracy loss. Comparing with QAT-based method, RCT saves about half of energy on moving model parameters. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: 14 pages

MSC Class: 68T07 (Primary) 68T05 (Secondary) ACM Class: I.5.1; I.2.6

arXiv:2103.12393 [pdf, other]

RISC-NN: Use RISC, NOT CISC as Neural Network Hardware Infrastructure

Authors: Taoran Xiang, Lunkai Zhang, Shuqian An, Xiaochun Ye, Mingzhe Zhang, Yanhuan Liu, Mingyu Yan, Da Wang, Hao Zhang, Wenming Li, Ninghui Sun, Dongrui Fan

Abstract: Neural Networks (NN) have been proven to be powerful tools to analyze Big Data. However, traditional CPUs cannot achieve the desired performance and/or energy efficiency for NN applications. Therefore, numerous NN accelerators have been used or designed to meet these goals. These accelerators all fall into three categories: GPGPUs, ASIC NN Accelerators and CISC NN Accelerators. Though CISC NN Acce… ▽ More Neural Networks (NN) have been proven to be powerful tools to analyze Big Data. However, traditional CPUs cannot achieve the desired performance and/or energy efficiency for NN applications. Therefore, numerous NN accelerators have been used or designed to meet these goals. These accelerators all fall into three categories: GPGPUs, ASIC NN Accelerators and CISC NN Accelerators. Though CISC NN Accelerators can achieve considerable smaller memory footprint than GPGPU thus improve energy efficiency; they still fail to provide same level of data reuse optimization achieved by ASIC NN Accelerators because of the inherited poor pragrammability of their CISC architecture. We argue that, for NN Accelerators, RISC is a better design choice than CISC, as is the case with general purpose processors. We propose RISC-NN, a novel many-core RISC-based NN accelerator that achieves high expressiveness and high parallelism and features strong programmability and low control-hardware costs. We show that, RISC-NN can implement all the necessary instructions of state-of-the-art CISC NN Accelerators; in the meantime, RISC-NN manages to achieve advanced optimization such as multiple-level data reuse and support for Sparse NN applications which previously only existed in ASIC NN Accelerators. Experiment results show that, RISC-NN achieves on average 11.88X performance efficiency compared with state-of-the-art Nvidia TITAN Xp GPGPU for various NN applications. RISC-NN also achieves on average 1.29X, 8.37X and 21.71X performance efficiency over CISC-based TPU in CNN, MLP and LSTM applications, respectively. Finally, RISC-NN can achieve additional 26.05% performance improvement and 33.13% energy reduction after applying pruning for Sparse NN applications. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.10276 [pdf, other]

doi 10.1007/JHEP08(2021)034

Determining the helicity structure of the nucleon at the Electron Ion Collider in China

Authors: Daniele Paolo Anderle, Tie-Jiun Hou, Hongxi Xing, Mengshi Yan, C. -P. Yuan, Yuxiang Zhao

Abstract: Understanding how sea quarks behave inside a nucleon is one of the most important physics goals of the proposed Electron-Ion Collider in China (EicC), which is designed to have 3.5 GeV polarized electron beam (80% polarization) colliding with 20 GeV polarized proton beam (70% polarization) at instantaneous luminosity of $2 \times 10^{33} {\rm cm}^{-2} {\rm s}^{-1}$. A specific topic at EicC is to… ▽ More Understanding how sea quarks behave inside a nucleon is one of the most important physics goals of the proposed Electron-Ion Collider in China (EicC), which is designed to have 3.5 GeV polarized electron beam (80% polarization) colliding with 20 GeV polarized proton beam (70% polarization) at instantaneous luminosity of $2 \times 10^{33} {\rm cm}^{-2} {\rm s}^{-1}$. A specific topic at EicC is to understand the polarization of individual quarks inside a longitudinally polarized nucleon. The potential of various future EicC data, including the inclusive and semi-inclusive deep inelastic scattering data from both doubly polarized electron-proton and electron-$^3{\rm He}$ collisions, to reduce the uncertainties of parton helicity distributions is explored at the next-to-leading order in QCD, using the Error PDF Updating Method Package ({\sc ePump}) which is based on the Hessian profiling method. We show that the semi-inclusive data are well able to provide good separation between flavour distributions, and to constrain their uncertainties in the $x>0.005$ region, especially when electron-$^3{\rm He}$ collisions, acting as effective electron-neutron collisions, are taken into account. To enable this study, we have generated a Hessian representation of the DSSV14 set of PDF replicas, named DSSV14H PDFs. △ Less

Submitted 13 July, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: 40 pages, 12 figures

Report number: JHEP08(2021)034

Journal ref: https://doi.org/10.1007/JHEP08(2021)034

arXiv:2103.07829 [pdf, other]

SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels

Authors: Chenliang Li, Ming Yan, Haiyang Xu, Fuli Luo, Wei Wang, Bin Bi, Songfang Huang

Abstract: Vision-language pre-training (VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning cross-modal representations. Existing pre-training methods either directly concatenate image representation and text representation at a feature level as input to a single-stream Transformer, or use a two-stream cross-modal Transformer to align the image-text representation at a hi… ▽ More Vision-language pre-training (VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning cross-modal representations. Existing pre-training methods either directly concatenate image representation and text representation at a feature level as input to a single-stream Transformer, or use a two-stream cross-modal Transformer to align the image-text representation at a high-level semantic space. In real-world image-text data, we observe that it is easy for some of the image-text pairs to align simple semantics on both modalities, while others may be related after higher-level abstraction. Therefore, in this paper, we propose a new pre-training method SemVLP, which jointly aligns both the low-level and high-level semantics between image and text representations. The model is pre-trained iteratively with two prevalent fashions: single-stream pre-training to align at a fine-grained feature level and two-stream pre-training to align high-level semantics, by employing a shared Transformer network with a pluggable cross-modal attention module. An extensive set of experiments have been conducted on four well-established vision-language understanding tasks to demonstrate the effectiveness of the proposed SemVLP in aligning cross-modal representations towards different semantic granularities. △ Less

Submitted 13 March, 2021; originally announced March 2021.

Comments: 10 pages, 4 figures

Showing 201–250 of 551 results for author: Yan, M