Search | arXiv e-print repository

Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

Authors: Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

Abstract: Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in… ▽ More Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in TAR. We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection. In contrast, we likewise determined that linear models outperform BERT for simulated legal discovery topics on the Jeb Bush e-mail collection. This suggests the match between transformer pre-training corpora and the task domain is of greater significance than generally appreciated. Additionally, we show that just-right language model fine-tuning on the task collection before starting active learning is critical. Too little or too much fine-tuning hinders performance, worse than that of linear models, even for a favorable corpus such as RCV1-v2. △ Less

Submitted 19 January, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: 6 pages, 1 figure, accepted at ECIR 2022

arXiv:2105.00795 [pdf, other]

RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning

Authors: Hankook Lee, Sungsoo Ahn, Seung-Woo Seo, You Young Song, Eunho Yang, Sung-Ju Hwang, Jinwoo Shin

Abstract: Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this… ▽ More Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO {database} are given as candidates, our RetCL achieves top-1 exact match accuracy of $71.3\%$ for the USPTO-50k benchmark, while a recent transformer-based approach achieves $59.6\%$. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches. △ Less

Submitted 3 June, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: Accepted to IJCAI 2021. Short version was accepted to Machine Learning for Molecules Workshop at NeurIPS 2020

arXiv:2104.08314 [pdf, other]

High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

Authors: Hossam Amer, Ahmed H. Salamah, Ahmad Sajedi, En-hui Yang

Abstract: Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map's elements included at horizontal and/or vertical kernel overlappings without considering the sparsity of the map. Usin… ▽ More Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map's elements included at horizontal and/or vertical kernel overlappings without considering the sparsity of the map. Using the sparsity of the map, this paper proposes two new convolution algorithms dubbed Compressed Pattern Overlap (CPO) and Compressed Pattern Sets (CPS) that simultaneously decrease the memory footprint and increase the inference speed while preserving the accuracy. CPO recognizes non-zero elements (NZEs) at horizontal and vertical overlappings in the activation maps. CPS further improves the memory savings of CPO by compressing the index positions of neighboring NZEs. In both algorithms, channels/regions of the activation maps with all zeros are skipped. Then, CPO/CPS performs convolution via Sparse Matrix-Vector Multiplication (SpMv) done on their sparse representations. Experimental results conducted on CPUs show that average per-layer time savings reach up to 63% and Compression Ratio (CR) up to 26x with respect to im2col. In some layers, our average per layer CPO/CPS time savings are better by 28% and CR is better by 9.2x than the parallel implementation of MEC. For a given CNN's inference, we offline select for each convolution layer the best convolutional algorithm in terms of time between either CPO or CPS and im2col. Our algorithms were selected up to 56% of the non-pointwise convolutional layers. Our offline selections yield CNN inference time savings up to 9% and CR up to 10x. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: 34 pages

arXiv:2103.14302 [pdf, other]

Mutually-Constrained Monotonic Multihead Attention for Online ASR

Authors: Jaeyun Song, Hajin Shim, Eunho Yang

Abstract: Despite the feature of real-time decoding, Monotonic Multihead Attention (MMA) shows comparable performance to the state-of-the-art offline methods in machine translation and automatic speech recognition (ASR) tasks. However, the latency of MMA is still a major issue in ASR and should be combined with a technique that can reduce the test latency at inference time, such as head-synchronous beam sea… ▽ More Despite the feature of real-time decoding, Monotonic Multihead Attention (MMA) shows comparable performance to the state-of-the-art offline methods in machine translation and automatic speech recognition (ASR) tasks. However, the latency of MMA is still a major issue in ASR and should be combined with a technique that can reduce the test latency at inference time, such as head-synchronous beam search decoding, which forces all non-activated heads to activate after a small fixed delay from the first head activation. In this paper, we remove the discrepancy between training and test phases by considering, in the training of MMA, the interactions across multiple heads that will occur in the test time. Specifically, we derive the expected alignments from monotonic attention by considering the boundaries of other heads and reflect them in the learning process. We validate our proposed method on the two standard benchmark datasets for ASR and show that our approach, MMA with the mutually-constrained heads from the training stage, provides better performance than baselines. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: Accepted at IEEE ICASSP 2021

arXiv:2103.13151 [pdf, other]

Learning Polar Encodings for Arbitrary-Oriented Ship Detection in SAR Images

Authors: Yishan He, Fei Gao, Jun Wang, Amir Hussain, Erfu Yang, Huiyu Zhou

Abstract: Common horizontal bounding box (HBB)-based methods are not capable of accurately locating slender ship targets with arbitrary orientations in synthetic aperture radar (SAR) images. Therefore, in recent years, methods based on oriented bounding box (OBB) have gradually received attention from researchers. However, most of the recently proposed deep learning-based methods for OBB detection encounter… ▽ More Common horizontal bounding box (HBB)-based methods are not capable of accurately locating slender ship targets with arbitrary orientations in synthetic aperture radar (SAR) images. Therefore, in recent years, methods based on oriented bounding box (OBB) have gradually received attention from researchers. However, most of the recently proposed deep learning-based methods for OBB detection encounter the boundary discontinuity problem in angle or key point regression. In order to alleviate this problem, researchers propose to introduce some manually set parameters or extra network branches for distinguishing the boundary cases, which make training more diffcult and lead to performance degradation. In this paper, in order to solve the boundary discontinuity problem in OBB regression, we propose to detect SAR ships by learning polar encodings. The encoding scheme uses a group of vectors pointing from the center of the ship target to the boundary points to represent an OBB. The boundary discontinuity problem is avoided by training and inference directly according to the polar encodings. In addition, we propose an Intersect over Union (IOU) -weighted regression loss, which further guides the training of polar encodings through the IOU metric and improves the detection performance. Experiments on the Rotating SAR Ship Detection Dataset (RSSDD) show that the proposed method can achieve better detection performance over other comparison algorithms and other OBB encoding schemes, demonstrating the effectiveness of our method. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2103.01328 [pdf, other]

ToxCCIn: Toxic Content Classification with Interpretability

Authors: Tong Xiang, Sean MacAvaney, Eugene Yang, Nazli Goharian

Abstract: Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans. Explanations are particularly important for tasks like offensive language or toxicity detection on social media because a manual appeal process is often in place to dispute automatically flagged content. In this work, we propose a technique to imp… ▽ More Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans. Explanations are particularly important for tasks like offensive language or toxicity detection on social media because a manual appeal process is often in place to dispute automatically flagged content. In this work, we propose a technique to improve the interpretability of these models, based on a simple and powerful assumption: a post is at least as toxic as its most toxic span. We incorporate this assumption into transformer models by scoring a post based on the maximum toxicity of its spans and augmenting the training process to identify correct spans. We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis (often regarded as a highly-interpretable model), according to a human study. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Long paper accepted to WASSA2021@EACL

arXiv:2102.03866 [pdf, other]

Model-Augmented Q-learning

Authors: Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

Abstract: In recent years, $Q$-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only… ▽ More In recent years, $Q$-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for $Q$-learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by utilizing model-estimation errors. We experimentally validate MQL built upon state-of-the-art off-policy MFRL methods, and show that MQL largely improves their performance and convergence. The proposed scheme is simple to implement and does not require additional training cost. △ Less

Submitted 7 February, 2021; originally announced February 2021.

arXiv:2102.02386 [pdf, other]

doi 10.1117/12.2562729

Analysis of Temperature-to-Polarization Leakage in BICEP3 and Keck CMB Data from 2016 to 2018

Authors: The BICEP/Keck Collaboration, :, T. St. Germaine, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. R. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, S. Fliescher , et al. (64 additional authors not shown)

Abstract: The BICEP/Keck Array experiment is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background polarization from the South Pole in search of a primordial $B$-mode signature. As a pair differencing experiment, an important systematic that must be controlled is the differential beam response between the co-located, orthogonally polarized detectors. We use high… ▽ More The BICEP/Keck Array experiment is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background polarization from the South Pole in search of a primordial $B$-mode signature. As a pair differencing experiment, an important systematic that must be controlled is the differential beam response between the co-located, orthogonally polarized detectors. We use high-fidelity, in-situ measurements of the beam response to estimate the temperature-to-polarization (T $\rightarrow$ P) leakage in our latest data including observations from 2016 through 2018. This includes three years of BICEP3 observing at 95 GHz, and multifrequency data from Keck Array. Here we present band-averaged far-field beam maps, differential beam mismatch, and residual beam power (after filtering out the leading difference modes via deprojection) for these receivers. We show preliminary results of "beam map simulations," which use these beam maps to observe a simulated temperature (no $Q/U$) sky to estimate T $\rightarrow$ P leakage in our real data. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: 9 pages, 4 figures

Journal ref: Proc. SPIE 11453, Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy X, 114532E (15 December 2020)

arXiv:2101.12409 [pdf, other]

Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning

Authors: Shengsheng Zhang, Yaping Huang, Yun Chen, Liner Yang, Chencheng Wang, Erhong Yang

Abstract: Most existing Grammatical Error Correction (GEC) methods based on sequence-to-sequence mainly focus on how to generate more pseudo data to obtain better performance. Few work addresses few-shot GEC domain adaptation. In this paper, we treat different GEC domains as different GEC tasks and propose to extend meta-learning to few-shot GEC domain adaptation without using any pseudo data. We exploit a… ▽ More Most existing Grammatical Error Correction (GEC) methods based on sequence-to-sequence mainly focus on how to generate more pseudo data to obtain better performance. Few work addresses few-shot GEC domain adaptation. In this paper, we treat different GEC domains as different GEC tasks and propose to extend meta-learning to few-shot GEC domain adaptation without using any pseudo data. We exploit a set of data-rich source domains to learn the initialization of model parameters that facilitates fast adaptation on new resource-poor target domains. We adapt GEC model to the first language (L1) of the second language learner. To evaluate the proposed method, we use nine L1s as source domains and five L1s as target domains. Experiment results on the L1 GEC domain adaptation dataset demonstrate that the proposed approach outperforms the multi-task transfer learning baseline by 0.50 $F_{0.5}$ score on average and enables us to effectively adapt to a new L1 domain with only 200 parallel sentences. △ Less

Submitted 29 January, 2021; originally announced January 2021.

arXiv:2101.12149 [pdf, other]

doi 10.1016/j.parco.2021.102833

Porting WarpX to GPU-accelerated platforms

Authors: A. Myers, A. Almgren, L. D. Amorim, J. Bell, L. Fedeli, L. Ge, K. Gott, D. P. Grote, M. Hogan, A. Huebl, R. Jambunathan, R. Lehe, C. Ng, M. Rowan, O. Shapoval, M. Thévenet, J. -L. Vay, H. Vincenti, E. Yang, N. Zaïm, W. Zhang, Y. Zhao, E. Zoni

Abstract: WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give curren… ▽ More WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems. △ Less

Submitted 2 September, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: 11 pages, 5 figures, accepted by Parallel Computing. Minor revisions, results unchanged

Journal ref: Parallel Computing, Volume 108, 2021, 102833

arXiv:2101.09294 [pdf, other]

doi 10.1145/3442188.3445916

Censorship of Online Encyclopedias: Implications for NLP Models

Authors: Eddie Yang, Margaret E. Roberts

Abstract: While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought to attention that the algorithms powering AI are not free of politics, stereotypes, and bias. While most work in this area has focused on the ways in which AI can exacerbate existing inequalities and discrimination, very little work has studied how governments actively shape trai… ▽ More While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought to attention that the algorithms powering AI are not free of politics, stereotypes, and bias. While most work in this area has focused on the ways in which AI can exacerbate existing inequalities and discrimination, very little work has studied how governments actively shape training data. We describe how censorship has affected the development of Wikipedia corpuses, text data which are regularly used for pre-trained inputs into NLP algorithms. We show that word embeddings trained on Baidu Baike, an online Chinese encyclopedia, have very different associations between adjectives and a range of concepts about democracy, freedom, collective action, equality, and people and historical events in China than its regularly blocked but uncensored counterpart - Chinese language Wikipedia. We examine the implications of these discrepancies by studying their use in downstream AI applications. Our paper shows how government repression, censorship, and self-censorship may impact training data and the applications that draw from them. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: Accepted for publication at ACM FAccT 2021

arXiv:2012.09363 [pdf, other]

doi 10.1117/12.2562066

Observing low elevation sky and the CMB Cold Spot with BICEP3 at the South Pole

Authors: J. Kang, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. R. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, S. Fliescher, N. Goeckner-Wald, D. C. Goldfinger , et al. (62 additional authors not shown)

Abstract: BICEP3 is a 520 mm aperture on-axis refracting telescope at the South Pole, which observes the polarization of the cosmic microwave background (CMB) at 95 GHz to search for the B-mode signal from inflationary gravitational waves. In addition to this main target, we have developed a low-elevation observation strategy to extend coverage of the Southern sky at the South Pole, where BICEP3 can quickly… ▽ More BICEP3 is a 520 mm aperture on-axis refracting telescope at the South Pole, which observes the polarization of the cosmic microwave background (CMB) at 95 GHz to search for the B-mode signal from inflationary gravitational waves. In addition to this main target, we have developed a low-elevation observation strategy to extend coverage of the Southern sky at the South Pole, where BICEP3 can quickly achieve degree-scale E-mode measurements over a large area. An interesting E-mode measurement is probing a potential polarization anomaly around the CMB Cold Spot. During the austral summer seasons of 2018-19 and 2019-20, BICEP3 observed the sky with a flat mirror to redirect the beams to various low elevation ranges. The preliminary data analysis shows degree-scale E-modes measured with high signal-to-noise ratio. △ Less

Submitted 17 December, 2020; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: 12 pages, 10 figures; Figure 7 shows the correct file

Journal ref: Proc. SPIE 11453, Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy X, 114532D (13 December 2020)

arXiv:2012.05934 [pdf, other]

Polarization Calibration of the BICEP3 CMB polarimeter at the South Pole

Authors: J. Cornelison, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. R. Cheshire, J. Connors, M. Crumrine, A. Cukierman, E. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, S. Fliescher, N. Goeckner-Wald, D. C. Goldfinger, J. A. Grayson , et al. (62 additional authors not shown)

Abstract: The BICEP3 CMB Polarimeter is a small-aperture refracting telescope located at the South Pole and is specifically designed to search for the possible signature of inflationary gravitational waves in the Cosmic Microwave Background (CMB). The experiment measures polarization on the sky by differencing the signal of co-located, orthogonally polarized antennas coupled to Transition Edge Sensor (TES)… ▽ More The BICEP3 CMB Polarimeter is a small-aperture refracting telescope located at the South Pole and is specifically designed to search for the possible signature of inflationary gravitational waves in the Cosmic Microwave Background (CMB). The experiment measures polarization on the sky by differencing the signal of co-located, orthogonally polarized antennas coupled to Transition Edge Sensor (TES) detectors. We present precise measurements of the absolute polarization response angles and polarization efficiencies for nearly all of BICEP3s $\sim800$ functioning polarization-sensitive detector pairs from calibration data taken in January 2018. Using a Rotating Polarized Source (RPS), we mapped polarization response for each detector over a full 360 degrees of source rotation and at multiple telescope boresight rotations from which per-pair polarization properties were estimated. In future work, these results will be used to constrain signals predicted by exotic physical models such as Cosmic Birefringence. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: Proceedings submitted to SPIE 2020 (AS111). 12 pages, 5 figures, 2 tables

arXiv:2012.04047 [pdf, other]

Receiver development for BICEP Array, a next-generation CMB polarimeter at the South Pole

Authors: L. Moncelsi, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, S. Fatigoni, J. P. Filippini, N. Goeckner-Wald, D. C. Goldfinger, J. Grayson, P. Grimes, G. Hall , et al. (50 additional authors not shown)

Abstract: A detection of curl-type ($B$-mode) polarization of the primary CMB would be direct evidence for the inflationary paradigm of the origin of the Universe. The BICEP/Keck Array (BK) program targets the degree angular scales, where the power from primordial $B$-mode polarization is expected to peak, with ever-increasing sensitivity and has published the most stringent constraints on inflation to date… ▽ More A detection of curl-type ($B$-mode) polarization of the primary CMB would be direct evidence for the inflationary paradigm of the origin of the Universe. The BICEP/Keck Array (BK) program targets the degree angular scales, where the power from primordial $B$-mode polarization is expected to peak, with ever-increasing sensitivity and has published the most stringent constraints on inflation to date. BICEP Array (BA) is the Stage-3 instrument of the BK program and will comprise four BICEP3-class receivers observing at 30/40, 95, 150 and 220/270 GHz with a combined 32,000+ detectors; such wide frequency coverage is necessary for control of the Galactic foregrounds, which also produce degree-scale $B$-mode signal. The 30/40 GHz receiver is designed to constrain the synchrotron foreground and has begun observing at the South Pole in early 2020. By the end of a 3-year observing campaign, the full BICEP Array instrument is projected to reach $σ_r$ between 0.002 and 0.004, depending on foreground complexity and degree of removal of $B$-modes due to gravitational lensing (delensing). This paper presents an overview of the design, measured on-sky performance and calibration of the first BA receiver. We also give a preview of the added complexity in the time-domain multiplexed readout of the 7,776-detector 150 GHz receiver. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: Proceedings of SPIE 2020 (AS111). This article supersedes arXiv:1808.00568 and arXiv:2002.05228

arXiv:2011.14203 [pdf, other]

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

Authors: Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush, David Brooks, Gu-Yeon Wei

Abstract: Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimi… ▽ More Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimization for multi-task NLP. EdgeBERT employs entropy-based early exit predication in order to perform dynamic voltage-frequency scaling (DVFS), at a sentence granularity, for minimal energy consumption while adhering to a prescribed target latency. Computation and memory footprint overheads are further alleviated by employing a calibrated combination of adaptive attention span, selective network pruning, and floating-point quantization. Furthermore, in order to maximize the synergistic benefits of these algorithms in always-on and intermediate edge computing settings, we specialize a 12nm scalable hardware accelerator system, integrating a fast-switching low-dropout voltage regulator (LDO), an all-digital phase-locked loop (ADPLL), as well as, high-density embedded non-volatile memories (eNVMs) wherein the sparse floating-point bit encodings of the shared multi-task parameters are carefully stored. Altogether, latency-aware multi-task NLP inference acceleration on the EdgeBERT hardware system generates up to 7x, 2.5x, and 53x lower energy compared to the conventional inference without early stopping, the latency-unbounded early exit approach, and CUDA adaptations on an Nvidia Jetson Tegra X2 mobile GPU, respectively. △ Less

Submitted 5 September, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: 12 pages plus references. Paper to appear at the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021)

arXiv:2011.13720 [pdf, other]

doi 10.1103/PhysRevB.103.115151

Topological entanglement entropy of interacting disordered zigzag graphene ribbons

Authors: Young Heon Kim, Hye Jeong Lee, S. -R. Eric Yang

Abstract: Interacting disordered zigzag graphene nanoribbons have fractional charges, are quasi-one-dimensional, and display an exponentially small gap. Our numerical computations showed that the topological entanglement entropy of these systems has a small finite but universal value, independent of the strength of the interaction and the disorder. The result that was obtained for the topological entangleme… ▽ More Interacting disordered zigzag graphene nanoribbons have fractional charges, are quasi-one-dimensional, and display an exponentially small gap. Our numerical computations showed that the topological entanglement entropy of these systems has a small finite but universal value, independent of the strength of the interaction and the disorder. The result that was obtained for the topological entanglement entropy shows that the disorder-free phase is critical and becomes unstable in the presence of disorder. △ Less

Submitted 21 March, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

Comments: 4 pages, 4 figures, new figures added; to be published in PRB

Journal ref: Phys. Rev. B 103, 115151 (2021)

arXiv:2011.00730 [pdf]

Thermal Conductivities and Interfacial Thermal Conductance of 1- to 3-Layer WSe$_2$

Authors: Elham Easy, Yuan Gao, Yingtao Wang, Dingkai Yan, Seyed M. Goushehgir, Eui-Hyeok Yang, Baoxing Xu, Xian Zhang

Abstract: Atomically thin materials such as graphene and semiconducting transition metal dichalcogenides have attracted extensive interest in recent years, motivating investigation into multiple properties. In this work, we used the opto thermal Raman technique to measure the thermal transport properties of a popular TMDC material WSe$_2$, in single atomic layer, bilayer, and trilayer forms. Atomically thin materials such as graphene and semiconducting transition metal dichalcogenides have attracted extensive interest in recent years, motivating investigation into multiple properties. In this work, we used the opto thermal Raman technique to measure the thermal transport properties of a popular TMDC material WSe$_2$, in single atomic layer, bilayer, and trilayer forms. △ Less

Submitted 7 March, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

arXiv:2010.15269 [pdf, other]

GloFlow: Global Image Alignment for Creation of Whole Slide Images for Pathology from Video

Authors: Viswesh Krishna, Anirudh Joshi, Philip L. Bulterys, Eric Yang, Andrew Y. Ng, Pranav Rajpurkar

Abstract: The application of deep learning to pathology assumes the existence of digital whole slide images of pathology slides. However, slide digitization is bottlenecked by the high cost of precise motor stages in slide scanners that are needed for position information used for slide stitching. We propose GloFlow, a two-stage method for creating a whole slide image using optical flow-based image registra… ▽ More The application of deep learning to pathology assumes the existence of digital whole slide images of pathology slides. However, slide digitization is bottlenecked by the high cost of precise motor stages in slide scanners that are needed for position information used for slide stitching. We propose GloFlow, a two-stage method for creating a whole slide image using optical flow-based image registration with global alignment using a computationally tractable graph-pruning approach. In the first stage, we train an optical flow predictor to predict pairwise translations between successive video frames to approximate a stitch. In the second stage, this approximate stitch is used to create a neighborhood graph to produce a corrected stitch. On a simulated dataset of video scans of WSIs, we find that our method outperforms known approaches to slide-stitching, and stitches WSIs resembling those produced by slide scanners. △ Less

Submitted 12 November, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

arXiv:2010.15054 [pdf, other]

Attribution Preservation in Network Compression for Reliable Network Interpretation

Authors: Geondo Park, June Yong Yang, Sung Ju Hwang, Eunho Yang

Abstract: Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, w… ▽ More Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its pre-compression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020. Code: https://github.com/GeondoPark/attribute-preserve

arXiv:2010.08776 [pdf, other]

The NVIDIA PilotNet Experiments

Authors: Mariusz Bojarski, Chenyi Chen, Joyjit Daw, Alperen Değirmenci, Joya Deri, Bernhard Firner, Beat Flepp, Sachin Gogri, Jesse Hong, Lawrence Jackel, Zhenhua Jia, BJ Lee, Bo Liu, Fei Liu, Urs Muller, Samuel Payne, Nischal Kota Nagendra Prasad, Artem Provodin, John Roach, Timur Rvachov, Neha Tadimeti, Jesper van Engelen, Haiguang Wen, Eric Yang, Zongyi Yang

Abstract: Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as i… ▽ More Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as input and produces a desired vehicle trajectory as output; there are no distinct internal modules connected by human-designed interfaces. We believe that handcrafted interfaces ultimately limit performance by restricting information flow through the system and that a learned approach, in combination with other artificial intelligence systems that add redundancy, will lead to better overall performing systems. We continue to conduct research toward that goal. This document describes the PilotNet lane-keeping effort, carried out over the past five years by our NVIDIA PilotNet group in Holmdel, New Jersey. Here we present a snapshot of system status in mid-2020 and highlight some of the work done by the PilotNet group. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:2010.05533 [pdf, other]

Toward Cross-Lingual Definition Generation for Language Learners

Authors: Cunliang Kong, Liner Yang, Tianzuo Zhang, Qinan Fan, Zhenghao Liu, Yun Chen, Erhong Yang

Abstract: Generating dictionary definitions automatically can prove useful for language learners. However, it's still a challenging task of cross-lingual definition generation. In this work, we propose to generate definitions in English for words in various languages. To achieve this, we present a simple yet effective approach based on publicly available pretrained language models. In this approach, models… ▽ More Generating dictionary definitions automatically can prove useful for language learners. However, it's still a challenging task of cross-lingual definition generation. In this work, we propose to generate definitions in English for words in various languages. To achieve this, we present a simple yet effective approach based on publicly available pretrained language models. In this approach, models can be directly applied to other languages after trained on the English dataset. We demonstrate the effectiveness of this approach on zero-shot definition generation. Experiments and manual analyses on newly constructed datasets show that our models have a strong cross-lingual transfer ability and can generate fluent English definitions for Chinese words. We further measure the lexical complexity of generated and reference definitions. The results show that the generated definitions are much simpler, which is more suitable for language learners. △ Less

Submitted 12 October, 2020; originally announced October 2020.

arXiv:2010.02727 [pdf]

Symbolic Techniques for Deep Learning: Challenges and Opportunities

Authors: Belinda Fang, Elaine Yang, Fei Xie

Abstract: As the number of deep learning frameworks increase and certain ones gain popularity, it spurs the discussion of what methodologies are employed by these frameworks and the reasoning behind them. The goal of this survey is to study how symbolic techniques are utilized in deep learning. To do this, we look at some of the most popular deep learning frameworks being used today, including TensorFlow, K… ▽ More As the number of deep learning frameworks increase and certain ones gain popularity, it spurs the discussion of what methodologies are employed by these frameworks and the reasoning behind them. The goal of this survey is to study how symbolic techniques are utilized in deep learning. To do this, we look at some of the most popular deep learning frameworks being used today, including TensorFlow, Keras, PyTorch, and MXNet. While these frameworks greatly differ from one another, many of them use symbolic techniques, whether it be symbolic execution, graphs, or programming. We focus this paper on symbolic techniques because they influence not only how neural networks are built but also the way in which they are executed. Limitations of symbolic techniques have led to efforts in integrating symbolic and nonsymbolic aspects in deep learning, opening up new possibilities for symbolic techniques. For example, the Gluon API by Apache MXNet bridges the gap between imperative programming and symbolic execution through hybridization. Frameworks such as JANUS attempt to translate imperative programs into symbolic graphs, while approaches like DeepCheck attempt to use symbolic execution to analyze and validate imperative neural network programs. Symbolic analysis has also been paired with concrete execution in a technique called concolic testing in order to better test deep neural networks. Our study of these developments exemplifies just a few of the many ways the symbolic techniques employed by popular frameworks have the opportunity to be altered and utilized to achieve better performance. △ Less

Submitted 1 October, 2020; originally announced October 2020.

arXiv:2008.12619 [pdf, other]

doi 10.3847/1538-4357/ac1596

CMB-S4: Forecasting Constraints on Primordial Gravitational Waves

Authors: CMB-S4 Collaboration, :, Kevork Abazajian, Graeme E. Addison, Peter Adshead, Zeeshan Ahmed, Daniel Akerib, Aamir Ali, Steven W. Allen, David Alonso, Marcelo Alvarez, Mustafa A. Amin, Adam Anderson, Kam S. Arnold, Peter Ashton, Carlo Baccigalupi, Debbie Bard, Denis Barkats, Darcy Barron, Peter S. Barry, James G. Bartlett, Ritoban Basu Thakur, Nicholas Battaglia, Rachel Bean, Chris Bebek , et al. (212 additional authors not shown)

Abstract: CMB-S4---the next-generation ground-based cosmic microwave background (CMB) experiment---is set to significantly advance the sensitivity of CMB measurements and enhance our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. Among the science cases pursued with CMB-S4, the quest for detecting p… ▽ More CMB-S4---the next-generation ground-based cosmic microwave background (CMB) experiment---is set to significantly advance the sensitivity of CMB measurements and enhance our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. Among the science cases pursued with CMB-S4, the quest for detecting primordial gravitational waves is a central driver of the experimental design. This work details the development of a forecasting framework that includes a power-spectrum-based semi-analytic projection tool, targeted explicitly towards optimizing constraints on the tensor-to-scalar ratio, $r$, in the presence of Galactic foregrounds and gravitational lensing of the CMB. This framework is unique in its direct use of information from the achieved performance of current Stage 2--3 CMB experiments to robustly forecast the science reach of upcoming CMB-polarization endeavors. The methodology allows for rapid iteration over experimental configurations and offers a flexible way to optimize the design of future experiments given a desired scientific goal. To form a closed-loop process, we couple this semi-analytic tool with map-based validation studies, which allow for the injection of additional complexity and verification of our forecasts with several independent analysis methods. We document multiple rounds of forecasts for CMB-S4 using this process and the resulting establishment of the current reference design of the primordial gravitational-wave component of the Stage-4 experiment, optimized to achieve our science goals of detecting primordial gravitational waves for $r > 0.003$ at greater than $5σ$, or, in the absence of a detection, of reaching an upper limit of $r < 0.001$ at $95\%$ CL. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 24 pages, 8 figures, 9 tables, submitted to ApJ. arXiv admin note: text overlap with arXiv:1907.04473

arXiv:2008.02985 [pdf, other]

doi 10.1007/s11207-020-01681-5

Reconstructing Highly-twisted Magnetic Fields

Authors: Victor M. Demcsak, Michael S. Wheatland, Alpha Mastrano, Kai E. Yang

Abstract: We investigate the ability of a nonlinear force-free code to calculate highly-twisted magnetic field configurations using the Titov and Démoulin (1999) equilibrium field as a test case. The code calculates a force-free field using boundary conditions on the normal component of the field in the lower boundary, and the normal component of the current density over one polarity of the field in the low… ▽ More We investigate the ability of a nonlinear force-free code to calculate highly-twisted magnetic field configurations using the Titov and Démoulin (1999) equilibrium field as a test case. The code calculates a force-free field using boundary conditions on the normal component of the field in the lower boundary, and the normal component of the current density over one polarity of the field in the lower boundary. The code can also use the current density over both polarities of the field in the lower boundary as a boundary condition. We investigate the accuracy of the reconstructions with increasing flux-rope surface twist number $N_{\textrm{t}}$, achieved by decreasing the sub-surface line current in the model. We find that the code can approximately reconstruct the Titov-Démoulin field for surface twist numbers up to $N_{\textrm{t}} \approx 8.8$. This includes configurations with bald patches. We investigate the ability to recover bald patches, and more generally identify the limitations of our method for highly-twisted fields. The results have implications for our ability to reconstruct coronal magnetic fields from observational data. △ Less

Submitted 7 August, 2020; originally announced August 2020.

Comments: 23 pages, 6 figures, accepted by Solar Physics

arXiv:2008.02956 [pdf, other]

Bootstrapping Neural Processes

Authors: Juho Lee, Yoonho Lee, Jungtaek Kim, Eunho Yang, Sung Ju Hwang, Yee Whye Teh

Abstract: Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely… ▽ More Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility. To this end, we propose the Boostrapping Neural Process (BNP), a novel extension of the NP family using the bootstrap. The bootstrap is a classical data-driven technique for estimating uncertainty, which allows BNP to learn the stochasticity in NPs without assuming a particular form. We demonstrate the efficacy of BNP on various types of data and its robustness in the presence of model-data mismatch. △ Less

Submitted 27 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

Comments: Published in Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) Code is available at https://github.com/juho-lee/bnp

arXiv:2008.02953 [pdf, other]

Neural Complexity Measures

Authors: Yoonho Lee, Juho Lee, Sung Ju Hwang, Eunho Yang, Seungjin Choi

Abstract: While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven wa… ▽ More While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way. The trained NC model can be added to the standard training loss to regularize any task learner in a standard supervised learning scenario. We contrast NC's approach against existing manually-designed complexity measures and other meta-learning models, and we validate NC's performance on multiple regression and classification tasks △ Less

Submitted 23 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

Comments: Published in Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) Code is available at https://github.com/yoonholee/neural-complexity

arXiv:2007.14477 [pdf, ps, other]

GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

Authors: Sajad Sotudeh, Tong Xiang, Hao-Ren Yao, Sean MacAvaney, Eugene Yang, Nazli Goharian, Ophir Frieder

Abstract: Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our expe… ▽ More Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our experiments explore using a domain-tuned contextualized language model (namely, BERT) for this task. We also experiment with different components and configurations (e.g., a multi-view SVM) stacked upon BERT models for specific sub-tasks. Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C. We perform an ablation study which reveals that domain tuning considerably improves the classification performance. Furthermore, error analysis shows common misclassification errors made by our model and outlines research directions for future. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: SemEval 2020

arXiv:2007.12020 [pdf, other]

Few-shot Visual Reasoning with Meta-analogical Contrastive Learning

Authors: Youngsung Kim, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

Abstract: While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human abi… ▽ More While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human ability to identify structural or relational similarity between two sets. Specifically, given training and test sets that contain the same type of visual reasoning problems, we extract the structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning. We repeatedly apply this process with slightly modified queries of the same problem under the assumption that it does not affect the relationship between a training and a test sample. This allows to learn the relational similarity between the two samples in an effective manner even with a single pair of samples. We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce. We further meta-learn our analogical contrastive learning model over the same tasks with diverse attributes, and show that it generalizes to the same visual reasoning problem with unseen attributes. △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:2007.11362 [pdf, other]

Time-Reversal Symmetric ODE Network

Authors: In Huh, Eunho Yang, Sung Ju Hwang, Jinwoo Shin

Abstract: Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discr… ▽ More Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discrepancy in the time evolutions of ODE networks between forward and backward dynamics. Then, we design a new framework, which we name as Time-Reversal Symmetric ODE Networks (TRS-ODENs), that can learn the dynamics of physical systems more sample-efficiently by learning with the proposed loss function. We evaluate TRS-ODENs on several classical dynamics, and find they can learn the desired time evolution from observed noisy and complex trajectories. We also show that, even for systems that do not possess the full time-reversal symmetry, TRS-ODENs can achieve better predictive performances over baselines. △ Less

Submitted 6 January, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: 15 pages; accepted to NeurIPS 2020; Code is available at https://github.com/inhuh/trs-oden; v3: references added, typo corrected

arXiv:2007.08844 [pdf, other]

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Authors: Jaehyung Kim, Youngbum Hur, Sejun Park, Eunho Yang, Sung Ju Hwang, Jinwoo Shin

Abstract: While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo… ▽ More While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class-imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes. △ Less

Submitted 13 September, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: 19 pages; NeurIPS 2020

arXiv:2007.07484 [pdf, other]

A General Family of Stochastic Proximal Gradient Methods for Deep Learning

Authors: Jihun Yun, Aurelie C. Lozano, Eunho Yang

Abstract: We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as speci… ▽ More We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $\ell_q$ regularization ($0 \leq q \leq 1$) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: 21 pages

arXiv:2007.07358 [pdf, other]

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Authors: Youngmin Oh, Kimin Lee, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

Abstract: Experience replay, which enables the agents to remember and reuse experience from the past, has played a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, the existing sampling methods allow selecting out more meaningful experiences by imposing priorities on them based on certain metrics (e.g. TD-error). However, they may resul… ▽ More Experience replay, which enables the agents to remember and reuse experience from the past, has played a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, the existing sampling methods allow selecting out more meaningful experiences by imposing priorities on them based on certain metrics (e.g. TD-error). However, they may result in sampling highly biased, redundant transitions since they compute the sampling rate for each transition independently, without consideration of its importance in relation to other transitions. In this paper, we aim to address the issue by proposing a new learning-based sampling method that can compute the relative importance of transition. To this end, we design a novel permutation-equivariant neural architecture that takes contexts from not only features of each transition (local) but also those of others (global) as inputs. We validate our framework, which we refer to as Neural Experience Replay Sampler (NERS), on multiple benchmark tasks for both continuous and discrete control tasks and show that it can significantly improve the performance of various off-policy RL methods. Further analysis confirms that the improvements of the sample efficiency indeed are due to sampling diverse and meaningful transitions by NERS that considers both local and global contexts. △ Less

Submitted 7 April, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.00884

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

Authors: Kyung-Su Kim, Aurélie C. Lozano, Eunho Yang

Abstract: Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning theory. Especially, using the NTK-based approach, the following three representative results were obtained: (1) A training error bound was derived to show that net… ▽ More Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning theory. Especially, using the NTK-based approach, the following three representative results were obtained: (1) A training error bound was derived to show that networks can fit any finite training sample perfectly by reflecting a tighter characterization of training speed depending on the data complexity. (2) A generalization error bound invariant of network size was derived by using a data-dependent complexity measure (CMD). It follows from this CMD bound that networks can generalize arbitrary smooth functions. (3) A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. This kernel outperforms its corresponding network and the existing gold standard, Random Forests, in few shot learning. For all of these results to hold, the network scaling factor $κ$ should decrease w.r.t. sample size n. In this case of decreasing $κ$, however, we prove that the aforementioned results are surprisingly erroneous. It is because the output value of trained network decreases to zero when $κ$ decreases w.r.t. n. To solve this problem, we tighten key bounds by essentially removing $κ$-affected values. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results. △ Less

Submitted 6 August, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: We spotted an error in the proof of Lemma A.4 and are investigating whether this can be corrected. Furthermore, the authors of the original paper have informed us that they are fixing the lemma upon which our theorem 3.2 builds. Therefore, we are removing the current version of our paper

arXiv:2007.00873 [pdf, other]

Compressed Sensing via Measurement-Conditional Generative Models

Authors: Kyung-Su Kim, Jung Hyun Lee, Eunho Yang

Abstract: A pre-trained generator has been frequently adopted in compressed sensing (CS) due to its ability to effectively estimate signals with the prior of NNs. In order to further refine the NN-based prior, we propose a framework that allows the generator to utilize additional information from a given measurement for prior learning, thereby yielding more accurate prediction for signals. As our framework… ▽ More A pre-trained generator has been frequently adopted in compressed sensing (CS) due to its ability to effectively estimate signals with the prior of NNs. In order to further refine the NN-based prior, we propose a framework that allows the generator to utilize additional information from a given measurement for prior learning, thereby yielding more accurate prediction for signals. As our framework has a simple form, it is easily applied to existing CS methods using pre-trained generators. We demonstrate through extensive experiments that our framework exhibits uniformly superior performances by large margin and can reduce the reconstruction error up to an order of magnitude for some applications. We also explain the experimental success in theory by showing that our framework can slightly relax the stringent signal presence condition, which is required to guarantee the success of signal recovery. △ Less

Submitted 2 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

arXiv:2006.14222 [pdf, other]

Set Based Stochastic Subsampling

Authors: Bruno Andreis, Seanie Lee, A. Tuan Nguyen, Juho Lee, Eunho Yang, Sung Ju Hwang

Abstract: Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} usin… ▽ More Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models. △ Less

Submitted 30 May, 2022; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: 20 pages

arXiv:2006.12777 [pdf, other]

Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning

Authors: A. Tuan Nguyen, Hyewon Jeong, Eunho Yang, Sung Ju Hwang

Abstract: Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., predict… ▽ More Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., prediction of mortality risk). Existing asymmetric multi-task learning methods tackle this negative transfer problem by performing knowledge transfer from tasks with low loss to tasks with high loss. However, using loss as a measure of reliability is risky since it could be a result of overfitting. In the case of time-series prediction tasks, knowledge learned for one task (e.g., predicting the sepsis onset) at a specific timestep may be useful for learning another task (e.g., prediction of mortality) at a later timestep, but lack of loss at each timestep makes it difficult to measure the reliability at each timestep. To capture such dynamically changing asymmetric relationships between tasks in time-series data, we propose a novel temporal asymmetric multi-task learning model that performs knowledge transfer from certain tasks/timesteps to relevant uncertain tasks, based on feature-level uncertainty. We validate our model on multiple clinical risk prediction tasks against various deep learning models for time-series prediction, which our model significantly outperforms, without any sign of negative transfer. Further qualitative analysis of learned knowledge graphs by clinicians shows that they are helpful in analyzing the predictions of the model. Our final code is available at https://github.com/anhtuan5696/TPAMTL. △ Less

Submitted 18 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: AAAI 2021. The first two authors contributed equally to this work. 10 pages, 4 figures, 4 tables

arXiv:2006.12139 [pdf, other]

Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

Authors: Minyoung Song, Jaehong Yoon, Eunho Yang, Sung Ju Hwang

Abstract: As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of a given network. A common limitation of most existing pruning techniques, is that they require pre-training of the network at least once before pruning, and th… ▽ More As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of a given network. A common limitation of most existing pruning techniques, is that they require pre-training of the network at least once before pruning, and thus we can benefit from reduction in memory and computation only at the inference time. However, reducing the training cost of neural networks with rapid structural pruning may be beneficial either to minimize monetary cost with cloud computing or to enable on-device learning on a resource-limited device. Recently introduced random-weight pruning approaches can eliminate the needs of pretraining, but they often obtain suboptimal performance over conventional pruning techniques and also does not allow for faster training since they perform unstructured pruning. To overcome their limitations, we propose Set-based Task-Adaptive Meta Pruning (STAMP), which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. To ensure maximum performance improvements on the target task, we meta-learn the mask generator over different subsets of the reference dataset, such that it can generalize well to any unseen datasets within a few gradient steps of training. We validate STAMP against recent advanced pruning methods on benchmark datasets, on which it not only obtains significantly improved compression rates over the baselines at similar accuracy, but also orders of magnitude faster training speed. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:2006.12097 [pdf, other]

Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning

Authors: Wonyong Jeong, Jaehong Yoon, Eunho Yang, Sung Ju Hwang

Abstract: While existing federated learning approaches mostly require that clients have fully-labeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be eith… ▽ More While existing federated learning approaches mostly require that clients have fully-labeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be either partly labeled, or completely unlabeled with labeled data being available only at the server, which leads us to a new practical federated learning problem, namely Federated Semi-Supervised Learning (FSSL). In this work, we study two essential scenarios of FSSL based on the location of the labeled data. The first scenario considers a conventional case where clients have both labeled and unlabeled data (labels-at-client), and the second scenario considers a more challenging case, where the labeled data is only available at the server (labels-at-server). We then propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch). FedMatch improves upon naive combinations of federated learning and semi-supervised learning approaches with a new inter-client consistency loss and decomposition of the parameters for disjoint learning on labeled and unlabeled data. Through extensive experimental validation of our method in the two different scenarios, we show that our method outperforms both local semi-supervised learning and baselines which naively combine federated learning with semi-supervised learning. The code is available at https://github.com/wyjeong/FedMatch. △ Less

Submitted 29 March, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Journal ref: International Conference on Learning Representations (ICLR 2021), International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2020 (FL-ICML'20)

arXiv:2006.05419 [pdf, other]

Cost-effective Interactive Attention Learning with Neural Attention Processes

Authors: Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang Joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang

Abstract: We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is… ▽ More We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2005.00304 [pdf]

doi 10.1103/PhysRevB.102.115114

Thickness dependence of electronic and crystal structures in VO$_2$ ultrathin films: suppression of the collaborative Mott-Peierls transition

Authors: D. Shiga, B. E. Yang, N. Hasegawa, T. Kanda, R. Tokunaga, K. Yoshimatsu, R. Yukawa, M. Kitamura, K. Horiba, H. Kumigashira

Abstract: Through ${in~situ}$ photoemission spectroscopy, we investigated the change in the electronic and crystal structures of dimensionality-controlled VO$_2$ films coherently grown on TiO$_2$(001) substrates. In the nanostructured films, the balance between the instabilities of a bandlike Peierls transition and a Mott transition is controlled as a function of thickness. The characteristic spectral chang… ▽ More Through ${in~situ}$ photoemission spectroscopy, we investigated the change in the electronic and crystal structures of dimensionality-controlled VO$_2$ films coherently grown on TiO$_2$(001) substrates. In the nanostructured films, the balance between the instabilities of a bandlike Peierls transition and a Mott transition is controlled as a function of thickness. The characteristic spectral change associated with temperature-driven metal-insulator transition in VO$_2$ thick films holds down to 1.5 nm (roughly corresponding to five V atoms along the [001] direction), whereas VO$_2$ films of less than 1.0 nm exhibit insulating nature without V-V dimerization. These results suggest that the delicate balance between a Mott instability and a bandlike Peierls instability is modulated at a scale of a few nanometers by the dimensional crossover effects and confinement effects, which consequently induce the complicated electronic phase diagram of ultrathin VO$_2$ films. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: 30 pages, 4 main figures, 4 supplementary figures

Journal ref: Phys. Rev. B 102, 115114 (2020)

arXiv:2004.14125 [pdf, ps, other]

doi 10.1103/PhysRevResearch.2.033109

Topologically ordered zigzag nanoribbon: $e/2$ fractional edge charge, spin-charge separation, and ground state degeneracy

Authors: S. -R. Eric Yang, Min-Chul Cha, Hye Jeong Lee, Young Heon Kim

Abstract: We numerically compute the density of states (DOS) of interacting disordered zigzag graphene nanoribbon (ZGNR) having midgap states showing $e/2$ fractional edge charges. The computed Hartree-Fock DOS is linear at the critical disorder strength where the gap vanishes. This implies an $I\mbox{-}V$ curve of $I\propto V^2$. Thus, $I\mbox{-}V$ curve measurement may yield evidence of fractional charges… ▽ More We numerically compute the density of states (DOS) of interacting disordered zigzag graphene nanoribbon (ZGNR) having midgap states showing $e/2$ fractional edge charges. The computed Hartree-Fock DOS is linear at the critical disorder strength where the gap vanishes. This implies an $I\mbox{-}V$ curve of $I\propto V^2$. Thus, $I\mbox{-}V$ curve measurement may yield evidence of fractional charges in interacting disordered ZGNR. We show that even a weak disorder potential acts as a singular perturbation on zigzag edge electronic states, producing drastic changes in the energy spectrum. Spin-charge separation and fractional charges play a key role in the reconstruction of edge antiferromagnetism. Our results show that an interacting disordered ZGNR is a topologically ordered Mott-Anderson insulator. △ Less

Submitted 22 July, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: 10 pages, 18 figures, Published version, Phys. Rev. Research 2, 033109 (2020)

arXiv:2004.12510 [pdf, other]

doi 10.1007/s11207-020-01663-7

Self-consistent Nonlinear Force-free Field Reconstruction from Weighted Boundary Conditions

Authors: Alpha Mastrano, Kai E. Yang, Michael S. Wheatland

Abstract: Vector magnetogram data are often used as photospheric boundary conditions for force-free coronal magnetic field extrapolations. In general, however, vector magnetogram data are not consistent with the force-free assumption. In this article, we demonstrate a way to deal with inconsistent boundary data, by generalizing the "self-consistency procedure" of Wheatland & Regnier (2009). In that procedur… ▽ More Vector magnetogram data are often used as photospheric boundary conditions for force-free coronal magnetic field extrapolations. In general, however, vector magnetogram data are not consistent with the force-free assumption. In this article, we demonstrate a way to deal with inconsistent boundary data, by generalizing the "self-consistency procedure" of Wheatland & Regnier (2009). In that procedure, the inconsistency is resolved by an iterative process of constructing two solutions based on the values of the force-free parameter alpha on the two polarities of the field in the boundary (the P and N polarities), and taking uncertainty-weighted averages of the boundary alpha values in the P and N solutions. When the alpha values in the P and N regions are very different, the self-consistent solution may lose high alpha values from the boundary conditions. We show how, by altering the weighting of the uncertainties in the P or N boundary conditions, we can preserve high alpha values in the self-consistent solution. The weighted self-consistent extrapolation method is demonstrated on an analytic bipole field and applied to vector magnetogram data taken by the Helioseismic and Magnetic Imager (HMI) instrument for NOAA active region AR 12017 on 2014 March 29. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: 12 pages, 5 figures, submitted to Solar Physics

arXiv:2004.08590 [pdf, other]

doi 10.3847/1538-4357/ab8810

Relative Magnetic Helicity Based on a Periodic Potential Field

Authors: Kai E. Yang, Michael S. Wheatland, Stuart A. Gilchrist

Abstract: Magnetic helicity is conserved under ideal magnetohydrodynamics (MHD) and quasi-conserved even under a resistive process. The standard definition for magnetic helicity cannot be applied directly to an open magnetic field in a volume, because it is gauge-dependent. Instead, the relative magnetic helicity is widely used. We find that the energy of a potential magnetic field in a rectangular domain w… ▽ More Magnetic helicity is conserved under ideal magnetohydrodynamics (MHD) and quasi-conserved even under a resistive process. The standard definition for magnetic helicity cannot be applied directly to an open magnetic field in a volume, because it is gauge-dependent. Instead, the relative magnetic helicity is widely used. We find that the energy of a potential magnetic field in a rectangular domain with periodic lateral boundary conditions is less than that of the field with a fixed normal component on all six boundaries. To make use of this lower energy potential field in the analysis of relative magnetic helicity, we introducing a new definition for magnetic helicity for the magnetic field, which involves the periodic potential field. We apply this definition to a sequence of analytic solutions and a numerical simulation. The results show that our new gauge-invariant helicity is very close to the current-carrying part of the relative magnetic helicity of the original magnetic field. We find also that the ratio between the current-carrying helicity and the relative magnetic helicity for the original and our defined relative helicity show different behavior. It seems that the new helicity is more sensitive to the component of the field due to the electric current in the volume, which is the source for instabilities and solar eruptive phenomena. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: 33 pages, 7 figures, accepted by ApJ

arXiv:2004.07955 [pdf, other]

Targeted Attack for Deep Hashing based Retrieval

Authors: Jiawang Bai, Bin Chen, Yiming Li, Dongxian Wu, Weiwei Guo, Shu-tao Xia, En-hui Yang

Abstract: The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a point-to-set optimization, which minimizes the avera… ▽ More The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a point-to-set optimization, which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label. Then we design a novel component-voting scheme to obtain an anchor code as the representative of the set of hash codes of objects with the target label, whose optimality guarantee is also theoretically derived. To balance the performance and perceptibility, we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the $\ell^\infty$ restriction on the perturbation. Extensive experiments verify that DHTA is effective in attacking both deep hashing based image retrieval and video retrieval. △ Less

Submitted 23 July, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

Comments: Accepted by ECCV 2020 as Oral

arXiv:2003.03196 [pdf, other]

Federated Continual Learning with Weighted Inter-client Transfer

Authors: Jaehong Yoon, Wonyong Jeong, Giwoong Lee, Eunho Yang, Sung Ju Hwang

Abstract: There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utiliz… ▽ More There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utilizing knowledge from other clients, while preventing interference from irrelevant knowledge. To resolve these issues, we propose a novel federated continual learning framework, Federated Weighted Inter-client Transfer (FedWeIT), which decomposes the network weights into global federated parameters and sparse task-specific parameters, and each client receives selective knowledge from other clients by taking a weighted combination of their task-specific parameters. FedWeIT minimizes interference between incompatible tasks, and also allows positive knowledge transfer across clients during learning. We validate our FedWeIT against existing federated learning and continual learning methods under varying degrees of task similarity across clients, and our model significantly outperforms them with a large reduction in the communication cost. Code is available at https://github.com/wyjeong/FedWeIT △ Less

Submitted 14 June, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: ICML 2021

arXiv:2002.06561 [pdf, other]

doi 10.1007/s11633-022-1412-6

Generalized Embedding Machines for Recommender Systems

Authors: Enneng Yang, Xin Xin, Li Shen, Guibing Guo

Abstract: Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we pro… ▽ More Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we propose an alternative approach to model high-order interaction signals in the embedding level, namely Generalized Embedding Machine (GEM). The embedding used in GEM encodes not only the information from the feature itself but also the information from other correlated features. Under such situation, the embedding becomes high-order. Then we can incorporate GEM with FM and even its advanced variants to perform feature interactions. More specifically, in this paper we utilize graph convolution networks (GCN) to generate high-order embeddings. We integrate GEM with several FM-based models and conduct extensive experiments on two real-world datasets. The results demonstrate significant improvement of GEM over corresponding baselines. △ Less

Submitted 16 February, 2020; originally announced February 2020.

Comments: 8 pages

Journal ref: Machine Intelligence Research (2024): 1-14

arXiv:2002.05254 [pdf, other]

doi 10.1007/s10909-019-02299-z

Optical Design and Characterization of 40-GHz Detector and Module for the BICEP Array

Authors: A. Soliman, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, M. Dierickx, L. Duband, S. Fatigoni, J. P. Filippini, G. Hall, M. Halpern, S. Harrison, S. Henderson, S. R. Hildebrandt , et al. (44 additional authors not shown)

Abstract: Families of cosmic inflation models predict a primordial gravitational-wave background that imprints B-mode polarization pattern in the Cosmic Microwave Background (CMB). High sensitivity instruments with wide frequency coverage and well-controlled systematic errors are needed to constrain the faint B-mode amplitude. We have developed antenna-coupled Transition Edge Sensor (TES) arrays for high-se… ▽ More Families of cosmic inflation models predict a primordial gravitational-wave background that imprints B-mode polarization pattern in the Cosmic Microwave Background (CMB). High sensitivity instruments with wide frequency coverage and well-controlled systematic errors are needed to constrain the faint B-mode amplitude. We have developed antenna-coupled Transition Edge Sensor (TES) arrays for high-sensitivity polarized CMB observations over a wide range of millimeter-wave bands. BICEP Array, the latest phase of the BICEP/Keck experiment series, is a multi-receiver experiment designed to search for inflationary B-mode polarization to a precision $σ$(r) between 0.002 and 0.004 after 3 full years of observations, depending on foreground complexity and the degree of lensing removal. We describe the electromagnetic design and measured performance of BICEP Array low-frequency 40-GHz detector, their packaging in focal plane modules, and optical characterization including efficiency and beam matching between polarization pairs. We summarize the design and simulated optical performance, including an approach to improve the optical efficiency due to mismatch losses. We report the measured beam maps for a new broad-band corrugation design to minimize beam differential ellipticity between polarization pairs caused by interactions with the module housing frame, which helps minimize polarized beam mismatch that converts CMB temperature to polarization ($T \rightarrow P$) anisotropy in CMB maps. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: 8 pages, 7 figures, Accepted by the Journal of Low Temperature Physics (Proceedings of the 18th International Workshop on Low Temperature Detectors)

arXiv:2002.05228 [pdf, other]

doi 10.1007/s10909-020-02394-6

Design and performance of the first BICEP Array receiver

Authors: A. Schillaci, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, M. Dierickx, L. Duband, S. Fatigoni, J. P. Filippini, G. Hall, M. Halpern, S. Harrison, S. Henderson, S. R. Hildebrandt , et al. (44 additional authors not shown)

Abstract: Branches of cosmic inflationary models, such as slow-roll inflation, predict a background of primordial gravitational waves that imprints a unique odd-parity B-mode pattern in the Cosmic Microwave Background (CMB) at amplitudes that are within experimental reach. The BICEP/Keck (BK) experiment targets this primordial signature, the amplitude of which is parameterized by the tensor-to-scalar ratio… ▽ More Branches of cosmic inflationary models, such as slow-roll inflation, predict a background of primordial gravitational waves that imprints a unique odd-parity B-mode pattern in the Cosmic Microwave Background (CMB) at amplitudes that are within experimental reach. The BICEP/Keck (BK) experiment targets this primordial signature, the amplitude of which is parameterized by the tensor-to-scalar ratio r, by observing the polarized microwave sky through the exceptionally clean and stable atmosphere at the South Pole. B-mode measurements require an instrument with exquisite sensitivity, tight control of systematics, and wide frequency coverage to disentangle the primordial signal from the Galactic foregrounds. BICEP Array represents the most recent stage of the BK program, and comprises four BICEP3-class receivers observing at 30/40, 95, 150 and 220/270 GHz. The 30/40 GHz receiver will be deployed at the South Pole during the 2019/2020 austral summer. After 3 full years of observations with 30,000+ detectors, BICEP Array will measure primordial gravitational waves to a precision $σ(r)$ between 0.002 and 0.004, depending on foreground complexity and the degree of lensing removal. In this paper we give an overview of the instrument, highlighting the design features in terms of cryogenics, magnetic shielding, detectors and readout architecture as well as reporting on the integration and tests that are ongoing with the first receiver at 30/40 GHz. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: 9 pages, 5 figures, presented at LTD18 in Milan (July 2019), accepted on JLTP (February 2020)

arXiv:2002.05219 [pdf, other]

doi 10.1007/s10909-020-02411-8

Characterizing the Sensitivity of 40 GHz TES Bolometers for BICEP Array

Authors: C. Zhang, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, M. Dierickx, L. Duband, S. Fatigoni, J. P. Filippini, G. Hall, M. Halpern, S. Harrison, S. Henderson, S. R. Hildebrandt , et al. (44 additional authors not shown)

Abstract: The BICEP/Keck (BK) experiment aims to detect the imprint of primordial gravitational waves in the Cosmic Microwave Background polarization, which would be direct evidence of the inflation theory. While the tensor-to-scalar ratio has been constrained to be r_0.05 < 0.06 at 95% c.l., further improvements on this upper limit are hindered by polarized Galactic foreground emissions and removal of grav… ▽ More The BICEP/Keck (BK) experiment aims to detect the imprint of primordial gravitational waves in the Cosmic Microwave Background polarization, which would be direct evidence of the inflation theory. While the tensor-to-scalar ratio has been constrained to be r_0.05 < 0.06 at 95% c.l., further improvements on this upper limit are hindered by polarized Galactic foreground emissions and removal of gravitational lensing polarization. The 30/40 GHz receiver of the BICEP Array (BA) will deploy at the end of 2019 and will constrain the synchrotron foreground with unprecedented accuracy within the BK sky patch. We will show the design of the 30/40 GHz detectors and test results summarizing its performance. The low optical and atmospheric loading at these frequencies requires our TES detectors to have low saturation power in order to be photon-noise dominated. To realize the low thermal conductivity required from a 250 mK base temperature, we developed new bolometer leg designs. We will present the relevant measured detector parameters: G, Tc, Rn, Psat , and spectral bands, and noise spectra. We achieved a per bolometer NEP including all noise components of 2.07E-17 W/sqrt(Hz), including an anticipated photon noise level 1.54E-17 W/sqrt(Hz). △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: Accepted for publication in Journal of Low Temperature Physics

arXiv:2002.05197 [pdf, other]

doi 10.1007/s10909-020-02392-8

Optical characterization of the Keck Array and BICEP3 CMB Polarimeters from 2016 to 2019

Authors: The BICEP/Keck Collaboration, :, T. St Germaine, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, J. J. Bock, H. Boenish, E. Bullock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, M. Dierickx, L. Duband, S. Fatigoni, J. P. Filippini, S. Fliescher, J. A. Grayson, G. Hall , et al. (50 additional authors not shown)

Abstract: The BICEP/Keck experiment (BK) is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background (CMB) polarization from the South Pole in search of a primordial $B$-mode signature. This $B$-mode signal arises from primordial gravitational waves interacting with the CMB, and has amplitude parametrized by the tensor-to-scalar ratio $r$. Since 2016, BICEP3 and th… ▽ More The BICEP/Keck experiment (BK) is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background (CMB) polarization from the South Pole in search of a primordial $B$-mode signature. This $B$-mode signal arises from primordial gravitational waves interacting with the CMB, and has amplitude parametrized by the tensor-to-scalar ratio $r$. Since 2016, BICEP3 and the Keck Array have been observing with 4800 total antenna-coupled transition-edge sensor detectors, with frequency bands spanning 95, 150, 220, and 270 GHz. Here we present the optical performance of these receivers from 2016 to 2019, including far-field beams measured in situ with an improved chopped thermal source and instrument spectral response measured with a field-deployable Fourier Transform Spectrometer. As a pair differencing experiment, an important systematic that must be controlled is the differential beam response between the co-located, orthogonally polarized detectors. We generate per-detector far-field beam maps and the corresponding differential beam mismatch that is used to estimate the temperature-to-polarization leakage in our CMB maps and to give feedback on detector and optics fabrication. The differential beam parameters presented here were estimated using improved low-level beam map analysis techniques, including efficient removal of non-Gaussian noise as well as improved spatial masking. These techniques help minimize systematic uncertainty in the beam analysis, with the goal of constraining the bias on $r$ induced by temperature-to-polarization leakage to be subdominant to the statistical uncertainty. This is essential as we progress to higher detector counts in the next generation of CMB experiments. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: 8 pages, 3 figures. Accepted by the Journal of Low Temperature Physics (Proceedings of the 18th International Workshop on Low Temperature Detectors)

Showing 151–200 of 344 results for author: Yang, E