Search | arXiv e-print repository

doi 10.1109/CRV60082.2023.00026

Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data

Authors: Andrej Janda, Brandon Wagstaff, Edwin G. Ng, Jonathan Kelly

Abstract: Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of… ▽ More Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point clouds exclusively. While useful, this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene and can be applied to cases where localization information is unavailable. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods. △ Less

Submitted 4 September, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: In Proceedings of the Conference on Robots and Vision (CRV'23), Montreal, Canada, Jun. 6-8, 2023. arXiv admin note: substantial text overlap with arXiv:2211.11801

arXiv:2211.11801 [pdf, other]

Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data

Authors: Andrej Janda, Brandon Wagstaff, Edwin G. Ng, Jonathan Kelly

Abstract: Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amo… ▽ More Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods. △ Less

Submitted 16 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: In Proceedings of the Conference on Robot Learning (CoRL'22) Workshop on Pre-Training Robot Learning, Auckland, New Zealand, December 15, 2022

arXiv:2110.08614 [pdf, other]

Deep Learning and Spectral Embedding for Graph Partitioning

Authors: Alice Gatti, Zhixiong Hu, Tess Smidt, Esmond G. Ng, Pieter Ghysels

Abstract: We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module i… ▽ More We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module is trained through a loss function that corresponds to the expected value of the normalized cut. Both parts of the neural network rely on SAGE convolutional layers and graph coarsening using heavy edge matching. The multilevel structure of the neural network is inspired by the multigrid algorithm. Our approach generalizes very well to bigger graphs and has partition quality comparable to METIS, Scotch and spectral partitioning, with shorter runtime compared to METIS and spectral partitioning. △ Less

Submitted 8 December, 2021; v1 submitted 16 October, 2021; originally announced October 2021.

arXiv:2104.03546 [pdf, other]

Graph Partitioning and Sparse Matrix Ordering using Reinforcement Learning and Graph Neural Networks

Authors: Alice Gatti, Zhixiong Hu, Tess Smidt, Esmond G. Ng, Pieter Ghysels

Abstract: We present a novel method for graph partitioning, based on reinforcement learning and graph convolutional neural networks. Our approach is to recursively partition coarser representations of a given graph. The neural network is implemented using SAGE graph convolution layers, and trained using an advantage actor critic (A2C) agent. We present two variants, one for finding an edge separator that mi… ▽ More We present a novel method for graph partitioning, based on reinforcement learning and graph convolutional neural networks. Our approach is to recursively partition coarser representations of a given graph. The neural network is implemented using SAGE graph convolution layers, and trained using an advantage actor critic (A2C) agent. We present two variants, one for finding an edge separator that minimizes the normalized cut or quotient cut, and one that finds a small vertex separator. The vertex separators are then used to construct a nested dissection ordering to permute a sparse matrix so that its triangular factorization will incur less fill-in. The partitioning quality is compared with partitions obtained using METIS and SCOTCH, and the nested dissection ordering is evaluated in the sparse solver SuperLU. Our results show that the proposed method achieves similar partitioning quality as METIS and SCOTCH. Furthermore, the method generalizes across different classes of graphs, and works well on a variety of graphs from the SuiteSparse sparse matrix collection. △ Less

Submitted 28 June, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

arXiv:2104.03416 [pdf, ps, other]

doi 10.21437/Interspeech.2021-337

Pushing the Limits of Non-Autoregressive Speech Recognition

Authors: Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan

Abstract: We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve… ▽ More We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model. △ Less

Submitted 11 September, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: Proceedings of INTERSPEECH

arXiv:2012.02339 [pdf, other]

Understanding Guided Image Captioning Performance across Domains

Authors: Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut

Abstract: Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models generally lack the ability to provide long descriptive answers, while expecting the textual question to be quite precise. We present a method to control the conc… ▽ More Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models generally lack the ability to provide long descriptive answers, while expecting the textual question to be quite precise. We present a method to control the concepts that an image caption should focus on, using an additional input called the guiding text that refers to either groundable or ungroundable concepts in the image. Our model consists of a Transformer-based multimodal encoder that uses the guiding text together with global and object-level image features to derive early-fusion representations used to generate the guided caption. While models trained on Visual Genome data have an in-domain advantage of fitting well when guided with automatic object labels, we find that guided captioning models trained on Conceptual Captions generalize better on out-of-domain images and guiding texts. Our human-evaluation results indicate that attempting in-the-wild guided image captioning requires access to large, unrestricted-domain training datasets, and that increased style diversity (even without increasing the number of unique tokens) is a key factor for improved performance. △ Less

Submitted 10 November, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: Proceedings of CoNLL 2021

arXiv:1810.04009 [pdf, ps, other]

doi 10.1103/PhysRevC.99.054308

Deep learning: Extrapolation tool for ab initio nuclear theory

Authors: Gianina Alina Negoita, James P. Vary, Glenn R. Luecke, Pieter Maris, Andrey M. Shirokov, Ik Jae Shin, Youngman Kim, Esmond G. Ng, Chao Yang, Matthew Lockner, Gurpur M. Prabhu

Abstract: Ab initio approaches in nuclear theory, such as the no-core shell model (NCSM), have been developed for approximately solving finite nuclei with realistic strong interactions. The NCSM and other approaches require an extrapolation of the results obtained in a finite basis space to the infinite basis space limit and assessment of the uncertainty of those extrapolations. Each observable requires a s… ▽ More Ab initio approaches in nuclear theory, such as the no-core shell model (NCSM), have been developed for approximately solving finite nuclei with realistic strong interactions. The NCSM and other approaches require an extrapolation of the results obtained in a finite basis space to the infinite basis space limit and assessment of the uncertainty of those extrapolations. Each observable requires a separate extrapolation and most observables have no proven extrapolation method. We propose a feed-forward artificial neural network (ANN) method as an extrapolation tool to obtain the ground state energy and the ground state point-proton root-mean-square (rms) radius along with their extrapolation uncertainties. The designed ANNs are sufficient to produce results for these two very different observables in $^6$Li from the ab initio NCSM results in small basis spaces that satisfy the following theoretical physics condition: independence of basis space parameters in the limit of extremely large matrices. Comparisons of the ANN results with other extrapolation methods are also provided. △ Less

Submitted 6 June, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

Comments: 13 pages, 6 figures. Some typos were fixed, e.g., replaced MSE units for the observables with observables' square units. arXiv admin note: text overlap with arXiv:1803.03215

Journal ref: Phys. Rev. C 99, 054308 (2019)

arXiv:1704.05923 [pdf, other]

doi 10.1021/acs.jctc.7b00402

A Model Order Reduction Algorithm for Estimating the Absorption Spectrum

Authors: Roel Van Beeumen, David B. Williams-Young, Joseph M. Kasper, Chao Yang, Esmond G. Ng, Xiaosong Li

Abstract: The ab initio description of the spectral interior of the absorption spectrum poses both a theoretical and computational challenge for modern electronic structure theory. Due to the often spectrally dense character of this domain in the quantum propagator's eigenspectrum for medium-to-large sized systems, traditional approaches based on the partial diagonalization of the propagator often encounter… ▽ More The ab initio description of the spectral interior of the absorption spectrum poses both a theoretical and computational challenge for modern electronic structure theory. Due to the often spectrally dense character of this domain in the quantum propagator's eigenspectrum for medium-to-large sized systems, traditional approaches based on the partial diagonalization of the propagator often encounter oscillatory and stagnating convergence. Electronic structure methods which solve the molecular response problem through the solution of spectrally shifted linear systems, such as the complex polarization propagator, offer an alternative approach which is agnostic to the underlying spectral density or domain location. This generality comes at a seemingly high computational cost associated with solving a large linear system for each spectral shift in some discretization of the spectral domain of interest. We present a novel, adaptive solution based on model order reduction techniques via interpolation. Model order reduction reduces the computational complexity of mathematical models and is ubiquitous in the simulation of dynamical systems. The efficiency and effectiveness of the proposed algorithm in the ab initio prediction of X-Ray absorption spectra is demonstrated using a test set of challenging water clusters which are spectrally dense in the neighborhood of the oxygen K-edge. Based on a single, user defined tolerance we automatically determine the order of the reduced models and approximate the absorption spectrum up to the given tolerance. We also illustrate that the automatically determined model order increases logarithmically with the problem dimension, compared to a linear increase of the number of eigenvalues within the energy window. Furthermore, we observed that the computational cost of the proposed algorithm only scales quadratically with respect to the problem dimension. △ Less

Submitted 30 August, 2017; v1 submitted 19 April, 2017; originally announced April 2017.

arXiv:1610.08128 [pdf, other]

The Reverse Cuthill-McKee Algorithm in Distributed-Memory

Authors: Ariful Azad, Mathias Jacquelin, Aydin Buluc, Esmond G. Ng

Abstract: Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributed-memory architectures. In this paper, we present t… ▽ More Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributed-memory architectures. In this paper, we present the first-ever distributed-memory implementation of the reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a two-dimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices. △ Less

Submitted 25 October, 2016; originally announced October 2016.

arXiv:1609.01689 [pdf, other]

doi 10.1016/j.cpc.2017.09.004

Accelerating Nuclear Configuration Interaction Calculations through a Preconditioned Block Iterative Eigensolver

Authors: Meiyue Shao, Hasan Metin Aktulga, Chao Yang, Esmond G. Ng, Pieter Maris, James P. Vary

Abstract: We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterati… ▽ More We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system. △ Less

Submitted 8 September, 2017; v1 submitted 6 September, 2016; originally announced September 2016.

Comments: Accepted for publication in Computer Physics Communications

Journal ref: Computer Physics Communications, 222:1--13, 2018

Showing 1–10 of 10 results for author: Ng, E G