Search | arXiv e-print repository

The Star Geometry of Critic-Based Regularizer Learning

Authors: Oscar Leong, Eliza O'Reilly, Yong Sheng Soh

Abstract: Variational regularization is a classical technique to solve statistical inference tasks and inverse problems, with modern data-driven approaches parameterizing regularizers via deep neural networks showcasing impressive empirical performance. Recent works along these lines learn task-dependent regularizers. This is done by integrating information about the measurements and ground-truth data in an… ▽ More Variational regularization is a classical technique to solve statistical inference tasks and inverse problems, with modern data-driven approaches parameterizing regularizers via deep neural networks showcasing impressive empirical performance. Recent works along these lines learn task-dependent regularizers. This is done by integrating information about the measurements and ground-truth data in an unsupervised, critic-based loss function, where the regularizer attributes low values to likely data and high values to unlikely data. However, there is little theory about the structure of regularizers learned via this process and how it relates to the two data distributions. To make progress on this challenge, we initiate a study of optimizing critic-based loss functions to learn regularizers over a particular family of regularizers: gauges (or Minkowski functionals) of star-shaped bodies. This family contains regularizers that are commonly employed in practice and shares properties with regularizers parameterized by deep neural networks. We specifically investigate critic-based losses derived from variational representations of statistical distances between probability measures. By leveraging tools from star geometry and dual Brunn-Minkowski theory, we illustrate how these losses can be interpreted as dual mixed volumes that depend on the data distribution. This allows us to derive exact expressions for the optimal regularizer in certain cases. Finally, we identify which neural network architectures give rise to such star body gauges and when do such regularizers have favorable properties for optimization. More broadly, this work highlights how the tools of star geometry can aid in understanding the geometry of unsupervised regularizer learning. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2403.06348 [pdf, other]

Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation

Authors: Jan Laukemann, Ahmed E. Helal, S. Isaac Geronimo Anderson, Fabio Checconi, Yongseok Soh, Jesmin Jahan Tithi, Teresa Ranadive, Brian J Gravelle, Fabrizio Petrini, Jee Choi

Abstract: High-dimensional sparse data emerge in many critical application domains such as cybersecurity, healthcare, anomaly detection, and trend analysis. To quickly extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes, da… ▽ More High-dimensional sparse data emerge in many critical application domains such as cybersecurity, healthcare, anomaly detection, and trend analysis. To quickly extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes, data distributions, and sparsity, which pose significant challenges for making efficient use of modern parallel architectures. This study breaks the prevailing assumption that compressing sparse tensors into coarse-grained structures (i.e., tensor slices or blocks) or along a particular dimension/mode (i.e., mode-specific) is more efficient than keeping them in a fine-grained, mode-agnostic form. Our novel sparse tensor representation, Adaptive Linearized Tensor Order (ALTO), encodes tensors in a compact format that can be easily streamed from memory and is amenable to both caching and parallel execution. To demonstrate the efficacy of ALTO, we accelerate popular TD methods that compute the Canonical Polyadic Decomposition (CPD) model across a range of real-world sparse tensors. Additionally, we characterize the major execution bottlenecks of TD methods on multiple generations of the latest Intel Xeon Scalable processors, including Sapphire Rapids CPUs, and introduce dynamic adaptation heuristics to automatically select the best algorithm based on the sparse tensor characteristics. Across a diverse set of real-world data sets, ALTO outperforms the state-of-the-art approaches, achieving more than an order-of-magnitude speedup over the best mode-agnostic formats. Compared to the best mode-specific formats, which require multiple tensor copies, ALTO achieves more than 5.1x geometric mean speedup at a fraction (25%) of their storage. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: We extend the results of our previous ICS paper to significantly improve the parallel performance of the Canonical Polyadic Alternating Least Squares (CP-ALS) algorithm for normally distributed data and the Canonical Polyadic Alternating Poisson Regression (CP-APR) algorithm for non-negative count data

arXiv:2311.04061 [pdf, other]

Neural Appearance Model for Cloth Rendering

Authors: Guan Yu Soh, Zahra Montazeri

Abstract: The realistic rendering of woven and knitted fabrics has posed significant challenges throughout many years. Previously, fiber-based micro-appearance models have achieved considerable success in attaining high levels of realism. However, rendering such models remains complex due to the intricate internal scatterings of hundreds of fibers within a yarn, requiring vast amounts of memory and time to… ▽ More The realistic rendering of woven and knitted fabrics has posed significant challenges throughout many years. Previously, fiber-based micro-appearance models have achieved considerable success in attaining high levels of realism. However, rendering such models remains complex due to the intricate internal scatterings of hundreds of fibers within a yarn, requiring vast amounts of memory and time to render. In this paper, we introduce a new framework to capture aggregated appearance by tracing many light paths through the underlying fiber geometry. We then employ lightweight neural networks to accurately model the aggregated BSDF, which allows for the precise modeling of a diverse array of materials while offering substantial improvements in speed and reductions in memory. Furthermore, we introduce a novel importance sampling scheme to further speed up the rate of convergence. We validate the efficacy and versatility of our framework through comparisons with preceding fiber-based shading models as well as the most recent yarn-based model. △ Less

Submitted 18 August, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 12 pages, 10 figures, 3 tables

arXiv:2310.00257 [pdf, other]

The Lovász Theta Function for Recovering Planted Clique Covers and Graph Colorings

Authors: Jiaxin Hou, Yong Sheng Soh, Antonios Varvitsiotis

Abstract: The problems of computing graph colorings and clique covers are central challenges in combinatorial optimization. Both of these are known to be NP-hard, and thus computationally intractable in the worst-case instance. A prominent approach for computing approximate solutions to these problems is the celebrated Lovász theta function $\vartheta(G)$, which is specified as the solution of a semidefinit… ▽ More The problems of computing graph colorings and clique covers are central challenges in combinatorial optimization. Both of these are known to be NP-hard, and thus computationally intractable in the worst-case instance. A prominent approach for computing approximate solutions to these problems is the celebrated Lovász theta function $\vartheta(G)$, which is specified as the solution of a semidefinite program (SDP), and hence tractable to compute. In this work, we move beyond the worst-case analysis and set out to understand whether the Lovász theta function recovers clique covers for random instances that have a latent clique cover structure, possibly obscured by noise. We answer this question in the affirmative and show that for graphs generated from the planted clique model we introduce in this work, the SDP formulation of $\vartheta(G)$ has a unique solution that reveals the underlying clique-cover structure with high-probability. The main technical step is an intermediate result where we prove a deterministic condition of recovery based on an appropriate notion of sparsity. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 24 pages, 4 figures

arXiv:2305.19557 [pdf, other]

Dictionary Learning under Symmetries via Group Representations

Authors: Subhroshekhar Ghosh, Aaron Y. R. Low, Yong Sheng Soh, Zhuohang Feng, Brendan K. Y. Tan

Abstract: The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We s… ▽ More The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We specifically study this problem under the lens of mathematical representation theory. Leveraging the power of non-abelian Fourier analysis for functions over compact groups, we prescribe an algorithmic recipe for learning dictionaries that obey such invariances. We relate the dictionary learning problem in the physical domain, which is naturally modelled as being infinite dimensional, with the associated computational problem, which is necessarily finite dimensional. We establish that the dictionary learning problem can be effectively understood as an optimization instance over certain matrix orbitopes having a particular block-diagonal structure governed by the irreducible representations of the group of symmetries. This perspective enables us to introduce a band-limiting procedure which obtains dimensionality reduction in applications. We provide guarantees for our computational ansatz to provide a desirable dictionary learning outcome. We apply our paradigm to investigate the dictionary learning problem for the groups SO(2) and SO(3). While the SO(2)-orbitope admits an exact spectrahedral description, substantially less is understood about the SO(3)-orbitope. We describe a tractable spectrahedral outer approximation of the SO(3)-orbitope, and contribute an alternating minimization paradigm to perform optimization in this setting. We provide numerical experiments to highlight the efficacy of our approach in learning SO(3)-invariant dictionaries, both on synthetic and on real world data. △ Less

Submitted 25 July, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 29 pages, 2 figures

arXiv:2208.10893 [pdf]

doi 10.1088/2632-2153/aced7d

Transfer Learning Application of Self-supervised Learning in ARPES

Authors: Sandy Adhitia Ekahana, Genta Indra Winata, Y. Soh, Gabriel Aeppli, Radovic Milan, Ming Shi

Abstract: Recent development in angle-resolved photoemission spectroscopy (ARPES) technique involves spatially resolving samples while maintaining the high-resolution feature of momentum space. This development easily expands the data size and its complexity for data analysis, where one of it is to label similar dispersion cuts and map them spatially. In this work, we demonstrate that the recent development… ▽ More Recent development in angle-resolved photoemission spectroscopy (ARPES) technique involves spatially resolving samples while maintaining the high-resolution feature of momentum space. This development easily expands the data size and its complexity for data analysis, where one of it is to label similar dispersion cuts and map them spatially. In this work, we demonstrate that the recent development in representational learning (self-supervised learning) model combined with k-means clustering can help automate that part of data analysis and save precious time, albeit with low performance. Finally, we introduce a few-shot learning (k-nearest neighbour or kNN) in representational space where we selectively choose one (k=1) image reference for each known label and subsequently label the rest of the data with respect to the nearest reference image. This last approach demonstrates the strength of the self-supervised learning to automate the image analysis in ARPES in particular and can be generalized into any science data analysis that heavily involves image data. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2205.00833 [pdf]

Predicting and Optimizing for Energy Efficient ACMV Systems: Computational Intelligence Approaches

Authors: Deqing Zhai, Yeng Chai Soh

Abstract: In this study, a novel application of neural networks that predict thermal comfort states of occupants is proposed with accuracy over 95%, and two optimization algorithms are proposed and evaluated under two real cases (general offices and lecture theatres/conference rooms scenarios) in Singapore. The two optimization algorithms are Bayesian Gaussian process optimization (BGPO) and augmented firef… ▽ More In this study, a novel application of neural networks that predict thermal comfort states of occupants is proposed with accuracy over 95%, and two optimization algorithms are proposed and evaluated under two real cases (general offices and lecture theatres/conference rooms scenarios) in Singapore. The two optimization algorithms are Bayesian Gaussian process optimization (BGPO) and augmented firefly algorithm (AFA). Based on our earlier studies, the models of energy consumption were developed and well-trained through neural networks. This study focuses on using novel active approaches to evaluate thermal comfort of occupants and so as to solves a multiple-objective problem that aims to balance energy-efficiency of centralized air-conditioning systems and thermal comfort of occupants. The study results show that both BGPO and AFA are feasible to resolve this no prior knowledge-based optimization problem effectively. However, the optimal solutions of AFA are more consistent than those of BGPO at given sample sizes. The best energy saving rates (ESR) of BGPO and AFA are around -21% and -10% respectively at energy-efficient user preference for both Case 1 and Case 2. As a result, an potential benefit of S$1219.1 can be achieved annually for this experimental laboratory level in Singapore. △ Less

Submitted 19 April, 2022; originally announced May 2022.

arXiv:2201.12523 [pdf, other]

doi 10.1145/3524059.3532363

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures

Authors: Andy Nguyen, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Yongseok Soh, Teresa Ranadive, Fabrizio Petrini, Jee W. Choi

Abstract: Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework for accelerating fundamental TD operations on massively parallel GPU architectures. In contrast to prior work, the proposed Blocked Linearized Coordinate (BLCO) format enables efficient out-of-memory computation of tensor algorith… ▽ More Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework for accelerating fundamental TD operations on massively parallel GPU architectures. In contrast to prior work, the proposed Blocked Linearized Coordinate (BLCO) format enables efficient out-of-memory computation of tensor algorithms using a unified implementation that works on a single tensor copy. Our adaptive blocking and linearization strategies not only meet the resource constraints of GPU devices, but also accelerate data indexing, eliminate control-flow and memory-access irregularities, and reduce kernel launching overhead. To address the substantial synchronization cost on GPUs, we introduce an opportunistic conflict resolution algorithm, in which threads collaborate instead of contending on memory access to discover and resolve their conflicting updates on-the-fly, without keeping any auxiliary information or storing non-zero elements in specific mode orientations. As a result, our framework delivers superior in-memory performance compared to prior state-of-the-art, and is the only framework capable of processing out-of-memory tensors. On the latest Intel and NVIDIA GPUs, BLCO achieves 2.12-2.6X geometric-mean speedup (with up to 33.35X speedup) over the state-of-the-art mixed-mode compressed sparse fiber (MM-CSF) on a range of real-world sparse tensors. △ Less

Submitted 27 June, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

Comments: Accepted to ICS 2022

arXiv:2108.00740 [pdf, ps, other]

Multiplicative updates for symmetric-cone factorizations

Authors: Yong Sheng Soh, Antonios Varvitsiotis

Abstract: Given a matrix $X\in \mathbb{R}^{m\times n}_+$ with non-negative entries, the cone factorization problem over a cone $\mathcal{K}\subseteq \mathbb{R}^k$ concerns computing $\{ a_1,\ldots, a_{m} \} \subseteq \mathcal{K}$ and $\{ b_1,\ldots, b_{n} \} \subseteq~\mathcal{K}^*$ belonging to its dual so that $X_{ij} = \langle a_i, b_j \rangle$ for all $i\in [m], j\in [n]$. Cone factorizations are fundam… ▽ More Given a matrix $X\in \mathbb{R}^{m\times n}_+$ with non-negative entries, the cone factorization problem over a cone $\mathcal{K}\subseteq \mathbb{R}^k$ concerns computing $\{ a_1,\ldots, a_{m} \} \subseteq \mathcal{K}$ and $\{ b_1,\ldots, b_{n} \} \subseteq~\mathcal{K}^*$ belonging to its dual so that $X_{ij} = \langle a_i, b_j \rangle$ for all $i\in [m], j\in [n]$. Cone factorizations are fundamental to mathematical optimization as they allow us to express convex bodies as feasible regions of linear conic programs. In this paper, we introduce and analyze the symmetric-cone multiplicative update (SCMU) algorithm for computing cone factorizations when $\mathcal{K}$ is symmetric; i.e., it is self-dual and homogeneous. Symmetric cones are of central interest in mathematical optimization as they provide a common language for studying linear optimization over the nonnegative orthant (linear programs), over the second-order cone (second order cone programs), and over the cone of positive semidefinite matrices (semidefinite programs). The SCMU algorithm is multiplicative in the sense that the iterates are updated by applying a meticulously chosen automorphism of the cone computed using a generalization of the geometric mean to symmetric cones. Using an extension of Lieb's concavity theorem and von Neumann's trace inequality to symmetric cones, we show that the squared loss objective is non-decreasing along the trajectories of the SCMU algorithm. Specialized to the nonnegative orthant, the SCMU algorithm corresponds to the seminal algorithm by Lee and Seung for computing Nonnegative Matrix Factorizations. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 17 pages

arXiv:2106.00293 [pdf, other]

A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations

Authors: Yong Sheng Soh, Antonios Varvitsiotis

Abstract: Given a matrix $X\in \mathbb{R}_+^{m\times n}$ with nonnegative entries, a Positive Semidefinite (PSD) factorization of $X$ is a collection of $r \times r$-dimensional PSD matrices $\{A_i\}$ and $\{B_j\}$ satisfying $X_{ij}= \mathrm{tr}(A_i B_j)$ for all $\ i\in [m],\ j\in [n]$. PSD factorizations are fundamentally linked to understanding the expressiveness of semidefinite programs as well as the… ▽ More Given a matrix $X\in \mathbb{R}_+^{m\times n}$ with nonnegative entries, a Positive Semidefinite (PSD) factorization of $X$ is a collection of $r \times r$-dimensional PSD matrices $\{A_i\}$ and $\{B_j\}$ satisfying $X_{ij}= \mathrm{tr}(A_i B_j)$ for all $\ i\in [m],\ j\in [n]$. PSD factorizations are fundamentally linked to understanding the expressiveness of semidefinite programs as well as the power and limitations of quantum resources in information theory. The PSD factorization task generalizes the Non-negative Matrix Factorization (NMF) problem where we seek a collection of $r$-dimensional nonnegative vectors $\{a_i\}$ and $\{b_j\}$ satisfying $X_{ij}= a_i^\top b_j$, for all $i\in [m],\ j\in [n]$ -- one can recover the latter problem by choosing matrices in the PSD factorization to be diagonal. The most widely used algorithm for computing NMFs of a matrix is the Multiplicative Update algorithm developed by Lee and Seung, in which nonnegativity of the updates is preserved by scaling with positive diagonal matrices. In this paper, we describe a non-commutative extension of Lee-Seung's algorithm, which we call the Matrix Multiplicative Update (MMU) algorithm, for computing PSD factorizations. The MMU algorithm ensures that updates remain PSD by congruence scaling with the matrix geometric mean of appropriate PSD matrices, and it retains the simplicity of implementation that Lee-Seung's algorithm enjoys. Building on the Majorization-Minimization framework, we show that under our update scheme the squared loss objective is non-increasing and fixed points correspond to critical points. The analysis relies on Lieb's Concavity Theorem. Beyond PSD factorizations, we use the MMU algorithm as a primitive to calculate block-diagonal PSD factorizations and tensor PSD factorizations. We demonstrate the utility of our method with experiments on real and synthetic data. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: Comments welcome

arXiv:2007.07550 [pdf, other]

doi 10.1109/TSP.2021.3087900

Group Invariant Dictionary Learning

Authors: Yong Sheng Soh

Abstract: The dictionary learning problem concerns the task of representing data as sparse linear sums drawn from a smaller collection of basic building blocks. In application domains where such techniques are deployed, we frequently encounter datasets where some form of symmetry or invariance is present. Motivated by this observation, we develop a framework for learning dictionaries for data under the cons… ▽ More The dictionary learning problem concerns the task of representing data as sparse linear sums drawn from a smaller collection of basic building blocks. In application domains where such techniques are deployed, we frequently encounter datasets where some form of symmetry or invariance is present. Motivated by this observation, we develop a framework for learning dictionaries for data under the constraint that the collection of basic building blocks remains invariant under such symmetries. Our procedure for learning such dictionaries relies on representing the symmetry as the action of a matrix group acting on the data, and subsequently introducing a convex penalty function so as to induce sparsity with respect to the collection of matrix group elements. Our framework specializes to the convolutional dictionary learning problem when we consider integer shifts. Using properties of positive semidefinite Hermitian Toeplitz matrices, we develop an extension that learns dictionaries that are invariant under continuous shifts. Our numerical experiments on synthetic data and ECG data show that the incorporation of such symmetries as priors are most valuable when the dataset has few data-points, or when the full range of symmetries is inadequately expressed in the dataset. △ Less

Submitted 5 June, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

Comments: 30 pages, 23 figures

arXiv:2004.00243 [pdf, other]

Efficient Implementation of Multi-Channel Convolution in Monolithic 3D ReRAM Crossbar

Authors: Sho Ko, Yun Joon Soh, Jishen Zhao

Abstract: Convolutional neural networks (CNNs) demonstrate promising accuracy in a wide range of applications. Among all layers in CNNs, convolution layers are the most computation-intensive and consume the most energy. As the maturity of device and fabrication technology, 3D resistive random access memory (ReRAM) receives substantial attention for accelerating large vector-matrix multiplication and convolu… ▽ More Convolutional neural networks (CNNs) demonstrate promising accuracy in a wide range of applications. Among all layers in CNNs, convolution layers are the most computation-intensive and consume the most energy. As the maturity of device and fabrication technology, 3D resistive random access memory (ReRAM) receives substantial attention for accelerating large vector-matrix multiplication and convolution due to its high parallelism and energy efficiency benefits. However, implementing multi-channel convolution naively in 3D ReRAM will either produce incorrect results or exploit only partial parallelism of 3D ReRAM. In this paper, we propose a 3D ReRAM-based convolution accelerator architecture, which efficiently maps multi-channel convolution to monolithic 3D ReRAM. Our design has two key principles. First, we exploit the intertwined structure of 3D ReRAM to implement multi-channel convolution by using a state-of-the-art convolution algorithm. Second, we propose a new approach to efficiently implement negative weights by separating them from non-negative weights using configurable interconnects. Our evaluation demonstrates that our mapping scheme in 16-layer 3D ReRAM achieves a speedup of 5.79X, 927.81X, and 36.8X compared with a custom 2D ReRAM baseline and state-of-the-art CPU and GPU. Our design also reduces energy consumption by 2.12X, 1802.64X, and 114.1X compared with the same baseline. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:2002.04759 [pdf, other]

Collaborative Inference for Efficient Remote Monitoring

Authors: Chi Zhang, Yong Sheng Soh, Ling Feng, Tianyi Zhou, Qianxiao Li

Abstract: While current machine learning models have impressive performance over a wide range of applications, their large size and complexity render them unsuitable for tasks such as remote monitoring on edge devices with limited storage and computational power. A naive approach to resolve this on the model level is to use simpler architectures, but this sacrifices prediction accuracy and is unsuitable for… ▽ More While current machine learning models have impressive performance over a wide range of applications, their large size and complexity render them unsuitable for tasks such as remote monitoring on edge devices with limited storage and computational power. A naive approach to resolve this on the model level is to use simpler architectures, but this sacrifices prediction accuracy and is unsuitable for monitoring applications requiring accurate detection of the onset of adverse events. In this paper, we propose an alternative solution to this problem by decomposing the predictive model as the sum of a simple function which serves as a local monitoring tool, and a complex correction term to be evaluated on the server. A sign requirement is imposed on the latter to ensure that the local monitoring function is safe, in the sense that it can effectively serve as an early warning system. Our analysis quantifies the trade-offs between model complexity and performance, and serves as a guidance for architecture design. We validate our proposed framework on a series of monitoring experiments, where we succeed at learning monitoring models with significantly reduced complexity that minimally violate the safety requirement. More broadly, our framework is useful for learning classifiers in applications where false negatives are significantly more costly compared to false positives. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:1903.05714 [pdf, other]

Basic Performance Measurements of the Intel Optane DC Persistent Memory Module

Authors: Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, Steven Swanson

Abstract: Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Intel Optane DC Persistent Memory Module (or just "Optane DC PMM"). This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. This work comprises the first in-depth, scholarly, performance review o… ▽ More Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Intel Optane DC Persistent Memory Module (or just "Optane DC PMM"). This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. This work comprises the first in-depth, scholarly, performance review of Intel's Optane DC PMM, exploring its capabilities as a main memory device, and as persistent, byte-addressable memory exposed to user-space applications. This report details the technologies performance under a number of modes and scenarios, and across a wide variety of macro-scale benchmarks. Optane DC PMMs can be used as large memory devices with a DRAM cache to hide their lower bandwidth and higher latency. When used in this Memory (or cached) mode, Optane DC memory has little impact on applications with small memory footprints. Applications with larger memory footprints may experience some slow-down relative to DRAM, but are now able to keep much more data in memory. When used under a file system, Optane DC PMMs can result in significant performance gains, especially when the file system is optimized to use the load/store interface of the Optane DC PMM and the application uses many small, persistent writes. For instance, using the NOVA-relaxed NVMM file system, we can improve the performance of Kyoto Cabinet by almost 2x. Optane DC PMMs can also enable user-space persistence where the application explicitly controls its writes into persistent Optane DC media. In our experiments, modified applications that used user-space Optane DC persistence generally outperformed their file system counterparts. For instance, the persistent version of RocksDB performed almost 2x faster than the equivalent program utilizing an NVMM-aware file system. △ Less

Submitted 9 August, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

arXiv:1903.04194 [pdf, other]

doi 10.1007/s00454-020-00258-0

Fitting Tractable Convex Sets to Support Function Evaluations

Authors: Yong Sheng Soh, Venkat Chandrasekaran

Abstract: The geometric problem of estimating an unknown compact convex set from evaluations of its support function arises in a range of scientific and engineering applications. Traditional approaches typically rely on estimators that minimize the error over all possible compact convex sets; in particular, these methods do not allow for the incorporation of prior structural information about the underlying… ▽ More The geometric problem of estimating an unknown compact convex set from evaluations of its support function arises in a range of scientific and engineering applications. Traditional approaches typically rely on estimators that minimize the error over all possible compact convex sets; in particular, these methods do not allow for the incorporation of prior structural information about the underlying set and the resulting estimates become increasingly more complicated to describe as the number of measurements available grows. We address both of these shortcomings by describing a framework for estimating tractably specified convex sets from support function evaluations. Building on the literature in convex optimization, our approach is based on estimators that minimize the error over structured families of convex sets that are specified as linear images of concisely described sets -- such as the simplex or the spectraplex -- in a higher-dimensional space that is not much larger than the ambient space. Convex sets parametrized in this manner are significant from a computational perspective as one can optimize linear functionals over such sets efficiently; they serve a different purpose in the inferential context of the present paper, namely, that of incorporating regularization in the reconstruction while still offering considerable expressive power. We provide a geometric characterization of the asymptotic behavior of our estimators, and our analysis relies on the property that certain sets which admit semialgebraic descriptions are Vapnik-Chervonenkis (VC) classes. Our numerical experiments highlight the utility of our framework over previous approaches in settings in which the measurements available are noisy or small in number as well as those in which the underlying set to be reconstructed is non-polyhedral. △ Less

Submitted 25 February, 2021; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: 35 pages, 80 figures

arXiv:1805.07029 [pdf, other]

Scene Understanding Networks for Autonomous Driving based on Around View Monitoring System

Authors: JeongYeol Baek, Ioana Veronica Chelu, Livia Iordache, Vlad Paunescu, HyunJoo Ryu, Alexandru Ghiuta, Andrei Petreanu, YunSung Soh, Andrei Leica, ByeongMoon Jeon

Abstract: Modern driver assistance systems rely on a wide range of sensors (RADAR, LIDAR, ultrasound and cameras) for scene understanding and prediction. These sensors are typically used for detecting traffic participants and scene elements required for navigation. In this paper we argue that relying on camera based systems, specifically Around View Monitoring (AVM) system has great potential to achieve the… ▽ More Modern driver assistance systems rely on a wide range of sensors (RADAR, LIDAR, ultrasound and cameras) for scene understanding and prediction. These sensors are typically used for detecting traffic participants and scene elements required for navigation. In this paper we argue that relying on camera based systems, specifically Around View Monitoring (AVM) system has great potential to achieve these goals in both parking and driving modes with decreased costs. The contributions of this paper are as follows: we present a new end-to-end solution for delimiting the safe drivable area for each frame by means of identifying the closest obstacle in each direction from the driving vehicle, we use this approach to calculate the distance to the nearest obstacles and we incorporate it into a unified end-to-end architecture capable of joint object detection, curb detection and safe drivable area detection. Furthermore, we describe the family of networks for both a high accuracy solution and a low complexity solution. We also introduce further augmentation of the base architecture with 3D object detection. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: Accepted by CVPR 2018 Workshop on Autonomous Driving

arXiv:1701.01207 [pdf, other]

Learning Semidefinite Regularizers

Authors: Yong Sheng Soh, Venkat Chandrasekaran

Abstract: Regularization techniques are widely employed in optimization-based approaches for solving ill-posed inverse problems in data analysis and scientific computing. These methods are based on augmenting the objective with a penalty function, which is specified based on prior domain-specific expertise to induce a desired structure in the solution. We consider the problem of learning suitable regulariza… ▽ More Regularization techniques are widely employed in optimization-based approaches for solving ill-posed inverse problems in data analysis and scientific computing. These methods are based on augmenting the objective with a penalty function, which is specified based on prior domain-specific expertise to induce a desired structure in the solution. We consider the problem of learning suitable regularization functions from data in settings in which precise domain knowledge is not directly available. Previous work under the title of `dictionary learning' or `sparse coding' may be viewed as learning a regularization function that can be computed via linear programming. We describe generalizations of these methods to learn regularizers that can be computed and optimized via semidefinite programming. Our framework for learning such semidefinite regularizers is based on obtaining structured factorizations of data matrices, and our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with an operator analog of Sinkhorn scaling. Under suitable conditions on the input data, our algorithm provides a locally linearly convergent method for identifying the correct regularizer that promotes the type of structure contained in the data. Our analysis is based on the stability properties of Operator Sinkhorn scaling and their relation to geometric aspects of determinantal varieties (in particular tangent spaces with respect to these varieties). The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems. △ Less

Submitted 5 June, 2021; v1 submitted 4 January, 2017; originally announced January 2017.

Comments: 51 pages, 9 figures

arXiv:1612.04227 [pdf, other]

CFD results calibration from sparse sensor observations with a case study for indoor thermal map

Authors: Chaoyang Jiang, Yeng Chai Soh, Hua Li, Mustafa K. Masood, Zhe Wei, Xiaoli Zhou, Deqing Zhai

Abstract: Current CFD calibration work has mainly focused on the CFD model calibration. However no known work has considered the calibration of the CFD results. In this paper, we take inspiration from the image editing problem to develop a methodology to calibrate CFD simulation results based on sparse sensor observations. We formulate the calibration of CFD results as an optimization problem. The cost func… ▽ More Current CFD calibration work has mainly focused on the CFD model calibration. However no known work has considered the calibration of the CFD results. In this paper, we take inspiration from the image editing problem to develop a methodology to calibrate CFD simulation results based on sparse sensor observations. We formulate the calibration of CFD results as an optimization problem. The cost function consists of two terms. One term guarantees a good local adjustment of the simulation results based on the sparse sensor observations. The other term transmits the adjustment from local regions around sensing locations to the global domain. The proposed method can enhance the CFD simulation results while preserving the overall original profile. An experiment in an air-conditioned room was implemented to verify the effectiveness of the proposed method. In the experiment, four sensor observations were used to calibrate a simulated thermal map with 167x365 data points. The experimental results show that the proposed method is effective and practical. △ Less

Submitted 13 December, 2016; originally announced December 2016.

Comments: 17 pages

arXiv:1607.05962 [pdf, other]

Indoor occupancy estimation from carbon dioxide concentration

Authors: Chaoyang Jiang, Mustafa K. Masood, Yeng Chai Soh, Hua Li

Abstract: This paper presents an indoor occupancy estimator with which we can estimate the number of real-time indoor occupants based on the carbon dioxide (CO2) measurement. The estimator is actually a dynamic model of the occupancy level. To identify the dynamic model, we propose the Feature Scaled Extreme Learning Machine (FS-ELM) algorithm, which is a variation of the standard Extreme Learning Machine (… ▽ More This paper presents an indoor occupancy estimator with which we can estimate the number of real-time indoor occupants based on the carbon dioxide (CO2) measurement. The estimator is actually a dynamic model of the occupancy level. To identify the dynamic model, we propose the Feature Scaled Extreme Learning Machine (FS-ELM) algorithm, which is a variation of the standard Extreme Learning Machine (ELM) but is shown to perform better for the occupancy estimation problem. The measured CO2 concentration suffers from serious spikes. We find that pre-smoothing the CO2 data can greatly improve the estimation accuracy. In real applications, however, we cannot obtain the real-time globally smoothed CO2 data. We provide a way to use the locally smoothed CO2 data instead, which is real-time available. We introduce a new criterion, i.e. $x$-tolerance accuracy, to assess the occupancy estimator. The proposed occupancy estimator was tested in an office room with 24 cubicles and 11 open seats. The accuracy is up to 94 percent with a tolerance of four occupants. △ Less

Submitted 20 July, 2016; originally announced July 2016.

Comments: 11 pages, 7 figures

arXiv:1509.09282 [pdf, other]

doi 10.1109/TSP.2015.2489606

Distributed Inference for Relay-Assisted Sensor Networks With Intermittent Measurements Over Fading Channels

Authors: Shanying Zhu, Yeng Chai Soh, Lihua Xie

Abstract: In this paper, we consider a general distributed estimation problem in relay-assisted sensor networks by taking into account time-varying asymmetric communications, fading channels and intermittent measurements. Motivated by centralized filtering algorithms, we propose a distributed innovation-based estimation algorithm by combining the measurement innovation (assimilation of new measurement) and… ▽ More In this paper, we consider a general distributed estimation problem in relay-assisted sensor networks by taking into account time-varying asymmetric communications, fading channels and intermittent measurements. Motivated by centralized filtering algorithms, we propose a distributed innovation-based estimation algorithm by combining the measurement innovation (assimilation of new measurement) and local data innovation (incorporation of neighboring data). Our algorithm is fully distributed which does not need a fusion center. We establish theoretical results regarding asymptotic unbiasedness and consistency of the proposed algorithm. Specifically, in order to cope with time-varying asymmetric communications, we utilize an ordering technique and the generalized Perron complement to manipulate the first and second moment analyses in a tractable framework. Furthermore, we present a performance-oriented design of the proposed algorithm for energy-constrained networks based on the theoretical results. Simulation results corroborate the theoretical findings, thus demonstrating the effectiveness of the proposed algorithm. △ Less

Submitted 30 September, 2015; originally announced September 2015.

Comments: 32 pages, 14 figures

arXiv:1506.00747 [pdf, other]

doi 10.1109/TSP.2016.2573767

Sensor placement by maximal projection on minimum eigenspace for linear inverse problems

Authors: Chaoyang Jiang, Yeng Chai Soh, Hua Li

Abstract: This paper presents two new greedy sensor placement algorithms, named minimum nonzero eigenvalue pursuit (MNEP) and maximal projection on minimum eigenspace (MPME), for linear inverse problems, with greater emphasis on the MPME algorithm for performance comparison with existing approaches. We select the sensing locations one-by-one. In this way, the least number of required sensors can be determin… ▽ More This paper presents two new greedy sensor placement algorithms, named minimum nonzero eigenvalue pursuit (MNEP) and maximal projection on minimum eigenspace (MPME), for linear inverse problems, with greater emphasis on the MPME algorithm for performance comparison with existing approaches. We select the sensing locations one-by-one. In this way, the least number of required sensors can be determined by checking whether the estimation accuracy is satisfied after each sensing location is determined. The minimum eigenspace is defined as the eigenspace associated with the minimum eigenvalue of the dual observation matrix. For each sensing location, the projection of its observation vector onto the minimum eigenspace is shown to be monotonically decreasing w.r.t. the worst case error variance (WCEV) of the estimated parameters. We select the sensing location whose observation vector has the maximum projection onto the minimum eigenspace of the current dual observation matrix. The proposed MPME is shown to be one of the most computationally efficient algorithms. Our Monte-Carlo simulations showed that MPME outperforms the convex relaxation method [1], the SparSenSe method [2], and the FrameSense method [3] in terms of WCEV and the mean square error (MSE) of the estimated parameters, especially when the number of available sensor nodes is very limited. △ Less

Submitted 5 November, 2016; v1 submitted 2 June, 2015; originally announced June 2015.

Comments: 15 pages, 7 figures, and 1 table. Accepted by IEEE Transactions on Signal Processing

arXiv:1412.7281 [pdf, other]

doi 10.1109/TSP.2015.2441034

Distributed Parameter Estimation with Quantized Communication via Running Average

Authors: Shanying Zhu, Yeng Chai Soh, Lihua Xie

Abstract: In this paper, we consider the parameter estimation problem over sensor networks in the presence of quantized data and directed communication links. We propose a two-stage algorithm aiming at achieving the centralized sample mean estimate in a distributed manner. Different from the existing algorithms, a running average technique is utilized in the proposed algorithm to smear out the randomness ca… ▽ More In this paper, we consider the parameter estimation problem over sensor networks in the presence of quantized data and directed communication links. We propose a two-stage algorithm aiming at achieving the centralized sample mean estimate in a distributed manner. Different from the existing algorithms, a running average technique is utilized in the proposed algorithm to smear out the randomness caused by the probabilistic quantization scheme. With the running average technique, it is shown that the centralized sample mean estimate can be achieved both in the mean square and almost sure senses, which is not observed in the conventional consensus algorithms. In addition, the rates of convergence are given to quantify the mean square and almost sure performances. Finally, simulation results are presented to illustrate the effectiveness of the proposed algorithm and highlight the improvements by using running average technique. △ Less

Submitted 24 July, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

Comments: 13 pages, 6 figures; IEEE Transactions on Signal Processing, 2015

arXiv:1412.3731 [pdf, other]

High-Dimensional Change-Point Estimation: Combining Filtering with Convex Optimization

Authors: Yong Sheng Soh, Venkat Chandrasekaran

Abstract: We consider change-point estimation in a sequence of high-dimensional signals given noisy observations. Classical approaches to this problem such as the filtered derivative method are useful for sequences of scalar-valued signals, but they have undesirable scaling behavior in the high-dimensional setting. However, many high-dimensional signals encountered in practice frequently possess latent low-… ▽ More We consider change-point estimation in a sequence of high-dimensional signals given noisy observations. Classical approaches to this problem such as the filtered derivative method are useful for sequences of scalar-valued signals, but they have undesirable scaling behavior in the high-dimensional setting. However, many high-dimensional signals encountered in practice frequently possess latent low-dimensional structure. Motivated by this observation, we propose a technique for high-dimensional change-point estimation that combines the filtered derivative approach from previous work with convex optimization methods based on atomic norm regularization, which are useful for exploiting structure in high-dimensional data. Our algorithm is applicable in online settings as it operates on small portions of the sequence of observations at a time, and it is well-suited to the high-dimensional setting both in terms of computational scalability and of statistical efficiency. The main result of this paper shows that our method performs change-point estimation reliably as long as the product of the smallest-sized change (the Euclidean-norm-squared of the difference between signals at a change-point) and the smallest distance between change-points (number of time instances) is larger than a Gaussian width parameter that characterizes the low-dimensional complexity of the underlying signal sequence. △ Less

Submitted 6 January, 2015; v1 submitted 11 December, 2014; originally announced December 2014.

Comments: 27 pages, 4 figures, minor typo in Theorem 3.1 corrected

arXiv:0805.0871 [pdf]

Large Area Roller Embossing of Multilayered Ceramic Green Composites

Authors: X. Shan, Y. C. Soh, C. W. P. Shi, C. K. Tay, C. W. Lu

Abstract: In this paper, we will report our achievements in developing large area patterning of multilayered ceramic green composites using roller embossing. The aim of our research is to pattern large area ceramic green composites using a modified roller laminating apparatus, which is compatible with screen printing machines, for integration of embossing and screen printing. The instrumentation of our ro… ▽ More In this paper, we will report our achievements in developing large area patterning of multilayered ceramic green composites using roller embossing. The aim of our research is to pattern large area ceramic green composites using a modified roller laminating apparatus, which is compatible with screen printing machines, for integration of embossing and screen printing. The instrumentation of our roller embossing apparatus, as shown in Figure1, consists of roller 1 and rollers 2. Roller 1 is heated up to the desired embossing temperature ; roller 2 is, however, kept at room temperature. The mould is a nickel template manufactured by plating nickel-based micro patterns (height : 50 $μ$m) on a nickel film (thickness : 70 $μ$m) ; the substrate for the roller embossing is a multilayered Heraeus Heralock HL 2000 ceramic green composite. Comparing with the conventional simultaneous embossing, the advantages of roller embossing include : (1) low embossing force ; (2) easiness of demoulding ; (3) localized area in contact with heater ; and etc. We have demonstrated the capability of large area roller embossing with a panel size of 150mmx 150mm on the mentioned substrate. We have explored and confirmed the impact of parameters (feed speed, temperature of roller and applied pressure) to the pattern quality of roller embossing. Furthermore, under the optimized process parameters, we characterized the variations of pattern dimension over the panel area, and calculated a scaling factor in order to make the panel compatible with other processes. Figure 2 shows the embossed patterns on a 150mmx 150mm green ceramic panel. △ Less

Submitted 7 May, 2008; originally announced May 2008.

Comments: Submitted on behalf of EDA Publishing Association (http://irevues.inist.fr/handle/2042/16838)

Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2008, Nice : France (2008)

Showing 1–24 of 24 results for author: Soh, Y