-
Training of Physical Neural Networks
Authors:
Ali Momeni,
Babak Rahmani,
Benjamin Scellier,
Logan G. Wright,
Peter L. McMahon,
Clara C. Wanjura,
Yuhang Li,
Anas Skalli,
Natalia G. Berloff,
Tatsuhiro Onodera,
Ilker Oguz,
Francesco Morichetti,
Philipp del Hougne,
Manuel Le Gallo,
Abu Sebastian,
Azalia Mirhoseini,
Cheng Zhang,
Danijela Marković,
Daniel Brunner,
Christophe Moser,
Sylvain Gigan,
Florian Marquardt,
Aydogan Ozcan,
Julie Grollier,
Andrea J. Liu
, et al. (3 additional authors not shown)
Abstract:
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also…
▽ More
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Scaling on-chip photonic neural processors using arbitrarily programmable wave propagation
Authors:
Tatsuhiro Onodera,
Martin M. Stein,
Benjamin A. Ash,
Mandar M. Sohoni,
Melissa Bosch,
Ryotatsu Yanagimoto,
Marc Jankowski,
Timothy P. McKenna,
Tianyu Wang,
Gennady Shvets,
Maxim R. Shcherbakov,
Logan G. Wright,
Peter L. McMahon
Abstract:
On-chip photonic processors for neural networks have potential benefits in both speed and energy efficiency but have not yet reached the scale at which they can outperform electronic processors. The dominant paradigm for designing on-chip photonics is to make networks of relatively bulky discrete components connected by one-dimensional waveguides. A far more compact alternative is to avoid explici…
▽ More
On-chip photonic processors for neural networks have potential benefits in both speed and energy efficiency but have not yet reached the scale at which they can outperform electronic processors. The dominant paradigm for designing on-chip photonics is to make networks of relatively bulky discrete components connected by one-dimensional waveguides. A far more compact alternative is to avoid explicitly defining any components and instead sculpt the continuous substrate of the photonic processor to directly perform the computation using waves freely propagating in two dimensions. We propose and demonstrate a device whose refractive index as a function of space, $n(x,z)$, can be rapidly reprogrammed, allowing arbitrary control over the wave propagation in the device. Our device, a 2D-programmable waveguide, combines photoconductive gain with the electro-optic effect to achieve massively parallel modulation of the refractive index of a slab waveguide, with an index modulation depth of $10^{-3}$ and approximately $10^4$ programmable degrees of freedom. We used a prototype device with a functional area of $12\,\text{mm}^2$ to perform neural-network inference with up to 49-dimensional input vectors in a single pass, achieving 96% accuracy on vowel classification and 86% accuracy on $7 \times 7$-pixel MNIST handwritten-digit classification. This is a scale beyond that of previous photonic chips relying on discrete components, illustrating the benefit of the continuous-waves paradigm. In principle, with large enough chip area, the reprogrammability of the device's refractive index distribution enables the reconfigurable realization of any passive, linear photonic circuit or device. This promises the development of more compact and versatile photonic systems for a wide range of applications, including optical processing, smart sensing, spectroscopy, and optical communications.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
A statistical significance testing approach for measuring term burstiness with applications to domain-specific terminology extraction
Authors:
Samuel Sarria Hurtado,
Todd Mullen,
Taku Onodera,
Paul Sheridan
Abstract:
A term in a corpus is said to be ``bursty'' (or overdispersed) when its occurrences are concentrated in few out of many documents. In this paper, we propose Residual Inverse Collection Frequency (RICF), a statistical significance test inspired heuristic for quantifying term burstiness. The chi-squared test is, to our knowledge, the sole test of statistical significance among existing term burstine…
▽ More
A term in a corpus is said to be ``bursty'' (or overdispersed) when its occurrences are concentrated in few out of many documents. In this paper, we propose Residual Inverse Collection Frequency (RICF), a statistical significance test inspired heuristic for quantifying term burstiness. The chi-squared test is, to our knowledge, the sole test of statistical significance among existing term burstiness measures. Chi-squared test term burstiness scores are computed from the collection frequency statistic (i.e., the proportion that a specified term constitutes in relation to all terms within a corpus). However, the document frequency of a term (i.e., the proportion of documents within a corpus in which a specific term occurs) is exploited by certain other widely used term burstiness measures. RICF addresses this shortcoming of the chi-squared test by virtue of its term burstiness scores systematically incorporating both the collection frequency and document frequency statistics. We evaluate the RICF measure on a domain-specific technical terminology extraction task using the GENIA Term corpus benchmark, which comprises 2,000 annotated biomedical article abstracts. RICF generally outperformed the chi-squared test in terms of precision at k score with percent improvements of 0.00% (P@10), 6.38% (P@50), 6.38% (P@100), 2.27% (P@500), 2.61% (P@1000), and 1.90% (P@5000). Furthermore, RICF performance was competitive with the performances of other well-established measures of term burstiness. Based on these findings, we consider our contributions in this paper as a promising starting point for future exploration in leveraging statistical significance testing in text analysis.
△ Less
Submitted 1 January, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Image sensing with multilayer, nonlinear optical neural networks
Authors:
Tianyu Wang,
Mandar M. Sohoni,
Logan G. Wright,
Martin M. Stein,
Shi-Yuan Ma,
Tatsuhiro Onodera,
Maxwell G. Anderson,
Peter L. McMahon
Abstract:
Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but enc…
▽ More
Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but encoding. By optically encoding images into a compressed, low-dimensional latent space suitable for efficient post-analysis, these image sensors can operate with fewer pixels and fewer photons, allowing higher-throughput, lower-latency operation. Optical neural networks (ONNs) offer a platform for processing data in the analog, optical domain. ONN-based sensors have however been limited to linear processing, but nonlinearity is a prerequisite for depth, and multilayer NNs significantly outperform shallow NNs on many tasks. Here, we realize a multilayer ONN pre-processor for image sensing, using a commercial image intensifier as a parallel optoelectronic, optical-to-optical nonlinear activation function. We demonstrate that the nonlinear ONN pre-processor can achieve compression ratios of up to 800:1 while still enabling high accuracy across several representative computer-vision tasks, including machine-vision benchmarks, flow-cytometry image classification, and identification of objects in real scenes. In all cases we find that the ONN's nonlinearity and depth allowed it to outperform a purely linear ONN encoder. Although our experiments are specialized to ONN sensors for incoherent-light images, alternative ONN platforms should facilitate a range of ONN sensors. These ONN sensors may surpass conventional sensors by pre-processing optical information in spatial, temporal, and/or spectral dimensions, potentially with coherent and quantum qualities, all natively in the optical domain.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
An optical neural network using less than 1 photon per multiplication
Authors:
Tianyu Wang,
Shi-Yuan Ma,
Logan G. Wright,
Tatsuhiro Onodera,
Brian Richard,
Peter L. McMahon
Abstract:
Deep learning has rapidly become a widespread tool in both scientific and commercial endeavors. Milestones of deep learning exceeding human performance have been achieved for a growing number of tasks over the past several years, across areas as diverse as game-playing, natural-language translation, and medical-image analysis. However, continued progress is increasingly hampered by the high energy…
▽ More
Deep learning has rapidly become a widespread tool in both scientific and commercial endeavors. Milestones of deep learning exceeding human performance have been achieved for a growing number of tasks over the past several years, across areas as diverse as game-playing, natural-language translation, and medical-image analysis. However, continued progress is increasingly hampered by the high energy costs associated with training and running deep neural networks on electronic processors. Optical neural networks have attracted attention as an alternative physical platform for deep learning, as it has been theoretically predicted that they can fundamentally achieve higher energy efficiency than neural networks deployed on conventional digital computers. Here, we experimentally demonstrate an optical neural network achieving 99% accuracy on handwritten-digit classification using ~3.2 detected photons per weight multiplication and ~90% accuracy using ~0.64 photons (~$2.4 \times 10^{-19}$ J of optical energy) per weight multiplication. This performance was achieved using a custom free-space optical processor that executes matrix-vector multiplications in a massively parallel fashion, with up to ~0.5 million scalar (weight) multiplications performed at the same time. Using commercially available optical components and standard neural-network training methods, we demonstrated that optical neural networks can operate near the standard quantum limit with extremely low optical powers and still achieve high accuracy. Our results provide a proof-of-principle for low-optical-power operation, and with careful system design including the surrounding electronics used for data storage and control, open up a path to realizing optical processors that require only $10^{-16}$ J total energy per scalar multiplication -- which is orders of magnitude more efficient than current digital processors.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Deep physical neural networks enabled by a backpropagation algorithm for arbitrary physical systems
Authors:
Logan G. Wright,
Tatsuhiro Onodera,
Martin M. Stein,
Tianyu Wang,
Darren T. Schachter,
Zoey Hu,
Peter L. McMahon
Abstract:
Deep neural networks have become a pervasive tool in science and engineering. However, modern deep neural networks' growing energy requirements now increasingly limit their scaling and broader use. We propose a radical alternative for implementing deep neural network models: Physical Neural Networks. We introduce a hybrid physical-digital algorithm called Physics-Aware Training to efficiently trai…
▽ More
Deep neural networks have become a pervasive tool in science and engineering. However, modern deep neural networks' growing energy requirements now increasingly limit their scaling and broader use. We propose a radical alternative for implementing deep neural network models: Physical Neural Networks. We introduce a hybrid physical-digital algorithm called Physics-Aware Training to efficiently train sequences of controllable physical systems to act as deep neural networks. This method automatically trains the functionality of any sequence of real physical systems, directly, using backpropagation, the same technique used for modern deep neural networks. To illustrate their generality, we demonstrate physical neural networks with three diverse physical systems-optical, mechanical, and electrical. Physical neural networks may facilitate unconventional machine learning hardware that is orders of magnitude faster and more energy efficient than conventional electronic processors.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Experimental investigation of performance differences between Coherent Ising Machines and a quantum annealer
Authors:
Ryan Hamerly,
Takahiro Inagaki,
Peter L. McMahon,
Davide Venturelli,
Alireza Marandi,
Tatsuhiro Onodera,
Edwin Ng,
Carsten Langrock,
Kensuke Inaba,
Toshimori Honjo,
Koji Enbutsu,
Takeshi Umeki,
Ryoichi Kasahara,
Shoko Utsunomiya,
Satoshi Kako,
Ken-ichi Kawarabayashi,
Robert L. Byer,
Martin M. Fejer,
Hideo Mabuchi,
Dirk Englund,
Eleanor Rieffel,
Hiroki Takesue,
Yoshihisa Yamamoto
Abstract:
Physical annealing systems provide heuristic approaches to solving NP-hard Ising optimization problems. Here, we study the performance of two types of annealing machines--a commercially available quantum annealer built by D-Wave Systems, and measurement-feedback coherent Ising machines (CIMs) based on optical parametric oscillator networks--on two classes of problems, the Sherrington-Kirkpatrick (…
▽ More
Physical annealing systems provide heuristic approaches to solving NP-hard Ising optimization problems. Here, we study the performance of two types of annealing machines--a commercially available quantum annealer built by D-Wave Systems, and measurement-feedback coherent Ising machines (CIMs) based on optical parametric oscillator networks--on two classes of problems, the Sherrington-Kirkpatrick (SK) model and MAX-CUT. The D-Wave quantum annealer outperforms the CIMs on MAX-CUT on regular graphs of degree 3. On denser problems, however, we observe an exponential penalty for the quantum annealer ($\exp(-α_\textrm{DW} N^2)$) relative to CIMs ($\exp(-α_\textrm{CIM} N)$) for fixed anneal times, on both the SK model and on 50%-edge-density MAX-CUT, where the coefficients $α_\textrm{CIM}$ and $α_\textrm{DW}$ are problem-class-dependent. On instances with over $50$ vertices, a several-orders-of-magnitude time-to-solution difference exists between CIMs and the D-Wave annealer. An optimal-annealing-time analysis is also consistent with a significant projected performance difference. The difference in performance between the sparsely connected D-Wave machine and the measurement-feedback facilitated all-to-all connectivity of the CIMs provides strong experimental support for efforts to increase the connectivity of quantum annealers.
△ Less
Submitted 24 May, 2019; v1 submitted 14 May, 2018;
originally announced May 2018.
-
Succinct Oblivious RAM
Authors:
Taku Onodera,
Tetsuo Shibuya
Abstract:
Reducing the database space overhead is critical in big-data processing. In this paper, we revisit oblivious RAM (ORAM) using big-data standard for the database space overhead.
ORAM is a cryptographic primitive that enables users to perform arbitrary database accesses without revealing the access pattern to the server. It is particularly important today since cloud services become increasingly c…
▽ More
Reducing the database space overhead is critical in big-data processing. In this paper, we revisit oblivious RAM (ORAM) using big-data standard for the database space overhead.
ORAM is a cryptographic primitive that enables users to perform arbitrary database accesses without revealing the access pattern to the server. It is particularly important today since cloud services become increasingly common making it necessary to protect users' private information from database access pattern analyses. Previous ORAM studies focused mostly on reducing the access overhead. Consequently, the access overhead of the state-of-the-art ORAM constructions is almost at practical levels in certain application scenarios such as secure processors. On the other hand, most existing ORAM constructions require $(1+Θ(1))n$ (say, $10n$) bits of server space where $n$ is the database size. Though such space complexity is often considered to be "optimal", overhead such as $10 \times$ is prohibitive for big-data applications in practice.
We propose ORAM constructions that take only $(1+o(1))n$ bits of server space while maintaining state-of-the-art performance in terms of the access overhead and the user space. We also give non-asymptotic analyses and simulation results which indicate that the proposed ORAM constructions are practically effective.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
A Preferential Attachment Paradox: How Preferential Attachment Combines with Growth to Produce Networks with Log-normal In-degree Distributions
Authors:
Paul Sheridan,
Taku Onodera
Abstract:
Every network scientist knows that preferential attachment combines with growth to produce networks with power-law in-degree distributions. How, then, is it possible for the network of American Physical Society journal collection citations to enjoy a log-normal citation distribution when it was found to have grown in accordance with preferential attachment? This anomalous result, which we exalt as…
▽ More
Every network scientist knows that preferential attachment combines with growth to produce networks with power-law in-degree distributions. How, then, is it possible for the network of American Physical Society journal collection citations to enjoy a log-normal citation distribution when it was found to have grown in accordance with preferential attachment? This anomalous result, which we exalt as the preferential attachment paradox, has remained unexplained since the physicist Sidney Redner first made light of it over a decade ago. Here we propose a resolution. The chief source of the mischief, we contend, lies in Redner having relied on a measurement procedure bereft of the accuracy required to distinguish preferential attachment from another form of attachment that is consistent with a log-normal in-degree distribution. There was a high-accuracy measurement procedure in use at the time, but it would have have been difficult to use it to shed light on the paradox, due to the presence of a systematic error inducing design flaw. In recent years the design flaw had been recognised and corrected. We show that the bringing of the newly corrected measurement procedure to bear on the data leads to a resolution of the paradox.
△ Less
Submitted 21 January, 2018; v1 submitted 20 March, 2017;
originally announced March 2017.
-
Detecting Superbubbles in Assembly Graphs
Authors:
Taku Onodera,
Kunihiko Sadakane,
Tetsuo Shibuya
Abstract:
We introduce a new concept of a subgraph class called a superbubble for analyzing assembly graphs, and propose an efficient algorithm for detecting it. Most assembly algorithms utilize assembly graphs like the de Bruijn graph or the overlap graph constructed from reads. From these graphs, many assembly algorithms first detect simple local graph structures (motifs), such as tips and bubbles, mainly…
▽ More
We introduce a new concept of a subgraph class called a superbubble for analyzing assembly graphs, and propose an efficient algorithm for detecting it. Most assembly algorithms utilize assembly graphs like the de Bruijn graph or the overlap graph constructed from reads. From these graphs, many assembly algorithms first detect simple local graph structures (motifs), such as tips and bubbles, mainly to find sequencing errors. These motifs are easy to detect, but they are sometimes too simple to deal with more complex errors. The superbubble is an extension of the bubble, which is also important for analyzing assembly graphs. Though superbubbles are much more complex than ordinary bubbles, we show that they can be efficiently enumerated. We propose an average-case linear time algorithm (i.e., O(n+m) for a graph with n vertices and m edges) for graphs with a reasonable model, though the worst-case time complexity of our algorithm is quadratic (i.e., O(n(n+m))). Moreover, the algorithm is practically very fast: Our experiments show that our algorithm runs in reasonable time with a single CPU core even against a very large graph of a whole human genome.
△ Less
Submitted 30 July, 2013;
originally announced July 2013.