-
Training of Physical Neural Networks
Authors:
Ali Momeni,
Babak Rahmani,
Benjamin Scellier,
Logan G. Wright,
Peter L. McMahon,
Clara C. Wanjura,
Yuhang Li,
Anas Skalli,
Natalia G. Berloff,
Tatsuhiro Onodera,
Ilker Oguz,
Francesco Morichetti,
Philipp del Hougne,
Manuel Le Gallo,
Abu Sebastian,
Azalia Mirhoseini,
Cheng Zhang,
Danijela Marković,
Daniel Brunner,
Christophe Moser,
Sylvain Gigan,
Florian Marquardt,
Aydogan Ozcan,
Julie Grollier,
Andrea J. Liu
, et al. (3 additional authors not shown)
Abstract:
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also…
▽ More
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Finding $d$-Cuts in Graphs of Bounded Diameter, Graphs of Bounded Radius and $H$-Free Graphs
Authors:
Felicia Lucke,
Ali Momeni,
Daniël Paulusma,
Siani Smith
Abstract:
The $d$-Cut problem is to decide if a graph has an edge cut such that each vertex has at most $d$ neighbours at the opposite side of the cut. If $d=1$, we obtain the intensively studied Matching Cut problem. The $d$-Cut problem has been studied as well, but a systematic study for special graph classes was lacking. We initiate such a study and consider classes of bounded diameter, bounded radius an…
▽ More
The $d$-Cut problem is to decide if a graph has an edge cut such that each vertex has at most $d$ neighbours at the opposite side of the cut. If $d=1$, we obtain the intensively studied Matching Cut problem. The $d$-Cut problem has been studied as well, but a systematic study for special graph classes was lacking. We initiate such a study and consider classes of bounded diameter, bounded radius and $H$-free graphs. We prove that for all $d\geq 2$, $d$-Cut is polynomial-time solvable for graphs of diameter $2$, $(P_3+P_4)$-free graphs and $P_5$-free graphs. These results extend known results for $d=1$. However, we also prove several NP-hardness results for $d$-Cut that contrast known polynomial-time results for $d=1$. Our results lead to full dichotomies for bounded diameter and bounded radius and to almost-complete dichotomies for $H$-free graphs.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Backpropagation-free Training of Deep Physical Neural Networks
Authors:
Ali Momeni,
Babak Rahmani,
Matthieu Mallejac,
Philipp Del Hougne,
Romain Fleury
Abstract:
Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and in…
▽ More
Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.
△ Less
Submitted 12 June, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Physics-inspired Neuroacoustic Computing Based on Tunable Nonlinear Multiple-scattering
Authors:
Ali Momeni,
Xinxin Guo,
Herve Lissek,
Romain Fleury
Abstract:
Waves, such as light and sound, inherently bounce and mix due to multiple scattering induced by the complex material objects that surround us. This scattering process severely scrambles the information carried by waves, challenging conventional communication systems, sensing paradigms, and wave-based computing schemes. Here, we show that instead of being a hindrance, multiple scattering can be ben…
▽ More
Waves, such as light and sound, inherently bounce and mix due to multiple scattering induced by the complex material objects that surround us. This scattering process severely scrambles the information carried by waves, challenging conventional communication systems, sensing paradigms, and wave-based computing schemes. Here, we show that instead of being a hindrance, multiple scattering can be beneficial to enable and enhance analog nonlinear information mapping, allowing for the direct physical implementation of computational paradigms such as reservoir computing and extreme learning machines. We propose a physics-inspired version of such computational architectures for speech and vowel recognition that operate directly in the native domain of the input signal, namely on real-sounds, without any digital pre-processing or encoding conversion and backpropagation training computation. We first implement it in a proof-of-concept prototype, a nonlinear chaotic acoustic cavity containing multiple tunable and power-efficient nonlinear meta-scatterers. We prove the efficiency of the acoustic-based computing system for vowel recognition tasks with high testing classification accuracy (91.4%). Finally, we demonstrate the high performance of vowel recognition in the natural environment of a reverberation room. Our results open the way for efficient acoustic learning machines that operate directly on the input sound, and leverage physics to enable Natural Language Processing (NLP).
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
The Game of Cops and Robber on (Claw, Even-hole)-free Graphs
Authors:
Ramin Javadi,
Ali Momeni
Abstract:
In this paper, we study the game of cops and robber on the class of graphs with no even hole (induced cycle of even length) and claw (a star with three leaves). The cop number of a graph $G$ is defined as the minimum number of cops needed to capture the robber. Here, we prove that the cop number of all claw-free even-hole-free graphs is at most two and, in addition, the capture time is at most…
▽ More
In this paper, we study the game of cops and robber on the class of graphs with no even hole (induced cycle of even length) and claw (a star with three leaves). The cop number of a graph $G$ is defined as the minimum number of cops needed to capture the robber. Here, we prove that the cop number of all claw-free even-hole-free graphs is at most two and, in addition, the capture time is at most $2n$ rounds, where $n$ is the number of vertices of the graph. Moreover, our results can be viewed as a first step towards studying the structure of claw-free even-hole-free graphs.
△ Less
Submitted 11 January, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Dealing with Distribution Mismatch in Semi-supervised Deep Learning for Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature Densities
Authors:
Saul Calderon-Ramirez,
Shengxiang Yang,
David Elizondo,
Armaghan Moemeni
Abstract:
In the context of the global coronavirus pandemic, different deep learning solutions for infected subject detection using chest X-ray images have been proposed. However, deep learning models usually need large labelled datasets to be effective. Semi-supervised deep learning is an attractive alternative, where unlabelled data is leveraged to improve the overall model's accuracy. However, in real-wo…
▽ More
In the context of the global coronavirus pandemic, different deep learning solutions for infected subject detection using chest X-ray images have been proposed. However, deep learning models usually need large labelled datasets to be effective. Semi-supervised deep learning is an attractive alternative, where unlabelled data is leveraged to improve the overall model's accuracy. However, in real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset (i.e. the labelled dataset was sampled from a target clinic and the unlabelled dataset from a source clinic). This results in a distribution mismatch between the unlabelled and labelled datasets. In this work, we assess the impact of the distribution mismatch between the labelled and the unlabelled datasets, for a semi-supervised model trained with chest X-ray images, for COVID-19 detection. Under strong distribution mismatch conditions, we found an accuracy hit of almost 30\%, suggesting that the unlabelled dataset distribution has a strong influence in the behaviour of the model. Therefore, we propose a straightforward approach to diminish the impact of such distribution mismatch. Our proposed method uses a density approximation of the feature space. It is built upon the target dataset to filter out the observations in the source unlabelled dataset that might harm the accuracy of the semi-supervised model. It assumes that a small labelled source dataset is available together with a larger source unlabelled dataset. Our proposed method does not require any model training, it is simple and computationally cheap. We compare our proposed method against two popular state of the art out-of-distribution data detectors, which are also cheap and simple to implement. In our tests, our method yielded accuracy gains of up to 32\%, when compared to the previous state of the art methods.
△ Less
Submitted 16 August, 2021;
originally announced September 2021.
-
Wave-based extreme deep learning based on non-linear time-Floquet entanglement
Authors:
Ali Momeni,
Romain Fleury
Abstract:
Wave-based analog signal processing holds the promise of extremely fast, on-the-fly, power-efficient data processing, occurring as a wave propagates through an artificially engineered medium. Yet, due to the fundamentally weak non-linearities of traditional wave materials, such analog processors have been so far largely confined to simple linear projections such as image edge detection or matrix m…
▽ More
Wave-based analog signal processing holds the promise of extremely fast, on-the-fly, power-efficient data processing, occurring as a wave propagates through an artificially engineered medium. Yet, due to the fundamentally weak non-linearities of traditional wave materials, such analog processors have been so far largely confined to simple linear projections such as image edge detection or matrix multiplications. Complex neuromorphic computing tasks, which inherently require strong non-linearities, have so far remained out-of-reach of wave-based solutions, with a few attempts that implemented non-linearities on the digital front, or used weak and inflexible non-linear sensors, restraining the learning performance. Here, we tackle this issue by demonstrating the relevance of Time-Floquet physics to induce a strong non-linear entanglement between signal inputs at different frequencies, enabling a power-efficient and versatile wave platform for analog extreme deep learning involving a single, uniformly modulated dielectric layer and a scattering medium. We prove the efficiency of the method for extreme learning machines and reservoir computing to solve a range of challenging learning tasks, from forecasting chaotic time series to the simultaneous classification of distinct datasets. Our results open the way for wave-based machine learning with high energy efficiency, speed, and scalability.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using X-ray Chest Images
Authors:
Saul Calderon-Ramirez,
Shengxiang-Yang,
Armaghan Moemeni,
David Elizondo,
Simon Colreavy-Donnelly,
Luis Fernando Chavarria-Estrada,
Miguel A. Molina-Cabello
Abstract:
The Corona Virus (COVID-19) is an internationalpandemic that has quickly propagated throughout the world. The application of deep learning for image classification of chest X-ray images of Covid-19 patients, could become a novel pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is rela…
▽ More
The Corona Virus (COVID-19) is an internationalpandemic that has quickly propagated throughout the world. The application of deep learning for image classification of chest X-ray images of Covid-19 patients, could become a novel pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in the context of a new highly infectious disease, the datasets are also highly imbalanced,with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch using a very limited number of labelled observations and highly imbalanced labelled dataset. We propose a simple approach for correcting data imbalance, re-weight each observationin the loss function, giving a higher weight to the observationscorresponding to the under-represented class. For unlabelled observations, we propose the usage of the pseudo and augmentedlabels calculated by MixMatch to choose the appropriate weight. The MixMatch method combined with the proposed pseudo-label based balance correction improved classification accuracy by up to 10%, with respect to the non balanced MixMatch algorithm, with statistical significance. We tested our proposed approach with several available datasets using 10, 15 and 20 labelledobservations. Additionally, a new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients
△ Less
Submitted 20 August, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures
Authors:
Saul Calderon-Ramirez,
Luis Oala,
Jordina Torrents-Barrena,
Shengxiang Yang,
Armaghan Moemeni,
Wojciech Samek,
Miguel A. Molina-Cabello
Abstract:
In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components: (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensi…
▽ More
In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components: (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios across three multi-class classification tasks. These are designed to systematically understand how OOD unlabelled data affects MixMatch performance. In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs), to compare labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and model agnostic. They use the feature space of a generic Wide-ResNet and can be applied prior to learning. Our test results reveal that supposed semantic similarity between labelled and unlabelled data is not a good heuristic for unlabelled data selection. In contrast, strong correlation between MixMatch accuracy and the proposed DeDiMs allow us to quantitatively rank different unlabelled datasets ante hoc according to expected MixMatch accuracy. This is what we call MixMOOD. Furthermore, we argue that the MixMOOD approach can aid to standardize the evaluation of different semi-supervised learning techniques under real world scenarios involving out of distribution data.
△ Less
Submitted 13 June, 2020;
originally announced June 2020.
-
Smoothness-Adaptive Contextual Bandits
Authors:
Yonatan Gur,
Ahmadreza Momeni,
Stefan Wager
Abstract:
We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known i…
▽ More
We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known in advance, and misspecification of smoothness may severely deteriorate the performance of existing methods. In this work, we consider a framework where the smoothness of payoff functions is not known, and study when and how algorithms may adapt to unknown smoothness. First, we establish that designing algorithms that adapt to unknown smoothness of payoff functions is, in general, impossible. However, under a self-similarity condition (which does not reduce the minimax complexity of the dynamic optimization problem at hand), we establish that adapting to unknown smoothness is possible, and further devise a general policy for achieving smoothness-adaptive performance. Our policy infers the smoothness of payoffs throughout the decision-making process, while leveraging the structure of off-the-shelf non-adaptive policies. We establish that for problem settings with either differentiable or non-differentiable payoff functions, this policy matches (up to a logarithmic scale) the regret rate that is achievable when the smoothness of payoffs is known a priori.
△ Less
Submitted 15 October, 2021; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Adaptive Sequential Experiments with Unknown Information Arrival Processes
Authors:
Yonatan Gur,
Ahmadreza Momeni
Abstract:
Sequential experiments are often characterized by an exploration-exploitation tradeoff that is captured by the multi-armed bandit (MAB) framework. This framework has been studied and applied, typically when at each time period feedback is received only on the action that was selected at that period. However, in many practical settings additional data may become available between decision epochs. W…
▽ More
Sequential experiments are often characterized by an exploration-exploitation tradeoff that is captured by the multi-armed bandit (MAB) framework. This framework has been studied and applied, typically when at each time period feedback is received only on the action that was selected at that period. However, in many practical settings additional data may become available between decision epochs. We introduce a generalized MAB formulation, which considers a broad class of distributions that are informative about mean rewards, and allows observations from these distributions to arrive according to an arbitrary and a priori unknown arrival process. When it is known how to map auxiliary data to reward estimates, by obtaining matching lower and upper bounds we characterize a spectrum of minimax complexities for this class of problems as a function of the information arrival process, which captures how salient characteristics of this process impact achievable performance. In terms of achieving optimal performance, we establish that upper confidence bound and posterior sampling policies possess natural robustness with respect to the information arrival process without any adjustments, which uncovers a novel property of these popular policies and further lends credence to their appeal. When the mappings connecting auxiliary data and rewards are a priori unknown, we characterize necessary and sufficient conditions under which auxiliary information allows performance improvement. We devise a new policy that is based on two different upper confidence bounds (one that accounts for auxiliary observation and one that does not) and establish the near-optimality of this policy. We use data from a large media site to analyze the value that may be captured in practice by leveraging auxiliary data for designing content recommendations.
△ Less
Submitted 18 December, 2020; v1 submitted 28 June, 2019;
originally announced July 2019.
-
End-to-End Performance Optimization in Hybrid Molecular and Electromagnetic Communications
Authors:
Ali Momeni,
Hamid Khoshfekr Rudsari,
Mohammad Reza Javan,
Nader Mokari,
Eduard A. Jorswieck,
Mahdi Orooji
Abstract:
Telemedicine refers to the use of information and communication technology to assist with medical information and services. In health care applications, high reliable communication links between the health care provider and the desired destination in the human body play a central role in designing end-to-end (E2E) telemedicine system. In the advanced health care applications, $\text{e.g.}$ drug de…
▽ More
Telemedicine refers to the use of information and communication technology to assist with medical information and services. In health care applications, high reliable communication links between the health care provider and the desired destination in the human body play a central role in designing end-to-end (E2E) telemedicine system. In the advanced health care applications, $\text{e.g.}$ drug delivery, molecular communication becomes a major building block in bio-nano-medical applications. In this paper, an E2E communication link consisting of the electromagnetic and the molecular link is investigated. This paradigm is crucial when the body is a part of the communication system. Based on the quality of service (QoS) metrics, we present a closed-form expression for the E2E BER of the combination of molecular and wireless electromagnetic communications. \textcolor{black}{ Next, we formulate an optimization problem with the aim of minimizing the E2E BER of the system to achieve the optimal symbol duration for EC and DMC regarding the imposing delivery time from telemedicine services.} The proposed problem is solved by an iterative algorithm based on the bisection method. Also, we study the impact of the system parameters, including drift velocity, detection threshold at the receiver in molecular communication, on the performance of the system. Numerical results show that the proposed method obtains the minimum E2E bit error probability by selecting an appropriate symbol duration of electromagnetic and molecular communications.
△ Less
Submitted 17 June, 2019; v1 submitted 1 January, 2019;
originally announced January 2019.