Search | arXiv e-print repository

Spectral Self-supervised Feature Selection

Authors: Daniel Segal, Ofir Lindenbaum, Ariel Jaffe

Abstract: Choosing a meaningful subset of features from high-dimensional observations in unsupervised settings can greatly enhance the accuracy of downstream analysis, such as clustering or dimensionality reduction, and provide valuable insights into the sources of heterogeneity in a given dataset. In this paper, we propose a self-supervised graph-based approach for unsupervised feature selection. Our metho… ▽ More Choosing a meaningful subset of features from high-dimensional observations in unsupervised settings can greatly enhance the accuracy of downstream analysis, such as clustering or dimensionality reduction, and provide valuable insights into the sources of heterogeneity in a given dataset. In this paper, we propose a self-supervised graph-based approach for unsupervised feature selection. Our method's core involves computing robust pseudo-labels by applying simple processing steps to the graph Laplacian's eigenvectors. The subset of eigenvectors used for computing pseudo-labels is chosen based on a model stability criterion. We then measure the importance of each feature by training a surrogate model to predict the pseudo-labels from the observations. Our approach is shown to be robust to challenging scenarios, such as the presence of outliers and complex substructures. We demonstrate the effectiveness of our method through experiments on real-world datasets, showing its robustness across multiple domains, particularly its effectiveness on biological datasets. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 20 pages, 5 figures

arXiv:2406.11703 [pdf, other]

Multiple Descents in Unsupervised Learning: The Role of Noise, Domain Shift and Anomalies

Authors: Kobi Rahimi, Tom Tirer, Ofir Lindenbaum

Abstract: The phenomenon of double descent has recently gained attention in supervised learning. It challenges the conventional wisdom of the bias-variance trade-off by showcasing a surprising behavior. As the complexity of the model increases, the test error initially decreases until reaching a certain point where the model starts to overfit the train set, causing the test error to rise. However, deviating… ▽ More The phenomenon of double descent has recently gained attention in supervised learning. It challenges the conventional wisdom of the bias-variance trade-off by showcasing a surprising behavior. As the complexity of the model increases, the test error initially decreases until reaching a certain point where the model starts to overfit the train set, causing the test error to rise. However, deviating from classical theory, the error exhibits another decline when exceeding a certain degree of over-parameterization. We study the presence of double descent in unsupervised learning, an area that has received little attention and is not yet fully understood. We conduct extensive experiments using under-complete auto-encoders (AEs) for various applications, such as dealing with noisy data, domain shifts, and anomalies. We use synthetic and real data and identify model-wise, epoch-wise, and sample-wise double descent for all the aforementioned applications. Finally, we assessed the usability of the AEs for detecting anomalies and mitigating the domain shift between datasets. Our findings indicate that over-parameterized models can improve performance not only in terms of reconstruction, but also in enhancing capabilities for the downstream task. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09920 [pdf, other]

Knowledge Editing in Language Models via Adapted Direct Preference Optimization

Authors: Amit Rozner, Barak Battash, Lior Wolf, Ofir Lindenbaum

Abstract: Large Language Models (LLMs) can become outdated over time as they may lack updated world knowledge, leading to factual knowledge errors and gaps. Knowledge Editing (KE) aims to overcome this challenge using weight updates that do not require expensive retraining. We propose treating KE as an LLM alignment problem. Toward this goal, we introduce Knowledge Direct Preference Optimization (KDPO), a v… ▽ More Large Language Models (LLMs) can become outdated over time as they may lack updated world knowledge, leading to factual knowledge errors and gaps. Knowledge Editing (KE) aims to overcome this challenge using weight updates that do not require expensive retraining. We propose treating KE as an LLM alignment problem. Toward this goal, we introduce Knowledge Direct Preference Optimization (KDPO), a variation of the Direct Preference Optimization (DPO) that is more effective for knowledge modifications. Our method is based on an online approach that continually updates the knowledge stored in the model. We use the current knowledge as a negative sample and the new knowledge we want to introduce as a positive sample in a process called DPO. We also use teacher-forcing for negative sample generation and optimize using the positive sample, which helps maintain localized changes. We tested our KE method on various datasets and models, comparing it to several cutting-edge methods, with 100 and 500 sequential edits. Additionally, we conducted an ablation study comparing our method to the standard DPO approach. Our experimental results show that our modified DPO method allows for more refined KE, achieving similar or better performance compared to previous methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures

arXiv:2406.06634 [pdf, other]

Sparse Binarization for Fast Keyword Spotting

Authors: Jonathan Svirsky, Uri Shaham, Ofir Lindenbaum

Abstract: With the increasing prevalence of voice-activated devices and applications, keyword spotting (KWS) models enable users to interact with technology hands-free, enhancing convenience and accessibility in various contexts. Deploying KWS models on edge devices, such as smartphones and embedded systems, offers significant benefits for real-time applications, privacy, and bandwidth efficiency. However,… ▽ More With the increasing prevalence of voice-activated devices and applications, keyword spotting (KWS) models enable users to interact with technology hands-free, enhancing convenience and accessibility in various contexts. Deploying KWS models on edge devices, such as smartphones and embedded systems, offers significant benefits for real-time applications, privacy, and bandwidth efficiency. However, these devices often possess limited computational power and memory. This necessitates optimizing neural network models for efficiency without significantly compromising their accuracy. To address these challenges, we propose a novel keyword-spotting model based on sparse input representation followed by a linear classifier. The model is four times faster than the previous state-of-the-art edge device-compatible model with better accuracy. We show that our method is also more robust in noisy environments while being fast. Our code is available at: https://github.com/jsvir/sparknet. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.12265 [pdf, other]

SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images

Authors: Shir Barzel, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch

Abstract: Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recog… ▽ More Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, WSCG 2024 conference

arXiv:2405.00791 [pdf, other]

Obtaining Favorable Layouts for Multiple Object Generation

Authors: Barak Battash, Amit Rozner, Lior Wolf, Ofir Lindenbaum

Abstract: Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve mult… ▽ More Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve multiple subjects. When presented with a prompt containing more than one subject, these models may omit some subjects or merge them together. To address this challenge, we propose a novel approach based on a guiding principle. We allow the diffusion model to initially propose a layout, and then we rearrange the layout grid. This is achieved by enforcing cross-attention maps (XAMs) to adhere to proposed masks and by migrating pixels from latent maps to new locations determined by us. We introduce new loss terms aimed at reducing XAM entropy for clearer spatial definition of subjects, reduce the overlap between XAMs, and ensure that XAMs align with their respective masks. We contrast our approach with several alternative methods and show that it more faithfully captures the desired concepts across a variety of text prompts. △ Less

Submitted 1 May, 2024; originally announced May 2024.

MSC Class: I.2; I.4

arXiv:2402.16383 [pdf, other]

Self Supervised Correlation-based Permutations for Multi-View Clustering

Authors: Ran Eisenberg, Jonathan Svirsky, Ofir Lindenbaum

Abstract: Fusing information from different modalities can enhance data analysis tasks, including clustering. However, existing multi-view clustering (MVC) solutions are limited to specific domains or rely on a suboptimal and computationally demanding two-stage procedure of representation and clustering. We propose an end-to-end deep learning-based MVC framework for general data (image, tabular, etc.). Our… ▽ More Fusing information from different modalities can enhance data analysis tasks, including clustering. However, existing multi-view clustering (MVC) solutions are limited to specific domains or rely on a suboptimal and computationally demanding two-stage procedure of representation and clustering. We propose an end-to-end deep learning-based MVC framework for general data (image, tabular, etc.). Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective. Concurrently, we learn cluster assignments by identifying consistent pseudo-labels across multiple views. We demonstrate the effectiveness of our model using ten MVC benchmark datasets. Theoretically, we show that our model approximates the supervised linear discrimination analysis (LDA) representation. Additionally, we provide an error bound induced by false-pseudo label annotations. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.14254 [pdf, other]

Contextual Feature Selection with Conditional Stochastic Gates

Authors: Ram Dyuthi Sristi, Ofir Lindenbaum, Shira Lifshitz, Maria Lavzin, Jackie Schiller, Gal Mishne, Hadas Benisty

Abstract: Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for… ▽ More Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for contextual feature selection where the subset of selected features is conditioned on the value of context variables. Our new approach, Conditional Stochastic Gates (c-STG), models the importance of features using conditional Bernoulli variables whose parameters are predicted based on contextual variables. We introduce a hypernetwork that maps context variables to feature selection parameters to learn the context-dependent gates along with a prediction model. We further present a theoretical analysis of our model, indicating that it can improve performance and flexibility over population-level methods in complex feature selection settings. Finally, we conduct an extensive benchmark using simulated and real-world datasets across multiple domains demonstrating that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability. △ Less

Submitted 7 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to ICML 2024

arXiv:2312.13240 [pdf, other]

Efficient Verification-Based Face Identification

Authors: Amit Rozner, Barak Battash, Ofir Lindenbaum, Lior Wolf

Abstract: We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate… ▽ More We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate the model weights using a hypernetwork $h$. This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method. △ Less

Submitted 25 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 10 pages, 5 figures

ACM Class: I.4

arXiv:2312.13155 [pdf, other]

Gappy local conformal auto-encoders for heterogeneous data fusion: in praise of rigidity

Authors: Erez Peterfreund, Iryna Burak, Ofir Lindenbaum, Jim Gimlett, Felix Dietrich, Ronald R. Coifman, Ioannis G. Kevrekidis

Abstract: Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets… ▽ More Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets of partial observations, and the result is a globally consistent latent space, harmonizing (rigidifying, fusing) all measurements. The key enabler is the availability of multiple slightly perturbed measurements of each instance:, local measurement, "bursts", that allows us to estimate the local distortion induced by each instrument. We demonstrate the approach in a sequence of examples, starting with simple two-dimensional data sets and proceeding to a Wi-Fi localization problem and to the solution of a "dynamical puzzle" arising in spatio-temporal observations of the solutions of Partial Differential Equations. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2307.12336 [pdf, other]

TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models

Authors: Guy Zamberg, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch

Abstract: Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by uti… ▽ More Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2306.04785 [pdf, other]

Interpretable Deep Clustering for Tabular Data

Authors: Jonathan Svirsky, Ofir Lindenbaum

Abstract: Clustering is a fundamental learning task widely used as a first step in data analysis. For example, biologists use cluster assignments to analyze genome sequences, medical records, or images. Since downstream analysis is typically performed at the cluster level, practitioners seek reliable and interpretable clustering models. We propose a new deep-learning framework for general domain tabular dat… ▽ More Clustering is a fundamental learning task widely used as a first step in data analysis. For example, biologists use cluster assignments to analyze genome sequences, medical records, or images. Since downstream analysis is typically performed at the cluster level, practitioners seek reliable and interpretable clustering models. We propose a new deep-learning framework for general domain tabular data that predicts interpretable cluster assignments at the instance and cluster levels. First, we present a self-supervised procedure to identify the subset of the most informative features from each data point. Then, we design a model that predicts cluster assignments and a gate matrix that provides cluster-level feature selection. Overall, our model provides cluster assignments with an indication of the driving feature for each sample and each cluster. We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics tabular datasets. Furthermore, using previously proposed metrics, we verify that our model leads to interpretable results at a sample and cluster level. Our code is available at https://github.com/jsvir/idc. △ Less

Submitted 9 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.00582 [pdf, other]

Anomaly Detection with Variance Stabilized Density Estimation

Authors: Amit Rozner, Barak Battash, Henry Li, Lior Wolf, Ofir Lindenbaum

Abstract: We propose a modified density estimation problem that is highly effective for detecting anomalies in tabular data. Our approach assumes that the density function is relatively stable (with lower variance) around normal samples. We have verified this hypothesis empirically using a wide range of real-world data. Then, we present a variance-stabilized density estimation problem for maximizing the lik… ▽ More We propose a modified density estimation problem that is highly effective for detecting anomalies in tabular data. Our approach assumes that the density function is relatively stable (with lower variance) around normal samples. We have verified this hypothesis empirically using a wide range of real-world data. Then, we present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples while minimizing the variance of the density around normal samples. To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution. We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results while alleviating the need for data-specific hyperparameter tuning. Finally, we have used an ablation study to demonstrate the importance of each of the proposed components, followed by a stability analysis evaluating the robustness of our model. △ Less

Submitted 8 May, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 9 pages, 7 figures

ACM Class: I.2

arXiv:2306.00528 [pdf, other]

Neuronal Cell Type Classification using Deep Learning

Authors: Ofek Ophir, Orit Shefi, Ofir Lindenbaum

Abstract: The brain is likely the most complex organ, given the variety of functions it controls, the number of cells it comprises, and their corresponding diversity. Studying and identifying neurons, the brain's primary building blocks, is a crucial milestone and essential for understanding brain function in health and disease. Recent developments in machine learning have provided advanced abilities for cl… ▽ More The brain is likely the most complex organ, given the variety of functions it controls, the number of cells it comprises, and their corresponding diversity. Studying and identifying neurons, the brain's primary building blocks, is a crucial milestone and essential for understanding brain function in health and disease. Recent developments in machine learning have provided advanced abilities for classifying neurons. However, these methods remain black boxes with no explainability and reasoning. This paper aims to provide a robust and explainable deep-learning framework to classify neurons based on their electrophysiological activity. Our analysis is performed on data provided by the Allen Cell Types database containing a survey of biological features derived from single-cell recordings of mice and humans. First, we classify neuronal cell types of mice data to identify excitatory and inhibitory neurons. Then, neurons are categorized to their broad types in humans using domain adaptation from mice data. Lastly, neurons are classified into sub-types based on transgenic mouse lines using deep neural networks in an explainable fashion. We show state-of-the-art results in a dendrite-type classification of excitatory vs. inhibitory neurons and transgenic mouse lines classification. The model is also inherently interpretable, revealing the correlations between neuronal types and their electrophysiological properties. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2303.09381 [pdf, other]

Multi-modal Differentiable Unsupervised Feature Selection

Authors: Junchen Yang, Ofir Lindenbaum, Yuval Kluger, Ariel Jaffe

Abstract: Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-moda… ▽ More Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-modal unsupervised feature selection framework: identifying informative variables based on coupled high-dimensional measurements. Our method is designed to identify features associated with two types of latent low-dimensional structures: (i) shared structures that govern the observations in both modalities and (ii) differential structures that appear in only one modality. To that end, we propose two Laplacian-based scoring operators. We incorporate the scores with differentiable gates that mask nuisance features and enhance the accuracy of the structure captured by the graph Laplacian. The performance of the new scheme is illustrated using synthetic and real datasets, including an extended biological application to single-cell multi-omics. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.02749 [pdf, other]

Revisiting the Noise Model of Stochastic Gradient Descent

Authors: Barak Battash, Ofir Lindenbaum

Abstract: The stochastic gradient noise (SGN) is a significant factor in the success of stochastic gradient descent (SGD). Following the central limit theorem, SGN was initially modeled as Gaussian, and lately, it has been suggested that stochastic gradient noise is better characterized using $SαS$ Lévy distribution. This claim was allegedly refuted and rebounded to the previously suggested Gaussian noise m… ▽ More The stochastic gradient noise (SGN) is a significant factor in the success of stochastic gradient descent (SGD). Following the central limit theorem, SGN was initially modeled as Gaussian, and lately, it has been suggested that stochastic gradient noise is better characterized using $SαS$ Lévy distribution. This claim was allegedly refuted and rebounded to the previously suggested Gaussian noise model. This paper presents solid, detailed empirical evidence that SGN is heavy-tailed and better depicted by the $SαS$ distribution. Furthermore, we argue that different parameters in a deep neural network (DNN) hold distinct SGN characteristics throughout training. To more accurately approximate the dynamics of SGD near a local minimum, we construct a novel framework in $\mathbb{R}^N$, based on Lévy-driven stochastic differential equation (SDE), where one-dimensional Lévy processes model each parameter in the DNN. Next, we show that SGN jump intensity (frequency and amplitude) depends on the learning rate decay mechanism (LRdecay); furthermore, we demonstrate empirically that the LRdecay effect may stem from the reduction of the SGN and not the decrease in the step size. Based on our analysis, we examine the mean escape time, trapping probability, and more properties of DNNs near local minima. Finally, we prove that the training process will likely exit from the basin in the direction of parameters with heavier tail SGN. We will share our code for reproducibility. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2301.13530 [pdf, other]

Domain-Generalizable Multiple-Domain Clustering

Authors: Amit Rozner, Barak Battash, Lior Wolf, Ofir Lindenbaum

Abstract: This work generalizes the problem of unsupervised domain generalization to the case in which no labeled samples are available (completely unsupervised). We are given unlabeled samples from multiple source domains, and we aim to learn a shared predictor that assigns examples to semantically related clusters. Evaluation is done by predicting cluster assignments in previously unseen domains. Towards… ▽ More This work generalizes the problem of unsupervised domain generalization to the case in which no labeled samples are available (completely unsupervised). We are given unlabeled samples from multiple source domains, and we aim to learn a shared predictor that assigns examples to semantically related clusters. Evaluation is done by predicting cluster assignments in previously unseen domains. Towards this goal, we propose a two-stage training framework: (1) self-supervised pre-training for extracting domain invariant semantic features. (2) multi-head cluster prediction with pseudo labels, which rely on both the feature space and cluster head prediction, further leveraging a novel prediction-based label smoothing scheme. We demonstrate empirically that our model is more accurate than baselines that require fine-tuning using samples from the target domain or some level of supervision. Our code is available at https://github.com/AmitRozner/domain-generalizable-multiple-domain-clustering. △ Less

Submitted 31 January, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: 13 pages, 3 figures

arXiv:2301.00448 [pdf, other]

Unsupervised Acoustic Scene Mapping Based on Acoustic Features and Dimensionality Reduction

Authors: Idan Cohen, Ofir Lindenbaum, Sharon Gannot

Abstract: Classical methods for acoustic scene mapping require the estimation of time difference of arrival (TDOA) between microphones. Unfortunately, TDOA estimation is very sensitive to reverberation and additive noise. We introduce an unsupervised data-driven approach that exploits the natural structure of the data. Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning sch… ▽ More Classical methods for acoustic scene mapping require the estimation of time difference of arrival (TDOA) between microphones. Unfortunately, TDOA estimation is very sensitive to reverberation and additive noise. We introduce an unsupervised data-driven approach that exploits the natural structure of the data. Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements. Our experimental setup includes a microphone array that measures the transmitted sound source at multiple locations across the acoustic enclosure. We demonstrate that LOCA learns a representation that is isometric to the spatial locations of the microphones. The performance of our method is evaluated using a series of realistic simulations and compared with other dimensionality-reduction schemes. We further assess the influence of reverberation on the results of LOCA and show that it demonstrates considerable robustness. △ Less

Submitted 12 March, 2024; v1 submitted 1 January, 2023; originally announced January 2023.

arXiv:2210.16022 [pdf, other]

SG-VAD: Stochastic Gates Based Speech Activity Detection

Authors: Jonathan Svirsky, Ofir Lindenbaum

Abstract: We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperfo… ▽ More We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2204.08683 [pdf, ps, other]

Imbalanced Classification via a Tabular Translation GAN

Authors: Jonathan Gradstein, Moshe Salhov, Yoav Tulpan, Ofir Lindenbaum, Amir Averbuch

Abstract: When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the syn… ▽ More When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the synthesized samples to be close to the class boundary. Furthermore, we explore a selection criterion to retain the most useful of the synthesized samples. Experimental results using several downstream classifiers on a variety of tabular class-imbalanced datasets show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2202.10550 [pdf, other]

Imbalanced Classification via Explicit Gradient Learning From Augmented Data

Authors: Bronislav Yasinnik, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch

Abstract: Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing approaches attempt to eliminate the bias through data re-sampling or re-weighting the loss in the learning process. Still, these methods tend to overfit the minority sa… ▽ More Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing approaches attempt to eliminate the bias through data re-sampling or re-weighting the loss in the learning process. Still, these methods tend to overfit the minority samples and perform poorly when the structure of the minority class is highly irregular. Here, we propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances. These additional data are incorporated in the classifier's deep-learning process, and their contributions are learned explicitly. The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios. △ Less

Submitted 28 October, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:1906.05591, arXiv:2202.09318

arXiv:2110.15960 [pdf, other]

Support Recovery with Stochastic Gates: Theory and Application for Linear Models

Authors: Soham Jana, Henry Li, Yutaro Yamada, Ofir Lindenbaum

Abstract: Consider the problem of simultaneous estimation and support recovery of the coefficient vector in a linear data model with additive Gaussian noise. We study the problem of estimating the model coefficients based on a recently proposed non-convex regularizer, namely the stochastic gates (STG) [Yamada et al. 2020]. We suggest a new projection-based algorithm for solving the STG regularized minimizat… ▽ More Consider the problem of simultaneous estimation and support recovery of the coefficient vector in a linear data model with additive Gaussian noise. We study the problem of estimating the model coefficients based on a recently proposed non-convex regularizer, namely the stochastic gates (STG) [Yamada et al. 2020]. We suggest a new projection-based algorithm for solving the STG regularized minimization problem, and prove convergence and support recovery guarantees of the STG-estimator for a range of random and non-random design matrix setups. Our new algorithm has been shown to outperform the existing STG algorithm and other classical estimators for support recovery in various real and synthetic data analyses. △ Less

Submitted 12 November, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: 15 pages, 7 figures, extended the theoretical studies and numerical analyses to other probabilistic models (subgaussian, moment constraints etc.)

arXiv:2110.05306 [pdf, other]

Deep Unsupervised Feature Selection by Discarding Nuisance and Correlated Features

Authors: Uri Shaham, Ofir Lindenbaum, Jonathan Svirsky, Yuval Kluger

Abstract: Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of… ▽ More Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features rather than on the complete feature set. To do this, we propose a fully differentiable approach for unsupervised feature selection, utilizing the Laplacian score criterion to avoid the selection of nuisance features. We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features. Building on the recently proposed concrete layer that allows controlling for the number of selected features via architectural design, simplifying the optimization process. Experimenting on several real-world datasets, we demonstrate that our proposed approach outperforms similar approaches designed to avoid only correlated or nuisance features, but not both. Several state-of-the-art clustering results are reported. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2110.00494 [pdf, other]

Probabilistic Robust Autoencoders for Outlier Detection

Authors: Ofir Lindenbaum, Yariv Aizenbud, Yuval Kluger

Abstract: Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional… ▽ More Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first present the Robust AutoEncoder (RAE) objective as a minimization problem for splitting the data into inliers and outliers. Our objective is designed to exclude outliers while including a subset of samples (inliers) that can be effectively reconstructed using an AutoEncoder (AE). RAE minimizes the autoencoder's reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting an $\ell_0$ norm counting the number of selected samples from the reconstruction term. Unfortunately, this leads to an intractable combinatorial problem. Therefore, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We use synthetic data to show that PRAE can accurately remove outliers in a wide range of contamination levels. Finally, we demonstrate that using PRAE for anomaly detection leads to state-of-the-art results on various benchmark datasets. △ Less

Submitted 24 August, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2106.06468 [pdf, other]

Locally Sparse Neural Networks for Tabular Biomedical Data

Authors: Junchen Yang, Ofir Lindenbaum, Yuval Kluger

Abstract: Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when applied to tabular datasets. To address these neural networks' shortcomings, we propose an intrinsically interpretable network for heterogeneous biomedical data. W… ▽ More Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when applied to tabular datasets. To address these neural networks' shortcomings, we propose an intrinsically interpretable network for heterogeneous biomedical data. We design a locally sparse neural network where the local sparsity is learned to identify the subset of most relevant features for each sample. This sample-specific sparsity is predicted via a \textit{gating} network, which is trained in tandem with the \textit{prediction} network. By forcing the model to select a subset of the most informative features for each sample, we reduce model overfitting in low-sample-size data and obtain an interpretable model. We demonstrate that our method outperforms state-of-the-art models when applied to synthetic or real-world biomedical datasets using extensive experiments. Furthermore, the proposed framework dramatically outperforms existing schemes when evaluating its interpretability capabilities. Finally, we demonstrate the applicability of our model to two important biomedical tasks: survival analysis and marker gene identification. △ Less

Submitted 7 February, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

arXiv:2010.05620 [pdf, other]

$\ell_0$-based Sparse Canonical Correlation Analysis

Authors: Ofir Lindenbaum, Moshe Salhov, Amir Averbuch, Yuval Kluger

Abstract: Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalit… ▽ More Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalities exceeds the number of samples. Moreover, often a significant fraction of the variables measures modality-specific information, and thus removing them is beneficial for identifying the \textit{canonically correlated variates}. Here, we propose $\ell_0$-CCA, a method for learning correlated representations based on sparse subsets of variables from two observed modalities. Sparsity is obtained by multiplying the input variables by stochastic gates, whose parameters are learned together with the CCA weights via an $\ell_0$-regularized correlation loss. We further propose $\ell_0$-Deep CCA for solving the problem of non-linear sparse CCA by modeling the correlated representations using deep nets. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, by gating nuisance input variables, our approach improves the extracted representations compared to other linear, non-linear and sparse CCA-based models. △ Less

Submitted 8 June, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

arXiv:2009.04142 [pdf, other]

doi 10.1063/5.0044529

Kernel-based parameter estimation of dynamical systems with unknown observation functions

Authors: Ofir Lindenbaum, Amir Sagiv, Gal Mishne, Ronen Talmon

Abstract: A low-dimensional dynamical system is observed in an experiment as a high-dimensional signal; for example, a video of a chaotic pendulums system. Assuming that we know the dynamical model up to some unknown parameters, can we estimate the underlying system's parameters by measuring its time-evolution only once? The key information for performing this estimation lies in the temporal inter-dependenc… ▽ More A low-dimensional dynamical system is observed in an experiment as a high-dimensional signal; for example, a video of a chaotic pendulums system. Assuming that we know the dynamical model up to some unknown parameters, can we estimate the underlying system's parameters by measuring its time-evolution only once? The key information for performing this estimation lies in the temporal inter-dependencies between the signal and the model. We propose a kernel-based score to compare these dependencies. Our score generalizes a maximum likelihood estimator for a linear model to a general nonlinear setting in an unknown feature space. We estimate the system's underlying parameters by maximizing the proposed score. We demonstrate the accuracy and efficiency of the method using two chaotic dynamical systems - the double pendulum and the Lorenz '63 model. △ Less

Submitted 4 April, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2007.04728 [pdf, other]

Differentiable Unsupervised Feature Selection based on a Gated Laplacian

Authors: Ofir Lindenbaum, Uri Shaham, Jonathan Svirsky, Erez Peterfreund, Yuval Kluger

Abstract: Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable… ▽ More Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable loss function that combines the Laplacian score, which favors low-frequency features, with a gating mechanism for feature selection. We improve the Laplacian score, by replacing it with a gated variant computed on a subset of features. This subset is obtained using a continuous approximation of Bernoulli variables whose parameters are trained to gate the full feature space. We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set. Experimental demonstration of the efficacy of the proposed approach and its advantage over current baselines is provided using several real-world examples. △ Less

Submitted 9 November, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

arXiv:2004.07234 [pdf, other]

doi 10.1073/pnas.2014627117

LOCA: LOcal Conformal Autoencoder for standardized data coordinates

Authors: Erez Peterfreund, Ofir Lindenbaum, Felix Dietrich, Tom Bertalan, Matan Gavish, Ioannis G. Kevrekidis, Ronald R. Coifman

Abstract: We propose a deep-learning based method for obtaining standardized data coordinates from scientific measurements.Data observations are modeled as samples from an unknown, non-linear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized latent variables. By leveraging a repeated measurement sampling strategy, we present a method for learning an embedding in… ▽ More We propose a deep-learning based method for obtaining standardized data coordinates from scientific measurements.Data observations are modeled as samples from an unknown, non-linear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized latent variables. By leveraging a repeated measurement sampling strategy, we present a method for learning an embedding in $\mathbb{R}^d$ that is isometric to the latent variables of the manifold. These data coordinates, being invariant under smooth changes of variables, enable matching between different instrumental observations of the same phenomenon. Our embedding is obtained using a LOcal Conformal Autoencoder (LOCA), an algorithm that constructs an embedding to rectify deformations by using a local z-scoring procedure while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA on various model settings and observe that it exhibits promising interpolation and extrapolation capabilities. Finally, we apply LOCA to single-site Wi-Fi localization data, and to $3$-dimensional curved surface estimation based on a $2$-dimensional projection. △ Less

Submitted 14 January, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2003.07331 [pdf, other]

Randomly Aggregated Least Squares for Support Recovery

Authors: Ofir Lindenbaum, Stefan Steinerberger

Abstract: We study the problem of exact support recovery: given an (unknown) vector $θ\in \left\{-1,0,1\right\}^D$, we are given access to the noisy measurement $$ y = Xθ+ ω,$$ where $X \in \mathbb{R}^{N \times D}$ is a (known) Gaussian matrix and the noise $ω\in \mathbb{R}^N$ is an (unknown) Gaussian vector. How small we can choose $N$ and still reliably recover the support of $θ$? We present RAWLS (Random… ▽ More We study the problem of exact support recovery: given an (unknown) vector $θ\in \left\{-1,0,1\right\}^D$, we are given access to the noisy measurement $$ y = Xθ+ ω,$$ where $X \in \mathbb{R}^{N \times D}$ is a (known) Gaussian matrix and the noise $ω\in \mathbb{R}^N$ is an (unknown) Gaussian vector. How small we can choose $N$ and still reliably recover the support of $θ$? We present RAWLS (Randomly Aggregated UnWeighted Least Squares Support Recovery): the main idea is to take random subsets of the $N$ equations, perform a least squares recovery over this reduced bit of information and then average over many random subsets. We show that the proposed procedure can provably recover an approximation of $θ$ and demonstrate its use in support recovery through numerical examples. △ Less

Submitted 9 November, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

arXiv:2002.12317 [pdf, other]

The Spectral Underpinning of word2vec

Authors: Ariel Jaffe, Yuval Kluger, Ofir Lindenbaum, Jonathan Patsenker, Erez Peterfreund, Stefan Steinerberger

Abstract: word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an under… ▽ More word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism. △ Less

Submitted 9 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

arXiv:1905.12724 [pdf, other]

Variational Diffusion Autoencoders with Random Walk Sampling

Authors: Henry Li, Ofir Lindenbaum, Xiuyuan Cheng, Alexander Cloninger

Abstract: Variational autoencoders (VAEs) and generative adversarial networks (GANs) enjoy an intuitive connection to manifold learning: in training the decoder/generator is optimized to approximate a homeomorphism between the data distribution and the sampling space. This is a construction that strives to define the data manifold. A major obstacle to VAEs and GANs, however, is choosing a suitable prior tha… ▽ More Variational autoencoders (VAEs) and generative adversarial networks (GANs) enjoy an intuitive connection to manifold learning: in training the decoder/generator is optimized to approximate a homeomorphism between the data distribution and the sampling space. This is a construction that strives to define the data manifold. A major obstacle to VAEs and GANs, however, is choosing a suitable prior that matches the data topology. Well-known consequences of poorly picked priors are posterior and mode collapse. To our knowledge, no existing method sidesteps this user choice. Conversely, $\textit{diffusion maps}$ automatically infer the data topology and enjoy a rigorous connection to manifold learning, but do not scale easily or provide the inverse homeomorphism (i.e. decoder/generator). We propose a method that combines these approaches into a generative model that inherits the asymptotic guarantees of $\textit{diffusion maps}$ while preserving the scalability of deep models. We prove approximation theoretic results for the dimension dependence of our proposed method. Finally, we demonstrate the effectiveness of our method with various real and synthetic datasets. △ Less

Submitted 27 August, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: 24 pages, 9 figures, 1 table; accepted to ECCV 2020

arXiv:1810.04247 [pdf, other]

Feature Selection using Stochastic Gates

Authors: Yutaro Yamada, Ofir Lindenbaum, Sahand Negahban, Yuval Kluger

Abstract: Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that… ▽ More Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an information-theoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications. △ Less

Submitted 26 July, 2020; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: Published in ICML 2020

Journal ref: Proceedings of Machine Learning and Systems 2020, pages 8952--8963

arXiv:1802.04927 [pdf, other]

Geometry-Based Data Generation

Authors: Ofir Lindenbaum, Jay S. Stanley III, Guy Wolf, Smita Krishnaswamy

Abstract: Many generative models attempt to replicate the density of their input data. However, this approach is often undesirable, since data density is highly affected by sampling biases, noise, and artifacts. We propose a method called SUGAR (Synthesis Using Geometrically Aligned Random-walks) that uses a diffusion process to learn a manifold geometry from the data. Then, it generates new points evenly a… ▽ More Many generative models attempt to replicate the density of their input data. However, this approach is often undesirable, since data density is highly affected by sampling biases, noise, and artifacts. We propose a method called SUGAR (Synthesis Using Geometrically Aligned Random-walks) that uses a diffusion process to learn a manifold geometry from the data. Then, it generates new points evenly along the manifold by pulling randomly generated points into its intrinsic structure using a diffusion kernel. SUGAR equalizes the density along the manifold by selectively generating points in sparse areas of the manifold. We demonstrate how the approach corrects sampling biases and artifacts, while also revealing intrinsic patterns (e.g. progression) and relations in the data. The method is applicable for correcting missing data, finding hypothetical data points, and learning relationships between data features. △ Less

Submitted 6 September, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

arXiv:1707.01093 [pdf, other]

Kernel Scaling for Manifold Learning and Classification

Authors: Ofir Lindenbaum, Moshe Salhov, Arie Yeredor, Amir Averbuch

Abstract: Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel's scale parameter, also referred to as the kernel's bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification… ▽ More Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel's scale parameter, also referred to as the kernel's bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification and manifold learning. For manifold learning, we seek a scale which is best at capturing the manifold's intrinsic dimension. For classification, we propose three methods for estimating the scale, which optimize the classification results in different senses. The proposed frameworks are simulated on artificial and on real datasets. The results show a high correlation between optimal classification rates and the estimated scales. Finally, we demonstrate the approach on a seismic event classification task. △ Less

Submitted 4 June, 2019; v1 submitted 4 July, 2017; originally announced July 2017.

arXiv:1706.01750 [pdf, other]

doi 10.1109/TGRS.2018.2797537

Multi-View Kernels for Low-Dimensional Modeling of Seismic Events

Authors: Ofir Lindenbaum, Yuri Bregman, Neta Rabin, Amir Averbuch

Abstract: The problem of learning from seismic recordings has been studied for years. There is a growing interest in developing automatic mechanisms for identifying the properties of a seismic event. One main motivation is the ability have a reliable identification of man-made explosions. The availability of multiple high-dimensional observations has increased the use of machine learning techniques in a var… ▽ More The problem of learning from seismic recordings has been studied for years. There is a growing interest in developing automatic mechanisms for identifying the properties of a seismic event. One main motivation is the ability have a reliable identification of man-made explosions. The availability of multiple high-dimensional observations has increased the use of machine learning techniques in a variety of fields. In this work, we propose to use a kernel-fusion based dimensionality reduction framework for generating meaningful seismic representations from raw data. The proposed method is tested on 2023 events that were recorded in Israel and in Jordan. The method achieves promising results in classification of event type as well as in estimating the location of the event. The proposed fusion and dimensionality reduction tools may be applied to other types of geophysical data. △ Less

Submitted 6 June, 2017; originally announced June 2017.

arXiv:1606.08819 [pdf, other]

Multi-View Kernel Consensus For Data Analysis

Authors: Moshe Salhov, Ofir Lindenbaum, Yariv Aizenbud, Avi Silberschatz, Yoel Shkolnisky, Amir Averbuch

Abstract: The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phe… ▽ More The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides. △ Less

Submitted 29 January, 2019; v1 submitted 28 June, 2016; originally announced June 2016.

arXiv:1508.05550 [pdf, other]

MultiView Diffusion Maps

Authors: Ofir Lindenbaum, Arie Yeredor, Moshe Salhov, Amir Averbuch

Abstract: In this paper, we address the challenging task of achieving multi-view dimensionality reduction. The goal is to effectively use the availability of multiple views for extracting a coherent low-dimensional representation of the data. The proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. The multi-view dimensionality reduction is achieve… ▽ More In this paper, we address the challenging task of achieving multi-view dimensionality reduction. The goal is to effectively use the availability of multiple views for extracting a coherent low-dimensional representation of the data. The proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. The multi-view dimensionality reduction is achieved by defining a cross-view model in which an implied random walk process is restrained to hop between objects in the different views. The method is robust to scaling and insensitive to small structural changes in the data. We define new diffusion distances and analyze the spectra of the proposed kernel. We show that the proposed framework is useful for various machine learning applications such as clustering, classification, and manifold learning. Finally, by fusing multi-sensor seismic data we present a method for automatic identification of seismic events. △ Less

Submitted 4 June, 2019; v1 submitted 22 August, 2015; originally announced August 2015.

Showing 1–38 of 38 results for author: Lindenbaum, O