Search | arXiv e-print repository

Osmosis: RGBD Diffusion Prior for Underwater Image Restoration

Authors: Opher Bar Nathan, Deborah Levy, Tali Treibitz, Dan Rosenbaum

Abstract: Underwater image restoration is a challenging task because of water effects that increase dramatically with distance. This is worsened by lack of ground truth data of clean scenes without water. Diffusion priors have emerged as strong image restoration priors. However, they are often trained with a dataset of the desired restored output, which is not available in our case. We also observe that usi… ▽ More Underwater image restoration is a challenging task because of water effects that increase dramatically with distance. This is worsened by lack of ground truth data of clean scenes without water. Diffusion priors have emerged as strong image restoration priors. However, they are often trained with a dataset of the desired restored output, which is not available in our case. We also observe that using only color data is insufficient, and therefore augment the prior with a depth channel. We train an unconditional diffusion model prior on the joint space of color and depth, using standard RGBD datasets of natural outdoor scenes in air. Using this prior together with a novel guidance method based on the underwater image formation model, we generate posterior samples of clean images, removing the water effects. Even though our prior did not see any underwater images during training, our method outperforms state-of-the-art baselines for image restoration on very challenging scenes. Our code, models and data are available on the project website. △ Less

Submitted 1 August, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: ECCV 2024. Project page with results and code: https://osmosis-diffusion.github.io/

arXiv:2402.04821 [pdf, other]

E(3)-Equivariant Mesh Neural Networks

Authors: Thuan Trang, Nhat Khang Ngo, Daniel Levy, Thieu N. Vo, Siamak Ravanbakhsh, Truong Son Hy

Abstract: Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimall… ▽ More Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information, and further improve it to account for long-range interactions through hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive pre-processing. Our implementation is available at https://github.com/HySonLab/EquiMesh △ Less

Submitted 18 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2309.03139 [pdf, other]

Using Multiple Vector Channels Improves E(n)-Equivariant Graph Neural Networks

Authors: Daniel Levy, Sékou-Oumar Kaba, Carmelo Gonzales, Santiago Miret, Siamak Ravanbakhsh

Abstract: We present a natural extension to E(n)-equivariant graph neural networks that uses multiple equivariant vectors per node. We formulate the extension and show that it improves performance across different physical systems benchmark tasks, with minimal differences in runtime or number of parameters. The proposed multichannel EGNN outperforms the standard singlechannel EGNN on N-body charged particle… ▽ More We present a natural extension to E(n)-equivariant graph neural networks that uses multiple equivariant vectors per node. We formulate the extension and show that it improves performance across different physical systems benchmark tasks, with minimal differences in runtime or number of parameters. The proposed multichannel EGNN outperforms the standard singlechannel EGNN on N-body charged particle dynamics, molecular property predictions, and predicting the trajectories of solar system bodies. Given the additional benefits and minimal additional cost of multi-channel EGNN, we suggest that this extension may be of practical use to researchers working in machine learning for the physical sciences △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2304.07743 [pdf, other]

SeaThru-NeRF: Neural Radiance Fields in Scattering Media

Authors: Deborah Levy, Amit Peleg, Naama Pearl, Dan Rosenbaum, Derya Akkaynak, Simon Korman, Tali Treibitz

Abstract: Research on neural radiance fields (NeRFs) for novel view generation is exploding with new models and extensions. However, a question that remains unanswered is what happens in underwater or foggy scenes where the medium strongly influences the appearance of objects. Thus far, NeRF and its variants have ignored these cases. However, since the NeRF framework is based on volumetric rendering, it has… ▽ More Research on neural radiance fields (NeRFs) for novel view generation is exploding with new models and extensions. However, a question that remains unanswered is what happens in underwater or foggy scenes where the medium strongly influences the appearance of objects. Thus far, NeRF and its variants have ignored these cases. However, since the NeRF framework is based on volumetric rendering, it has inherent capability to account for the medium's effects, once modeled appropriately. We develop a new rendering model for NeRFs in scattering media, which is based on the SeaThru image formation model, and suggest a suitable architecture for learning both scene information and medium parameters. We demonstrate the strength of our method using simulated and real-world scenes, correctly rendering novel photorealistic views underwater. Even more excitingly, we can render clear views of these scenes, removing the medium between the camera and the scene and reconstructing the appearance and depth of far objects, which are severely occluded by the medium. Our code and unique datasets are available on the project's website. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2212.02762 [pdf, other]

doi 10.1093/jamia/ocad081

Automated Identification of Eviction Status from Electronic Health Record Notes

Authors: Zonghai Yao, Jack Tsai, Weisong Liu, David A. Levy, Emily Druhl, Joel I Reisman, Hong Yu

Abstract: Objective: Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes.… ▽ More Objective: Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes. Materials and Methods: We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and BioClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the BioClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans' housing insecurity. △ Less

Submitted 20 May, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: This article has been accepted for publication in Journal of the American Medical Informatics Association Published by Oxford University Press. https://doi.org/10.1093/jamia/ocad081

Journal ref: Journal of the American Medical Informatics Association, ocad081, 2023

arXiv:2210.05875 [pdf, other]

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Authors: Sunjae Kwon, Zonghai Yao, Harmon S. Jordan, David A. Levy, Brian Corner, Hong Yu

Abstract: This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ($MedJ$). Then, we introduce a novel medical jargon extraction (… ▽ More This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ($MedJ$). Then, we introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted to EMNLP 22

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2109.04020 [pdf, other]

Distributionally Robust Multilingual Machine Translation

Authors: Chunting Zhou, Daniel Levy, Xian Li, Marjan Ghazvininejad, Graham Neubig

Abstract: Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models. However, the heavy data imbalance between languages hinders the model from performing uniformly across language pairs. In this paper, we propose a new learning objective for MNMT based on distributional… ▽ More Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models. However, the heavy data imbalance between languages hinders the model from performing uniformly across language pairs. In this paper, we propose a new learning objective for MNMT based on distributionally robust optimization, which minimizes the worst-case expected loss over the set of language pairs. We further show how to practically optimize this objective for large translation corpora using an iterated best response scheme, which is both effective and incurs negligible additional computational cost compared to standard empirical risk minimization. We perform extensive experiments on three sets of languages from two datasets and show that our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: Long paper accepted by EMNLP2021 main conference

arXiv:2108.02391 [pdf, other]

Adapting to Function Difficulty and Growth Conditions in Private Optimization

Authors: Hilal Asi, Daniel Levy, John Duchi

Abstract: We develop algorithms for private stochastic convex optimization that adapt to the hardness of the specific function we wish to optimize. While previous work provide worst-case bounds for arbitrary convex functions, it is often the case that the function at hand belongs to a smaller class that enjoys faster rates. Concretely, we show that for functions exhibiting $κ$-growth around the optimum, i.e… ▽ More We develop algorithms for private stochastic convex optimization that adapt to the hardness of the specific function we wish to optimize. While previous work provide worst-case bounds for arbitrary convex functions, it is often the case that the function at hand belongs to a smaller class that enjoys faster rates. Concretely, we show that for functions exhibiting $κ$-growth around the optimum, i.e., $f(x) \ge f(x^*) + λκ^{-1} \|x-x^*\|_2^κ$ for $κ> 1$, our algorithms improve upon the standard ${\sqrt{d}}/{n\varepsilon}$ privacy rate to the faster $({\sqrt{d}}/{n\varepsilon})^{\tfracκ{κ- 1}}$. Crucially, they achieve these rates without knowledge of the growth constant $κ$ of the function. Our algorithms build upon the inverse sensitivity mechanism, which adapts to instance difficulty (Asi & Duchi, 2020), and recent localization techniques in private optimization (Feldman et al., 2020). We complement our algorithms with matching lower bounds for these function classes and demonstrate that our adaptive algorithm is \emph{simultaneously} (minimax) optimal over all $κ\ge 1+c$ whenever $c = Θ(1)$. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 28 pages

arXiv:2102.11845 [pdf, other]

Learning with User-Level Privacy

Authors: Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

Abstract: We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical… ▽ More We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the minimax optimality of our algorithms for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $τ$ of the distribution rather than the entire range. △ Less

Submitted 3 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: NeurIPS 2021. 43 pages, 0 figure

arXiv:2010.05893 [pdf, other]

Large-Scale Methods for Distributionally Robust Optimization

Authors: Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

Abstract: We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $χ^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $χ^2$ uncertainty sets these are the first such… ▽ More We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $χ^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $χ^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $χ^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods. △ Less

Submitted 10 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 63 pages, NeurIPS 2020

arXiv:1909.10455 [pdf, other]

Necessary and Sufficient Geometries for Gradient Methods

Authors: Daniel Levy, John C. Duchi

Abstract: We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient metho… ▽ More We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex---for example, any $\ell_p$-ball for $p < 2$---the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods. △ Less

Submitted 28 October, 2019; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: 23 pages. To appear at NeurIPS 2019

arXiv:1811.09953 [pdf, other]

Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference

Authors: Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, Li Fei-Fei

Abstract: Homomorphic encryption enables arbitrary computation over data while it remains encrypted. This privacy-preserving feature is attractive for machine learning, but requires significant computational time due to the large overhead of the encryption scheme. We present Faster CryptoNets, a method for efficient encrypted inference using neural networks. We develop a pruning and quantization approach th… ▽ More Homomorphic encryption enables arbitrary computation over data while it remains encrypted. This privacy-preserving feature is attractive for machine learning, but requires significant computational time due to the large overhead of the encryption scheme. We present Faster CryptoNets, a method for efficient encrypted inference using neural networks. We develop a pruning and quantization approach that leverages sparse representations in the underlying cryptosystem to accelerate inference. We derive an optimal approximation for popular activation functions that achieves maximally-sparse encodings and minimizes approximation error. We also show how privacy-safe training techniques can be used to reduce the overhead of encrypted inference for real-world datasets by leveraging transfer learning and differential privacy. Our experiments show that our method maintains competitive accuracy and achieves a significant speedup over previous methods. This work increases the viability of deep learning systems that use homomorphic encryption to protect user privacy. △ Less

Submitted 25 November, 2018; originally announced November 2018.

arXiv:1811.01343 [pdf, other]

Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset

Authors: Dana Berman, Deborah Levy, Shai Avidan, Tali Treibitz

Abstract: Underwater images suffer from color distortion and low contrast, because light is attenuated while it propagates through water. Attenuation under water varies with wavelength, unlike terrestrial images where attenuation is assumed to be spectrally uniform. The attenuation depends both on the water body and the 3D structure of the scene, making color restoration difficult. Unlike existing single… ▽ More Underwater images suffer from color distortion and low contrast, because light is attenuated while it propagates through water. Attenuation under water varies with wavelength, unlike terrestrial images where attenuation is assumed to be spectrally uniform. The attenuation depends both on the water body and the 3D structure of the scene, making color restoration difficult. Unlike existing single underwater image enhancement techniques, our method takes into account multiple spectral profiles of different water types. By estimating just two additional global parameters: the attenuation ratios of the blue-red and blue-green color channels, the problem is reduced to single image dehazing, where all color channels have the same attenuation coefficients. Since the water type is unknown, we evaluate different parameters out of an existing library of water types. Each type leads to a different restored image and the best result is automatically chosen based on color distribution. We collected a dataset of images taken in different locations with varying water properties, showing color charts in the scenes. Moreover, to obtain ground truth, the 3D structure of the scene was calculated based on stereo imaging. This dataset enables a quantitative evaluation of restoration algorithms on natural images and shows the advantage of our method. △ Less

Submitted 24 March, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

arXiv:1711.09268 [pdf, other]

Generalizing Hamiltonian Monte Carlo with Neural Networks

Authors: Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein

Abstract: We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distribut… ▽ More We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distributions, for instance achieving a 106x improvement in effective sample size in one case, and mixing when standard HMC makes no measurable progress in a second. Finally, we show quantitative and qualitative gains on a real-world task: latent-variable generative modeling. We release an open source TensorFlow implementation of the algorithm. △ Less

Submitted 2 March, 2018; v1 submitted 25 November, 2017; originally announced November 2017.

Comments: ICLR 2018

arXiv:1711.08068 [pdf, other]

Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

Authors: Daniel Levy, Stefano Ermon

Abstract: Policy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Model-based methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task. R… ▽ More Policy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Model-based methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task. Recently, hybrid methods have been successful in trading off applicability for improved sample-complexity. However, these have been limited to continuous action spaces. In this work, we present a new hybrid method based on an approximation of the dynamics as an expectation over the next state under the current policy. This relaxation allows us to derive a novel hybrid policy gradient estimator, combining score function and pathwise derivative estimators, that is applicable to discrete action spaces. We show significant gains in sample complexity, ranging between $1.7$ and $25\times$, when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and Hand Mass. Our method is applicable to both discrete and continuous action spaces, when competing pathwise methods are limited to the latter. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Comments: In AAAI 2018 proceedings

arXiv:1707.03372 [pdf, other]

Fast Amortized Inference and Learning in Log-linear Models with Randomly Perturbed Nearest Neighbor Search

Authors: Stephen Mussmann, Daniel Levy, Stefano Ermon

Abstract: Inference in log-linear models scales linearly in the size of output space in the worst-case. This is often a bottleneck in natural language processing and computer vision tasks when the output space is feasibly enumerable but very large. We propose a method to perform inference in log-linear models with sublinear amortized cost. Our idea hinges on using Gumbel random variable perturbations and a… ▽ More Inference in log-linear models scales linearly in the size of output space in the worst-case. This is often a bottleneck in natural language processing and computer vision tasks when the output space is feasibly enumerable but very large. We propose a method to perform inference in log-linear models with sublinear amortized cost. Our idea hinges on using Gumbel random variable perturbations and a pre-computed Maximum Inner Product Search data structure to access the most-likely elements in sublinear amortized time. Our method yields provable runtime and accuracy guarantees. Further, we present empirical experiments on ImageNet and Word Embeddings showing significant speedups for sampling, inference, and learning in log-linear models. △ Less

Submitted 11 July, 2017; originally announced July 2017.

Comments: In UAI proceedings

arXiv:1703.02573 [pdf, other]

Data Noising as Smoothing in Neural Network Language Models

Authors: Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng

Abstract: Data noising is an effective technique for regularizing neural network models. While noising is widely adopted in application domains such as vision and speech, commonly used noising primitives have not been developed for discrete sequence-level settings such as language modeling. In this paper, we derive a connection between input noising in neural network language models and smoothing in $n$-gra… ▽ More Data noising is an effective technique for regularizing neural network models. While noising is widely adopted in application domains such as vision and speech, commonly used noising primitives have not been developed for discrete sequence-level settings such as language modeling. In this paper, we derive a connection between input noising in neural network language models and smoothing in $n$-gram models. Using this connection, we draw upon ideas from smoothing to develop effective noising schemes. We demonstrate performance gains when applying the proposed schemes to language modeling and machine translation. Finally, we provide empirical analysis validating the relationship between noising and smoothing. △ Less

Submitted 7 March, 2017; originally announced March 2017.

Comments: ICLR 2017

arXiv:1612.00542 [pdf, other]

Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks

Authors: Daniel Lévy, Arzav Jain

Abstract: Mammography is the most widely used method to screen breast cancer. Because of its mostly manual nature, variability in mass appearance, and low signal-to-noise ratio, a significant number of breast masses are missed or misdiagnosed. In this work, we present how Convolutional Neural Networks can be used to directly classify pre-segmented breast masses in mammograms as benign or malignant, using a… ▽ More Mammography is the most widely used method to screen breast cancer. Because of its mostly manual nature, variability in mass appearance, and low signal-to-noise ratio, a significant number of breast masses are missed or misdiagnosed. In this work, we present how Convolutional Neural Networks can be used to directly classify pre-segmented breast masses in mammograms as benign or malignant, using a combination of transfer learning, careful pre-processing and data augmentation to overcome limited training data. We achieve state-of-the-art results on the DDSM dataset, surpassing human performance, and show interpretability of our model. △ Less

Submitted 1 December, 2016; originally announced December 2016.

Comments: NIPS 2016 ML4HC Workshop

arXiv:1611.03056 [pdf, other]

doi 10.1007/978-3-319-24858-5_8

Intrusion Detection System for Applications using Linux Containers

Authors: Amr S. Abed, Charles Clancy, David S. Levy

Abstract: Linux containers are gaining increasing traction in both individual and industrial use, and as these containers get integrated into mission-critical systems, real-time detection of malicious cyber attacks becomes a critical operational requirement. This paper introduces a real-time host-based intrusion detection system that can be used to passively detect malfeasance against applications within Li… ▽ More Linux containers are gaining increasing traction in both individual and industrial use, and as these containers get integrated into mission-critical systems, real-time detection of malicious cyber attacks becomes a critical operational requirement. This paper introduces a real-time host-based intrusion detection system that can be used to passively detect malfeasance against applications within Linux containers running in a standalone or in a cloud multi-tenancy environment. The demonstrated intrusion detection system uses bags of system calls monitored from the host kernel for learning the behavior of an application running within a Linux container and determining anomalous container behavior. Performance of the approach using a database application was measured and results are discussed. △ Less

Submitted 9 November, 2016; originally announced November 2016.

Comments: The final publication is available at http://link.springer.com/chapter/10.1007%2F978-3-319-24858-5_8. arXiv admin note: substantial text overlap with arXiv:1611.03053

Journal ref: STM 2015. LNCS, vol. 9331, pp. 123-135. Springer, Heidelberg (2015)

arXiv:1611.03053 [pdf, other]

doi 10.1109/GLOCOMW.2015.7414047

Applying Bag of System Calls for Anomalous Behavior Detection of Applications in Linux Containers

Authors: Amr S. Abed, T. Charles Clancy, David S. Levy

Abstract: In this paper, we present the results of using bags of system calls for learning the behavior of Linux containers for use in anomaly-detection based intrusion detection system. By using system calls of the containers monitored from the host kernel for anomaly detection, the system does not require any prior knowledge of the container nature, neither does it require altering the container or the ho… ▽ More In this paper, we present the results of using bags of system calls for learning the behavior of Linux containers for use in anomaly-detection based intrusion detection system. By using system calls of the containers monitored from the host kernel for anomaly detection, the system does not require any prior knowledge of the container nature, neither does it require altering the container or the host kernel. △ Less

Submitted 9 November, 2016; originally announced November 2016.

Comments: Published version available on IEEE Xplore (http://ieeexplore.ieee.org/document/7414047/) arXiv admin note: substantial text overlap with arXiv:1611.03056

Journal ref: 2015 IEEE Globecom Workshops (GC Wkshps), San Diego, CA, 2015, pp. 1-5

arXiv:1506.03072 [pdf, other]

Clustering by transitive propagation

Authors: Vijay Kumar, Dan Levy

Abstract: We present a global optimization algorithm for clustering data given the ratio of likelihoods that each pair of data points is in the same cluster or in different clusters. To define a clustering solution in terms of pairwise relationships, a necessary and sufficient condition is that belonging to the same cluster satisfies transitivity. We define a global objective function based on pairwise like… ▽ More We present a global optimization algorithm for clustering data given the ratio of likelihoods that each pair of data points is in the same cluster or in different clusters. To define a clustering solution in terms of pairwise relationships, a necessary and sufficient condition is that belonging to the same cluster satisfies transitivity. We define a global objective function based on pairwise likelihood ratios and a transitivity constraint over all triples, assigning an equal prior probability to all clustering solutions. We maximize the objective function by implementing max-sum message passing on the corresponding factor graph to arrive at an O(N^3) algorithm. Lastly, we demonstrate an application inspired by mutational sequencing for decoding random binary words transmitted through a noisy channel. △ Less

Submitted 9 June, 2015; originally announced June 2015.

Comments: 13 pages + 2 appendices, figures

arXiv:1307.2380 [pdf]

A Taxonomy of Performance Prediction Systems in the Parallel and Distributed Computing Grids

Authors: Sena Seneviratne, David C. Levy, Rajkumar Buyya

Abstract: As Grids are loosely-coupled congregations of geographically distributed heterogeneous resources, the efficient utilization of the resources requires the support of a sound Performance Prediction System (PPS). The performance prediction of grid resources is helpful for both Resource Management Systems and grid users to make optimized resource usage decisions. There have been many PPS projects that… ▽ More As Grids are loosely-coupled congregations of geographically distributed heterogeneous resources, the efficient utilization of the resources requires the support of a sound Performance Prediction System (PPS). The performance prediction of grid resources is helpful for both Resource Management Systems and grid users to make optimized resource usage decisions. There have been many PPS projects that span over several grid resources in several dimensions. In this paper the taxonomy for describing the PPS architecture is discussed. The taxonomy is used to categorize and identify approaches which are followed in the implementation of the existing PPSs for Grids. The taxonomy and the survey results are used to identify approaches and issues that have not been fully explored in research. △ Less

Submitted 12 July, 2013; v1 submitted 9 July, 2013; originally announced July 2013.

Comments: 35 pages,4 figures,2 tables

arXiv:cs/0607135 [pdf, ps, other]

A polynomial-time approximation algorithm for the number of k-matchings in bipartite graphs

Authors: Shmuel Friedland, Daniel Levy

Abstract: We show that the number of $k$-matching in a given undirected graph $G$ is equal to the number of perfect matching of the corresponding graph $G_k$ on an even number of vertices divided by a suitable factor. If $G$ is bipartite then one can construct a bipartite $G_k$. For bipartite graphs this result implies that the number of $k$-matching has a polynomial-time approximation algorithm. The… ▽ More We show that the number of $k$-matching in a given undirected graph $G$ is equal to the number of perfect matching of the corresponding graph $G_k$ on an even number of vertices divided by a suitable factor. If $G$ is bipartite then one can construct a bipartite $G_k$. For bipartite graphs this result implies that the number of $k$-matching has a polynomial-time approximation algorithm. The above results are extended to permanents and hafnians of corresponding matrices. △ Less

Submitted 28 July, 2006; originally announced July 2006.

Comments: 6 pages

arXiv:cs/0602041 [pdf, ps, other]

Why neighbor-joining works

Authors: Radu Mihaescu, Dan Levy, Lior Pachter

Abstract: We show that the neighbor-joining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson's optimal radius bound as a special case and explains many cases where neighbor-joining is successful even when Atteson's criterion is not satisfied. We also provide a proof for Atteson's conjecture on the optimal edge radiu… ▽ More We show that the neighbor-joining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson's optimal radius bound as a special case and explains many cases where neighbor-joining is successful even when Atteson's criterion is not satisfied. We also provide a proof for Atteson's conjecture on the optimal edge radius of the neighbor-joining algorithm. The strong performance guarantees we provide also hold for the quadratic time fast neighbor-joining algorithm, thus providing a theoretical basis for inferring very large phylogenies with neighbor-joining. △ Less

Submitted 17 June, 2007; v1 submitted 10 February, 2006; originally announced February 2006.

Comments: Revision 2

ACM Class: F.2.0

Showing 1–26 of 26 results for author: Levy, D