Search | arXiv e-print repository

Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning

Authors: Armand Nicolicioiu, Eugenia Iofinova, Eldar Kurtic, Mahdi Nikdan, Andrei Panferov, Ilia Markov, Nir Shavit, Dan Alistarh

Abstract: The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key desiderata for such assistants are personalization-in the sense that the assistant should reflect the user's own style-and privacy-in the sense that users may prefer to always store their personal data locall… ▽ More The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key desiderata for such assistants are personalization-in the sense that the assistant should reflect the user's own style-and privacy-in the sense that users may prefer to always store their personal data locally, on their own computing device. We present a new design for such an automated assistant, for the specific use case of personal assistant for email generation, which we call Panza. Specifically, Panza can be both trained and inferenced locally on commodity hardware, and is personalized to the user's writing style. Panza's personalization features are based on a new technique called data playback, which allows us to fine-tune an LLM to better reflect a user's writing style using limited data. We show that, by combining efficient fine-tuning and inference methods, Panza can be executed entirely locally using limited resources-specifically, it can be executed within the same resources as a free Google Colab instance. Finally, our key methodological contribution is a careful study of evaluation metrics, and of how different choices of system components (e.g. the use of Retrieval-Augmented Generation or different fine-tuning approaches) impact the system's performance. △ Less

Submitted 24 June, 2024; originally announced July 2024.

Comments: Panza is available at https://github.com/IST-DASLab/PanzaMail

arXiv:2405.15756 [pdf, other]

Sparse Expansion and Neuronal Disentanglement

Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

Abstract: We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other on… ▽ More We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same inference FLOP budget per token, and that this gap grows as sparsity increases, leading to inference speedups. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively $\textit{disentangling}$ the input-output relationship of every individual neuron across clusters of inputs. Specifically, sparse experts approximate the dense neuron output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron's output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has a fraction of highly entangled Wasserstein neurons, and model performance suffers more when these are sparsified as opposed to others. The code for Sparse Expansion is available at: https://github.com/Shavit-Lab/Sparse-Expansion . △ Less

Submitted 24 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures

arXiv:2312.08793 [pdf, other]

Forbidden Facts: An Investigation of Competing Objectives in Llama-2

Authors: Tony T. Wang, Miles Wang, Kaivalya Hariharan, Nir Shavit

Abstract: LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task. Specifically, we instruct Llama-2 to truthfully complete a factual recall statement while forbidding it from saying the correct answer. This often makes the model give incorrect answers. We decompose Llama-2 into 1… ▽ More LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task. Specifically, we instruct Llama-2 to truthfully complete a factual recall statement while forbidding it from saying the correct answer. This often makes the model give incorrect answers. We decompose Llama-2 into 1000+ components, and rank each one with respect to how useful it is for forbidding the correct answer. We find that in aggregate, around 35 components are enough to reliably implement the full suppression behavior. However, these components are fairly heterogeneous and many operate using faulty heuristics. We discover that one of these heuristics can be exploited via a manually designed adversarial attack which we call The California Attack. Our results highlight some roadblocks standing in the way of being able to successfully interpret advanced ML systems. Project website available at https://forbiddenfacts.github.io . △ Less

Submitted 31 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted to the ATTRIB and SoLaR workshops at NeurIPS 2023; (v3: clarified experimental details)

arXiv:2303.00882 [pdf, other]

X-Ray2EM: Uncertainty-Aware Cross-Modality Image Reconstruction from X-Ray to Electron Microscopy in Connectomics

Authors: Yicong Li, Yaron Meirovitch, Aaron T. Kuan, Jasper S. Phelps, Alexandra Pacureanu, Wei-Chung Allen Lee, Nir Shavit, Lu Mi

Abstract: Comprehensive, synapse-resolution imaging of the brain will be crucial for understanding neuronal computations and function. In connectomics, this has been the sole purview of volume electron microscopy (EM), which entails an excruciatingly difficult process because it requires cutting tissue into many thin, fragile slices that then need to be imaged, aligned, and reconstructed. Unlike EM, hard X-… ▽ More Comprehensive, synapse-resolution imaging of the brain will be crucial for understanding neuronal computations and function. In connectomics, this has been the sole purview of volume electron microscopy (EM), which entails an excruciatingly difficult process because it requires cutting tissue into many thin, fragile slices that then need to be imaged, aligned, and reconstructed. Unlike EM, hard X-ray imaging is compatible with thick tissues, eliminating the need for thin sectioning, and delivering fast acquisition, intrinsic alignment, and isotropic resolution. Unfortunately, current state-of-the-art X-ray microscopy provides much lower resolution, to the extent that segmenting membranes is very challenging. We propose an uncertainty-aware 3D reconstruction model that translates X-ray images to EM-like images with enhanced membrane segmentation quality, showing its potential for developing simpler, faster, and more accurate X-ray based connectomics pipelines. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted by ISBI 2023 conference. Supplementary material is available in this arXiv version

arXiv:2302.07348 [pdf, other]

Cliff-Learning

Authors: Tony T. Wang, Igor Zablotchi, Nir Shavit, Jonathan S. Rosenfeld

Abstract: We study the data-scaling of transfer learning from foundation models in the low-downstream-data regime. We observe an intriguing phenomenon which we call cliff-learning. Cliff-learning refers to regions of data-scaling laws where performance improves at a faster than power law rate (i.e. regions of concavity on a log-log scaling plot). We conduct an in-depth investigation of foundation-model clif… ▽ More We study the data-scaling of transfer learning from foundation models in the low-downstream-data regime. We observe an intriguing phenomenon which we call cliff-learning. Cliff-learning refers to regions of data-scaling laws where performance improves at a faster than power law rate (i.e. regions of concavity on a log-log scaling plot). We conduct an in-depth investigation of foundation-model cliff-learning and study toy models of the phenomenon. We observe that the degree of cliff-learning reflects the degree of compatibility between the priors of a learning algorithm and the task being learned. △ Less

Submitted 6 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 16 pages; v2 updates: improved layout, added acknowledgements

arXiv:2302.03819 [pdf, other]

The XPRESS Challenge: Xray Projectomic Reconstruction -- Extracting Segmentation with Skeletons

Authors: Tri Nguyen, Mukul Narwani, Mark Larson, Yicong Li, Shuhan Xie, Hanspeter Pfister, Donglai Wei, Nir Shavit, Lu Mi, Alexandra Pacureanu, Wei-Chung Lee, Aaron T. Kuan

Abstract: The wiring and connectivity of neurons form a structural basis for the function of the nervous system. Advances in volume electron microscopy (EM) and image segmentation have enabled mapping of circuit diagrams (connectomics) within local regions of the mouse brain. However, applying volume EM over the whole brain is not currently feasible due to technological challenges. As a result, comprehensiv… ▽ More The wiring and connectivity of neurons form a structural basis for the function of the nervous system. Advances in volume electron microscopy (EM) and image segmentation have enabled mapping of circuit diagrams (connectomics) within local regions of the mouse brain. However, applying volume EM over the whole brain is not currently feasible due to technological challenges. As a result, comprehensive maps of long-range connections between brain regions are lacking. Recently, we demonstrated that X-ray holographic nanotomography (XNH) can provide high-resolution images of brain tissue at a much larger scale than EM. In particular, XNH is wellsuited to resolve large, myelinated axon tracts (white matter) that make up the bulk of long-range connections (projections) and are critical for inter-region communication. Thus, XNH provides an imaging solution for brain-wide projectomics. However, because XNH data is typically collected at lower resolutions and larger fields-of-view than EM, accurate segmentation of XNH images remains an important challenge that we present here. In this task, we provide volumetric XNH images of cortical white matter axons from the mouse brain along with ground truth annotations for axon trajectories. Manual voxel-wise annotation of ground truth is a time-consuming bottleneck for training segmentation networks. On the other hand, skeleton-based ground truth is much faster to annotate, and sufficient to determine connectivity. Therefore, we encourage participants to develop methods to leverage skeleton-based training. To this end, we provide two types of ground-truth annotations: a small volume of voxel-wise annotations and a larger volume with skeleton-based annotations. Entries will be evaluated on how accurately the submitted segmentations agree with the ground-truth skeleton annotations. △ Less

Submitted 24 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 6 pages, 2 figures

arXiv:2110.06421 [pdf, other]

Revisiting Latent-Space Interpolation via a Quantitative Evaluation Framework

Authors: Lu Mi, Tianxing He, Core Francisco Park, Hao Wang, Yue Wang, Nir Shavit

Abstract: Latent-space interpolation is commonly used to demonstrate the generalization ability of deep latent variable models. Various algorithms have been proposed to calculate the best trajectory between two encodings in the latent space. In this work, we show how data labeled with semantically continuous attributes can be utilized to conduct a quantitative evaluation of latent-space interpolation algori… ▽ More Latent-space interpolation is commonly used to demonstrate the generalization ability of deep latent variable models. Various algorithms have been proposed to calculate the best trajectory between two encodings in the latent space. In this work, we show how data labeled with semantically continuous attributes can be utilized to conduct a quantitative evaluation of latent-space interpolation algorithms, for variational autoencoders. Our framework can be used to complement the standard qualitative comparison, and also enables evaluation for domains (such as graph) in which the visualization is difficult. Interestingly, our experiments reveal that the superiority of interpolation algorithms could be domain-dependent. While normalised interpolation works best for the image domain, spherical linear interpolation achieves the best performance in the graph domain. Next, we propose a simple-yet-effective method to restrict the latent space via a bottleneck structure in the encoder. We find that all interpolation algorithms evaluated in this work can benefit from this restriction. Finally, we conduct interpolation-aware training with the labeled attributes, and show that this explicit supervision can improve the interpolation performance. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 11 pages

arXiv:2106.14880 [pdf, other]

HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps

Authors: Lu Mi, Hang Zhao, Charlie Nash, Xiaohan Jin, Jiyang Gao, Chen Sun, Cordelia Schmid, Nir Shavit, Yuning Chai, Dragomir Anguelov

Abstract: High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. However, there are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack to gener… ▽ More High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. However, there are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack to generalize onto new unseen scenarios. To address this issue, we introduce a new challenging task to generate HD maps. In this work, we explore several autoregressive models using different data representations, including sequence, plain graph, and hierarchical graph. We propose HDMapGen, a hierarchical graph generation model capable of producing high-quality and diverse HD maps through a coarse-to-fine approach. Experiments on the Argoverse dataset and an in-house dataset show that HDMapGen significantly outperforms baseline methods. Additionally, we demonstrate that HDMapGen achieves high scalability and efficiency. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2101.02746 [pdf, other]

doi 10.1007/978-3-030-59722-1_8

Learning Guided Electron Microscopy with Active Acquisition

Authors: Lu Mi, Hao Wang, Yaron Meirovitch, Richard Schalek, Srinivas C. Turaga, Jeff W. Lichtman, Aravinthan D. T. Samuel, Nir Shavit

Abstract: Single-beam scanning electron microscopes (SEM) are widely used to acquire massive data sets for biomedical study, material analysis, and fabrication inspection. Datasets are typically acquired with uniform acquisition: applying the electron beam with the same power and duration to all image pixels, even if there is great variety in the pixels' importance for eventual use. Many SEMs are now able t… ▽ More Single-beam scanning electron microscopes (SEM) are widely used to acquire massive data sets for biomedical study, material analysis, and fabrication inspection. Datasets are typically acquired with uniform acquisition: applying the electron beam with the same power and duration to all image pixels, even if there is great variety in the pixels' importance for eventual use. Many SEMs are now able to move the beam to any pixel in the field of view without delay, enabling them, in principle, to invest their time budget more effectively with non-uniform imaging. In this paper, we show how to use deep learning to accelerate and optimize single-beam SEM acquisition of images. Our algorithm rapidly collects an information-lossy image (e.g. low resolution) and then applies a novel learning method to identify a small subset of pixels to be collected at higher resolution based on a trade-off between the saliency and spatial diversity. We demonstrate the efficacy of this novel technique for active acquisition by speeding up the task of collecting connectomic datasets for neurobiology by up to an order of magnitude. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: MICCAI 2020

arXiv:2006.10621 [pdf, other]

On the Predictability of Pruning Across Scales

Authors: Jonathan S. Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit

Abstract: We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable.… ▽ More We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e.g., ImageNet) and architectures (e.g., ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning. △ Less

Submitted 3 July, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

arXiv:1912.02165 [pdf, other]

L3 Fusion: Fast Transformed Convolutions on CPUs

Authors: Rati Gelashvili, Nir Shavit, Aleksandar Zlateski

Abstract: Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well--designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous… ▽ More Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well--designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous amount of shared L3 cache present on modern CPUs is often neglected, and the algorithms are optimized solely for the private L2 cache. In this paper we propose an efficient `L3 Fusion` algorithm that is specifically designed for CPUs with significant amount of shared L3 cache. Using the hierarchical roofline model, we show that in many cases, especially for layers with fewer channels, the `L3 fused` approach can greatly outperform standard 3 stage one provided by big vendors such as Intel. We validate our theoretical findings, by benchmarking our `L3 fused` implementation against publicly available state of the art. △ Less

Submitted 4 December, 2019; originally announced December 2019.

arXiv:1910.04858 [pdf, other]

Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate

Authors: Lu Mi, Hao Wang, Yonglong Tian, Hao He, Nir Shavit

Abstract: Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able… ▽ More Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during the inference, without the need to re-train, re-design, or fine-tune the models, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods. △ Less

Submitted 10 January, 2022; v1 submitted 27 September, 2019; originally announced October 2019.

Comments: In proceedings of the 36th AAAI Conference on Artificial Intelligence

arXiv:1909.12673 [pdf, other]

A Constructive Prediction of the Generalization Error Across Scales

Authors: Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit

Abstract: The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of… ▽ More The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data. △ Less

Submitted 20 December, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: ICLR 2020

arXiv:1812.01157 [pdf, other]

Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics

Authors: Yaron Meirovitch, Lu Mi, Hayk Saribekyan, Alexander Matveev, David Rolnick, Nir Shavit

Abstract: Pixel-accurate tracking of objects is a key element in many computer vision applications, often solved by iterated individual object tracking or instance segmentation followed by object matching. Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack. The key idea in cross-classification is to efficiently turn… ▽ More Pixel-accurate tracking of objects is a key element in many computer vision applications, often solved by iterated individual object tracking or instance segmentation followed by object matching. Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack. The key idea in cross-classification is to efficiently turn a clustering problem into a classification problem by running a logarithmic number of independent classifications per image, letting the cross-labeling of these classifications uniquely classify each pixel to the object labels. We apply the 3C mechanism to achieve state-of-the-art accuracy in connectomics -- the nanoscale mapping of neural tissue from electron microscopy volumes. Our reconstruction system increases scalability by an order of magnitude over existing single-object tracking methods (such as flood-filling networks). This scalability is important for the deployment of connectomics pipelines, since currently the best performing techniques require computing infrastructures that are beyond the reach of most laboratories. Our algorithm may offer benefits in other domains that require pixel-accurate tracking of multiple objects, such as segmentation of videos and medical imagery. △ Less

Submitted 15 June, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

Comments: 11 figures

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8425-8435

arXiv:1705.10882 [pdf, other]

Morphological Error Detection in 3D Segmentations

Authors: David Rolnick, Yaron Meirovitch, Toufiq Parag, Hanspeter Pfister, Viren Jain, Jeff W. Lichtman, Edward S. Boyden, Nir Shavit

Abstract: Deep learning algorithms for connectomics rely upon localized classification, rather than overall morphology. This leads to a high incidence of erroneously merged objects. Humans, by contrast, can easily detect such errors by acquiring intuition for the correct morphology of objects. Biological neurons have complicated and variable shapes, which are challenging to learn, and merge errors take a mu… ▽ More Deep learning algorithms for connectomics rely upon localized classification, rather than overall morphology. This leads to a high incidence of erroneously merged objects. Humans, by contrast, can easily detect such errors by acquiring intuition for the correct morphology of objects. Biological neurons have complicated and variable shapes, which are challenging to learn, and merge errors take a multitude of different forms. We present an algorithm, MergeNet, that shows 3D ConvNets can, in fact, detect merge errors from high-level neuronal morphology. MergeNet follows unsupervised training and operates across datasets. We demonstrate the performance of MergeNet both on a variety of connectomics data and on a dataset created from merged MNIST images. △ Less

Submitted 30 May, 2017; originally announced May 2017.

Comments: 13 pages, 6 figures

arXiv:1705.10694 [pdf, other]

Deep Learning is Robust to Massive Label Noise

Authors: David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit

Abstract: Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to larger but noisy datasets that are more easily obtained. In this paper, we show that deep neural networks are capable of generalizing from training data for wh… ▽ More Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest to larger but noisy datasets that are more easily obtained. In this paper, we show that deep neural networks are capable of generalizing from training data for which true labels are massively outnumbered by incorrect labels. We demonstrate remarkably high test performance after training on corrupted data from MNIST, CIFAR, and ImageNet. For example, on MNIST we obtain test accuracy above 90 percent even after each clean training example has been diluted with 100 randomly-labeled examples. Such behavior holds across multiple patterns of label noise, even when erroneous labels are biased towards confusing classes. We show that training in this regime requires a significant but manageable increase in dataset size that is related to the factor by which correct labels have been diluted. Finally, we provide an analysis of our results that shows how increasing noise decreases the effective batch size. △ Less

Submitted 26 February, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

arXiv:1703.01467 [pdf, other]

Generative Compression

Authors: Shibani Santurkar, David Budden, Nir Shavit

Abstract: Traditional image and video compression algorithms rely on hand-crafted encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the data being compressed. Here we describe the concept of generative compression, the compression of data using generative models, and suggest that it is a direction worth pursuing to produce more accurate and visually pleasing reconstructions at much d… ▽ More Traditional image and video compression algorithms rely on hand-crafted encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the data being compressed. Here we describe the concept of generative compression, the compression of data using generative models, and suggest that it is a direction worth pursuing to produce more accurate and visually pleasing reconstructions at much deeper compression levels for both image and video data. We also demonstrate that generative compression is orders-of-magnitude more resilient to bit error rates (e.g. from noisy wireless channels) than traditional variable-length coding schemes. △ Less

Submitted 4 June, 2017; v1 submitted 4 March, 2017; originally announced March 2017.

arXiv:1702.07386 [pdf, other]

Toward Streaming Synapse Detection with Compositional ConvNets

Authors: Shibani Santurkar, David Budden, Alexander Matveev, Heather Berlin, Hayk Saribekyan, Yaron Meirovitch, Nir Shavit

Abstract: Connectomics is an emerging field in neuroscience that aims to reconstruct the 3-dimensional morphology of neurons from electron microscopy (EM) images. Recent studies have successfully demonstrated the use of convolutional neural networks (ConvNets) for segmenting cell membranes to individuate neurons. However, there has been comparatively little success in high-throughput identification of the i… ▽ More Connectomics is an emerging field in neuroscience that aims to reconstruct the 3-dimensional morphology of neurons from electron microscopy (EM) images. Recent studies have successfully demonstrated the use of convolutional neural networks (ConvNets) for segmenting cell membranes to individuate neurons. However, there has been comparatively little success in high-throughput identification of the intercellular synaptic connections required for deriving connectivity graphs. In this study, we take a compositional approach to segmenting synapses, modeling them explicitly as an intercellular cleft co-located with an asymmetric vesicle density along a cell membrane. Instead of requiring a deep network to learn all natural combinations of this compositionality, we train lighter networks to model the simpler marginal distributions of membranes, clefts and vesicles from just 100 electron microscopy samples. These feature maps are then combined with simple rules-based heuristics derived from prior biological knowledge. Our approach to synapse detection is both more accurate than previous state-of-the-art (7% higher recall and 5% higher F1-score) and yields a 20-fold speed-up compared to the previous fastest implementations. We demonstrate by reconstructing the first complete, directed connectome from the largest available anisotropic microscopy dataset (245 GB) of mouse somatosensory cortex (S1) in just 9.7 hours on a single shared-memory CPU system. We believe that this work marks an important step toward the goal of a microscope-pace streaming connectomics pipeline. △ Less

Submitted 23 February, 2017; originally announced February 2017.

Comments: 10 pages, 9 figures

arXiv:1612.02120 [pdf, other]

A Multi-Pass Approach to Large-Scale Connectomics

Authors: Yaron Meirovitch, Alexander Matveev, Hayk Saribekyan, David Budden, David Rolnick, Gergely Odor, Seymour Knowles-Barley, Thouis Raymond Jones, Hanspeter Pfister, Jeff William Lichtman, Nir Shavit

Abstract: The field of connectomics faces unprecedented "big data" challenges. To reconstruct neuronal connectivity, automated pixel-level segmentation is required for petabytes of streaming electron microscopy data. Existing algorithms provide relatively good accuracy but are unacceptably slow, and would require years to extract connectivity graphs from even a single cubic millimeter of neural tissue. Here… ▽ More The field of connectomics faces unprecedented "big data" challenges. To reconstruct neuronal connectivity, automated pixel-level segmentation is required for petabytes of streaming electron microscopy data. Existing algorithms provide relatively good accuracy but are unacceptably slow, and would require years to extract connectivity graphs from even a single cubic millimeter of neural tissue. Here we present a viable real-time solution, a multi-pass pipeline optimized for shared-memory multicore systems, capable of processing data at near the terabyte-per-hour pace of multi-beam electron microscopes. The pipeline makes an initial fast-pass over the data, and then makes a second slow-pass to iteratively correct errors in the output of the fast-pass. We demonstrate the accuracy of a sparse slow-pass reconstruction algorithm and suggest new methods for detecting morphological errors. Our fast-pass approach provided many algorithmic challenges, including the design and implementation of novel shallow convolutional neural nets and the parallelization of watershed and object-merging techniques. We use it to reconstruct, from image stack to skeletons, the full dataset of Kasthuri et al. (463 GB capturing 120,000 cubic microns) in a matter of hours on a single multicore machine rather than the weeks it has taken in the past on much larger distributed systems. △ Less

Submitted 7 December, 2016; originally announced December 2016.

Comments: 18 pages, 10 figures

arXiv:1611.06565 [pdf, other]

Deep Tensor Convolution on Multicores

Authors: David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit

Abstract: Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster W… ▽ More Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the $N$-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2-dimensional ConvNets as a special (and the least beneficial) case of our approach, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art. △ Less

Submitted 11 June, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

Comments: 11 pages, 4 figures, 1 supplementary doc

arXiv:1607.06139 [pdf, ps, other]

A Complexity-Based Hierarchy for Multiprocessor Synchronization

Authors: Faith Ellen, Rati Gelashvili, Nir Shavit, Leqi Zhu

Abstract: For many years, Herlihy's elegant computability based Consensus Hierarchy has been our best explanation of the relative power of various types of multiprocessor synchronization objects when used in deterministic algorithms. However, key to this hierarchy is treating synchronization instructions as distinct objects, an approach that is far from the real-world, where multiprocessor programs apply sy… ▽ More For many years, Herlihy's elegant computability based Consensus Hierarchy has been our best explanation of the relative power of various types of multiprocessor synchronization objects when used in deterministic algorithms. However, key to this hierarchy is treating synchronization instructions as distinct objects, an approach that is far from the real-world, where multiprocessor programs apply synchronization instructions to collections of arbitrary memory locations. We were surprised to realize that, when considering instructions applied to memory locations, the computability based hierarchy collapses. This leaves open the question of how to better capture the power of various synchronization instructions. In this paper, we provide an approach to answering this question. We present a hierarchy of synchronization instructions, classified by their space complexity in solving obstruction-free consensus. Our hierarchy provides a classification of combinations of known instructions that seems to fit with our intuition of how useful some are in practice, while questioning the effectiveness of others. We prove an essentially tight characterization of the power of buffered read and write instructions.Interestingly, we show a similar result for multi-location atomic assignments. △ Less

Submitted 3 May, 2018; v1 submitted 20 July, 2016; originally announced July 2016.

arXiv:1411.5383 [pdf, other]

doi 10.1073/pnas.1419100111

Johnson-Lindenstrauss Compression with Neuroscience-Based Constraints

Authors: Zeyuan Allen-Zhu, Rati Gelashvili, Silvio Micali, Nir Shavit

Abstract: Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory or inhibitory implies that every so imple… ▽ More Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory or inhibitory implies that every so implementable JL matrix must be sign-consistent (i.e., all entries in a single column must be either all non-negative or all non-positive), and the fact that any given neuron connects to a relatively small subset of other neurons implies that the JL matrix had better be sparse. We construct sparse JL matrices that are sign-consistent, and prove that our construction is essentially optimal. Our work answers a mathematical question that was triggered by earlier work and is necessary to justify the existence of JL compression in the brain, and emphasizes that inhibition is crucial if neurons are to perform efficient, correlation-preserving compression. △ Less

Submitted 19 November, 2014; originally announced November 2014.

Comments: A shorter version of this paper has appeared in the Proceedings of the National Academy of Sciences

arXiv:1411.0168 [pdf, other]

On the Importance of Registers for Computability

Authors: Rati Gelashvili, Mohsen Ghaffari, Jerry Li, Nir Shavit

Abstract: All consensus hierarchies in the literature assume that we have, in addition to copies of a given object, an unbounded number of registers. But why do we really need these registers? This paper considers what would happen if one attempts to solve consensus using various objects but without any registers. We show that under a reasonable assumption, objects like queues and stacks cannot emulate th… ▽ More All consensus hierarchies in the literature assume that we have, in addition to copies of a given object, an unbounded number of registers. But why do we really need these registers? This paper considers what would happen if one attempts to solve consensus using various objects but without any registers. We show that under a reasonable assumption, objects like queues and stacks cannot emulate the missing registers. We also show that, perhaps surprisingly, initialization, shown to have no computational consequences when registers are readily available, is crucial in determining the synchronization power of objects when no registers are allowed. Finally, we show that without registers, the number of available objects affects the level of consensus that can be solved. Our work thus raises the question of whether consensus hierarchies which assume an unbounded number of registers truly capture synchronization power, and begins a line of research aimed at better understanding the interaction between read-write memory and the powerful synchronization operations available on modern architectures. △ Less

Submitted 1 November, 2014; originally announced November 2014.

Comments: 12 pages, 0 figures

arXiv:1405.5689 [pdf, other]

Inherent Limitations of Hybrid Transactional Memory

Authors: Dan Alistarh, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi, Nir Shavit

Abstract: Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort, nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the fundamental limitations of building a HyTM with nontrivial concurrency between hardware and software transactions are still not well understood. In this paper, we propose a general… ▽ More Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort, nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the fundamental limitations of building a HyTM with nontrivial concurrency between hardware and software transactions are still not well understood. In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses, and allows us to formally quantify and analyze the amount of overhead (instrumentation) of a HyTM scheme. We prove the following: (1) it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for weak progress guarantees, and (2) under reasonable assumptions, in any opaque progressive HyTM, a hardware transaction must incur instrumentation costs linear in the size of its data set. We further provide two upper bound implementations whose instrumentation costs are optimal with respect to their progress guarantees. In sum, this paper captures for the first time an inherent trade-off between the degree of concurrency a HyTM provides between hardware and software transactions, and the amount of instrumentation overhead the implementation must incur. △ Less

Submitted 17 February, 2015; v1 submitted 22 May, 2014; originally announced May 2014.

arXiv:1405.5461 [pdf, other]

The LevelArray: A Fast, Practical Long-Lived Renaming Algorithm

Authors: Dan Alistarh, Justin Kopinsky, Alexander Matveev, Nir Shavit

Abstract: The long-lived renaming problem appears in shared-memory systems where a set of threads need to register and deregister frequently from the computation, while concurrent operations scan the set of currently registered threads. Instances of this problem show up in concurrent implementations of transactional memory, flat combining, thread barriers, and memory reclamation schemes for lock-free data s… ▽ More The long-lived renaming problem appears in shared-memory systems where a set of threads need to register and deregister frequently from the computation, while concurrent operations scan the set of currently registered threads. Instances of this problem show up in concurrent implementations of transactional memory, flat combining, thread barriers, and memory reclamation schemes for lock-free data structures. In this paper, we analyze a randomized solution for long-lived renaming. The algorithmic technique we consider, called the LevelArray, has previously been used for hashing and one-shot (single-use) renaming. Our main contribu- tion is to prove that, in long-lived executions, where processes may register and deregister polynomially many times, the technique guarantees constant steps on average and O(log log n) steps with high probability for registering, unit cost for deregistering, and O(n) steps for collect queries, where n is an upper bound on the number of processes that may be active at any point in time. We also show that the algorithm has the surprising property that it is self-healing: under reasonable assumptions on the schedule, operations running while the data structure is in a degraded state implicitly help the data structure re-balance itself. This subtle mechanism obviates the need for expensive periodic rebuilding procedures. Our benchmarks validate this approach, showing that, for typical use parameters, the average number of steps a process takes to register is less than two and the worst-case number of steps is bounded by six, even in executions with billions of operations. We contrast this with other randomized implementations, whose worst-case behavior we show to be unreliable, and with deterministic implementations, whose cost is linear in n. △ Less

Submitted 21 May, 2014; originally announced May 2014.

Comments: ICDCS 2014

arXiv:1311.3200 [pdf, ps, other]

Are Lock-Free Concurrent Algorithms Practically Wait-Free?

Authors: Dan Alistarh, Keren Censor-Hillel, Nir Shavit

Abstract: Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. While obt… ▽ More Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. While obtaining efficient wait-free algorithms has been a long-time goal for the theory community, most non-blocking commercial code is only lock-free. This paper suggests a simple solution to this problem. We show that, for a large class of lock- free algorithms, under scheduling conditions which approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, programmers can keep on designing simple lock-free algorithms instead of complex wait-free ones, and in practice, they will get wait-free progress. Our main contribution is a new way of analyzing a general class of lock-free algorithms under a stochastic scheduler. Our analysis relates the individual performance of processes with the global performance of the system using Markov chain lifting between a complex per-process chain and a simpler system progress chain. We show that lock-free algorithms are not only wait-free with probability 1, but that in fact a general subset of lock-free algorithms can be closely bounded in terms of the average number of steps required until an operation completes. To the best of our knowledge, this is the first attempt to analyze progress conditions, typically stated in relation to a worst case adversary, in a stochastic model capturing their expected asymptotic behavior. △ Less

Submitted 15 November, 2013; v1 submitted 13 November, 2013; originally announced November 2013.

Comments: 25 pages

Showing 1–26 of 26 results for author: Shavit, N