-
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation
Authors:
Ylaine Gerardin,
John Shamshoian,
Judy Shen,
Nhat Le,
Jamie Prezioso,
John Abel,
Isaac Finberg,
Daniel Borders,
Raymond Biju,
Michael Nercessian,
Vaed Prasad,
Joseph Lee,
Spencer Wyman,
Sid Gupta,
Abigail Emerson,
Bahar Rahsepar,
Darpan Sanghavi,
Ryan Leung,
Limin Yu,
Archit Khosla,
Amaro Taylor-Weiner
Abstract:
Nested pairwise frames is a method for relative benchmarking of cell or tissue digital pathology models against manual pathologist annotations on a set of sampled patches. At a high level, the method compares agreement between a candidate model and pathologist annotations with agreement among pathologists' annotations. This evaluation framework addresses fundamental issues of data size and annotat…
▽ More
Nested pairwise frames is a method for relative benchmarking of cell or tissue digital pathology models against manual pathologist annotations on a set of sampled patches. At a high level, the method compares agreement between a candidate model and pathologist annotations with agreement among pathologists' annotations. This evaluation framework addresses fundamental issues of data size and annotator variability in using manual pathologist annotations as a source of ground truth for model validation. We implemented nested pairwise frames evaluation for tissue classification, cell classification, and cell count prediction tasks and show results for cell and tissue models deployed on an H&E-stained melanoma dataset.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
A Novel Approach To User Agent String Parsing For Vulnerability Analysis Using Mutli-Headed Attention
Authors:
Dhruv Nandakumar,
Sathvik Murli,
Ankur Khosla,
Kevin Choi,
Abdul Rahman,
Drew Walsh,
Scott Riede,
Eric Dull,
Edward Bowen
Abstract:
The increasing reliance on the internet has led to the proliferation of a diverse set of web-browsers and operating systems (OSs) capable of browsing the web. User agent strings (UASs) are a component of web browsing that are transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain information about the client device and software, which is used by web servers for various pur…
▽ More
The increasing reliance on the internet has led to the proliferation of a diverse set of web-browsers and operating systems (OSs) capable of browsing the web. User agent strings (UASs) are a component of web browsing that are transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain information about the client device and software, which is used by web servers for various purposes such as content negotiation and security. However, due to the proliferation of various browsers and devices, parsing UASs is a non-trivial task due to a lack of standardization of UAS formats. Current rules-based approaches are often brittle and can fail when encountering such non-standard formats. In this work, a novel methodology for parsing UASs using Multi-Headed Attention Based transformers is proposed. The proposed methodology exhibits strong performance in parsing a variety of UASs with differing formats. Furthermore, a framework to utilize parsed UASs to estimate the vulnerability scores for large sections of publicly visible IT networks or regions is also discussed. The methodology present here can also be easily extended or deployed for real-time parsing of logs in enterprise settings.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in Pathology
Authors:
Dinkar Juyal,
Siddhant Shingi,
Syed Ashar Javed,
Harshith Padigela,
Chintan Shah,
Anand Sampat,
Archit Khosla,
John Abel,
Amaro Taylor-Weiner
Abstract:
Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. In pathology images, there is another level of imbalance, where given a positively labeled…
▽ More
Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. In pathology images, there is another level of imbalance, where given a positively labeled Whole Slide Image (WSI), only a fraction of pixels within it contribute to the positive label. This compounds the severity of imbalance and makes imbalanced classification in pathology challenging. Furthermore, these imbalances can occur in out-of-distribution (OOD) datasets when the models are deployed in the real-world. We leverage the idea that decoupling feature and classifier learning can lead to improved decision boundaries for label imbalanced datasets. To this end, we investigate the integration of supervised contrastive learning with multiple instance learning (SC-MIL). Specifically, we propose a joint-training MIL framework in the presence of label imbalance that progressively transitions from learning bag-level representations to optimal classifier learning. We perform experiments with different imbalance settings for two well-studied problems in cancer pathology: subtyping of non-small cell lung cancer and subtyping of renal cell carcinoma. SC-MIL provides large and consistent improvements over other techniques on both in-distribution (ID) and OOD held-out sets across multiple imbalanced settings.
△ Less
Submitted 9 September, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Accurate Trajectory Prediction for Autonomous Vehicles
Authors:
Michael Diodato,
Yu Li,
Antonia Lovjer,
Minsu Yeom,
Albert Song,
Yiyang Zeng,
Abhay Khosla,
Benedikt Schifferer,
Manik Goyal,
Iddo Drori
Abstract:
Predicting vehicle trajectories, angle and speed is important for safe and comfortable driving. We demonstrate the best predicted angle, speed, and best performance overall winning the top three places of the ICCV 2019 Learning to Drive challenge. Our key contributions are (i) a general neural network system architecture which embeds and fuses together multiple inputs by encoding, and decodes mult…
▽ More
Predicting vehicle trajectories, angle and speed is important for safe and comfortable driving. We demonstrate the best predicted angle, speed, and best performance overall winning the top three places of the ICCV 2019 Learning to Drive challenge. Our key contributions are (i) a general neural network system architecture which embeds and fuses together multiple inputs by encoding, and decodes multiple outputs using neural networks, (ii) using pre-trained neural networks for augmenting the given input data with segmentation maps and semantic information, and (iii) leveraging the form and distribution of the expected output in the model.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Spin molecular-orbit coupling and magnetic properties of the decorated honeycomb layers of Mo3S7(dmit)3 crystals
Authors:
J. Merino,
A. C. Jacko,
A. L. Khosla,
A. Ralko,
B. J. Powell
Abstract:
We explore the magnetic properties of isolated a-b planes of trinuclear organometallic crystals, Mo3S7(dmit)3, in which an interplay of strong electronic correlations and spin molecular-orbital coupling (SMOC) occurs. The magnetic properties can be described by a XXZ+120, S=1 Heisenberg model on a honeycomb lattice with single-spin anisotropy, D, which depends strongly on SMOC. Based on ab initio…
▽ More
We explore the magnetic properties of isolated a-b planes of trinuclear organometallic crystals, Mo3S7(dmit)3, in which an interplay of strong electronic correlations and spin molecular-orbital coupling (SMOC) occurs. The magnetic properties can be described by a XXZ+120, S=1 Heisenberg model on a honeycomb lattice with single-spin anisotropy, D, which depends strongly on SMOC. Based on ab initio estimates of SMOC in Mo3S7(dmit)3 crystals, we predict that the honeycomb layers of Mo3S7(dmit)3 are Neel ordered. However, in materials with a greater degree of magnetic frustration, Neel order can give way to the large-D phase.
△ Less
Submitted 31 October, 2018;
originally announced October 2018.
-
Mining Procedures from Technical Support Documents
Authors:
Abhirut Gupta,
Abhay Khosla,
Gautam Singh,
Gargi Dasgupta
Abstract:
Guided troubleshooting is an inherent task in the domain of technical support services. When a customer experiences an issue with the functioning of a technical service or a product, an expert user helps guide the customer through a set of steps comprising a troubleshooting procedure. The objective is to identify the source of the problem through a set of diagnostic steps and observations, and arr…
▽ More
Guided troubleshooting is an inherent task in the domain of technical support services. When a customer experiences an issue with the functioning of a technical service or a product, an expert user helps guide the customer through a set of steps comprising a troubleshooting procedure. The objective is to identify the source of the problem through a set of diagnostic steps and observations, and arrive at a resolution. Procedures containing these set of diagnostic steps and observations in response to different problems are common artifacts in the body of technical support documentation. The ability to use machine learning and linguistics to understand and leverage these procedures for applications like intelligent chatbots or robotic process automation, is crucial. Existing research on question answering or intelligent chatbots does not look within procedures or deep-understand them. In this paper, we outline a system for mining procedures from technical support documents. We create models for solving important subproblems like extraction of procedures, identifying decision points within procedures, identifying blocks of instructions corresponding to these decision points and mapping instructions within a decision block. We also release a dataset containing our manual annotations on publicly available support documents, to promote further research on the problem.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Network Dissection: Quantifying Interpretability of Deep Visual Representations
Authors:
David Bau,
Bolei Zhou,
Aditya Khosla,
Aude Oliva,
Antonio Torralba
Abstract:
We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units wit…
▽ More
We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.
△ Less
Submitted 19 April, 2017;
originally announced April 2017.
-
Effects of anisotropy in spin molecular-orbital coupling on effective spin models of trinuclear organometallic complexes
Authors:
J. Merino,
A. C. Jacko,
A. L. Khosla,
B. J. Powell
Abstract:
We consider layered decorated honeycomb lattices at two-thirds filling, as realized in some trinuclear organometallic complexes. Localized $S=1$ moments with a single-spin anisotropy emerge from the interplay of Coulomb repulsion and spin molecular-orbit coupling (SMOC). Magnetic anisotropies with bond dependent exchange couplings occur in the honeycomb layers when the direct intracluster exchange…
▽ More
We consider layered decorated honeycomb lattices at two-thirds filling, as realized in some trinuclear organometallic complexes. Localized $S=1$ moments with a single-spin anisotropy emerge from the interplay of Coulomb repulsion and spin molecular-orbit coupling (SMOC). Magnetic anisotropies with bond dependent exchange couplings occur in the honeycomb layers when the direct intracluster exchange and the spin molecular-orbital coupling are both present. We find that the effective spin exchange model within the layers is an XXZ + 120$^\circ$ honeycomb quantum compass model. The intrinsic non-spherical symmetry of the multinuclear complexes leads to very different transverse and longitudinal spin molecular-orbital couplings, which greatly enhances the single-spin and exchange coupling anisotropies. The interlayer coupling is described by a XXZ model with anisotropic biquadratic terms. As the correlation strength increases the systems becomes increasingly one-dimensional. Thus, if the ratio of SMOC to the interlayer hopping is small this stabilizes the Haldane phase. However, as the ratio increases there is a quantum phase transition to the topologically trivial `$D$-phase'. We also predict a quantum phase transition from a Haldane phase to a magnetically ordered phase at sufficiently strong external magnetic fields.
△ Less
Submitted 9 November, 2017; v1 submitted 24 March, 2017;
originally announced March 2017.
-
Heisenberg and Dzyaloshinskii-Moriya interactions controlled by molecular packing in tri-nuclear organometallic clusters
Authors:
B. J. Powell,
J. Merino,
A. L. Khosla,
A. C. Jacko
Abstract:
Motivated by recent synthetic and theoretical progress we consider magnetism in crystals of multi-nuclear organometallic complexes. We calculate the Heisenberg symmetric exchange and the Dzyaloshinskii-Moriya antisymmetric exchange. We show how, in the absence of spin-orbit coupling, the interplay of electronic correlations and quantum interference leads to a quasi-one dimensional effective spin m…
▽ More
Motivated by recent synthetic and theoretical progress we consider magnetism in crystals of multi-nuclear organometallic complexes. We calculate the Heisenberg symmetric exchange and the Dzyaloshinskii-Moriya antisymmetric exchange. We show how, in the absence of spin-orbit coupling, the interplay of electronic correlations and quantum interference leads to a quasi-one dimensional effective spin model in a typical tri-nuclear complex, Mo$_3$S$_7$(dmit)$_3$, despite its underlying three dimensional band structure. We show that both intra- and inter-molecular spin-orbit coupling can cause an effective Dzyaloshinskii-Moriya interaction. Furthermore, we show that, even for an isolated pair of molecules the relative orientation of the molecules controls the nature of the Dzyaloshinskii-Moriya coupling. We show that interference effects also play a crucial role in determining the Dzyaloshinskii-Moriya interaction. Thus, we argue, that multi-nuclear organometallic complexes represent an ideal platform to investigate the effects of Dzyaloshinskii-Moriya interactions on quantum magnets.
△ Less
Submitted 2 August, 2017; v1 submitted 14 December, 2016;
originally announced December 2016.
-
Following Gaze Across Views
Authors:
Adrià Recasens,
Carl Vondrick,
Aditya Khosla,
Antonio Torralba
Abstract:
Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one view with a person in it and a second vi…
▽ More
Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one view with a person in it and a second view of the scene, our model estimates a density for gaze location in the second view. A key aspect of our approach is an end-to-end model that solves the following sub-problems: saliency, gaze pose, and geometric relationships between views. Although our model is supervised only with gaze, we show that the model learns to solve these subproblems automatically without supervision. Experiments suggest that our approach follows gaze better than standard baselines and produces plausible results for everyday situations.
△ Less
Submitted 9 December, 2016;
originally announced December 2016.
-
Spin-orbit coupling in {Mo$_3$S$_7$(dmit)$_3$}
Authors:
A. C. Jacko,
A. L. Khosla,
J. Merino,
B. J. Powell
Abstract:
Spin-orbit coupling in crystals is known to lead to unusual direction dependent exchange interactions, however understanding of the consequeces of such effects in molecular crystals is incomplete. Here we perform four component relativistic density functional theory computations on the multi-nuclear molecular crystal {Mo$_3$S$_7$(dmit)$_3$} and show that both intra- and inter-molecular spin-orbit…
▽ More
Spin-orbit coupling in crystals is known to lead to unusual direction dependent exchange interactions, however understanding of the consequeces of such effects in molecular crystals is incomplete. Here we perform four component relativistic density functional theory computations on the multi-nuclear molecular crystal {Mo$_3$S$_7$(dmit)$_3$} and show that both intra- and inter-molecular spin-orbit coupling are significant. We determine a long-range relativistic single electron Hamiltonian from first principles by constructing Wannier spin-orbitals. We analyse the various contributions through the lens of group theory. Intermolecular spin-orbit couplings like those found here are known to lead to quantum spin-Hall and topological insulator phases on the 2D lattice formed by the tight-binding model predicted for a single layer of {Mo$_3$S$_7$(dmit)$_3$}.
△ Less
Submitted 9 December, 2016;
originally announced December 2016.
-
Places: An Image Database for Deep Scene Understanding
Authors:
Bolei Zhou,
Aditya Khosla,
Agata Lapedriza,
Antonio Torralba,
Aude Oliva
Abstract:
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments enc…
▽ More
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.
△ Less
Submitted 6 October, 2016;
originally announced October 2016.
-
Topological quantum phase transition driven by anisotropic spin-orbit coupling in trinuclear organometallic coordination crystals
Authors:
J. Merino,
A. C. Jacko,
A. L. Khosla,
B. J. Powell
Abstract:
We show how quasi-one-dimensional correlated insulating states arise at two-thirds filling in organometallic multinuclear coordination complexes described by layered decorated honeycomb lattices. The interplay of spin-orbit coupling and electronic correlations leads to pseudospin-1 moments arranged in weakly coupled chains with highly anisotropic exchange and a large trigonal splitting. This leads…
▽ More
We show how quasi-one-dimensional correlated insulating states arise at two-thirds filling in organometallic multinuclear coordination complexes described by layered decorated honeycomb lattices. The interplay of spin-orbit coupling and electronic correlations leads to pseudospin-1 moments arranged in weakly coupled chains with highly anisotropic exchange and a large trigonal splitting. This leads to a quantum phase transition from a Haldane phase to a topologically trivial phase as the relative strength of the spin-orbit coupling increases.
△ Less
Submitted 24 June, 2016;
originally announced June 2016.
-
Eye Tracking for Everyone
Authors:
Kyle Krafka,
Aditya Khosla,
Petr Kellnhofer,
Harini Kannan,
Suchendra Bhandarkar,
Wojciech Matusik,
Antonio Torralba
Abstract:
From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additio…
▽ More
From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2.5M frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device. Our model achieves a prediction error of 1.71cm and 2.53cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.34cm and 2.12cm. Further, we demonstrate that the features learned by iTracker generalize well to other datasets, achieving state-of-the-art results. The code, data, and models are available at http://gazecapture.csail.mit.edu.
△ Less
Submitted 18 June, 2016;
originally announced June 2016.
-
Deep Learning for Identifying Metastatic Breast Cancer
Authors:
Dayong Wang,
Aditya Khosla,
Rishab Gargeya,
Humayun Irshad,
Andrew H. Beck
Abstract:
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and…
▽ More
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
△ Less
Submitted 18 June, 2016;
originally announced June 2016.
-
Spin-orbit coupling and strong electronic correlations in cyclic molecules
Authors:
A. L. Khosla,
A. C. Jacko,
J. Merino,
B. J. Powell
Abstract:
In atoms spin-orbit coupling (SOC) cannot raise the angular momentum above a maximum value or lower it below a minimum. Here we show that this need not be the case in materials built from nanoscale structures including multi-nuclear coordination complexes, materials with decorated lattices, or atoms on surfaces. In such cyclic molecules the electronic spin couples to currents running around the mo…
▽ More
In atoms spin-orbit coupling (SOC) cannot raise the angular momentum above a maximum value or lower it below a minimum. Here we show that this need not be the case in materials built from nanoscale structures including multi-nuclear coordination complexes, materials with decorated lattices, or atoms on surfaces. In such cyclic molecules the electronic spin couples to currents running around the molecule. For odd-fold symmetric molecules (e.g., odd membered rings) the SOC is highly analogous to the atomic case; but for even-fold symmetric molecules every angular momentum state can be both raised and lowered. These differences arise because for odd-fold symmetric molecules the maximum and minimum molecular orbital angular momentum states are time reversal conjugates, whereas for even-fold symmetric molecules they are aliases of the same single state. We show, from first principles calculations, that in suitable molecules this molecular SOC is large, compared to the energy differences between frontier molecular orbitals. Finally, we show that, when electronic correlations are strong, molecular SOC can cause highly anisotropic exchange interactions and discuss how this can lead to effective spin models with compass Hamiltonians.
△ Less
Submitted 1 March, 2017; v1 submitted 14 June, 2016;
originally announced June 2016.
-
Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition
Authors:
Radoslaw M. Cichy,
Aditya Khosla,
Dimitrios Pantazis,
Antonio Torralba,
Aude Oliva
Abstract:
The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the…
▽ More
The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.
△ Less
Submitted 12 January, 2016;
originally announced January 2016.
-
Learning Deep Features for Discriminative Localization
Authors:
Bolei Zhou,
Aditya Khosla,
Agata Lapedriza,
Aude Oliva,
Antonio Torralba
Abstract:
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that…
▽ More
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them
△ Less
Submitted 13 December, 2015;
originally announced December 2015.
-
Visualizing Object Detection Features
Authors:
Carl Vondrick,
Aditya Khosla,
Hamed Pirsiavash,
Tomasz Malisiewicz,
Antonio Torralba
Abstract:
We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for high scoring false alarms, we discovered that, alth…
▽ More
We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and supports that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of recognition systems.
△ Less
Submitted 18 February, 2015;
originally announced February 2015.
-
Object Detectors Emerge in Deep Scene CNNs
Authors:
Bolei Zhou,
Aditya Khosla,
Agata Lapedriza,
Aude Oliva,
Antonio Torralba
Abstract:
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these de…
▽ More
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene classification. As scenes are composed of objects, the CNN for scene classification automatically discovers meaningful objects detectors, representative of the learned scene categories. With object detectors emerging as a result of learning to recognize scenes, our work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.
△ Less
Submitted 15 April, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.
-
ImageNet Large Scale Visual Recognition Challenge
Authors:
Olga Russakovsky,
Jia Deng,
Hao Su,
Jonathan Krause,
Sanjeev Satheesh,
Sean Ma,
Zhiheng Huang,
Andrej Karpathy,
Aditya Khosla,
Michael Bernstein,
Alexander C. Berg,
Li Fei-Fei
Abstract:
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances in object recognition that ha…
▽ More
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.
△ Less
Submitted 29 January, 2015; v1 submitted 1 September, 2014;
originally announced September 2014.
-
3D ShapeNets: A Deep Representation for Volumetric Shapes
Authors:
Zhirong Wu,
Shuran Song,
Aditya Khosla,
Fisher Yu,
Linguang Zhang,
Xiaoou Tang,
Jianxiong Xiao
Abstract:
3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from vie…
▽ More
3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from view-based 2.5D depth maps is also a critical part of visual understanding. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically. It naturally supports joint object recognition and shape completion from 2.5D depth maps, and it enables active object recognition through view planning. To train our 3D deep learning model, we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive experiments show that our 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
△ Less
Submitted 15 April, 2015; v1 submitted 21 June, 2014;
originally announced June 2014.
-
Inverting and Visualizing Features for Object Detection
Authors:
Carl Vondrick,
Aditya Khosla,
Tomasz Malisiewicz,
Antonio Torralba
Abstract:
We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on `HOG goggles' and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for…
▽ More
We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on `HOG goggles' and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.
△ Less
Submitted 5 May, 2013; v1 submitted 10 December, 2012;
originally announced December 2012.
-
Particle Swarm Optimization Framework for Low Power Testing of VLSI Circuits
Authors:
Balwnder Singh,
Sukhleen Bindra Narang,
Arun Khosla
Abstract:
Power dissipation in sequential circuits is due to increased toggling count of Circuit under Test, which depends upon test vectors applied. If successive test vectors sequences have more toggling nature then it is sure that toggling rate of flip flops is higher. Higher toggling for flip flops results more power dissipation. To overcome this problem, one method is to use GA to have test vectors of…
▽ More
Power dissipation in sequential circuits is due to increased toggling count of Circuit under Test, which depends upon test vectors applied. If successive test vectors sequences have more toggling nature then it is sure that toggling rate of flip flops is higher. Higher toggling for flip flops results more power dissipation. To overcome this problem, one method is to use GA to have test vectors of high fault coverage in short interval, followed by Hamming distance management on test patterns. This approach is time consuming and needs more efforts. Another method which is purposed in this paper is a PSO based Frame Work to optimize power dissipation. Here target is to set the entire test vector in a frame for time period 'T', so that the frame consists of all those vectors strings which not only provide high fault coverage but also arrange vectors in frame to produce minimum toggling.
△ Less
Submitted 7 November, 2011;
originally announced November 2011.
-
Towards Real-time Classification of Astronomical Transients
Authors:
A. Mahabal,
S. G. Djorgovski,
R. Williams,
A. Drake,
C. Donalek,
M. Graham,
B. Moghaddam,
M. Turmon,
J. Jewell,
A. Khosla,
B. Hensley
Abstract:
Exploration of time domain is now a vibrant area of research in astronomy, driven by the advent of digital synoptic sky surveys. While panoramic surveys can detect variable or transient events, typically some follow-up observations are needed; for short-lived phenomena, a rapid response is essential. Ability to automatically classify and prioritize transient events for follow-up studies becomes…
▽ More
Exploration of time domain is now a vibrant area of research in astronomy, driven by the advent of digital synoptic sky surveys. While panoramic surveys can detect variable or transient events, typically some follow-up observations are needed; for short-lived phenomena, a rapid response is essential. Ability to automatically classify and prioritize transient events for follow-up studies becomes critical as the data rates increase. We have been developing such methods using the data streams from the Palomar-Quest survey, the Catalina Sky Survey and others, using the VOEventNet framework. The goal is to automatically classify transient events, using the new measurements, combined with archival data (previous and multi-wavelength measurements), and contextual information (e.g., Galactic or ecliptic latitude, presence of a possible host galaxy nearby, etc.); and to iterate them dynamically as the follow-up data come in (e.g., light curves or colors). We have been investigating Bayesian methodologies for classification, as well as discriminated follow-up to optimize the use of available resources, including Naive Bayesian approach, and the non-parametric Gaussian process regression. We will also be deploying variants of the traditional machine learning techniques such as Neural Nets and Support Vector Machines on datasets of reliably classified transients as they build up.
△ Less
Submitted 24 October, 2008;
originally announced October 2008.