Search | arXiv e-print repository

A*HAR: A New Benchmark towards Semi-supervised learning for Class-imbalanced Human Activity Recognition

Authors: Govind Narasimman, Kangkang Lu, Arun Raja, Chuan Sheng Foo, Mohamed Sabry Aly, Jie Lin, Vijay Chandrasekhar

Abstract: Despite the vast literature on Human Activity Recognition (HAR) with wearable inertial sensor data, it is perhaps surprising that there are few studies investigating semisupervised learning for HAR, particularly in a challenging scenario with class imbalance problem. In this work, we present a new benchmark, called A*HAR, towards semisupervised learning for class-imbalanced HAR. We evaluate state-… ▽ More Despite the vast literature on Human Activity Recognition (HAR) with wearable inertial sensor data, it is perhaps surprising that there are few studies investigating semisupervised learning for HAR, particularly in a challenging scenario with class imbalance problem. In this work, we present a new benchmark, called A*HAR, towards semisupervised learning for class-imbalanced HAR. We evaluate state-of-the-art semi-supervised learning method on A*HAR, by combining Mean Teacher and Convolutional Neural Network. Interestingly, we find that Mean Teacher boosts the overall performance when training the classifier with fewer labelled samples and a large amount of unlabeled samples, but the classifier falls short in handling unbalanced activities. These findings lead to an interesting open problem, i.e., development of semi-supervised HAR algorithms that are class-imbalance aware without any prior knowledge on the class distribution for unlabeled samples. The dataset and benchmark evaluation are released at https://github.com/I2RDL2/ASTAR-HAR for future research. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 5 pages, 3 figures

arXiv:2007.04756 [pdf, other]

Learning to Prune Deep Neural Networks via Reinforcement Learning

Authors: Manas Gupta, Siddharth Aravindan, Aleksandra Kalisz, Vijay Chandrasekhar, Lin Jie

Abstract: This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. This enables PuRL to achieve sparsity and accuracy comparable to current state-of-the-art methods, while having a much shorte… ▽ More This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. This enables PuRL to achieve sparsity and accuracy comparable to current state-of-the-art methods, while having a much shorter training cycle. PuRL achieves more than 80% sparsity on the ResNet-50 model while retaining a Top-1 accuracy of 75.37% on the ImageNet dataset. Through our experiments we show that PuRL is also able to sparsify already efficient architectures like MobileNet-V2. In addition to performance characterisation experiments, we also provide a discussion and analysis of the various RL design choices that went into the tuning of the Markov Decision Process underlying PuRL. Lastly, we point out that PuRL is simple to use and can be easily adapted for various architectures. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: Accepted at the ICML 2020 Workshop on Automated Machine Learning (AutoML 2020)

arXiv:2006.14265 [pdf, other]

Empirical Analysis of Overfitting and Mode Drop in GAN Training

Authors: Yasin Yazici, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Vijay Chandrasekhar

Abstract: We examine two key questions in GAN training, namely overfitting and mode drop, from an empirical perspective. We show that when stochasticity is removed from the training procedure, GANs can overfit and exhibit almost no mode drop. Our results shed light on important characteristics of the GAN training procedure. They also provide evidence against prevailing intuitions that GANs do not memorize t… ▽ More We examine two key questions in GAN training, namely overfitting and mode drop, from an empirical perspective. We show that when stochasticity is removed from the training procedure, GANs can overfit and exhibit almost no mode drop. Our results shed light on important characteristics of the GAN training procedure. They also provide evidence against prevailing intuitions that GANs do not memorize the training set, and that mode dropping is mainly due to properties of the GAN objective rather than how it is optimized during training. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: To appear in ICIP2020

arXiv:2004.07543 [pdf, other]

doi 10.1016/j.neucom.2021.10.090

Classify and Generate: Using Classification Latent Space Representations for Image Generations

Authors: Saisubramaniam Gopalakrishnan, Pranshu Ranjan Singh, Yasin Yazici, Chuan-Sheng Foo, Vijay Chandrasekhar, ArulMurugan Ambikapathi

Abstract: Utilization of classification latent space information for downstream reconstruction and generation is an intriguing and a relatively unexplored area. In general, discriminative representations are rich in class-specific features but are too sparse for reconstruction, whereas, in autoencoders the representations are dense but have limited indistinguishable class-specific features, making them less… ▽ More Utilization of classification latent space information for downstream reconstruction and generation is an intriguing and a relatively unexplored area. In general, discriminative representations are rich in class-specific features but are too sparse for reconstruction, whereas, in autoencoders the representations are dense but have limited indistinguishable class-specific features, making them less suitable for classification. In this work, we propose a discriminative modeling framework that employs manipulated supervised latent representations to reconstruct and generate new samples belonging to a given class. Unlike generative modeling approaches such as GANs and VAEs that aim to model the data manifold distribution, Representation based Generations (ReGene) directly represent the given data manifold in the classification space. Such supervised representations, under certain constraints, allow for reconstructions and controlled generations using an appropriate decoder without enforcing any prior distribution. Theoretically, given a class, we show that these representations when smartly manipulated using convex combinations retain the same class label. Furthermore, they also lead to the novel generation of visually realistic images. Extensive experiments on datasets of varying resolutions demonstrate that ReGene has higher classification accuracy than existing conditional generative models while being competitive in terms of FID. △ Less

Submitted 14 December, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

Journal ref: Saisubramaniam Gopalakrishnan, Pranshu Ranjan Singh et. al., Classify and generate: Using classification latent space representations for image generations, Neurocomputing, Volume 471, 2022, Pages 296-334, ISSN 0925-2312

arXiv:1912.04219 [pdf, other]

FaultNet: Faulty Rail-Valves Detection using Deep Learning and Computer Vision

Authors: Ramanpreet Singh Pahwa, Jin Chao, Jestine Paul, Yiqun Li, Ma Tin Lay Nwe, Shudong Xie, Ashish James, Arulmurugan Ambikapathi, Zeng Zeng, Vijay Ramaseshan Chandrasekhar

Abstract: Regular inspection of rail valves and engines is an important task to ensure the safety and efficiency of railway networks around the globe. Over the past decade, computer vision and pattern recognition based techniques have gained traction for such inspection and defect detection tasks. An automated end-to-end trained system can potentially provide a low-cost, high throughput, and cheap alternati… ▽ More Regular inspection of rail valves and engines is an important task to ensure the safety and efficiency of railway networks around the globe. Over the past decade, computer vision and pattern recognition based techniques have gained traction for such inspection and defect detection tasks. An automated end-to-end trained system can potentially provide a low-cost, high throughput, and cheap alternative to manual visual inspection of these components. However, such systems require a huge amount of defective images for networks to understand complex defects. In this paper, a multi-phase deep learning based technique is proposed to perform accurate fault detection of rail-valves. Our approach uses a two-step method to perform high precision image segmentation of rail-valves resulting in pixel-wise accurate segmentation. Thereafter, a computer vision technique is used to identify faulty valves. We demonstrate that the proposed approach results in improved detection performance when compared to current state-of-theart techniques used in fault detection. △ Less

Submitted 8 November, 2019; originally announced December 2019.

Comments: 8 pages, 8 figures, ITSC 2019

Journal ref: IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE - ITSC 2019

arXiv:1909.07541 [pdf, other]

A*3D Dataset: Towards Autonomous Driving in Challenging Environments

Authors: Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, Jie Lin

Abstract: With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection. Existing datasets either represent simple scenarios or provide only day-time data. In this paper, we introduce a new challenging A*3D dataset which consists of RGB images and LiDAR data wi… ▽ More With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection. Existing datasets either represent simple scenarios or provide only day-time data. In this paper, we introduce a new challenging A*3D dataset which consists of RGB images and LiDAR data with significant diversity of scene, time, and weather. The dataset consists of high-density images ($\approx~10$ times more than the pioneering KITTI dataset), heavy occlusions, a large number of night-time frames ($\approx~3$ times the nuScenes dataset), addressing the gaps in the existing datasets to push the boundaries of tasks in autonomous driving research to more challenging highly diverse environments. The dataset contains $39\text{K}$ frames, $7$ classes, and $230\text{K}$ 3D object annotations. An extensive 3D object detection benchmark evaluation on the A*3D dataset for various attributes such as high density, day-time/night-time, gives interesting insights into the advantages and limitations of training and testing 3D object detection in real-world setting. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: A new 3D dataset by I2R, A*STAR for autonomous driving

arXiv:1907.07862 [pdf, other]

Artificial Intelligence-Enabled Cellular Networks: A Critical Path to Beyond-5G and 6G

Authors: Rubayet Shafin, Lingjia Liu, Vikram Chandrasekhar, Hao Chen, Jeffrey Reed, Jianzhong, Zhang

Abstract: Mobile Network Operators (MNOs) are in process of overlaying their conventional macro cellular networks with shorter range cells such as outdoor pico cells. The resultant increase in network complexity creates substantial overhead in terms of operating expenses, time, and labor for their planning and management. Artificial intelligence (AI) offers the potential for MNOs to operate their networks i… ▽ More Mobile Network Operators (MNOs) are in process of overlaying their conventional macro cellular networks with shorter range cells such as outdoor pico cells. The resultant increase in network complexity creates substantial overhead in terms of operating expenses, time, and labor for their planning and management. Artificial intelligence (AI) offers the potential for MNOs to operate their networks in a more organic and cost-efficient manner. We argue that deploying AI in 5G and Beyond will require surmounting significant technical barriers in terms of robustness, performance, and complexity. We outline future research directions, identify top 5 challenges, and present a possible roadmap to realize the vision of AI-enabled cellular networks for Beyond-5G and 6G. △ Less

Submitted 17 July, 2019; originally announced July 2019.

Comments: 7 pages, 3 figures, 1 table

arXiv:1902.03444 [pdf, other]

Venn GAN: Discovering Commonalities and Particularities of Multiple Distributions

Authors: Yasin Yazıcı, Bruno Lecouat, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar

Abstract: We propose a GAN design which models multiple distributions effectively and discovers their commonalities and particularities. Each data distribution is modeled with a mixture of $K$ generator distributions. As the generators are partially shared between the modeling of different true data distributions, shared ones captures the commonality of the distributions, while non-shared ones capture uniqu… ▽ More We propose a GAN design which models multiple distributions effectively and discovers their commonalities and particularities. Each data distribution is modeled with a mixture of $K$ generator distributions. As the generators are partially shared between the modeling of different true data distributions, shared ones captures the commonality of the distributions, while non-shared ones capture unique aspects of them. We show the effectiveness of our method on various datasets (MNIST, Fashion MNIST, CIFAR-10, Omniglot, CelebA) with compelling results. △ Less

Submitted 9 February, 2019; originally announced February 2019.

arXiv:1901.10074 [pdf, other]

CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images

Authors: Jin Chao, Ahmad Al Badawi, Balagopal Unnikrishnan, Jie Lin, Chan Fook Mun, James M. Brown, J. Peter Campbell, Michael Chiang, Jayashree Kalpathy-Cramer, Vijay Ramaseshan Chandrasekhar, Pavitra Krishnaswamy, Khin Mi Mi Aung

Abstract: Convolutional neural networks (CNNs) have enabled significant performance leaps in medical image classification tasks. However, translating neural network models for clinical applications remains challenging due to data privacy issues. Fully Homomorphic Encryption (FHE) has the potential to address this challenge as it enables the use of CNNs on encrypted images. However, current HE technology pos… ▽ More Convolutional neural networks (CNNs) have enabled significant performance leaps in medical image classification tasks. However, translating neural network models for clinical applications remains challenging due to data privacy issues. Fully Homomorphic Encryption (FHE) has the potential to address this challenge as it enables the use of CNNs on encrypted images. However, current HE technology poses immense computational and memory overheads, particularly for high-resolution images such as those seen in the clinical context. We present CaRENets: Compact and Resource-Efficient CNNs for high performance and resource-efficient inference on high-resolution encrypted images in practical applications. At the core, CaRENets comprises a new FHE compact packing scheme that is tightly integrated with CNN functions. CaRENets offers dual advantages of memory efficiency (due to compact packing of images and CNN activations) and inference speed (due to the reduction in the number of ciphertexts created and the associated mathematical operations) over standard interleaved packing schemes. We apply CaRENets to perform homomorphic abnormality detection with 80-bit security level in two clinical conditions - Retinopathy of Prematurity (ROP) and Diabetic Retinopathy (DR). The ROP dataset comprises 96 x 96 grayscale images, while the DR dataset comprises 256 x 256 RGB images. We demonstrate over 45x improvement in memory efficiency and 4-5x speedup in inference over the interleaved packing schemes. As our approach enables memory-efficient low-latency HE inference without imposing additional communication burden, it has implications for practical and secure deep learning inference in clinical imaging. △ Less

Submitted 28 January, 2019; originally announced January 2019.

arXiv:1901.02064 [pdf, other]

Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

Authors: Xue Geng, Jie Fu, Bin Zhao, Jie Lin, Mohamed M. Sabry Aly, Christopher Pal, Vijay Chandrasekhar

Abstract: This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quantization operations would incur less information los… ▽ More This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quantization operations would incur less information loss and thus improve the final performance. It first introduces a quantization scheme with efficient bit-shifting and rounding operations to represent network parameters and activations in low precision. Then it restructures the network architectures to form unified modules for optimization on the quantized model. Extensive experiments on ImageNet and KITTI validate the effectiveness of our model, demonstrating that state-of-the-art results for various tasks can be achieved by this quantized model. Besides, we designed and synthesized an RTL model to measure the hardware costs among various quantization methods. For each quantization operation, it reduces area cost by about 15 times and energy consumption by about 9 times, compared to a strong baseline. △ Less

Submitted 4 January, 2019; originally announced January 2019.

Journal ref: Data Compression Conference 2019

arXiv:1812.07832 [pdf, other]

Semi-Supervised Deep Learning for Abnormality Classification in Retinal Images

Authors: Bruno Lecouat, Ken Chang, Chuan-Sheng Foo, Balagopal Unnikrishnan, James M. Brown, Houssam Zenati, Andrew Beers, Vijay Chandrasekhar, Jayashree Kalpathy-Cramer, Pavitra Krishnaswamy

Abstract: Supervised deep learning algorithms have enabled significant performance gains in medical image classification tasks. But these methods rely on large labeled datasets that require resource-intensive expert annotation. Semi-supervised generative adversarial network (GAN) approaches offer a means to learn from limited labeled data alongside larger unlabeled datasets, but have not been applied to dis… ▽ More Supervised deep learning algorithms have enabled significant performance gains in medical image classification tasks. But these methods rely on large labeled datasets that require resource-intensive expert annotation. Semi-supervised generative adversarial network (GAN) approaches offer a means to learn from limited labeled data alongside larger unlabeled datasets, but have not been applied to discern fine-scale, sparse or localized features that define medical abnormalities. To overcome these limitations, we propose a patch-based semi-supervised learning approach and evaluate performance on classification of diabetic retinopathy from funduscopic images. Our semi-supervised approach achieves high AUC with just 10-20 labeled training images, and outperforms the supervised baselines by upto 15% when less than 30% of the training dataset is labeled. Further, our method implicitly enables interpretation of the SSL predictions. As this approach enables good accuracy, resolution and interpretability with lower annotation burden, it sets the pathway for scalable applications of deep learning in clinical imaging. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/227

arXiv:1812.02288 [pdf, other]

Adversarially Learned Anomaly Detection

Authors: Houssam Zenati, Manon Romain, Chuan Sheng Foo, Bruno Lecouat, Vijay Ramaseshan Chandrasekhar

Abstract: Anomaly detection is a significant and hence well-studied problem. However, developing effective anomaly detection methods for complex and high-dimensional data remains a challenge. As Generative Adversarial Networks (GANs) are able to model the complex high-dimensional distributions of real-world data, they offer a promising approach to address this challenge. In this work, we propose an anomaly… ▽ More Anomaly detection is a significant and hence well-studied problem. However, developing effective anomaly detection methods for complex and high-dimensional data remains a challenge. As Generative Adversarial Networks (GANs) are able to model the complex high-dimensional distributions of real-world data, they offer a promising approach to address this challenge. In this work, we propose an anomaly detection method, Adversarially Learned Anomaly Detection (ALAD) based on bi-directional GANs, that derives adversarially learned features for the anomaly detection task. ALAD then uses reconstruction errors based on these adversarially learned features to determine if a data sample is anomalous. ALAD builds on recent advances to ensure data-space and latent-space cycle-consistencies and stabilize GAN training, which results in significantly improved anomaly detection performance. ALAD achieves state-of-the-art performance on a range of image and tabular datasets while being several hundred-fold faster at test time than the only published GAN-based method. △ Less

Submitted 5 December, 2018; originally announced December 2018.

Comments: In the Proceedings of the 20th IEEE International Conference on Data Mining (ICDM), 2018

arXiv:1811.12065 [pdf, other]

TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks

Authors: Lile Cai, Anne-Maelle Barneche, Arthur Herbout, Chuan Sheng Foo, Jie Lin, Vijay Ramaseshan Chandrasekhar, Mohamed M. Sabry

Abstract: Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing hardware accelerators for CNNs that provide improved inference performa… ▽ More Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing hardware accelerators for CNNs that provide improved inference performance and energy consumption compared to GPUs. Such embedded deep learning platforms differ in the amount of compute resources and memory-access bandwidth, which would affect performance and energy consumption of CNNs. It is therefore critical to consider the available hardware resources in the network architecture search. To this end, we introduce TEA-DNN, a NAS algorithm targeting multi-objective optimization of execution time, energy consumption, and classification accuracy of CNN workloads on embedded architectures. TEA-DNN leverages energy and execution time measurements on embedded hardware when exploring the Pareto-optimal curves across accuracy, execution time, and energy consumption and does not require additional effort to model the underlying hardware. We apply TEA-DNN for image classification on actual embedded platforms (NVIDIA Jetson TX2 and Intel Movidius Neural Compute Stick). We highlight the Pareto-optimal operating points that emphasize the necessity to explicitly consider hardware characteristics in the search process. To the best of our knowledge, this is the most comprehensive study of Pareto-optimal models across a range of hardware platforms using actual measurements on hardware to obtain objective values. △ Less

Submitted 21 October, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

Comments: Accepted by ISLPED2019

arXiv:1811.06231 [pdf, other]

Graph Convolutional Neural Networks for Polymers Property Prediction

Authors: Minggang Zeng, Jatin Nitin Kumar, Zeng Zeng, Ramasamy Savitha, Vijay Ramaseshan Chandrasekhar, Kedar Hippalgaonkar

Abstract: A fast and accurate predictive tool for polymer properties is demanding and will pave the way to iterative inverse design. In this work, we apply graph convolutional neural networks (GCNN) to predict the dielectric constant and energy bandgap of polymers. Using density functional theory (DFT) calculated properties as the ground truth, GCNN can achieve remarkable agreement with DFT results. Moreove… ▽ More A fast and accurate predictive tool for polymer properties is demanding and will pave the way to iterative inverse design. In this work, we apply graph convolutional neural networks (GCNN) to predict the dielectric constant and energy bandgap of polymers. Using density functional theory (DFT) calculated properties as the ground truth, GCNN can achieve remarkable agreement with DFT results. Moreover, we show that GCNN outperforms other machine learning algorithms. Our work proves that GCNN relies only on morphological data of polymers and removes the requirement for complicated hand-crafted descriptors, while still offering accuracy in fast predictions. △ Less

Submitted 15 November, 2018; originally announced November 2018.

Comments: Accepted for NIPS 2018 Workshop on Machine Learning for Molecules and Materials

arXiv:1811.06219 [pdf, other]

Predicting thermoelectric properties from crystal graphs and material descriptors - first application for functional materials

Authors: Leo Laugier, Daniil Bash, Jose Recatala, Hong Kuan Ng, Savitha Ramasamy, Chuan-Sheng Foo, Vijay R Chandrasekhar, Kedar Hippalgaonkar

Abstract: We introduce the use of Crystal Graph Convolutional Neural Networks (CGCNN), Fully Connected Neural Networks (FCNN) and XGBoost to predict thermoelectric properties. The dataset for the CGCNN is independent of Density Functional Theory (DFT) and only relies on the crystal and atomic information, while that for the FCNN is based on a rich attribute list mined from Materialsproject.org. The results… ▽ More We introduce the use of Crystal Graph Convolutional Neural Networks (CGCNN), Fully Connected Neural Networks (FCNN) and XGBoost to predict thermoelectric properties. The dataset for the CGCNN is independent of Density Functional Theory (DFT) and only relies on the crystal and atomic information, while that for the FCNN is based on a rich attribute list mined from Materialsproject.org. The results show that the optimized FCNN is three layer deep and is able to predict the scattering-time independent thermoelectric powerfactor much better than the CGCNN (or XGBoost), suggesting that bonding and density of states descriptors informed from materials science knowledge obtained partially from DFT are vital to predict functional properties. △ Less

Submitted 15 November, 2018; originally announced November 2018.

arXiv:1811.04595 [pdf, other]

Holistic Multi-modal Memory Network for Movie Question Answering

Authors: Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar

Abstract: Answering questions according to multi-modal context is a challenging problem as it requires a deep integration of different data sources. Existing approaches only employ partial interactions among data sources in one attention hop. In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal… ▽ More Answering questions according to multi-modal context is a challenging problem as it requires a deep integration of different data sources. Existing approaches only employ partial interactions among data sources in one attention hop. In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop. In addition, it takes answer choices into consideration during the context retrieval stage. Therefore, the proposed framework effectively integrates multi-modal context, question, and answer information, which leads to more informative context retrieved for question answering. Our HMMN framework achieves state-of-the-art accuracy on MovieQA dataset. Extensive ablation studies show the importance of holistic reasoning and contributions of different attention strategies. △ Less

Submitted 12 November, 2018; originally announced November 2018.

arXiv:1811.00778 [pdf, other]

Towards the AlexNet Moment for Homomorphic Encryption: HCNN, theFirst Homomorphic CNN on Encrypted Data with GPUs

Authors: Ahmad Al Badawi, Jin Chao, Jie Lin, Chan Fook Mun, Jun Jie Sim, Benjamin Hong Meng Tan, Xiao Nan, Khin Mi Mi Aung, Vijay Ramaseshan Chandrasekhar

Abstract: Deep Learning as a Service (DLaaS) stands as a promising solution for cloud-based inference applications. In this setting, the cloud has a pre-learned model whereas the user has samples on which she wants to run the model. The biggest concern with DLaaS is user privacy if the input samples are sensitive data. We provide here an efficient privacy-preserving system by employing high-end technologies… ▽ More Deep Learning as a Service (DLaaS) stands as a promising solution for cloud-based inference applications. In this setting, the cloud has a pre-learned model whereas the user has samples on which she wants to run the model. The biggest concern with DLaaS is user privacy if the input samples are sensitive data. We provide here an efficient privacy-preserving system by employing high-end technologies such as Fully Homomorphic Encryption (FHE), Convolutional Neural Networks (CNNs) and Graphics Processing Units (GPUs). FHE, with its widely-known feature of computing on encrypted data, empowers a wide range of privacy-concerned applications. This comes at high cost as it requires enormous computing power. In this paper, we show how to accelerate the performance of running CNNs on encrypted data with GPUs. We evaluated two CNNs to classify homomorphically the MNIST and CIFAR-10 datasets. Our solution achieved a sufficient security level (> 80 bit) and reasonable classification accuracy (99%) and (77.55%) for MNIST and CIFAR-10, respectively. In terms of latency, we could classify an image in 5.16 seconds and 304.43 seconds for MNIST and CIFAR-10, respectively. Our system can also classify a batch of images (> 8,000) without extra overhead. △ Less

Submitted 18 August, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

arXiv:1808.07272 [pdf, other]

doi 10.1145/3240508.3240713

Deep Adaptive Temporal Pooling for Activity Recognition

Authors: Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal

Abstract: Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling long-term temporal importance and determining the activity relevance of different temporal segments in a video. To address this problem, we propose a learnable and differentiable module: Deep Adaptive Temporal Pooling (DATP). DATP applies a… ▽ More Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling long-term temporal importance and determining the activity relevance of different temporal segments in a video. To address this problem, we propose a learnable and differentiable module: Deep Adaptive Temporal Pooling (DATP). DATP applies a self-attention mechanism to adaptively pool the classification scores of different video segments. Specifically, using frame-level features, DATP regresses importance of different temporal segments and generates weights for them. Remarkably, DATP is trained using only the video-level label. There is no need of additional supervision except video-level activity class label. We conduct extensive experiments to investigate various input features and different weight models. Experimental results show that DATP can learn to assign large weights to key video segments. More importantly, DATP can improve training of frame-level feature extractor. This is because relevant temporal segments are assigned large weights during back-propagation. Overall, we achieve state-of-the-art performance on UCF101, HMDB51 and Kinetics datasets. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Comments: Accepted by ACM Multimedia 2018

arXiv:1807.04307 [pdf, other]

Manifold regularization with GANs for semi-supervised learning

Authors: Bruno Lecouat, Chuan-Sheng Foo, Houssam Zenati, Vijay Chandrasekhar

Abstract: Generative Adversarial Networks are powerful generative models that are able to model the manifold of natural images. We leverage this property to perform manifold regularization by approximating a variant of the Laplacian norm using a Monte Carlo approximation that is easily computed with the GAN. When incorporated into the semi-supervised feature-matching GAN we achieve state-of-the-art results… ▽ More Generative Adversarial Networks are powerful generative models that are able to model the manifold of natural images. We leverage this property to perform manifold regularization by approximating a variant of the Laplacian norm using a Monte Carlo approximation that is easily computed with the GAN. When incorporated into the semi-supervised feature-matching GAN we achieve state-of-the-art results for GAN-based semi-supervised learning on CIFAR-10 and SVHN benchmarks, with a method that is significantly easier to implement than competing methods. We also find that manifold regularization improves the quality of generated images, and is affected by the quality of the GAN used to approximate the regularizer. △ Less

Submitted 11 July, 2018; originally announced July 2018.

arXiv:1807.02629 [pdf, other]

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Authors: Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, Georgios Piliouras

Abstract: Owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around convex-concave (or even linear) problems; however, making theoretical inroads towards efficient GAN training depends crucially on moving beyond this classic framework. To m… ▽ More Owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around convex-concave (or even linear) problems; however, making theoretical inroads towards efficient GAN training depends crucially on moving beyond this classic framework. To make piecemeal progress along these lines, we analyze the behavior of mirror descent (MD) in a class of non-monotone problems whose solutions coincide with those of a naturally associated variational inequality - a property which we call coherence. We first show that ordinary, "vanilla" MD converges under a strict version of this condition, but not otherwise; in particular, it may fail to converge even in bilinear models with a unique solution. We then show that this deficiency is mitigated by optimism: by taking an "extra-gradient" step, optimistic mirror descent (OMD) converges in all coherent problems. Our analysis generalizes and extends the results of Daskalakis et al. (2018) for optimistic gradient descent (OGD) in bilinear problems, and makes concrete headway for establishing convergence beyond convex-concave games. We also provide stochastic analogues of these results, and we validate our analysis by numerical experiments in a wide array of GAN models (including Gaussian mixture models, as well as the CelebA and CIFAR-10 datasets). △ Less

Submitted 1 October, 2018; v1 submitted 7 July, 2018; originally announced July 2018.

Comments: 26 pages, 14 figures

arXiv:1806.04498 [pdf, other]

The Unusual Effectiveness of Averaging in GAN Training

Authors: Yasin Yazıcı, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar

Abstract: We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum. Whilst MA is known to lead to convergence in bilinear settings, we provide the -- to our knowledge -- first theoretical arguments in support of EMA. We show that EMA converges to… ▽ More We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum. Whilst MA is known to lead to convergence in bilinear settings, we provide the -- to our knowledge -- first theoretical arguments in support of EMA. We show that EMA converges to limit cycles around the equilibrium with vanishing amplitude as the discount parameter approaches one for simple bilinear games and also enhances the stability of general GAN training. We establish experimentally that both techniques are strikingly effective in the non-convex-concave GAN setting as well. Both improve inception and FID scores on different architectures and for different GAN objectives. We provide comprehensive experimental results across a range of datasets -- mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet -- to demonstrate its effectiveness. We achieve state-of-the-art results on CIFAR-10 and produce clean CelebA face images.\footnote{~The code is available at \url{https://github.com/yasinyazici/EMA_GAN}} △ Less

Submitted 26 February, 2019; v1 submitted 12 June, 2018; originally announced June 2018.

Comments: Published as a conference paper at ICLR 2019

arXiv:1805.08957 [pdf, other]

Semi-Supervised Learning with GANs: Revisiting Manifold Regularization

Authors: Bruno Lecouat, Chuan-Sheng Foo, Houssam Zenati, Vijay R. Chandrasekhar

Abstract: GANS are powerful generative models that are able to model the manifold of natural images. We leverage this property to perform manifold regularization by approximating the Laplacian norm using a Monte Carlo approximation that is easily computed with the GAN. When incorporated into the feature-matching GAN of Improved GAN, we achieve state-of-the-art results for GAN-based semi-supervised learning… ▽ More GANS are powerful generative models that are able to model the manifold of natural images. We leverage this property to perform manifold regularization by approximating the Laplacian norm using a Monte Carlo approximation that is easily computed with the GAN. When incorporated into the feature-matching GAN of Improved GAN, we achieve state-of-the-art results for GAN-based semi-supervised learning on the CIFAR-10 dataset, with a method that is significantly easier to implement than competing methods. △ Less

Submitted 23 May, 2018; originally announced May 2018.

Comments: Accepted paper

Journal ref: Workshop track - ICLR 2018

arXiv:1803.11246 [pdf]

doi 10.1016/j.joule.2018.05.009

Accelerating Materials Development via Automation, Machine Learning, and High-Performance Computing

Authors: Juan Pablo Correa-Baena, Kedar Hippalgaonkar, Jeroen van Duren, Shaffiq Jaffer, Vijay R. Chandrasekhar, Vladan Stevanovic, Cyrus Wadia, Supratik Guha, Tonio Buonassisi

Abstract: Successful materials innovations can transform society. However, materials research often involves long timelines and low success probabilities, dissuading investors who have expectations of shorter times from bench to business. A combination of emergent technologies could accelerate the pace of novel materials development by 10x or more, aligning the timelines of stakeholders (investors and resea… ▽ More Successful materials innovations can transform society. However, materials research often involves long timelines and low success probabilities, dissuading investors who have expectations of shorter times from bench to business. A combination of emergent technologies could accelerate the pace of novel materials development by 10x or more, aligning the timelines of stakeholders (investors and researchers), markets, and the environment, while increasing return-on-investment. First, tool automation enables rapid experimental testing of candidate materials. Second, high-throughput computing (HPC) concentrates experimental bandwidth on promising compounds by predicting and inferring bulk, interface, and defect-related properties. Third, machine learning connects the former two, where experimental outputs automatically refine theory and help define next experiments. We describe state-of-the-art attempts to realize this vision and identify resource gaps. We posit that over the coming decade, this combination of tools will transform the way we perform materials research. There are considerable first-mover advantages at stake, especially for grand challenges in energy and related fields, including computing, healthcare, urbanization, water, food, and the environment. △ Less

Submitted 20 March, 2018; originally announced March 2018.

Comments: 22 pages, 3 figures

Journal ref: Joule 2 (2018) 1410-1420

arXiv:1803.02043 [pdf, other]

Online Deep Learning: Growing RBM on the fly

Authors: Savitha Ramasamy, Kanagasabai Rajaraman, Pavitra Krishnaswamy, Vijay Chandrasekhar

Abstract: We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representat… ▽ More We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representation at the hidden layer and (2) a discriminative phase for classification. The online generative training begins with zero neurons in the hidden layer, adds and updates the neurons to adapt to statistics of streaming data in a single pass unsupervised manner, resulting in a feature representation best suited to the data. The discriminative phase is based on stochastic gradient descent and associates the represented features to the class labels. We demonstrate the OGD-RBM on a set of multi-category and binary classification problems for data sets having varying degrees of class-imbalance. We first apply the OGD-RBM algorithm on the multi-class MNIST dataset to characterize the network evolution. We demonstrate that the online generative phase converges to a stable, concise network architecture, wherein individual neurons are inherently discriminative to the class labels despite unsupervised training. We then benchmark OGD-RBM performance to other machine learning, neural network and ClassRBM techniques for credit scoring applications using 3 public non-stationary two-class credit datasets with varying degrees of class-imbalance. We report that OGD-RBM improves accuracy by 2.5-3% over batch learning techniques while requiring at least 24%-70% fewer neurons and fewer training samples. This online generative training approach can be extended greedily to multiple layers for training Deep Belief Networks in non-stationary data mining applications without the need for a priori fixed architectures. △ Less

Submitted 6 March, 2018; originally announced March 2018.

Comments: 14 pages, 4 figures, 2 tables

arXiv:1802.06222 [pdf, ps, other]

Efficient GAN-Based Anomaly Detection

Authors: Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, Vijay Ramaseshan Chandrasekhar

Abstract: Generative adversarial networks (GANs) are able to model the complex highdimensional distributions of real-world data, which suggests they could be effective for anomaly detection. However, few works have explored the use of GANs for the anomaly detection task. We leverage recently developed GAN models for anomaly detection, and achieve state-of-the-art performance on image and network intrusion d… ▽ More Generative adversarial networks (GANs) are able to model the complex highdimensional distributions of real-world data, which suggests they could be effective for anomaly detection. However, few works have explored the use of GANs for the anomaly detection task. We leverage recently developed GAN models for anomaly detection, and achieve state-of-the-art performance on image and network intrusion datasets, while being several hundred-fold faster at test time than the only published GAN-based method. △ Less

Submitted 1 May, 2019; v1 submitted 17 February, 2018; originally announced February 2018.

Comments: Updated version of this work is published at ICDM 2018, see arXiv:1812.02288 . Submitted to the ICLR Workshop 2018

arXiv:1711.01714 [pdf, other]

End-to-End Video Classification with Knowledge Graphs

Authors: Fang Yuan, Zhe Wang, Jie Lin, Luis Fernando D'Haro, Kim Jung Jae, Zeng Zeng, Vijay Chandrasekhar

Abstract: Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning approaches including deep neural networks largely f… ▽ More Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning approaches including deep neural networks largely focus on the representations of the given data, humans often look beyond the data at hand and leverage external knowledge to make better decisions. Towards narrowing the gap, we propose to incorporate external knowledge graphs into video classification. In particular, we unify traditional "knowledgeless" machine learning models and knowledge graphs in a novel end-to-end framework. The framework is flexible to work with most existing video classification algorithms including state-of-the-art deep models. Finally, we conduct extensive experiments on the largest public video dataset YouTube-8M. The results are promising across the board, improving mean average precision by up to 2.9%. △ Less

Submitted 5 November, 2017; originally announced November 2017.

Comments: 9 pages, 5 figures

arXiv:1707.05455 [pdf, ps, other]

Pruning Convolutional Neural Networks for Image Instance Retrieval

Authors: Gaurav Manek, Jie Lin, Vijay Chandrasekhar, Lingyu Duan, Sateesh Giduthuri, Xiaoli Li, Tomaso Poggio

Abstract: In this work, we focus on the problem of image instance retrieval with deep descriptors extracted from pruned Convolutional Neural Networks (CNN). The objective is to heavily prune convolutional edges while maintaining retrieval performance. To this end, we introduce both data-independent and data-dependent heuristics to prune convolutional edges, and evaluate their performance across various comp… ▽ More In this work, we focus on the problem of image instance retrieval with deep descriptors extracted from pruned Convolutional Neural Networks (CNN). The objective is to heavily prune convolutional edges while maintaining retrieval performance. To this end, we introduce both data-independent and data-dependent heuristics to prune convolutional edges, and evaluate their performance across various compression rates with different deep descriptors over several benchmark datasets. Further, we present an end-to-end framework to fine-tune the pruned network, with a triplet loss function specially designed for the retrieval task. We show that the combination of heuristic pruning and fine-tuning offers 5x compression rate without considerable loss in retrieval performance. △ Less

Submitted 17 July, 2017; originally announced July 2017.

Comments: 5 pages

arXiv:1706.05461 [pdf, other]

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Authors: Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar

Abstract: The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, a… ▽ More The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text. The newly introduced text data is termed as YouTube-8M-Text. We present a classification framework for the joint use of text, visual and audio features, and conduct an extensive set of experiments to quantify the benefit that this additional mode brings. The inclusion of text yields state-of-the-art results, e.g. 86.7% GAP on the YouTube-8M-Text validation dataset. △ Less

Submitted 9 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

Comments: 8 pages, Accepted to CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding

arXiv:1705.09435 [pdf, other]

Deep Learning for Lung Cancer Detection: Tackling the Kaggle Data Science Bowl 2017 Challenge

Authors: Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Huiling Chen, Jie Lin, Babar Nazir, Cen Chen, Tse Chiang Howe, Zeng Zeng, Vijay Chandrasekhar

Abstract: We present a deep learning framework for computer-aided lung cancer diagnosis. Our multi-stage framework detects nodules in 3D lung CAT scans, determines if each nodule is malignant, and finally assigns a cancer probability based on these results. We discuss the challenges and advantages of our framework. In the Kaggle Data Science Bowl 2017, our framework ranked 41st out of 1972 teams. We present a deep learning framework for computer-aided lung cancer diagnosis. Our multi-stage framework detects nodules in 3D lung CAT scans, determines if each nodule is malignant, and finally assigns a cancer probability based on these results. We discuss the challenges and advantages of our framework. In the Kaggle Data Science Bowl 2017, our framework ranked 41st out of 1972 teams. △ Less

Submitted 26 May, 2017; originally announced May 2017.

arXiv:1704.08141 [pdf, other]

Compact Descriptors for Video Analysis: the Emerging MPEG Standard

Authors: Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, Alex Chichung Kot, Wen Gao

Abstract: This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniques aiming to reduce the descriptor size and impro… ▽ More This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniques aiming to reduce the descriptor size and improve the video representation ability have been proposed. This article describes the new standard that is being developed and reports the performance of these key technical contributions. △ Less

Submitted 26 April, 2017; originally announced April 2017.

Comments: 4 figures, 4 tables

arXiv:1701.04923 [pdf, other]

Compression of Deep Neural Networks for Image Instance Retrieval

Authors: Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Lingyu Duan, Tomaso Poggio

Abstract: Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds… ▽ More Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds of megabytes of storage making them inconvenient to deploy in mobile applications or in custom hardware. In this work, we study the problem of neural network model compression focusing on the image instance retrieval task. We study quantization, coding, pruning and weight sharing techniques for reducing model size for the instance retrieval problem. We provide extensive experimental results on the trade-off between retrieval performance and model size for different types of networks on several data sets providing the most comprehensive study on this topic. We compress models to the order of a few MBs: two orders of magnitude smaller than the uncompressed models while achieving negligible loss in retrieval performance. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Comments: 10 pages, accepted by DCC 2017

arXiv:1603.04595 [pdf, other]

Nested Invariance Pooling and RBM Hashing for Image Instance Retrieval

Authors: Olivier Morère, Jie Lin, Antoine Veillard, Vijay Chandrasekhar, Tomaso Poggio

Abstract: The goal of this work is the computation of very compact binary hashes for image instance retrieval. Our approach has two novel contributions. The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks. NIP is able to produce compact and well-performing descriptors with vis… ▽ More The goal of this work is the computation of very compact binary hashes for image instance retrieval. Our approach has two novel contributions. The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks. NIP is able to produce compact and well-performing descriptors with visual representations extracted from convolutional neural networks. We specifically incorporate scale, translation and rotation invariances but the scheme can be extended to any arbitrary sets of transformations. We also show that using moments of increasing order throughout nesting is important. The NIP descriptors are then hashed to the target code size (32-256 bits) with a Restricted Boltzmann Machine with a novel batch-level regularization scheme specifically designed for the purpose of hashing (RBMH). A thorough empirical evaluation with state-of-the-art shows that the results obtained both with the NIP descriptors and the NIP+RBMH hashes are consistently outstanding across a wide range of datasets. △ Less

Submitted 14 April, 2016; v1 submitted 15 March, 2016; originally announced March 2016.

Comments: Image Instance Retrieval, CNN, Invariant Representation, Hashing, Unsupervised Learning, Regularization. arXiv admin note: text overlap with arXiv:1601.02093

arXiv:1601.06603 [pdf, other]

Egocentric Activity Recognition with Multimodal Fisher Vector

Authors: Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal, Jie Lin

Abstract: With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this paper, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data of 20 fine-grained and diverse activity categories. We present a novel strategy to extract temporal trajectory-like features from sensor data. We prop… ▽ More With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this paper, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data of 20 fine-grained and diverse activity categories. We present a novel strategy to extract temporal trajectory-like features from sensor data. We propose to apply the Fisher Kernel framework to fuse video and temporal enhanced sensor features. Experiment results show that with careful design of feature extraction and fusion algorithm, sensor data can enhance information-rich video data. We make publicly available the Multimodal Egocentric Activity dataset to facilitate future research. △ Less

Submitted 25 January, 2016; originally announced January 2016.

Comments: 5 pages, 4 figures, ICASSP 2016 accepted

arXiv:1601.02093 [pdf, other]

Group Invariant Deep Representations for Image Instance Retrieval

Authors: Olivier Morère, Antoine Veillard, Jie Lin, Julie Petta, Vijay Chandrasekhar, Tomaso Poggio

Abstract: Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors for image instance retrieval.… ▽ More Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors for image instance retrieval. While CNN-based descriptors are generally remarked for good retrieval performance at lower bitrates, they nevertheless present a number of drawbacks including the lack of robustness to common object transformations such as rotations compared with their interest point based FV counterparts. In this paper, we propose a method for computing invariant global descriptors from CNNs. Our method implements a recently proposed mathematical theory for invariance in a sensory cortex modeled as a feedforward neural network. The resulting global descriptors can be made invariant to multiple arbitrary transformation groups while retaining good discriminativeness. Based on a thorough empirical evaluation using several publicly available datasets, we show that our method is able to significantly and consistently improve retrieval results every time a new type of invariance is incorporated. We also show that our method which has few parameters is not prone to overfitting: improvements generalize well across datasets with different properties with regard to invariances. Finally, we show that our descriptors are able to compare favourably to other state-of-the-art compact descriptors in similar bitranges, exceeding the highest retrieval results reported in the literature on some datasets. A dedicated dimensionality reduction step --quantization or hashing-- may be able to further improve the competitiveness of the descriptors. △ Less

Submitted 13 January, 2016; v1 submitted 9 January, 2016; originally announced January 2016.

arXiv:1511.03055 [pdf, other]

Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing

Authors: Jie Lin, Olivier Morère, Julie Petta, Vijay Chandrasekhar, Antoine Veillard

Abstract: A typical image retrieval pipeline starts with the comparison of global descriptors from a large database to find a short list of candidate matches. A good image descriptor is key to the retrieval pipeline and should reconcile two contradictory requirements: providing recall rates as high as possible and being as compact as possible for fast matching. Following the recent successes of Deep Convolu… ▽ More A typical image retrieval pipeline starts with the comparison of global descriptors from a large database to find a short list of candidate matches. A good image descriptor is key to the retrieval pipeline and should reconcile two contradictory requirements: providing recall rates as high as possible and being as compact as possible for fast matching. Following the recent successes of Deep Convolutional Neural Networks (DCNN) for large scale image classification, descriptors extracted from DCNNs are increasingly used in place of the traditional hand crafted descriptors such as Fisher Vectors (FV) with better retrieval performances. Nevertheless, the dimensionality of a typical DCNN descriptor --extracted either from the visual feature pyramid or the fully-connected layers-- remains quite high at several thousands of scalar values. In this paper, we propose Unsupervised Triplet Hashing (UTH), a fully unsupervised method to compute extremely compact binary hashes --in the 32-256 bits range-- from high-dimensional global descriptors. UTH consists of two successive deep learning steps. First, Stacked Restricted Boltzmann Machines (SRBM), a type of unsupervised deep neural nets, are used to learn binary embedding functions able to bring the descriptor size down to the desired bitrate. SRBMs are typically able to ensure a very high compression rate at the expense of loosing some desirable metric properties of the original DCNN descriptor space. Then, triplet networks, a rank learning scheme based on weight sharing nets is used to fine-tune the binary embedding functions to retain as much as possible of the useful metric properties of the original space. A thorough empirical evaluation conducted on multiple publicly available dataset using DCNN descriptors shows that our method is able to significantly outperform state-of-the-art unsupervised schemes in the target bit range. △ Less

Submitted 10 November, 2015; originally announced November 2015.

MSC Class: 68P20 ACM Class: H.3.3; I.2.6

arXiv:1508.02496 [pdf, other]

A Practical Guide to CNNs and Fisher Vectors for Image Instance Retrieval

Authors: Vijay Chandrasekhar, Jie Lin, Olivier Morère, Hanlin Goh, Antoine Veillard

Abstract: With deep learning becoming the dominant approach in computer vision, the use of representations extracted from Convolutional Neural Nets (CNNs) is quickly gaining ground on Fisher Vectors (FVs) as favoured state-of-the-art global image descriptors for image instance retrieval. While the good performance of CNNs for image classification are unambiguously recognised, which of the two has the upper… ▽ More With deep learning becoming the dominant approach in computer vision, the use of representations extracted from Convolutional Neural Nets (CNNs) is quickly gaining ground on Fisher Vectors (FVs) as favoured state-of-the-art global image descriptors for image instance retrieval. While the good performance of CNNs for image classification are unambiguously recognised, which of the two has the upper hand in the image retrieval context is not entirely clear yet. In this work, we propose a comprehensive study that systematically evaluates FVs and CNNs for image retrieval. The first part compares the performances of FVs and CNNs on multiple publicly available data sets. We investigate a number of details specific to each method. For FVs, we compare sparse descriptors based on interest point detectors with dense single-scale and multi-scale variants. For CNNs, we focus on understanding the impact of depth, architecture and training data on retrieval results. Our study shows that no descriptor is systematically better than the other and that performance gains can usually be obtained by using both types together. The second part of the study focuses on the impact of geometrical transformations such as rotations and scale changes. FVs based on interest point detectors are intrinsically resilient to such transformations while CNNs do not have a built-in mechanism to ensure such invariance. We show that performance of CNNs can quickly degrade in presence of rotations while they are far less affected by changes in scale. We then propose a number of ways to incorporate the required invariances in the CNN pipeline. Overall, our work is intended as a reference guide offering practically useful and simply implementable guidelines to anyone looking for state-of-the-art global descriptors best suited to their specific image instance retrieval problem. △ Less

Submitted 25 August, 2015; v1 submitted 11 August, 2015; originally announced August 2015.

Comments: Deep Convolutional Neural Networks for instance retrieval, Fisher Vectors, instance retrieval

arXiv:1501.07738 [pdf, other]

Co-Regularized Deep Representations for Video Summarization

Authors: Olivier Morère, Hanlin Goh, Antoine Veillard, Vijay Chandrasekhar, Jie Lin

Abstract: Compact keyframe-based video summaries are a popular way of generating viewership on video sharing platforms. Yet, creating relevant and compelling summaries for arbitrarily long videos with a small number of keyframes is a challenging task. We propose a comprehensive keyframe-based summarization framework combining deep convolutional neural networks and restricted Boltzmann machines. An original… ▽ More Compact keyframe-based video summaries are a popular way of generating viewership on video sharing platforms. Yet, creating relevant and compelling summaries for arbitrarily long videos with a small number of keyframes is a challenging task. We propose a comprehensive keyframe-based summarization framework combining deep convolutional neural networks and restricted Boltzmann machines. An original co-regularization scheme is used to discover meaningful subject-scene associations. The resulting multimodal representations are then used to select highly-relevant keyframes. A comprehensive user study is conducted comparing our proposed method to a variety of schemes, including the summarization currently in use by one of the most popular video sharing websites. The results show that our method consistently outperforms the baseline schemes for any given amount of keyframes both in terms of attractiveness and informativeness. The lead is even more significant for smaller summaries. △ Less

Submitted 30 January, 2015; originally announced January 2015.

Comments: Video summarization, deep convolutional neural networks, co-regularized restricted Boltzmann machines

arXiv:1501.04711 [pdf, other]

DeepHash: Getting Regularization, Depth and Fine-Tuning Right

Authors: Jie Lin, Olivier Morere, Vijay Chandrasekhar, Antoine Veillard, Hanlin Goh

Abstract: This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each requiring solutions specific to the hashing proble… ▽ More This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each requiring solutions specific to the hashing problem. In-depth evaluation shows that our scheme consistently outperforms state-of-the-art methods across all data sets for both Fisher Vectors and Deep Convolutional Neural Network features, by up to 20 percent over other schemes. The retrieval performance with 256-bit hashes is close to that of the uncompressed floating point features -- a remarkable 512 times compression. △ Less

Submitted 19 January, 2015; originally announced January 2015.

arXiv:1112.1344 [pdf]

Enhanced Inter-cell Interference Coordination for Heterogeneous Networks in LTE-Advanced: A Survey

Authors: Lars Lindbom, Robert Love, Sandeep Krishnamurthy, Chunhai Yao, Nobuhiko Miki, Vikram Chandrasekhar

Abstract: Heterogeneous networks (het-nets) - comprising of conventional macrocell base stations overlaid with femtocells, picocells and wireless relays - offer cellular operators burgeoning traffic demands through cell-splitting gains obtained by bringing users closer to their access points. However, the often random and unplanned location of these access points can cause severe near-far problems, typicall… ▽ More Heterogeneous networks (het-nets) - comprising of conventional macrocell base stations overlaid with femtocells, picocells and wireless relays - offer cellular operators burgeoning traffic demands through cell-splitting gains obtained by bringing users closer to their access points. However, the often random and unplanned location of these access points can cause severe near-far problems, typically solved by coordinating base-station transmissions to minimize interference. Towards this direction, the 3rd generation partnership project Long Term Evolution-Advanced (3GPP-LTE or Rel-10) standard introduces time-domain inter-cell interference coordination (ICIC) for facilitating a seamless deployment of a het-net overlay. This article surveys the key features encompassing the physical layer, network layer and back-hauling aspects of time-domain ICIC in Rel-10. △ Less

Submitted 7 December, 2011; v1 submitted 6 December, 2011; originally announced December 2011.

Comments: This is a working document describing the Enhanced Inter-cell Interference Coordination (E-ICIC) introduced in LTE-Advanced

arXiv:1002.2964 [pdf, ps, other]

doi 10.1109/TWC.2010.101310.100231

Open vs Closed Access Femtocells in the Uplink

Authors: Ping Xia, Vikram Chandrasekhar, Jeffrey G. Andrews

Abstract: Femtocells are assuming an increasingly important role in the coverage and capacity of cellular networks. In contrast to existing cellular systems, femtocells are end-user deployed and controlled, randomly located, and rely on third party backhaul (e.g. DSL or cable modem). Femtocells can be configured to be either open access or closed access. Open access allows an arbitrary nearby cellular use… ▽ More Femtocells are assuming an increasingly important role in the coverage and capacity of cellular networks. In contrast to existing cellular systems, femtocells are end-user deployed and controlled, randomly located, and rely on third party backhaul (e.g. DSL or cable modem). Femtocells can be configured to be either open access or closed access. Open access allows an arbitrary nearby cellular user to use the femtocell, whereas closed access restricts the use of the femtocell to users explicitly approved by the owner. Seemingly, the network operator would prefer an open access deployment since this provides an inexpensive way to expand their network capabilities, whereas the femtocell owner would prefer closed access, in order to keep the femtocell's capacity and backhaul to himself. We show mathematically and through simulations that the reality is more complicated for both parties, and that the best approach depends heavily on whether the multiple access scheme is orthogonal (TDMA or OFDMA, per subband) or non-orthogonal (CDMA). In a TDMA/OFDMA network, closed-access is typically preferable at high user densities, whereas in CDMA, open access can provide gains of more than 200% for the home user by reducing the near-far problem experienced by the femtocell. The results of this paper suggest that the interests of the femtocell owner and the network operator are more compatible than typically believed, and that CDMA femtocells should be configured for open access whereas OFDMA or TDMA femtocells should adapt to the cellular user density. △ Less

Submitted 15 February, 2010; originally announced February 2010.

Comments: 21 pages, 8 figures, 2 tables, submitted to IEEE Trans. on Wireless Communications

arXiv:0902.3210 [pdf, ps, other]

doi 10.1109/TWC.2009.090241

Coverage in Multi-Antenna Two-Tier Networks

Authors: Vikram Chandrasekhar, Marios Kountouris, Jeffrey G. Andrews

Abstract: In two-tier networks -- comprising a conventional cellular network overlaid with shorter range hotspots (e.g. femtocells, distributed antennas, or wired relays) -- with universal frequency reuse, the near-far effect from cross-tier interference creates dead spots where reliable coverage cannot be guaranteed to users in either tier. Equipping the macrocell and femtocells with multiple antennas en… ▽ More In two-tier networks -- comprising a conventional cellular network overlaid with shorter range hotspots (e.g. femtocells, distributed antennas, or wired relays) -- with universal frequency reuse, the near-far effect from cross-tier interference creates dead spots where reliable coverage cannot be guaranteed to users in either tier. Equipping the macrocell and femtocells with multiple antennas enhances robustness against the near-far problem. This work derives the maximum number of simultaneously transmitting multiple antenna femtocells meeting a per-tier outage probability constraint. Coverage dead zones are presented wherein cross-tier interference bottlenecks cellular and hotspot coverage. Two operating regimes are shown namely 1) a cellular-limited regime in which femtocell users experience unacceptable cross-tier interference and 2) a hotspot-limited regime wherein both femtocell users and cellular users are limited by hotspot interference. Our analysis accounts for the per-tier transmit powers, the number of transmit antennas (single antenna transmission being a special case) and terrestrial propagation such as the Rayleigh fading and the path loss exponents. Single-user (SU) multiple antenna transmission at each tier is shown to provide significantly superior coverage and spatial reuse relative to multiuser (MU) transmission. We propose a decentralized carrier-sensing approach to regulate femtocell transmission powers based on their location. Considering a worst-case cell-edge location, simulations using typical path loss scenarios show that our interference management strategy provides reliable cellular coverage with about 60 femtocells per cellsite. △ Less

Submitted 4 May, 2009; v1 submitted 18 February, 2009; originally announced February 2009.

Comments: 30 Pages, 11 figures, Revised and Resubmitted to IEEE Transactions on Wireless Communications

arXiv:0810.3869 [pdf, ps, other]

doi 10.1109/TWC.2009.081386

Power Control in Two-Tier Femtocell Networks

Authors: Vikram Chandrasekhar, Jeffrey G. Andrews, Tarik Muharemovic, Zukang Shen, Alan Gatherer

Abstract: In a two tier cellular network -- comprised of a central macrocell underlaid with shorter range femtocell hotspots -- cross-tier interference limits overall capacity with universal frequency reuse. To quantify near-far effects with universal frequency reuse, this paper derives a fundamental relation providing the largest feasible cellular Signal-to-Interference-Plus-Noise Ratio (SINR), given any… ▽ More In a two tier cellular network -- comprised of a central macrocell underlaid with shorter range femtocell hotspots -- cross-tier interference limits overall capacity with universal frequency reuse. To quantify near-far effects with universal frequency reuse, this paper derives a fundamental relation providing the largest feasible cellular Signal-to-Interference-Plus-Noise Ratio (SINR), given any set of feasible femtocell SINRs. We provide a link budget analysis which enables simple and accurate performance insights in a two-tier network. A distributed utility-based SINR adaptation at femtocells is proposed in order to alleviate cross-tier interference at the macrocell from cochannel femtocells. The Foschini-Miljanic (FM) algorithm is a special case of the adaptation. Each femtocell maximizes their individual utility consisting of a SINR based reward less an incurred cost (interference to the macrocell). Numerical results show greater than 30% improvement in mean femtocell SINRs relative to FM. In the event that cross-tier interference prevents a cellular user from obtaining its SINR target, an algorithm is proposed that reduces transmission powers of the strongest femtocell interferers. The algorithm ensures that a cellular user achieves its SINR target even with 100 femtocells/cell-site, and requires a worst case SINR reduction of only 16% at femtocells. These results motivate design of power control schemes requiring minimal network overhead in two-tier networks with shared spectrum. △ Less

Submitted 13 May, 2009; v1 submitted 21 October, 2008; originally announced October 2008.

Comments: 29 pages, 10 figures, Revised and resubmitted to the IEEE Transactions on Wireless Communications

arXiv:0805.1226 [pdf, ps, other]

Spectrum Allocation in Two-Tier Networks

Authors: Vikram Chandrasekhar, Jeffrey G. Andrews

Abstract: Two-tier networks, comprising a conventional cellular network overlaid with shorter range hotspots (e.g. femtocells, distributed antennas, or wired relays), offer an economically viable way to improve cellular system capacity. The capacity-limiting factor in such networks is interference. The cross-tier interference between macrocells and femtocells can suffocate the capacity due to the near-far… ▽ More Two-tier networks, comprising a conventional cellular network overlaid with shorter range hotspots (e.g. femtocells, distributed antennas, or wired relays), offer an economically viable way to improve cellular system capacity. The capacity-limiting factor in such networks is interference. The cross-tier interference between macrocells and femtocells can suffocate the capacity due to the near-far problem, so in practice hotspots should use a different frequency channel than the potentially nearby high-power macrocell users. Centralized or coordinated frequency planning, which is difficult and inefficient even in conventional cellular networks, is all but impossible in a two-tier network. This paper proposes and analyzes an optimum decentralized spectrum allocation policy for two-tier networks that employ frequency division multiple access (including OFDMA). The proposed allocation is optimal in terms of Area Spectral Efficiency (ASE), and is subjected to a sensible Quality of Service (QoS) requirement, which guarantees that both macrocell and femtocell users attain at least a prescribed data rate. Results show the dependence of this allocation on the QoS requirement, hotspot density and the co-channel interference from the macrocell and surrounding femtocells. Design interpretations of this result are provided. △ Less

Submitted 24 November, 2008; v1 submitted 8 May, 2008; originally announced May 2008.

Comments: 25 pages, Revised and submitted to IEEE Transactions on Communications

arXiv:0803.0952 [pdf]

doi 10.1109/MCOM.2008.4623708

Femtocell Networks: A Survey

Authors: Vikram Chandrasekhar, Jeffrey Andrews, Alan Gatherer

Abstract: The surest way to increase the system capacity of a wireless link is by getting the transmitter and receiver closer to each other, which creates the dual benefits of higher quality links and more spatial reuse. In a network with nomadic users, this inevitably involves deploying more infrastructure, typically in the form of microcells, hotspots, distributed antennas, or relays. A less expensive a… ▽ More The surest way to increase the system capacity of a wireless link is by getting the transmitter and receiver closer to each other, which creates the dual benefits of higher quality links and more spatial reuse. In a network with nomadic users, this inevitably involves deploying more infrastructure, typically in the form of microcells, hotspots, distributed antennas, or relays. A less expensive alternative is the recent concept of femtocells, also called home base-stations, which are data access points installed by home users get better indoor voice and data coverage. In this article, we overview the technical and business arguments for femtocells, and describe the state-of-the-art on each front. We also describe the technical challenges facing femtocell networks, and give some preliminary ideas for how to overcome them. △ Less

Submitted 20 September, 2008; v1 submitted 6 March, 2008; originally announced March 2008.

Comments: IEEE Communications Magazine, vol. 46, no.9, pp. 59-67, Sept. 2008

arXiv:cs/0702132 [pdf, ps, other]

Uplink Capacity and Interference Avoidance for Two-Tier Femtocell Networks

Authors: Vikram Chandrasekhar, Jeffrey G. Andrews

Abstract: Two-tier femtocell networks-- comprising a conventional macrocellular network plus embedded femtocell hotspots-- offer an economically viable solution to achieving high cellular user capacity and improved coverage. With universal frequency reuse and DS-CDMA transmission however, the ensuing cross-tier cochannel interference (CCI) causes unacceptable outage probability. This paper develops an upl… ▽ More Two-tier femtocell networks-- comprising a conventional macrocellular network plus embedded femtocell hotspots-- offer an economically viable solution to achieving high cellular user capacity and improved coverage. With universal frequency reuse and DS-CDMA transmission however, the ensuing cross-tier cochannel interference (CCI) causes unacceptable outage probability. This paper develops an uplink capacity analysis and interference avoidance strategy in such a two-tier CDMA network. We evaluate a network-wide area spectral efficiency metric called the \emph{operating contour (OC)} defined as the feasible combinations of the average number of active macrocell users and femtocell base stations (BS) per cell-site that satisfy a target outage constraint. The capacity analysis provides an accurate characterization of the uplink outage probability, accounting for power control, path-loss and shadowing effects. Considering worst case CCI at a corner femtocell, results reveal that interference avoidance through a time-hopped CDMA physical layer and sectorized antennas allows about a 7x higher femtocell density, relative to a split spectrum two-tier network with omnidirectional femtocell antennas. A femtocell exclusion region and a tier selection based handoff policy offers modest improvements in the OCs. These results provide guidelines for the design of robust shared spectrum two-tier networks. △ Less

Submitted 5 February, 2009; v1 submitted 22 February, 2007; originally announced February 2007.

Comments: To be published in the IEEE Transactions on Wireless Communications

Showing 1–45 of 45 results for author: Chandrasekhar, V