Search | arXiv e-print repository

Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

Authors: Hongjun Wang, Sagar Vaze, Kai Han

Abstract: Detecting test-time distribution shift has emerged as a key capability for safely deployed machine learning models, with the question being tackled under various guises in recent years. In this paper, we aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR). In particular, we aim to provide rigorous… ▽ More Detecting test-time distribution shift has emerged as a key capability for safely deployed machine learning models, with the question being tackled under various guises in recent years. In this paper, we aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR). In particular, we aim to provide rigorous empirical analysis of different methods across settings and provide actionable takeaways for practitioners and researchers. Concretely, we make the following contributions: (i) We perform rigorous cross-evaluation between state-of-the-art methods in the OOD detection and OSR settings and identify a strong correlation between the performances of methods for them; (ii) We propose a new, large-scale benchmark setting which we suggest better disentangles the problem tackled by OOD detection and OSR, re-evaluating state-of-the-art OOD detection and OSR methods in this setting; (iii) We surprisingly find that the best performing method on standard benchmarks (Outlier Exposure) struggles when tested at scale, while scoring rules which are sensitive to the deep feature magnitude consistently show promise; and (iv) We conduct empirical analysis to explain these phenomena and highlight directions for future research. Code: https://github.com/Visual-AI/Dissect-OOD-OSR △ Less

Submitted 29 August, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: Accepted to IJCV, preprint version; v2: add supplementary

arXiv:2408.04591 [pdf, other]

HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts

Authors: Hongjun Wang, Sagar Vaze, Kai Han

Abstract: Generalized Category Discovery (GCD) is a challenging task in which, given a partially labelled dataset, models must categorize all unlabelled instances, regardless of whether they come from labelled categories or from new ones. In this paper, we challenge a remaining assumption in this task: that all images share the same domain. Specifically, we introduce a new task and method to handle GCD when… ▽ More Generalized Category Discovery (GCD) is a challenging task in which, given a partially labelled dataset, models must categorize all unlabelled instances, regardless of whether they come from labelled categories or from new ones. In this paper, we challenge a remaining assumption in this task: that all images share the same domain. Specifically, we introduce a new task and method to handle GCD when the unlabelled data also contains images from different domains to the labelled set. Our proposed `HiLo' networks extract High-level semantic and Low-level domain features, before minimizing the mutual information between the representations. Our intuition is that the clusterings based on domain information and semantic information should be independent. We further extend our method with a specialized domain augmentation tailored for the GCD task, as well as a curriculum learning approach. Finally, we construct a benchmark from corrupted fine-grained datasets as well as a large-scale evaluation on DomainNet with real-world domain shifts, reimplementing a number of GCD baselines in this setting. We demonstrate that HiLo outperforms SoTA category discovery models by a large margin on all evaluations. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 39 pages, 9 figures, 26 tables

arXiv:2403.13684 [pdf, other]

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

Authors: Hongjun Wang, Sagar Vaze, Kai Han

Abstract: Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained… ▽ More Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet. △ Less

Submitted 20 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted as a conference paper at ICLR 2024; Project page: https://visual-ai.github.io/sptnet

arXiv:2311.17055 [pdf, other]

No Representation Rules Them All in Category Discovery

Authors: Sagar Vaze, Andrea Vedaldi, Andrew Zisserman

Abstract: In this paper we tackle the problem of Generalized Category Discovery (GCD). Specifically, given a dataset with labelled and unlabelled images, the task is to cluster all images in the unlabelled subset, whether or not they belong to the labelled categories. Our first contribution is to recognize that most existing GCD benchmarks only contain labels for a single clustering of the data, making it d… ▽ More In this paper we tackle the problem of Generalized Category Discovery (GCD). Specifically, given a dataset with labelled and unlabelled images, the task is to cluster all images in the unlabelled subset, whether or not they belong to the labelled categories. Our first contribution is to recognize that most existing GCD benchmarks only contain labels for a single clustering of the data, making it difficult to ascertain whether models are using the available labels to solve the GCD task, or simply solving an unsupervised clustering problem. As such, we present a synthetic dataset, named 'Clevr-4', for category discovery. Clevr-4 contains four equally valid partitions of the data, i.e based on object shape, texture, color or count. To solve the task, models are required to extrapolate the taxonomy specified by the labelled set, rather than simply latching onto a single natural grouping of the data. We use this dataset to demonstrate the limitations of unsupervised clustering in the GCD setting, showing that even very strong unsupervised models fail on Clevr-4. We further use Clevr-4 to examine the weaknesses of existing GCD algorithms, and propose a new method which addresses these shortcomings, leveraging consistent findings from the representation learning literature to do so. Our simple solution, which is based on 'mean teachers' and termed $μ$GCD, substantially outperforms implemented baselines on Clevr-4. Finally, when we transfer these findings to real data on the challenging Semantic Shift Benchmark (SSB), we find that $μ$GCD outperforms all prior work, setting a new state-of-the-art. For the project webpage, see https://www.robots.ox.ac.uk/~vgg/data/clevr4/ △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023

arXiv:2306.07969 [pdf, other]

GeneCIS: A Benchmark for General Conditional Image Similarity

Authors: Sagar Vaze, Nicolas Carion, Ishan Misra

Abstract: We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while… ▽ More We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly correlated with ImageNet accuracy, suggesting that simply scaling existing methods is not fruitful. We further propose a simple, scalable solution based on automatically mining information from existing image-caption datasets. We find our method offers a substantial boost over the baselines on GeneCIS, and further improves zero-shot performance on related image retrieval benchmarks. In fact, though evaluated zero-shot, our model surpasses state-of-the-art supervised models on MIT-States. Project page at https://sgvaze.github.io/genecis/. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: CVPR 2023 (Highlighted Paper). Project page at https://sgvaze.github.io/genecis/

arXiv:2304.02364 [pdf, other]

What's in a Name? Beyond Class Indices for Image Recognition

Authors: Kai Han, Xiaohu Huang, Yandong Li, Sagar Vaze, Jie Li, Xuhui Jia

Abstract: Existing machine learning models demonstrate excellent performance in image object recognition after training on a large-scale dataset under full supervision. However, these models only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. In contrast, vision-language models like CLIP are able to assign semantic class names to… ▽ More Existing machine learning models demonstrate excellent performance in image object recognition after training on a large-scale dataset under full supervision. However, these models only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. In contrast, vision-language models like CLIP are able to assign semantic class names to unseen objects in a 'zero-shot' manner, though they are once again provided a pre-defined set of candidate names at test-time. In this paper, we reconsider the recognition problem and task a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information. We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of candidate names. Our proposed approach entails iteratively clustering the data and employing a voting mechanism to determine the most suitable class names. Additionally, we investigate the potential of incorporating additional textual features to enhance clustering performance. To achieve this, we employ the CLIP vision and text encoders to retrieve relevant texts from an external database, which can provide supplementary semantic information to inform the clustering process. Furthermore, we tackle this problem both in unsupervised and partially supervised settings, as well as with a coarse-grained and fine-grained search space as the unconstrained dictionary. Remarkably, our method leads to a roughly 50% improvement over the baseline on ImageNet in the unsupervised setting. △ Less

Submitted 27 July, 2024; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: CVPR 2024 Workshop on Computer Vision in the Wild

arXiv:2204.03635 [pdf, other]

Zero-Shot Category-Level Object Pose Estimation

Authors: Walter Goodwin, Sagar Vaze, Ioannis Havoutis, Ingmar Posner

Abstract: Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Spec… ▽ More Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Specifically, we make the following contributions. First, we formalise the zero-shot, category-level pose estimation problem and frame it in a way that is most applicable to real-world embodied agents. Secondly, we propose a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem. We further re-purpose the recent CO3D dataset to present a controlled and realistic test setting. Finally, we demonstrate that all baselines for our proposed task perform poorly, and show that our method provides a six-fold improvement in average rotation accuracy at 30 degrees. Our code is available at https://github.com/applied-ai-lab/zero-shot-pose. △ Less

Submitted 2 October, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: 28 pages, 6 figures

Journal ref: ECCV 2022

arXiv:2201.02609 [pdf, other]

Generalized Category Discovery

Authors: Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

Abstract: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from novel ones. Existing recognition methods are not able to deal with this setting, because they make several restrictive assumptions, such as the unl… ▽ More In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from novel ones. Existing recognition methods are not able to deal with this setting, because they make several restrictive assumptions, such as the unlabelled instances only coming from known - or unknown - classes, and the number of unknown classes being known a-priori. We address the more unconstrained setting, naming it 'Generalized Category Discovery', and challenge all these assumptions. We first establish strong baselines by taking state-of-the-art algorithms from novel category discovery and adapting them for this task. Next, we propose the use of vision transformers with contrastive representation learning for this open-world setting. We then introduce a simple yet effective semi-supervised $k$-means method to cluster the unlabelled data into seen and unseen classes automatically, substantially outperforming the baselines. Finally, we also propose a new approach to estimate the number of classes in the unlabelled data. We thoroughly evaluate our approach on public datasets for generic object classification and on fine-grained datasets, leveraging the recent Semantic Shift Benchmark suite. Project page at https://www.robots.ox.ac.uk/~vgg/research/gcd △ Less

Submitted 18 June, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: CVPR 22. Changes from pre-print highlighted in GitHub repo

arXiv:2111.07975 [pdf, other]

Semantically Grounded Object Matching for Robust Robotic Scene Rearrangement

Authors: Walter Goodwin, Sagar Vaze, Ioannis Havoutis, Ingmar Posner

Abstract: Object rearrangement has recently emerged as a key competency in robot manipulation, with practical solutions generally involving object detection, recognition, grasping and high-level planning. Goal-images describing a desired scene configuration are a promising and increasingly used mode of instruction. A key outstanding challenge is the accurate inference of matches between objects in front of… ▽ More Object rearrangement has recently emerged as a key competency in robot manipulation, with practical solutions generally involving object detection, recognition, grasping and high-level planning. Goal-images describing a desired scene configuration are a promising and increasingly used mode of instruction. A key outstanding challenge is the accurate inference of matches between objects in front of a robot, and those seen in a provided goal image, where recent works have struggled in the absence of object-specific training data. In this work, we explore the deterioration of existing methods' ability to infer matches between objects as the visual shift between observed and goal scenes increases. We find that a fundamental limitation of the current setting is that source and target images must contain the same $\textit{instance}$ of every object, which restricts practical deployment. We present a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting by leveraging semantics together with visual features as a more robust, and much more general, measure of similarity. We demonstrate that this provides considerably improved matching performance in cross-instance settings, and can be used to guide multi-object rearrangement with a robot manipulator from an image that shares no object $\textit{instances}$ with the robot's scene. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 8 pages, 5 figures

arXiv:2110.06207 [pdf, other]

Open-Set Recognition: a Good Closed-Set Classifier is All You Need?

Authors: Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

Abstract: The ability to identify whether or not a test sample belongs to one of the semantic classes in a classifier's training set is critical to practical deployment of the model. This task is termed open-set recognition (OSR) and has received significant attention in recent years. In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlat… ▽ More The ability to identify whether or not a test sample belongs to one of the semantic classes in a classifier's training set is critical to practical deployment of the model. This task is termed open-set recognition (OSR) and has received significant attention in recent years. In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We find that this relationship holds across loss objectives and architectures, and further demonstrate the trend both on the standard OSR benchmarks as well as on a large-scale ImageNet evaluation. Second, we use this correlation to boost the performance of a maximum logit score OSR 'baseline' by improving its closed-set accuracy, and with this strong baseline achieve state-of-the-art on a number of OSR benchmarks. Similarly, we boost the performance of the existing state-of-the-art method by improving its closed-set accuracy, but the resulting discrepancy with the strong baseline is marginal. Our third contribution is to present the 'Semantic Shift Benchmark' (SSB), which better respects the task of detecting semantic novelty, in contrast to other forms of distribution shift also considered in related sub-fields, such as out-of-distribution detection. On this new evaluation, we again demonstrate that there is negligible difference between the strong baseline and the existing state-of-the-art. Project Page: https://www.robots.ox.ac.uk/~vgg/research/osr/ △ Less

Submitted 13 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: ICLR 22 Oral. Changes from pre-print highlighted on Github page

arXiv:2009.07000 [pdf, other]

Optimal Use of Multi-spectral Satellite Data with Convolutional Neural Networks

Authors: Sagar Vaze, James Foley, Mohamed Seddiq, Alexey Unagaev, Natalia Efremova

Abstract: The analysis of satellite imagery will prove a crucial tool in the pursuit of sustainable development. While Convolutional Neural Networks (CNNs) have made large gains in natural image analysis, their application to multi-spectral satellite images (wherein input images have a large number of channels) remains relatively unexplored. In this paper, we compare different methods of leveraging multi-ba… ▽ More The analysis of satellite imagery will prove a crucial tool in the pursuit of sustainable development. While Convolutional Neural Networks (CNNs) have made large gains in natural image analysis, their application to multi-spectral satellite images (wherein input images have a large number of channels) remains relatively unexplored. In this paper, we compare different methods of leveraging multi-band information with CNNs, demonstrating the performance of all compared methods on the task of semantic segmentation of agricultural vegetation (vineyards). We show that standard industry practice of using bands selected by a domain expert leads to a significantly worse test accuracy than the other methods compared. Specifically, we compare: using bands specified by an expert; using all available bands; learning attention maps over the input bands; and leveraging Bayesian optimisation to dictate band choice. We show that simply using all available band information already increases test time performance, and show that the Bayesian optimisation, first applied to band selection in this work, can be used to further boost accuracy. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: AI for Social Good workshop - Harvard CRCS

arXiv:2003.10823 [pdf, other]

SMArtCast: Predicting soil moisture interpolations into the future using Earth observation data in a deep learning framework

Authors: Conrad James Foley, Sagar Vaze, Mohamed El Amine Seddiq, Alexey Unagaev, Natalia Efremova

Abstract: Soil moisture is critical component of crop health and monitoring it can enable further actions for increasing yield or preventing catastrophic die off. As climate change increases the likelihood of extreme weather events and reduces the predictability of weather, and non-optimal soil moistures for crops may become more likely. In this work, we a series of LSTM architectures to analyze measurement… ▽ More Soil moisture is critical component of crop health and monitoring it can enable further actions for increasing yield or preventing catastrophic die off. As climate change increases the likelihood of extreme weather events and reduces the predictability of weather, and non-optimal soil moistures for crops may become more likely. In this work, we a series of LSTM architectures to analyze measurements of soil moisture and vegetation indiced derived from satellite imagery. The system learns to predict the future values of these measurements. These spatially sparse values and indices are used as input features to an interpolation method that infer spatially dense moisture map for a future time point. This has the potential to provide advance warning for soil moistures that may be inhospitable to crops across an area with limited monitoring capacity. △ Less

Submitted 24 April, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Comments: Climate change AI workshop

Journal ref: ICLR 2020

arXiv:1209.1291 [pdf, other]

The degrees of freedom of MIMO networks with full-duplex receiver cooperation but no CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The question of whether the degrees of freedom (DoF) of multi-user networks can be enhanced even under isotropic fading and no channel state information (or output feedback) at the transmitters (CSIT) is investigated. Toward this end, the two-user MIMO (multiple-input, multiple-output) broadcast and interference channels are studied with no side-information whatsoever at the transmitters and with… ▽ More The question of whether the degrees of freedom (DoF) of multi-user networks can be enhanced even under isotropic fading and no channel state information (or output feedback) at the transmitters (CSIT) is investigated. Toward this end, the two-user MIMO (multiple-input, multiple-output) broadcast and interference channels are studied with no side-information whatsoever at the transmitters and with receivers equipped with full-duplex radios. The full-duplex feature allows for receiver cooperation because each receiver, in addition to receiving the signals sent by the transmitters, can also simultaneously transmit a signal in the same band to the other receiver. Unlike the case of MIMO networks with CSIT and full-duplex receivers, for which DoF are known, it is shown that for MIMO networks with no CSIT, full-duplex receiver cooperation is beneficial to such an extent that even the DoF region is enhanced. Indeed, for important classes of two-user MIMO broadcast and interference channels, defined by certain relationships on numbers of antennas at different terminals, the exact DoF regions are established. The key to achieving DoF-optimal performance for such networks are new retro-cooperative interference alignment schemes. Their optimality is established via the DoF analysis of certain genie-aided or enhanced version of those networks. △ Less

Submitted 6 September, 2012; originally announced September 2012.

Comments: This work was presented at the Workshop on Interference in Wireless Networks, Boston University, June 2012

arXiv:1209.0047 [pdf, other]

The Degrees of Freedom Region of the MIMO Interference Channel with Hybrid CSIT

Authors: Kaniska Mohanty, Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The degrees of freedom (DoF) region of the two-user MIMO (multiple-input multiple-output) interference channel is established under a new model termed as hybrid CSIT. In this model, one transmitter has delayed channel state information (CSI) and the other transmitter has instantaneous CSIT, of incoming channel matrices at the respective unpaired receivers, and neither transmitter has any knowledge… ▽ More The degrees of freedom (DoF) region of the two-user MIMO (multiple-input multiple-output) interference channel is established under a new model termed as hybrid CSIT. In this model, one transmitter has delayed channel state information (CSI) and the other transmitter has instantaneous CSIT, of incoming channel matrices at the respective unpaired receivers, and neither transmitter has any knowledge of the incoming channel matrices of its respective paired receiver. The DoF region for hybrid CSIT, and consequently that of $2\times2\times3^{5}$ CSIT models, is completely characterized, and a new achievable scheme based on a combination of transmit beamforming and retrospective interference alignment is developed. Conditions are obtained on the numbers of antennas at each of the four terminals such that the DoF region under hybrid CSIT is equal to that under (a) global and instantaneous CSIT and (b) global and delayed CSIT, with the remaining cases resulting in a DoF region with hybrid CSIT that lies somewhere in between the DoF regions under the instantaneous and delayed CSIT settings. Further synergistic benefits accruing from switching between the two hybrid CSIT models are also explored. △ Less

Submitted 2 December, 2013; v1 submitted 31 August, 2012; originally announced September 2012.

arXiv:1202.6658 [pdf, other]

Independent signaling achieves the capacity region of the Gaussian interference channel with common information to within one bit

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The interference channel with common information (IC-CI) consists of two transmit-receive pairs that communicate over a common noisy medium. Each transmitter has an individual message for its paired receiver, and additionally, both transmitters have a common message to deliver to both receivers. In this paper, through explicit inner and outer bounds on the capacity region, we establish the capacit… ▽ More The interference channel with common information (IC-CI) consists of two transmit-receive pairs that communicate over a common noisy medium. Each transmitter has an individual message for its paired receiver, and additionally, both transmitters have a common message to deliver to both receivers. In this paper, through explicit inner and outer bounds on the capacity region, we establish the capacity region of the Gaussian IC-CI to within a bounded gap of one bit, independently of the values of all channel parameters. Using this constant-gap characterization, the generalized degrees of freedom (GDoF) region is determined. It is shown that the introduction of the common message leads to an increase in the GDoF over that achievable over the Gaussian interference channel without a common message, and hence to an unbounded improvement in the achievable rate. A surprising feature of the capacity-within-one-bit result is that most of the available benefit (i.e., to within one bit of capacity) due to the common message is achieved through a simple and explicit coding scheme that involves independent signaling at the two transmitters so that, in effect, this scheme forgoes the opportunity for transmitter cooperation that is inherently available due to shared knowledge of the common message at both transmitters. △ Less

Submitted 29 February, 2012; originally announced February 2012.

Comments: Submitted to IEEE Trans. on Information Theory, Feb. 2012

arXiv:1109.5790 [pdf, other]

The Degrees of Freedom of the 2-Hop, 2-User Interference Channel with Feedback

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The layered two-hop, two-flow interference network is considered that consists of two sources, two relays and two destinations with the first hop network between he sources and the relays and the second hop network between relays and destinations both being i.i.d. Rayleigh fading Gaussian interference channels. Two feedback models are studied. In the first one, called the delayed channel state inf… ▽ More The layered two-hop, two-flow interference network is considered that consists of two sources, two relays and two destinations with the first hop network between he sources and the relays and the second hop network between relays and destinations both being i.i.d. Rayleigh fading Gaussian interference channels. Two feedback models are studied. In the first one, called the delayed channel state information at the sources (delayed CSI-S) model, the sources know all channel coefficients with a finite delay but the relays have no side information whatsoever. In the second feedback model, referred to as the limited Shannon feedback model, the relays know first hop channel coefficients instantaneously and the second hop channel with a finite delay and one relay knows the received signal of one of the destinations with a finite delay and the other relay knows the received signal of the other destination with a finite delay but there is no side information at the sources whatsoever. It is shown in this paper that under both these settings, the layered two-hop, two-flow interference channel has 4/3 degrees of freedom. The result is obtained by developing a broadcast-channel-type upper-bound and new achievability schemes based on the ideas of retrospective interference alignment and retro-cooperative interference alignment, respectively. △ Less

Submitted 27 September, 2011; originally announced September 2011.

Comments: Submitted, July 10, 2011 to the 2011 Allerton Conf. Commun., Control, Comput., Monticello, IL; accepted August 04, 2011

arXiv:1109.5779 [pdf, other]

The Degrees of Freedom Region of the MIMO Interference Channel with Shannon Feedback

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The two-user multiple-input multiple-output (MIMO) fast-fading interference channel (IC) with an arbitrary number of antennas at each of the four terminals is studied under the settings of Shannon feedback, limited Shannon feedback, and output feedback, wherein all or certain channel matrices and outputs, or just the channel outputs, respectively, are available to the transmitters with a finite de… ▽ More The two-user multiple-input multiple-output (MIMO) fast-fading interference channel (IC) with an arbitrary number of antennas at each of the four terminals is studied under the settings of Shannon feedback, limited Shannon feedback, and output feedback, wherein all or certain channel matrices and outputs, or just the channel outputs, respectively, are available to the transmitters with a finite delay. While for most numbers of antennas at the four terminals, it is shown that the DoF regions with Shannon feedback and for the limited Shannon feedback settings considered here are identical, and equal to the DoF region with just delayed channel state information (CSIT), it is shown that this is not always the case. For a specific class of MIMO ICs characterized by a certain relationship between the numbers of antennas at the four nodes, the DoF regions with Shannon and the limited Shannon feedback settings, while again being identical, are strictly bigger than the DoF region with just delayed CSIT. To realize these DoF gains with Shannon or limited Shannon feedback, a new retrospective interference alignment scheme is developed wherein transmitter cooperation made possible by output feedback in addition to delayed CSIT is employed to effect a more efficient form of interference alignment than is feasible with previously known schemes that use just delayed CSIT. The DoF region for just output feedback, in which each transmitter has delayed knowledge of only the receivers' outputs, is also obtained for all but a class of MIMO ICs that satisfy one of two inequalities involving the numbers of antennas. △ Less

Submitted 31 October, 2011; v1 submitted 27 September, 2011; originally announced September 2011.

Comments: 30 pages, 3 tables, 9 figures. This paper was submitted to the IEEE Trans. Inform. Th. Oct. 2011. It was presented in part at the 49th Annual Allerton Conference on Communications, Control and Computing in Sept. 2011

arXiv:1105.6033 [pdf, other]

A New Outer-Bound via Interference Localization and the Degrees of Freedom Regions of MIMO Interference Networks with no CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The two-user multi-input, multi-output (MIMO) interference and cognitive radio channels are studied under the assumption of no channel state information at the transmitter (CSIT) from the degrees of freedom (DoF) region perspective. With $M_i$ and $N_i$ denoting the number of antennas at transmitter $i$ and receiver $i$ respectively, the DoF regions of the MIMO interference channel were recently c… ▽ More The two-user multi-input, multi-output (MIMO) interference and cognitive radio channels are studied under the assumption of no channel state information at the transmitter (CSIT) from the degrees of freedom (DoF) region perspective. With $M_i$ and $N_i$ denoting the number of antennas at transmitter $i$ and receiver $i$ respectively, the DoF regions of the MIMO interference channel were recently characterized by Huang et al., Zhu and Guo, and by the authors of this paper for all values of numbers of antennas except when $\min(M_1,N_1) > N_2 > M_2$ (or $\min(M_2,N_2) > N_1 > M_1$). This latter case was solved more recently by Zhu and Guo who provided a tight outer-bound. Here, a simpler and more widely applicable proof of that outer-bound is given based on the idea of interference localization. Using it, the DoF region is also established for the class of MIMO cognitive radio channels when $\min(M_1+M_2,N_1) > N_2 > M_2$ (with the second transmitter cognitive) -- the only class for which the inner and outer bounds previously obtained by the authors were not tight -- thereby completing the DoF region characterization of the general 2-user MIMO cognitive radio channel as well. △ Less

Submitted 30 May, 2011; originally announced May 2011.

Comments: Submitted to IEEE Trans. Information Theory, May 2011. A material in this paper will be presented in part at the IEEE Intern. Symp. Information Theory (ISIT), Aug. 2011

arXiv:1101.5809 [pdf, other]

The Degrees of Freedom Region and Interference Alignment for the MIMO Interference Channel with Delayed CSI

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The degrees of freedom (DoF) region of the 2-user multiple-antenna or MIMO (multiple-input, multiple-output) interference channel (IC) is studied under fast fading and the assumption of {\em delayed} channel state information (CSI) wherein all terminals know all (or certain) channel matrices perfectly, but with a delay, and each receiver in addition knows its own incoming channels instantaneously.… ▽ More The degrees of freedom (DoF) region of the 2-user multiple-antenna or MIMO (multiple-input, multiple-output) interference channel (IC) is studied under fast fading and the assumption of {\em delayed} channel state information (CSI) wherein all terminals know all (or certain) channel matrices perfectly, but with a delay, and each receiver in addition knows its own incoming channels instantaneously. The general MIMO IC is considered with an arbitrary number of antennas at each of the four terminals. Dividing it into several classes depending on the relation between the numbers of antennas at the four terminals, the fundamental DoF regions are characterized under the delayed CSI assumption for {\em all} possible values of number of antennas at the four terminals. In particular, an outer bound on the DoF region of the general MIMO IC is derived. This bound is then shown to be tight for all MIMO ICs by developing interference alignment based achievability schemes for each class. A comparison of these DoF regions under the delayed CSI assumption is made with those of the idealistic `perfect CSI' assumption where perfect and instantaneous CSI is available at all terminals on the one hand and with the DoF regions of the conservative `no CSI' assumption on the other, where CSI is available at the receivers but not at all at the transmitters. △ Less

Submitted 11 March, 2011; v1 submitted 30 January, 2011; originally announced January 2011.

Comments: New results are added. 57 pages, 6 figures, 2 tables, submitted to IEEE Trans. Inform. Theory

arXiv:1101.0306 [pdf, other]

The Degrees of Freedom Regions of Two-User and Certain Three-User MIMO Broadcast Channels with Delayed CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The degrees of freedom (DoF) region of the fast-fading MIMO (multiple-input multiple-output) Gaussian broadcast channel (BC) is studied when there is delayed channel state information at the transmitter (CSIT). In this setting, the channel matrices are assumed to vary independently across time and the transmitter is assumed to know the channel matrices with some arbitrary finite delay. An outer-bo… ▽ More The degrees of freedom (DoF) region of the fast-fading MIMO (multiple-input multiple-output) Gaussian broadcast channel (BC) is studied when there is delayed channel state information at the transmitter (CSIT). In this setting, the channel matrices are assumed to vary independently across time and the transmitter is assumed to know the channel matrices with some arbitrary finite delay. An outer-bound to the DoF region of the general $K$-user MIMO BC (with an arbitrary number of antennas at each terminal) is derived. This outer-bound is then shown to be tight for two classes of MIMO BCs, namely, (a) the two-user MIMO BC with arbitrary number of antennas at all terminals, and (b) for certain three-user MIMO BCs where all three receivers have an equal number of antennas and the transmitter has no more than twice the number of antennas present at each receivers. The achievability results are obtained by developing an interference alignment scheme that optimally accounts for multiple, and possibly distinct, number of antennas at the receivers. △ Less

Submitted 22 December, 2011; v1 submitted 31 December, 2010; originally announced January 2011.

Comments: 27 pages, 6 figures; submitted to IEEE Trans. on Information Theory, Dec. 2011

arXiv:1002.1532

On the scaling of feedback bits to achieve the full multiplexing gain over the Gaussian broadcast channel using DPC

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: This paper has been withdrawn by the author(s) for revision. This paper has been withdrawn by the author(s) for revision. △ Less

Submitted 9 February, 2010; v1 submitted 8 February, 2010; originally announced February 2010.

Comments: This paper has been withdrawn

arXiv:1002.1531 [pdf, ps, other]

A Large-System Analysis of the Imperfect-CSIT Gaussian Broadcast Channel with a DPC-based Transmission Strategy

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The Gaussian broadcast channel (GBC) with $K$ transmit antennas and $K$ single-antenna users is considered for the case in which the channel state information is obtained at the transmitter via a finite-rate feedback link of capacity $r$ bits per user. The throughput (i.e., the sum-rate normalized by $K$) of the GBC is analyzed in the limit as $K \to \infty$ with $\frac{r}{K} \to \bar{r}$. Consi… ▽ More The Gaussian broadcast channel (GBC) with $K$ transmit antennas and $K$ single-antenna users is considered for the case in which the channel state information is obtained at the transmitter via a finite-rate feedback link of capacity $r$ bits per user. The throughput (i.e., the sum-rate normalized by $K$) of the GBC is analyzed in the limit as $K \to \infty$ with $\frac{r}{K} \to \bar{r}$. Considering the transmission strategy of zeroforcing dirty paper coding (ZFDPC), a closed-form expression for the asymptotic throughput is derived. It is observed that, even under the finite-rate feedback setting, ZFDPC achieves a significantly higher throughput than zeroforcing beamforming. Using the asymptotic throughput expression, the problem of obtaining the number of users to be selected in order to maximize the throughput is solved. △ Less

Submitted 18 February, 2010; v1 submitted 8 February, 2010; originally announced February 2010.

Comments: Submitted to ISIT 2010

arXiv:1002.1530

The Degrees of Freedom Region of the MIMO Cognitive Interference Channel with No CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: This paper has been withdrawn by the author(s) for revision. This paper has been withdrawn by the author(s) for revision. △ Less

Submitted 9 February, 2010; v1 submitted 8 February, 2010; originally announced February 2010.

Comments: This paper has been withdrawn

arXiv:0909.5424 [pdf, ps, other]

The Degrees of Freedom Regions of MIMO Broadcast, Interference, and Cognitive Radio Channels with No CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The degrees of freedom (DoF) regions are characterized for the multiple-input multiple-output (MIMO) broadcast channel (BC), interference channels (IC) (including X and multi-hop interference channels) and the cognitive radio channel (CRC), when there is perfect and no channel state information at the receivers and the transmitter(s) (CSIR and CSIT), respectively. For the K-user MIMO BC, the exact… ▽ More The degrees of freedom (DoF) regions are characterized for the multiple-input multiple-output (MIMO) broadcast channel (BC), interference channels (IC) (including X and multi-hop interference channels) and the cognitive radio channel (CRC), when there is perfect and no channel state information at the receivers and the transmitter(s) (CSIR and CSIT), respectively. For the K-user MIMO BC, the exact characterization of the DoF region is obtained, which shows that a simple time-division-based transmission scheme is DoF-region optimal. Using the techniques developed for the MIMO BC, the corresponding problems for the two-user MIMO IC and the CRC are addressed. For both of these channels, inner and outer bounds to the DoF region are obtained and are seen to coincide for a vast majority of the relative numbers of antennas at the four terminals, thereby characterizing DoF regions for all but a few cases. Finally, the DoF regions of the $K$-user MIMO IC, the CRC, and X networks are derived for certain classes of these networks, including the one where all transmitters have an equal number of antennas and so do all receivers. The results of this paper are derived for distributions of fading channel matrices and additive noises that are more general than those considered in other simultaneous related works. The DoF regions with and without CSIT are compared and conditions on the relative numbers of antennas at the terminals under which a lack of CSIT does, or does not, result in the loss of DoF are identified, thereby providing, on the one hand, simple and robust communication schemes that don't require CSIT but have the same DoF performance as their previously found CSIT counterparts, and on the other hand, identifying situations where CSI feedback to transmitters would provide gains that are significant enough that even the DoF performance could be improved. △ Less

Submitted 23 January, 2011; v1 submitted 29 September, 2009; originally announced September 2009.

Comments: 49 pages, 11 figures, under review, IEEE Trans. Inform. Th. Submitted Sept. 2009, Revised Jan. 2011

arXiv:0906.2252 [pdf, ps, other]

Dirty Paper Coding for the MIMO Cognitive Radio Channel with Imperfect CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: A Dirty Paper Coding (DPC) based transmission scheme for the Gaussian multiple-input multiple-output (MIMO) cognitive radio channel (CRC) is studied when there is imperfect and perfect channel knowledge at the transmitters (CSIT) and the receivers, respectively. In particular, the problem of optimizing the sum-rate of the MIMO CRC over the transmit covariance matrices is dealt with. Such an opti… ▽ More A Dirty Paper Coding (DPC) based transmission scheme for the Gaussian multiple-input multiple-output (MIMO) cognitive radio channel (CRC) is studied when there is imperfect and perfect channel knowledge at the transmitters (CSIT) and the receivers, respectively. In particular, the problem of optimizing the sum-rate of the MIMO CRC over the transmit covariance matrices is dealt with. Such an optimization, under the DPC-based transmission strategy, needs to be performed jointly with an optimization over the inflation factor. To this end, first the problem of determination of inflation factor over the MIMO channel $Y=H_1 X + H_2 S + Z$ with imperfect CSIT is investigated. For this problem, two iterative algorithms, which generalize the corresponding algorithms proposed for the channel $Y=H(X+S)+Z$, are developed. Later, the necessary conditions for maximizing the sum-rate of the MIMO CRC over the transmit covariances for a given choice of inflation factor are derived. Using these necessary conditions and the algorithms for the determination of the inflation factor, an iterative, numerical algorithm for the joint optimization is proposed. Some interesting observations are made from the numerical results obtained from the algorithm. Furthermore, the high-SNR sum-rate scaling factor achievable over the CRC with imperfect CSIT is obtained. △ Less

Submitted 12 June, 2009; originally announced June 2009.

Comments: To be presented at ISIT 2009, Seoul, S. Korea

arXiv:0903.4526 [pdf, ps, other]

On the Achievable Rate of the Fading Dirty Paper Channel with Imperfect CSIT

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The problem of dirty paper coding (DPC) over the (multi-antenna) fading dirty paper channel (FDPC) Y = H(X + S) + Z is considered when there is imperfect knowledge of the channel state information H at the transmitter (CSIT). The case of FDPC with positive definite (p.d.) input covariance matrix was studied by the authors in a recent paper, and here the more general case of positive semi-definit… ▽ More The problem of dirty paper coding (DPC) over the (multi-antenna) fading dirty paper channel (FDPC) Y = H(X + S) + Z is considered when there is imperfect knowledge of the channel state information H at the transmitter (CSIT). The case of FDPC with positive definite (p.d.) input covariance matrix was studied by the authors in a recent paper, and here the more general case of positive semi-definite (p.s.d.) input covariance is dealt with. Towards this end, the choice of auxiliary random variable is modified. The algorithms for determination of inflation factor proposed in the p.d. case are then generalized to the case of p.s.d. input covariance. Subsequently, the largest DPC-achievable high-SNR (signal-to-noise ratio) scaling factor over the no-CSIT FDPC with p.s.d. input covariance matrix is derived. This scaling factor is seen to be a non-trivial generalization of the one achieved for the p.d. case. Next, in the limit of low SNR, it is proved that the choice of all-zero inflation factor (thus treating interference as noise) is optimal in the 'ratio' sense, regardless of the covariance matrix used. Further, in the p.d. covariance case, the inflation factor optimal at high SNR is obtained when the number of transmit antennas is greater than the number of receive antennas, with the other case having been already considered in the earlier paper. Finally, the problem of joint optimization of the input covariance matrix and the inflation factor is dealt with, and an iterative numerical algorithm is developed. △ Less

Submitted 26 March, 2009; originally announced March 2009.

Comments: Presented at the 43rd Annual Conference on Information Sciences and Systems, John Hopkins University, March 2009

arXiv:0901.2764 [pdf, ps, other]

Dirty Paper Coding for Fading Channels with Partial Transmitter Side Information

Authors: Chinmay S. Vaze, Mahesh K. Varanasi

Abstract: The problem of Dirty Paper Coding (DPC) over the Fading Dirty Paper Channel (FDPC) Y = H(X + S)+Z, a more general version of Costa's channel, is studied for the case in which there is partial and perfect knowledge of the fading process H at the transmitter (CSIT) and the receiver (CSIR), respectively. A key step in this problem is to determine the optimal inflation factor (under Costa's choice o… ▽ More The problem of Dirty Paper Coding (DPC) over the Fading Dirty Paper Channel (FDPC) Y = H(X + S)+Z, a more general version of Costa's channel, is studied for the case in which there is partial and perfect knowledge of the fading process H at the transmitter (CSIT) and the receiver (CSIR), respectively. A key step in this problem is to determine the optimal inflation factor (under Costa's choice of auxiliary random variable) when there is only partial CSIT. Towards this end, two iterative numerical algorithms are proposed. Both of these algorithms are seen to yield a good choice for the inflation factor. Finally, the high-SNR (signal-to-noise ratio) behavior of the achievable rate over the FDPC is dealt with. It is proved that FDPC (with t transmit and r receive antennas) achieves the largest possible scaling factor of min(t,r) log SNR even with no CSIT. Furthermore, in the high SNR regime, the optimality of Costa's choice of auxiliary random variable is established even when there is partial (or no) CSIT in the special case of FDPC with t <= r. Using the high-SNR scaling-law result of the FDPC (mentioned before), it is shown that a DPC-based multi-user transmission strategy, unlike other beamforming-based multi-user strategies, can achieve a single-user sum-rate scaling factor over the multiple-input multiple-output Gaussian Broadcast Channel with partial (or no) CSIT. △ Less

Submitted 19 January, 2009; originally announced January 2009.

Comments: 5 pages with 2 figures, presented at 42nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, USA, Oct. 2008

Showing 1–27 of 27 results for author: Vaze, S