Proceedings Cvpr Ieee Computer Society Conference on Computer Vision and Pattern Recognition Ieee Computer Society Conference on Computer Vision and Pattern Recognition, Jun 13, 2010
Active learning methods aim to select the most informa-tive unlabeled instances to label first, a... more Active learning methods aim to select the most informa-tive unlabeled instances to label first, and can help to fo-cus image or video annotations on the examples that will most improve a recognition system. However, most exist-ing methods only make myopic queries for a ...
In this paper, we study the generalization properties of online learning based stochastic methods... more In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexity-based generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al (COLT 2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees.
Despite its good practical performance, very little is known about Wolfe's minimum norm algorithm... more Despite its good practical performance, very little is known about Wolfe's minimum norm algorithm theoretically. To our knowledge, the only result is an exponential time analysis due to Wolfe himself. In this paper we give a maiden convergence analysis of Wolfe's algorithm. We prove that in $t$ iterations, Wolfe's algorithm returns an $O(1/t)$-approximate solution to the min-norm point on {\em any} polytope. We also prove a robust version of Fujishige's theorem which shows that an $O(1/n^2)$-approximate solution to the min-norm point on the base polytope implies {\em exact} submodular minimization. As a corollary, we get the first pseudo-polynomial time guarantee for the Fujishige-Wolfe minimum norm algorithm for unconstrained submodular function minimization.
We present a framework for efficiently solving Approximate Traveling Salesman Problem (Approximat... more We present a framework for efficiently solving Approximate Traveling Salesman Problem (Approximate TSP) for Quantum Computing Models. Existing representations of TSP introduce extra states which do not correspond to any permutation. We present an efficient and intuitive encoding for TSP in quantum computing paradigm. Using this representation and assuming a Gaussian distribution on tour-lengths, we give an algorithm to solve Approximate TSP (Euclidean) within BQP resource bounds. Generalizing this strategy for any distribution, we present an oracle based Quantum Algorithm to solve Approximate TSP. We present a realization of the oracle in the quantum counterpart of PP.
Proceedings of the 22nd international conference on World Wide Web - WWW '13, 2013
ABSTRACT A typical problem for a search engine (hosting sponsored search service) is to provide t... more ABSTRACT A typical problem for a search engine (hosting sponsored search service) is to provide the advertisers with a forecast of the number of impressions his/her ad is likely to obtain for a given bid. Accurate forecasts have high business value, since they enable advertisers to select bids that lead to better returns on their investment. They also play an important role in services such as automatic campaign optimization. Despite its importance the problem has remained relatively unexplored in literature. Existing methods typically overfit to the training data, leading to inconsistent performance. Furthermore, some of the existing methods cannot provide predictions for new ads, i.e., for ads that are not present in the logs. In this paper, we develop a generative model based approach that addresses these drawbacks. We design a Bayes net to capture inter-dependencies between the query traffic features and the competitors in an auction. Furthermore, we account for variability in the volume of query traffic by using a dynamic linear model. Finally, we implement our approach on a production grade MapReduce framework and conduct extensive large scale experiments on substantial volumes of sponsored search data from Bing. Our experimental results demonstrate significant advantages over existing methods as measured using several accuracy/error criteria, improved ability to provide estimates for new ads and more consistent performance with smaller variance in accuracies. Our method can also be adapted to several other related forecasting problems such as predicting average position of ads or the number of clicks under budget constraints.
We consider the problem of classification using similarity/distance functions over data. Specific... more We consider the problem of classification using similarity/distance functions over data. Specifically, we propose a framework for defining the goodness of a (dis)similarity function with respect to a given learning task and propose algorithms that have guaranteed generalization properties when working with such good functions. Our framework unifies and generalizes the frameworks proposed by [Balcan-Blum ICML 2006] and [Wang et al ICML 2007]. An attractive feature of our framework is its adaptability to data - we do not promote a fixed notion of goodness but rather let data dictate it. We show, by giving theoretical guarantees that the goodness criterion best suited to a problem can itself be learned which makes our approach applicable to a variety of domains and problems. We propose a landmarking-based approach to obtaining a classifier from such learned goodness criteria. We then provide a novel diversity based heuristic to perform task-driven selection of landmark points instead of random selection. We demonstrate the effectiveness of our goodness criteria learning method as well as the landmark selection heuristic on a variety of similarity-based learning datasets and benchmark UCI datasets on which our method consistently outperforms existing approaches by a significant margin.
Abstract. Designing web sites is a complex problem. Adaptive sites are those which improve themse... more Abstract. Designing web sites is a complex problem. Adaptive sites are those which improve themselves by learning from user access patterns. In this paper we have considered a problem of index page synthesis for an adaptive website and framed it in a new type of Multi-...
Empirically, we demonstrate that alternating minimization performs similar to recently proposed c... more Empirically, we demonstrate that alternating minimization performs similar to recently proposed convex techniques for this problem (which are based on "lifting" to a convex matrix problem) in sample complexity and robustness to noise. However, it is much more efficient and can scale to large problems. Analytically, for a resampling version of alternating minimization, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the first theoretical guarantee for alternating minimization (albeit with resampling) for any variant of phase retrieval problems in the non-convex setting.
We address the problem of general supervised learning when data can only be accessed through an (... more We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.
Metric and kernel learning are important in several machine learning applications. However, most ... more Metric and kernel learning are important in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study metric learning as a problem of learning a linear transformation of the input data. We show that for high-dimensional data, a particular framework for learning a linear transformation of the data based on the LogDet divergence can be efficiently kernelized to learn a metric (or equivalently, a kernel function) over an arbitrarily high dimensional space. We further demonstrate that a wide class of convex loss functions for learning linear transformations can similarly be kernelized, thereby considerably expanding the potential applications of metric learning. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision and text mining.
Proceedings Cvpr Ieee Computer Society Conference on Computer Vision and Pattern Recognition Ieee Computer Society Conference on Computer Vision and Pattern Recognition, Jun 13, 2010
Active learning methods aim to select the most informa-tive unlabeled instances to label first, a... more Active learning methods aim to select the most informa-tive unlabeled instances to label first, and can help to fo-cus image or video annotations on the examples that will most improve a recognition system. However, most exist-ing methods only make myopic queries for a ...
In this paper, we study the generalization properties of online learning based stochastic methods... more In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexity-based generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al (COLT 2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees.
Despite its good practical performance, very little is known about Wolfe's minimum norm algorithm... more Despite its good practical performance, very little is known about Wolfe's minimum norm algorithm theoretically. To our knowledge, the only result is an exponential time analysis due to Wolfe himself. In this paper we give a maiden convergence analysis of Wolfe's algorithm. We prove that in $t$ iterations, Wolfe's algorithm returns an $O(1/t)$-approximate solution to the min-norm point on {\em any} polytope. We also prove a robust version of Fujishige's theorem which shows that an $O(1/n^2)$-approximate solution to the min-norm point on the base polytope implies {\em exact} submodular minimization. As a corollary, we get the first pseudo-polynomial time guarantee for the Fujishige-Wolfe minimum norm algorithm for unconstrained submodular function minimization.
We present a framework for efficiently solving Approximate Traveling Salesman Problem (Approximat... more We present a framework for efficiently solving Approximate Traveling Salesman Problem (Approximate TSP) for Quantum Computing Models. Existing representations of TSP introduce extra states which do not correspond to any permutation. We present an efficient and intuitive encoding for TSP in quantum computing paradigm. Using this representation and assuming a Gaussian distribution on tour-lengths, we give an algorithm to solve Approximate TSP (Euclidean) within BQP resource bounds. Generalizing this strategy for any distribution, we present an oracle based Quantum Algorithm to solve Approximate TSP. We present a realization of the oracle in the quantum counterpart of PP.
Proceedings of the 22nd international conference on World Wide Web - WWW '13, 2013
ABSTRACT A typical problem for a search engine (hosting sponsored search service) is to provide t... more ABSTRACT A typical problem for a search engine (hosting sponsored search service) is to provide the advertisers with a forecast of the number of impressions his/her ad is likely to obtain for a given bid. Accurate forecasts have high business value, since they enable advertisers to select bids that lead to better returns on their investment. They also play an important role in services such as automatic campaign optimization. Despite its importance the problem has remained relatively unexplored in literature. Existing methods typically overfit to the training data, leading to inconsistent performance. Furthermore, some of the existing methods cannot provide predictions for new ads, i.e., for ads that are not present in the logs. In this paper, we develop a generative model based approach that addresses these drawbacks. We design a Bayes net to capture inter-dependencies between the query traffic features and the competitors in an auction. Furthermore, we account for variability in the volume of query traffic by using a dynamic linear model. Finally, we implement our approach on a production grade MapReduce framework and conduct extensive large scale experiments on substantial volumes of sponsored search data from Bing. Our experimental results demonstrate significant advantages over existing methods as measured using several accuracy/error criteria, improved ability to provide estimates for new ads and more consistent performance with smaller variance in accuracies. Our method can also be adapted to several other related forecasting problems such as predicting average position of ads or the number of clicks under budget constraints.
We consider the problem of classification using similarity/distance functions over data. Specific... more We consider the problem of classification using similarity/distance functions over data. Specifically, we propose a framework for defining the goodness of a (dis)similarity function with respect to a given learning task and propose algorithms that have guaranteed generalization properties when working with such good functions. Our framework unifies and generalizes the frameworks proposed by [Balcan-Blum ICML 2006] and [Wang et al ICML 2007]. An attractive feature of our framework is its adaptability to data - we do not promote a fixed notion of goodness but rather let data dictate it. We show, by giving theoretical guarantees that the goodness criterion best suited to a problem can itself be learned which makes our approach applicable to a variety of domains and problems. We propose a landmarking-based approach to obtaining a classifier from such learned goodness criteria. We then provide a novel diversity based heuristic to perform task-driven selection of landmark points instead of random selection. We demonstrate the effectiveness of our goodness criteria learning method as well as the landmark selection heuristic on a variety of similarity-based learning datasets and benchmark UCI datasets on which our method consistently outperforms existing approaches by a significant margin.
Abstract. Designing web sites is a complex problem. Adaptive sites are those which improve themse... more Abstract. Designing web sites is a complex problem. Adaptive sites are those which improve themselves by learning from user access patterns. In this paper we have considered a problem of index page synthesis for an adaptive website and framed it in a new type of Multi-...
Empirically, we demonstrate that alternating minimization performs similar to recently proposed c... more Empirically, we demonstrate that alternating minimization performs similar to recently proposed convex techniques for this problem (which are based on "lifting" to a convex matrix problem) in sample complexity and robustness to noise. However, it is much more efficient and can scale to large problems. Analytically, for a resampling version of alternating minimization, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the first theoretical guarantee for alternating minimization (albeit with resampling) for any variant of phase retrieval problems in the non-convex setting.
We address the problem of general supervised learning when data can only be accessed through an (... more We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.
Metric and kernel learning are important in several machine learning applications. However, most ... more Metric and kernel learning are important in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study metric learning as a problem of learning a linear transformation of the input data. We show that for high-dimensional data, a particular framework for learning a linear transformation of the data based on the LogDet divergence can be efficiently kernelized to learn a metric (or equivalently, a kernel function) over an arbitrarily high dimensional space. We further demonstrate that a wide class of convex loss functions for learning linear transformations can similarly be kernelized, thereby considerably expanding the potential applications of metric learning. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision and text mining.
Uploads
Papers by Prateek Jain