Search | arXiv e-print repository

On Bellman equations for continuous-time policy evaluation I: discretization and approximation

Abstract: We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees… ▽ More We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: WM and YZ contributed equally to this work

arXiv:2403.07877 [pdf, other]

Generating Future Observations to Estimate Grasp Success in Cluttered Environments

Authors: Daniel Fernandes Gomes, Wenxuan Mou, Paolo Paoletti, Shan Luo

Abstract: End-to-end self-supervised models have been proposed for estimating the success of future candidate grasps and video predictive models for generating future observations. However, none have yet studied these two strategies side-by-side for addressing the aforementioned grasping problem. We investigate and compare a model-free approach, to estimate the success of a candidate grasp, against a model-… ▽ More End-to-end self-supervised models have been proposed for estimating the success of future candidate grasps and video predictive models for generating future observations. However, none have yet studied these two strategies side-by-side for addressing the aforementioned grasping problem. We investigate and compare a model-free approach, to estimate the success of a candidate grasp, against a model-based alternative that exploits a self-supervised learnt predictive model that generates a future observation of the gripper about to grasp an object. Our experiments demonstrate that despite the end-to-end model-free model obtaining a best accuracy of 72%, the proposed model-based pipeline yields a significantly higher accuracy of 82%. △ Less

Submitted 18 December, 2023; originally announced March 2024.

Journal ref: 5th UK Robot Manipulation Workshop 2024

arXiv:2312.10074 [pdf]

STAGER checklist: Standardized Testing and Assessment Guidelines for Evaluating Generative AI Reliability

Authors: Jinghong Chen, Lingxuan Zhu, Weiming Mou, Zaoqu Liu, Quan Cheng, Anqi Lin, Jian Zhang, Peng Luo

Abstract: Generative Artificial Intelligence (AI) holds immense potential in medical applications. Numerous studies have explored the efficacy of various generative AI models within healthcare contexts, but there is a lack of a comprehensive and systematic evaluation framework. Given that some studies evaluating the ability of generative AI for medical applications have deficiencies in their methodological… ▽ More Generative Artificial Intelligence (AI) holds immense potential in medical applications. Numerous studies have explored the efficacy of various generative AI models within healthcare contexts, but there is a lack of a comprehensive and systematic evaluation framework. Given that some studies evaluating the ability of generative AI for medical applications have deficiencies in their methodological design, standardized guidelines for their evaluation are also currently lacking. In response, our objective is to devise standardized assessment guidelines tailored for evaluating the performance of generative AI systems in medical contexts. To this end, we conducted a thorough literature review using the PubMed and Google Scholar databases, focusing on research that tests generative AI capabilities in medicine. Our multidisciplinary team, comprising experts in life sciences, clinical medicine, medical engineering, and generative AI users, conducted several discussion sessions and developed a checklist of 23 items. The checklist is designed to encompass the critical evaluation aspects of generative AI in medical applications comprehensively. This checklist, and the broader assessment framework it anchors, address several key dimensions, including question collection, querying methodologies, and assessment techniques. We aim to provide a holistic evaluation of AI systems. The checklist delineates a clear pathway from question gathering to result assessment, offering researchers guidance through potential challenges and pitfalls. Our framework furnishes a standardized, systematic approach for research involving the testing of generative AI's applicability in medicine. It enhances the quality of research reporting and aids in the evolution of generative AI in medicine and life sciences. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 11 pages, 0 figure, 2 tables

arXiv:2209.13075 [pdf, other]

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

Authors: Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett

Abstract: The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We analyze a broad class of two-stage procedures that first estimate the treatment effect function, and then use this quantity to estimate the linear functional. We prove non-asymptotic upper bounds on the mean-squared error of such procedures: these bounds re… ▽ More The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We analyze a broad class of two-stage procedures that first estimate the treatment effect function, and then use this quantity to estimate the linear functional. We prove non-asymptotic upper bounds on the mean-squared error of such procedures: these bounds reveal that in order to obtain non-asymptotically optimal procedures, the error in estimating the treatment effect should be minimized in a certain weighted $L^2$-norm. We analyze a two-stage procedure based on constrained regression in this weighted norm, and establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds. These results show that the optimal non-asymptotic risk, in addition to depending on the asymptotically efficient variance, depends on the weighted norm distance between the true outcome function and its approximation by the richest function class supported by the sample size. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 56 pages, 6 figures

arXiv:2201.08518 [pdf, ps, other]

Optimal variance-reduced stochastic approximation in Banach spaces

Authors: Wenlong Mou, Koulik Khamaru, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

Abstract: We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contras… ▽ More We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes. △ Less

Submitted 29 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.12770 [pdf, ps, other]

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Authors: Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett

Abstract: We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a… ▽ More We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($λ$) family of algorithms for all $λ\in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $λ$ when running the TD($λ$) algorithm). △ Less

Submitted 11 May, 2024; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: Published at Mathematical Statistics and Learning

arXiv:2101.10819 [pdf, other]

When Would You Trust a Robot? A Study on Trust and Theory of Mind in Human-Robot Interactions

Authors: Wenxuan Mou, Martina Ruocco, Debora Zanatto, Angelo Cangelosi

Abstract: Trust is a critical issue in Human Robot Interactions as it is the core of human desire to accept and use a non human agent. Theory of Mind has been defined as the ability to understand the beliefs and intentions of others that may differ from one's own. Evidences in psychology and HRI suggest that trust and Theory of Mind are interconnected and interdependent concepts, as the decision to trust an… ▽ More Trust is a critical issue in Human Robot Interactions as it is the core of human desire to accept and use a non human agent. Theory of Mind has been defined as the ability to understand the beliefs and intentions of others that may differ from one's own. Evidences in psychology and HRI suggest that trust and Theory of Mind are interconnected and interdependent concepts, as the decision to trust another agent must depend on our own representation of this entity's actions, beliefs and intentions. However, very few works take Theory of Mind of the robot into consideration while studying trust in HRI. In this paper, we investigated whether the exposure to the Theory of Mind abilities of a robot could affect humans' trust towards the robot. To this end, participants played a Price Game with a humanoid robot that was presented having either low level Theory of Mind or high level Theory of Mind. Specifically, the participants were asked to accept the price evaluations on common objects presented by the robot. The willingness of the participants to change their own price judgement of the objects (i.e., accept the price the robot suggested) was used as the main measurement of the trust towards the robot. Our experimental results showed that robots possessing a high level of Theory of Mind abilities were trusted more than the robots presented with low level Theory of Mind skills. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: 7 pages, 4 figures, conference

arXiv:2012.05299 [pdf, other]

Optimal oracle inequalities for solving projected fixed-point equations

Authors: Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright

Abstract: Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper… ▽ More Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2008.07353 [pdf, ps, other]

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

Authors: Wenlong Mou, Zheng Wen, Xi Chen

Abstract: We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which a… ▽ More We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which are intractably large in many practical problems. To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP). Using a simulator oracle, we prove a near-optimal sample complexity upper bound that only depends linearly on the eluder dimension. We further prove a similar regret bound in deterministic systems without the simulator. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2004.04719 [pdf, ps, other]

On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

Authors: Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

Abstract: We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $\bar{A} θ= \bar{b}$. When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. The CLT characterizes the exact asym… ▽ More We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $\bar{A} θ= \bar{b}$. When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. The CLT characterizes the exact asymptotic covariance matrix, which is the sum of the classical Polyak-Ruppert covariance and a correction term that scales with the step size. Under assumptions on the tail of the noise distribution, we prove a non-asymptotic concentration inequality whose main term matches the covariance in CLT in any direction, up to universal constants. When the matrix $\bar{A}$ is not Hurwitz but only has non-negative real parts in its eigenvalues, we prove that the averaged LSA procedure actually achieves an $O(1/T)$ rate in mean-squared error. Our results provide a more refined understanding of linear stochastic approximation in both the asymptotic and non-asymptotic settings. We also show various applications of the main results, including the study of momentum-based stochastic gradient methods as well as temporal difference algorithms in reinforcement learning. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2004.02980 [pdf, other]

LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

Authors: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng

Abstract: Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We… ▽ More Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location, Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition, we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations, but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method's estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: Accepted to CVPR 2020

arXiv:1912.05153 [pdf, other]

Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

Authors: Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

Abstract: We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For… ▽ More We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + \Vert θ_{0} \Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + \Vert θ_{0} \Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincaré inequalities for conditional and marginal densities. △ Less

Submitted 11 December, 2019; originally announced December 2019.

arXiv:1910.00551 [pdf, ps, other]

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

Authors: Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

Abstract: We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $\varepsilon$ of… ▽ More We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $\varepsilon$ of the target density in at most $O(d \log (d/\varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$. △ Less

Submitted 1 October, 2019; originally announced October 2019.

arXiv:1908.10859 [pdf, ps, other]

High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

Authors: Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

Abstract: We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generali… ▽ More We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $\varepsilon > 0$ in Wasserstein distance from the target distribution in $O\left(\frac{d^{1/4}}{ \varepsilon^{1/2}} \right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $α$-th order smoothness, we prove that the mixing time scales as $O \left(\frac{d^{1/4}}{\varepsilon^{1/2}} + \frac{d^{1/2}}{\varepsilon^{1/(α- 1)}} \right)$. △ Less

Submitted 26 May, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: Changes from v1: improved algorithm with $O (d^{1/4} / \varepsilon^{1/2})$ mixing time

arXiv:1806.07507 [pdf, other]

iCLAP: Shape Recognition by Combining Proprioception and Touch Sensing

Authors: Shan Luo, Wenxuan Mou, Kaspar Althoefer, Hongbin Liu

Abstract: For humans, both the proprioception and touch sensing are highly utilized when performing haptic perception. However, most approaches in robotics use only either proprioceptive data or touch data in haptic object recognition. In this paper, we present a novel method named Iterative Closest Labeled Point (iCLAP) to link the kinesthetic cues and tactile patterns fundamentally and also introduce its… ▽ More For humans, both the proprioception and touch sensing are highly utilized when performing haptic perception. However, most approaches in robotics use only either proprioceptive data or touch data in haptic object recognition. In this paper, we present a novel method named Iterative Closest Labeled Point (iCLAP) to link the kinesthetic cues and tactile patterns fundamentally and also introduce its extensions to recognize object shapes. In the training phase, the iCLAP first clusters the features of tactile readings into a codebook and assigns these features with distinct label numbers. A 4D point cloud of the object is then formed by taking the label numbers of the tactile features as an additional dimension to the 3D sensor positions; hence, the two sensing modalities are merged to achieve a synthesized perception of the touched object. Furthermore, we developed and validated hybrid fusion strategies, product based and weighted sum based, to combine decisions obtained from iCLAP and single sensing modalities. Extensive experimentation demonstrates a dramatic improvement of object recognition using the proposed methods and it shows great potential to enhance robot perception ability. △ Less

Submitted 19 June, 2018; originally announced June 2018.

Comments: 10 pages, 12 figures, accepted to Autonomous Robots

arXiv:1708.04441 [pdf]

doi 10.1109/ICRA.2015.7139743

Localizing the Object Contact through Matching Tactile Features with Visual Map

Authors: Shan Luo, Wenxuan Mou, Kaspar Althoefer, Hongbin Liu

Abstract: This paper presents a novel framework for integration of vision and tactile sensing by localizing tactile readings in a visual object map. Intuitively, there are some correspondences, e.g., prominent features, between visual and tactile object identification. To apply it in robotics, we propose to localize tactile readings in visual images by sharing same sets of feature descriptors through two se… ▽ More This paper presents a novel framework for integration of vision and tactile sensing by localizing tactile readings in a visual object map. Intuitively, there are some correspondences, e.g., prominent features, between visual and tactile object identification. To apply it in robotics, we propose to localize tactile readings in visual images by sharing same sets of feature descriptors through two sensing modalities. It is then treated as a probabilistic estimation problem solved in a framework of recursive Bayesian filtering. Feature-based measurement model and Gaussian based motion model are thus built. In our tests, a tactile array sensor is utilized to generate tactile images during interaction with objects and the results have proven the feasibility of our proposed framework. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: 6 pages, 8 figures, ICRA 2015

Journal ref: ICRA 2015

arXiv:1708.04436 [pdf]

doi 10.1109/IROS.2016.7759485

Iterative Closest Labeled Point for Tactile Object Shape Recognition

Authors: Shan Luo, Wenxuan Mou, Kaspar Althoefer, Hongbin Liu

Abstract: Tactile data and kinesthetic cues are two important sensing sources in robot object recognition and are complementary to each other. In this paper, we propose a novel algorithm named Iterative Closest Labeled Point (iCLAP) to recognize objects using both tactile and kinesthetic information.The iCLAP first assigns different local tactile features with distinct label numbers. The label numbers of th… ▽ More Tactile data and kinesthetic cues are two important sensing sources in robot object recognition and are complementary to each other. In this paper, we propose a novel algorithm named Iterative Closest Labeled Point (iCLAP) to recognize objects using both tactile and kinesthetic information.The iCLAP first assigns different local tactile features with distinct label numbers. The label numbers of the tactile features together with their associated 3D positions form a 4D point cloud of the object. In this manner, the two sensing modalities are merged to form a synthesized perception of the touched object. To recognize an object, the partial 4D point cloud obtained from a number of touches iteratively matches with all the reference cloud models to identify the best fit. An extensive evaluation study with 20 real objects shows that our proposed iCLAP approach outperforms those using either of the separate sensing modalities, with a substantial recognition rate improvement of up to 18%. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: 6 pages, 8 figures, IROS 2016

arXiv:1707.05947 [pdf, other]

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

Authors: Wenlong Mou, Liwei Wang, Xiyu Zhai, Kai Zheng

Abstract: Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also… ▽ More Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{βT_k}\right)$, where $L$ is uniform Lipschitz parameter, $β$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning. △ Less

Submitted 19 July, 2017; originally announced July 2017.

arXiv:1706.03316 [pdf, ps, other]

Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible

Authors: Kai Zheng, Wenlong Mou, Liwei Wang

Abstract: Non-interactive Local Differential Privacy (LDP) requires data analysts to collect data from users through noisy channel at once. In this paper, we extend the frontiers of Non-interactive LDP learning and estimation from several aspects. For learning with smooth generalized linear losses, we propose an approximate stochastic gradient oracle estimated from non-interactive LDP channel, using Chebysh… ▽ More Non-interactive Local Differential Privacy (LDP) requires data analysts to collect data from users through noisy channel at once. In this paper, we extend the frontiers of Non-interactive LDP learning and estimation from several aspects. For learning with smooth generalized linear losses, we propose an approximate stochastic gradient oracle estimated from non-interactive LDP channel, using Chebyshev expansion. Combined with inexact gradient methods, we obtain an efficient algorithm with quasi-polynomial sample complexity bound. For the high-dimensional world, we discover that under $\ell_2$-norm assumption on data points, high-dimensional sparse linear regression and mean estimation can be achieved with logarithmic dependence on dimension, using random projection and approximate recovery. We also extend our methods to Kernel Ridge Regression. Our work is the first one that makes learning and estimation possible for a broad range of learning tasks under non-interactive LDP model. △ Less

Submitted 11 June, 2017; originally announced June 2017.

arXiv:1703.09947 [pdf, ps, other]

Efficient Private ERM for Smooth Objectives

Authors: Jiaqi Zhang, Kai Zheng, Wenlong Mou, Liwei Wang

Abstract: In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $ε$-… ▽ More In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $ε$-DP and $(ε, δ)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time. △ Less

Submitted 24 May, 2017; v1 submitted 29 March, 2017; originally announced March 2017.

arXiv:1612.04659 [pdf, other]

Stable Memory Allocation in the Hippocampus: Fundamental Limits and Neural Realization

Authors: Wenlong Mou, Zhi Wang, Liwei Wang

Abstract: It is believed that hippocampus functions as a memory allocator in brain, the mechanism of which remains unrevealed. In Valiant's neuroidal model, the hippocampus was described as a randomly connected graph, the computation on which maps input to a set of activated neuroids with stable size. Valiant proposed three requirements for the hippocampal circuit to become a stable memory allocator (SMA):… ▽ More It is believed that hippocampus functions as a memory allocator in brain, the mechanism of which remains unrevealed. In Valiant's neuroidal model, the hippocampus was described as a randomly connected graph, the computation on which maps input to a set of activated neuroids with stable size. Valiant proposed three requirements for the hippocampal circuit to become a stable memory allocator (SMA): stability, continuity and orthogonality. The functionality of SMA in hippocampus is essential in further computation within cortex, according to Valiant's model. In this paper, we put these requirements for memorization functions into rigorous mathematical formulation and introduce the concept of capacity, based on the probability of erroneous allocation. We prove fundamental limits for the capacity and error probability of SMA, in both data-independent and data-dependent settings. We also establish an example of stable memory allocator that can be implemented via neuroidal circuits. Both theoretical bounds and simulation results show that the neural SMA functions well. △ Less

Submitted 14 December, 2016; originally announced December 2016.

arXiv:1612.04571 [pdf, ps, other]

A Refined Analysis of LSH for Well-dispersed Data Points

Authors: Wenlong Mou, Liwei Wang

Abstract: Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve a-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low… ▽ More Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve a-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low doubling dimension, leading to significantly improved performance. In this paper, we address this issue and propose a refined analyses for running time of approximating near neighbors queries via LSH. We characterize dispersion of data using N_b, the number of b*r-near pairs among the data points. Combined with optimal data-oblivious LSH scheme, we get a new query time bound depending on N_b and doubling dimension. For many natural scenarios where points are well-dispersed or lying in a low-doubling-dimension space, our result leads to sharper performance than existing worst-case analysis. This paper not only present first rigorous proof on how LSHs make use of the structure of data points, but also provide important insights into parameter setting in the practice of LSH beyond worst case. Besides, the techniques in our analysis involve a generalized version of sphere packing problem, which might be of some independent interest. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: Paper accepted to SIAM Conference on Analytic Algorithmics and Combinatorics (ANALCO) 2017

arXiv:1507.03148 [pdf, other]

Face Alignment Assisted by Head Pose Estimation

Authors: Heng Yang, Wenxuan Mou, Yichi Zhang, Ioannis Patras, Hatice Gunes, Peter Robinson

Abstract: In this paper we propose a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation. We first investigate the failure cases of most state of the art face alignment approaches and observe that these failures often share one common global property, i.e. the head pose variation is usually large. Inspired by this, we propose a deep convolutional network model… ▽ More In this paper we propose a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation. We first investigate the failure cases of most state of the art face alignment approaches and observe that these failures often share one common global property, i.e. the head pose variation is usually large. Inspired by this, we propose a deep convolutional network model for reliable and accurate head pose estimation. Instead of using a mean face shape, or randomly selected shapes for cascaded face alignment initialisation, we propose two schemes for generating initialisation: the first one relies on projecting a mean 3D face shape (represented by 3D facial landmarks) onto 2D image under the estimated head pose; the second one searches nearest neighbour shapes from the training set according to head pose distance. By doing so, the initialisation gets closer to the actual shape, which enhances the possibility of convergence and in turn improves the face alignment performance. We demonstrate the proposed method on the benchmark 300W dataset and show very competitive performance in both head pose estimation and face alignment. △ Less

Submitted 18 July, 2015; v1 submitted 11 July, 2015; originally announced July 2015.

Comments: Accepted by BMVC2015

Showing 1–23 of 23 results for author: Mou, W