-
Boosting uniformity in quasirandom groups: fast and simple
Authors:
Harm Derksen,
Chin Ho Lee,
Emanuele Viola
Abstract:
We study the communication complexity of multiplying $k\times t$ elements from the group $H=\text{SL}(2,q)$ in the number-on-forehead model with $k$ parties. We prove a lower bound of $(t\log H)/c^{k}$. This is an exponential improvement over previous work, and matches the state-of-the-art in the area.
Relatedly, we show that the convolution of $k^{c}$ independent copies of a 3-uniform distribut…
▽ More
We study the communication complexity of multiplying $k\times t$ elements from the group $H=\text{SL}(2,q)$ in the number-on-forehead model with $k$ parties. We prove a lower bound of $(t\log H)/c^{k}$. This is an exponential improvement over previous work, and matches the state-of-the-art in the area.
Relatedly, we show that the convolution of $k^{c}$ independent copies of a 3-uniform distribution over $H^{m}$ is close to a $k$-uniform distribution. This is again an exponential improvement over previous work which needed $c^{k}$ copies. The proofs are remarkably simple; the results extend to other quasirandom groups.
We also show that for any group $H$, any distribution over $H^{m}$ whose weight-$k$ Fourier coefficients are small is close to a $k$-uniform distribution. This generalizes previous work in the abelian setting, and the proof is simpler.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Change Point Detection in Pairwise Comparison Data with Covariates
Authors:
Yi Han,
Thomas C. M. Lee
Abstract:
This paper introduces the novel piecewise stationary covariate-assisted ranking estimation (PS-CARE) model for analyzing time-evolving pairwise comparison data, enhancing item ranking accuracy through the integration of covariate information. By partitioning the data into distinct, stationary segments, the PS-CARE model adeptly detects temporal shifts in item rankings, known as change points, whos…
▽ More
This paper introduces the novel piecewise stationary covariate-assisted ranking estimation (PS-CARE) model for analyzing time-evolving pairwise comparison data, enhancing item ranking accuracy through the integration of covariate information. By partitioning the data into distinct, stationary segments, the PS-CARE model adeptly detects temporal shifts in item rankings, known as change points, whose number and positions are initially unknown. Leveraging the minimum description length (MDL) principle, this paper establishes a statistically consistent model selection criterion to estimate these unknowns. The practical optimization of this MDL criterion is done with the pruned exact linear time (PELT) algorithm. Empirical evaluations reveal the method's promising performance in accurately locating change points across various simulated scenarios. An application to an NBA dataset yielded meaningful insights that aligned with significant historical events, highlighting the method's practical utility and the MDL criterion's effectiveness in capturing temporal ranking changes. To the best of the authors' knowledge, this research pioneers change point detection in pairwise comparison data with covariate information, representing a significant leap forward in the field of dynamic ranking analysis.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
The Vizier Gaussian Process Bandit Algorithm
Authors:
Xingyou Song,
Qiuyi Zhang,
Chansoo Lee,
Emily Fertig,
Tzu-Kuo Huang,
Lior Belenki,
Greg Kochanski,
Setareh Ariafar,
Srinivas Vasudevan,
Sagi Perel,
Daniel Golovin
Abstract:
Google Vizier has performed millions of optimizations and accelerated numerous research and production systems at Google, demonstrating the success of Bayesian optimization as a large-scale service. Over multiple years, its algorithm has been improved considerably, through the collective experiences of numerous research efforts and user feedback. In this technical report, we discuss the implementa…
▽ More
Google Vizier has performed millions of optimizations and accelerated numerous research and production systems at Google, demonstrating the success of Bayesian optimization as a large-scale service. Over multiple years, its algorithm has been improved considerably, through the collective experiences of numerous research efforts and user feedback. In this technical report, we discuss the implementation details and design choices of the current default algorithm provided by Open Source Vizier. Our experiments on standardized benchmarks reveal its robustness and versatility against well-established industry baselines on multiple practical modes.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Scalable Multilevel Monte Carlo Methods Exploiting Parallel Redistribution on Coarse Levels
Authors:
Hillary R. Fairbanks,
Delyan Z. Kalchev,
Chak Shing Lee,
Panayot S. Vassilevski
Abstract:
We study an element agglomeration coarsening strategy that requires data redistribution at coarse levels when the number of coarse elements becomes smaller than the used computational units (cores). The overall procedure generates coarse elements (general unstructured unions of fine grid elements) within the framework of element-based algebraic multigrid methods (or AMGe) studied previously. The A…
▽ More
We study an element agglomeration coarsening strategy that requires data redistribution at coarse levels when the number of coarse elements becomes smaller than the used computational units (cores). The overall procedure generates coarse elements (general unstructured unions of fine grid elements) within the framework of element-based algebraic multigrid methods (or AMGe) studied previously. The AMGe generated coarse spaces have the ability to exhibit approximation properties of the same order as the fine-level ones since by construction they contain the piecewise polynomials of the same order as the fine level ones. These approximation properties are key for the successful use of AMGe in multilevel solvers for nonlinear partial differential equations as well as for multilevel Monte Carlo (MLMC) simulations. The ability to coarsen without being constrained by the number of available cores, as described in the present paper, allows to improve the scalability of these solvers as well as in the overall MLMC method. The paper illustrates this latter fact with detailed scalability study of MLMC simulations applied to model Darcy equations with a stochastic log-normal permeability field.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Chip-firing on the Platonic solids: a primer for studying graph gonality
Authors:
Marchelle Beougher,
Kexin Ding,
Max Everett,
Robin Huang,
Chan Lee,
Ralph Morrison,
Ben Weber
Abstract:
This paper provides a friendly introduction to chip-firing games and graph gonality. We use graphs coming from the five Platonic solids to illustrate different tools and techniques for studying these games, including independent sets, treewidth, scramble number, and Dhar's burning algorithm. In addition to showcasing some previously known results, we present the first proofs that the dodecahedron…
▽ More
This paper provides a friendly introduction to chip-firing games and graph gonality. We use graphs coming from the five Platonic solids to illustrate different tools and techniques for studying these games, including independent sets, treewidth, scramble number, and Dhar's burning algorithm. In addition to showcasing some previously known results, we present the first proofs that the dodecahedron graph has gonality $6$, and that the icosahedron graph has gonality~$9$.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Uniform Transformation: Refining Latent Representation in Variational Autoencoders
Authors:
Ye Shi,
C. S. George Lee
Abstract:
Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transfor…
▽ More
Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transform (PIT) -- to address irregular latent distributions. By reconfiguring irregular distributions into a uniform distribution in the latent space, our approach significantly enhances the disentanglement and interpretability of latent representations, overcoming the limitation of traditional VAE models in capturing complex data structures. Empirical evaluations demonstrated the efficacy of our proposed UT module in improving disentanglement metrics across benchmark datasets -- dSprites and MNIST. Our findings suggest a promising direction for advancing representation learning techniques, with implication for future research in extending this framework to more sophisticated datasets and downstream tasks.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A four-operator splitting algorithm for nonconvex and nonsmooth optimization
Authors:
Jan Harold Alcantara,
Ching-pei Lee,
Akiko Takeda
Abstract:
In this work, we address a class of nonconvex nonsmooth optimization problems where the objective function is the sum of two smooth functions (one of which is proximable) and two nonsmooth functions (one proper, closed and proximable, and the other continuous and weakly concave). We introduce a new splitting algorithm that extends the Davis-Yin splitting (DYS) algorithm to handle such four-term no…
▽ More
In this work, we address a class of nonconvex nonsmooth optimization problems where the objective function is the sum of two smooth functions (one of which is proximable) and two nonsmooth functions (one proper, closed and proximable, and the other continuous and weakly concave). We introduce a new splitting algorithm that extends the Davis-Yin splitting (DYS) algorithm to handle such four-term nonconvex nonsmooth problems. We prove that with appropriately chosen step sizes, our algorithm exhibits global subsequential convergence to stationary points with a stationarity measure converging at a rate of $1/k$. When specialized to the setting of the DYS algorithm, our results allow for larger stepsizes compared to existing bounds in the literature. Experimental results demonstrate the practical applicability and effectiveness of our proposed algorithm.
△ Less
Submitted 1 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
A Nonoverlapping Domain Decomposition Method for Extreme Learning Machines: Elliptic Problems
Authors:
Chang-Ock Lee,
Youngkyu Lee,
Byungeun Ryoo
Abstract:
Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network. It presets the weight/bias coefficients in the hidden layer with random values, which remain fixed throughout the computation, and uses a linear least squares method for training the parameters of the output layer of the neural network. It is kn…
▽ More
Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network. It presets the weight/bias coefficients in the hidden layer with random values, which remain fixed throughout the computation, and uses a linear least squares method for training the parameters of the output layer of the neural network. It is known to be much faster than Physics informed neural networks. However, classical ELM is still computationally expensive when a high level of representation is desired in the solution as this requires solving a large least squares system. In this paper, we propose a nonoverlapping domain decomposition method (DDM) for ELMs that not only reduces the training time of ELMs, but is also suitable for parallel computation. In numerical analysis, DDMs have been widely studied to reduce the time to obtain finite element solutions for elliptic PDEs through parallel computation. Among these approaches, nonoverlapping DDMs are attracting the most attention. Motivated by these methods, we introduce local neural networks, which are valid only at corresponding subdomains, and an auxiliary variable at the interface. We construct a system on the variable and the parameters of local neural networks. A Schur complement system on the interface can be derived by eliminating the parameters of the output layer. The auxiliary variable is then directly obtained by solving the reduced system after which the parameters for each local neural network are solved in parallel. A method for initializing the hidden layer parameters suitable for high approximation quality in large systems is also proposed. Numerical results that verify the acceleration performance of the proposed method with respect to the number of subdomains are presented.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms
Authors:
Yang Cai,
Gabriele Farina,
Julien Grand-Clément,
Christian Kroer,
Chung-Wei Lee,
Haipeng Luo,
Weiqiang Zheng
Abstract:
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several adva…
▽ More
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $δ>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/δ$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
A topological model for the HOMFLY-PT polynomial
Authors:
Cristina Ana-Maria Anghel,
Christine Ruey Shan Lee
Abstract:
We give the first known topological model for the HOMFLY-PT polynomial. More precisely, we prove that this invariant is given by a set of graded intersections between explicit Lagrangian submanifolds in a fixed configuration space on a Heegaard surface for the link exterior. The submanifolds are supported on arcs and ovals on the surface.
The construction also leads to a topological model for th…
▽ More
We give the first known topological model for the HOMFLY-PT polynomial. More precisely, we prove that this invariant is given by a set of graded intersections between explicit Lagrangian submanifolds in a fixed configuration space on a Heegaard surface for the link exterior. The submanifolds are supported on arcs and ovals on the surface.
The construction also leads to a topological model for the Jones polynomial constructed from Heegaard surfaces associated directly to the link diagram. In particular, it does not rely on a choice of a braid representative for the link. This opens up new avenues for investigation of the geometry of these invariants, as well as categorifications of geometric nature.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Connections between Reachability and Time Optimality
Authors:
Juho Bae,
Ji Hoon Bai,
Byung-Yoon Lee,
Jun-Yong Lee,
Chang-Hun Lee
Abstract:
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti…
▽ More
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of optimal control problems to address problems in corresponding equivalent classes. As a byproduct, we state and prove the construction methods of the reachability sets of three-dimensional curves with prescribed curvature bound. The findings are twofold: Firstly, we prove that any boundary point of the reachability set, with the terminal direction taken into account, can be accessed via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Secondly, we show that any boundary point of the reachability set, without considering the terminal direction, can be accessed by curves of CC, CS, or their respective subsegments. These findings extend the developments presented in literature regarding planar curves, or Dubins car dynamics, into spatial curves in $\mathbb{R}^3$. For higher dimensions, we confirm that the problem of identifying the reachability set of curvature bounded paths subsumes the well-known Markov-Dubins problem. These advancements in understanding the reachability of curvature bounded paths in $\mathbb{R}^3$ hold significant practical implications, particularly in the contexts of mission planning problems and time optimal guidance.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network
Authors:
Zih-Syuan Huang,
Ching-pei Lee
Abstract:
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training structured neural networks. Similar to existing regularized adaptive methods, the subproblem for computing the update direction of RAMDA involves a nonsmooth regularizer and a diagonal preconditioner, and therefore does not possess a closed-form solution in general. We thus also carefully devise an implementab…
▽ More
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training structured neural networks. Similar to existing regularized adaptive methods, the subproblem for computing the update direction of RAMDA involves a nonsmooth regularizer and a diagonal preconditioner, and therefore does not possess a closed-form solution in general. We thus also carefully devise an implementable inexactness condition that retains convergence guarantees similar to the exact versions, and propose a companion efficient solver for the subproblems of both RAMDA and existing methods to make them practically feasible. We leverage the theory of manifold identification in variational analysis to show that, even in the presence of such inexactness, the iterates of RAMDA attain the ideal structure induced by the regularizer at the stationary point of asymptotic convergence. This structure is locally optimal near the point of convergence, so RAMDA is guaranteed to obtain the best structure possible among all methods converging to the same point, making it the first regularized adaptive method outputting models that possess outstanding predictive performance while being (locally) optimally structured. Extensive numerical experiments in large-scale modern computer vision, language modeling, and speech tasks show that the proposed RAMDA is efficient and consistently outperforms state of the art for training structured neural network. Implementation of our algorithm is available at http://www.github.com/ismoptgroup/RAMDA/.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Totally Geodesic Surfaces in Hyperbolic 3-Manifolds: Algorithms and Examples
Authors:
Brannon Basilio,
Chaeryn Lee,
Joseph Malionek
Abstract:
Finding a totally geodesic surface, an embedded surface where the geodesics in the surface are also geodesics in the surrounding manifold, has been a problem of interest in the study of 3-manifolds. This has especially been of interest in hyperbolic 3-manifolds and knot complements, complements of piecewise-linearly embedded circles in the 3-sphere. This is due to Menasco-Reid's conjecture stating…
▽ More
Finding a totally geodesic surface, an embedded surface where the geodesics in the surface are also geodesics in the surrounding manifold, has been a problem of interest in the study of 3-manifolds. This has especially been of interest in hyperbolic 3-manifolds and knot complements, complements of piecewise-linearly embedded circles in the 3-sphere. This is due to Menasco-Reid's conjecture stating that hyperbolic knot complements do not contain such surfaces. Here, we present an algorithm that determines whether a given surface is totally geodesic and an algorithm that checks whether a given 3-manifold contains a totally geodesic surface. We applied our algorithm on over 150,000 3-manifolds and discovered nine 3-manifolds with totally geodesic surfaces. Additionally, we verified Menasco-Reid's conjecture for knots up to 12 crossings.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
The gonality of chess graphs
Authors:
Nila Cibu,
Kexin Ding,
Steven DiSilvio,
Sasha Kononova,
Chan Lee,
Ralph Morrison,
Krish Singal
Abstract:
Chess graphs encode the moves that a particular chess piece can make on an $m\times n$ chessboard. We study through these graphs through the lens of chip-firing games and graph gonality. We provide upper and lower bounds for the gonality of king's, bishop's, and knight's graphs, as well as for the toroidal versions of these graphs. We also prove that among all chess graphs, there exists an upper b…
▽ More
Chess graphs encode the moves that a particular chess piece can make on an $m\times n$ chessboard. We study through these graphs through the lens of chip-firing games and graph gonality. We provide upper and lower bounds for the gonality of king's, bishop's, and knight's graphs, as well as for the toroidal versions of these graphs. We also prove that among all chess graphs, there exists an upper bound on gonality solely in terms of $\min\{m,n\}$, except for queen's, toroidal queen's, rook's, and toroidal bishop's graphs.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Designing Problems for Improved Instruction and Learning -- Linear Algebra
Authors:
Ryan H. Allaire,
Margaret Reynolds,
Andrew C. Lee
Abstract:
One of the grand challenges of Mathematics instruction is to provide students with problems that are both accessible and have a reasonably elegant solution. Instructors commonly resort to resources like course textbooks, online-learning platforms, or other automated problem-generating software to select problems for exams and assignments. However, reliance on such tools may result in limited contr…
▽ More
One of the grand challenges of Mathematics instruction is to provide students with problems that are both accessible and have a reasonably elegant solution. Instructors commonly resort to resources like course textbooks, online-learning platforms, or other automated problem-generating software to select problems for exams and assignments. However, reliance on such tools may result in limited control over problem parameters, potentially yielding intricate solutions that impede students' understanding. This article centers on Linear Algebra, wherein we devise algorithms for reverse engineering matrices of integers with integer outcomes through operations such as the inverse, LU decomposition, and QR decomposition. The focus is on empowering instructors to manipulate matrix properties deliberately, ensuring the creation of problems that enrich instruction and foster student confidence. The intellectual endeavor of reverse engineering such problems, grounded in both theory and matrix properties, proves mutually beneficial for both students and instructors alike.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Generators for the cohomology of the moduli space of irregular parabolic Higgs bundles
Authors:
Jia Choon Lee,
Sukjoo Lee
Abstract:
We prove that the pure part of the cohomology ring of the moduli space of irregular $\underlineξ$-parabolic Higgs bundles is generated by the Künneth components of the Chern classes of a universal bundle and the Chern classes of the successive quotients of a universal flag of subbundles. As an application, in the regular full-flag case, we demonstrate a similar result for the cohomology ring of th…
▽ More
We prove that the pure part of the cohomology ring of the moduli space of irregular $\underlineξ$-parabolic Higgs bundles is generated by the Künneth components of the Chern classes of a universal bundle and the Chern classes of the successive quotients of a universal flag of subbundles. As an application, in the regular full-flag case, we demonstrate a similar result for the cohomology ring of the moduli spaces of parabolic and strongly parabolic Higgs bundles.
△ Less
Submitted 13 August, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Minimal grid diagrams of the prime knots with crossing number 13 and arc index 13
Authors:
Hwa Jeong Lee,
Yoonsang Lee,
Chanmin Lee,
Yeseo Park,
Hun Kim,
Gyo Taek Jin
Abstract:
We give a list of minimal grid diagrams of the 13 crossing prime nonalternating knots which have arc index 13. There are 9,988 prime knots with crossing number 13. Among them 4,878 are alternating and have arc index 15. Among the other nonalternating knots, 49, 399, 1,412 and 3,250 have arc index 10, 11, 12, and 13, respectively. We used the Dowker-Thistlethwaite code of the 3,250 knots provided b…
▽ More
We give a list of minimal grid diagrams of the 13 crossing prime nonalternating knots which have arc index 13. There are 9,988 prime knots with crossing number 13. Among them 4,878 are alternating and have arc index 15. Among the other nonalternating knots, 49, 399, 1,412 and 3,250 have arc index 10, 11, 12, and 13, respectively. We used the Dowker-Thistlethwaite code of the 3,250 knots provided by the program Knotscape to generate spanning trees of the corresponding knot diagrams to obtain minimal arc presentations in the form of grid diagrams.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Thanasis Pittas
Abstract:
We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge α$ for some known parameter $α$, and each $P_i$ has unknown covariance $Σ_i \preceq σ^2_i \cdot I_d$ for some unknown $σ_i$, the goal is to cluster the sam…
▽ More
We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge α$ for some known parameter $α$, and each $P_i$ has unknown covariance $Σ_i \preceq σ^2_i \cdot I_d$ for some unknown $σ_i$, the goal is to cluster the samples assuming a pairwise mean separation in the order of $(σ_i+σ_j)/\sqrtα$ between every pair of components $P_i$ and $P_j$. Our contributions are as follows:
For the special case of nearly uniform mixtures, we give the first poly-time algorithm for this clustering task. Prior work either required separation scaling with the maximum cluster standard deviation (i.e. $\max_i σ_i$) [DKK+22b] or required both additional structural assumptions and mean separation scaling as a large degree polynomial in $1/α$ [BKK22].
For general-weight mixtures, we point out that accurate clustering is information-theoretically impossible under our fine-grained mean separation assumptions. We introduce the notion of a clustering refinement -- a list of not-too-small subsets satisfying a similar separation, and which can be merged into a clustering approximating the ground truth -- and show that it is possible to efficiently compute an accurate clustering refinement of the samples. Furthermore, under a variant of the "no large sub-cluster'' condition from in prior work [BKK22], we show that our algorithm outputs an accurate clustering, not just a refinement, even for general-weight mixtures. As a corollary, we obtain efficient clustering algorithms for mixtures of well-conditioned high-dimensional log-concave distributions.
Moreover, our algorithm is robust to $Ω(α)$-fraction of adversarial outliers.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments
Authors:
Trung Dang,
Jasper C. H. Lee,
Maoyuan Song,
Paul Valiant
Abstract:
There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distribution…
▽ More
There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+α$ moment exists for $α\in (0,1)$. Both results, however, are optimal only in the worst case. We initiate the fine-grained study of the mean estimation problem: Can algorithms leverage useful features of the input distribution to beat the sub-Gaussian rate, without explicit knowledge of such features?
We resolve this question with an unexpectedly nuanced answer: "Yes in limited regimes, but in general no". For any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. More generally, we introduce a new definitional framework to analyze the fine-grained optimality of algorithms, which we call "neighborhood optimality", interpolating between the unattainably strong "instance optimality" and the trivially weak "admissibility" definitions. Applying the new framework, we show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-optimal estimator without constant factor slackness.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Spectrum of the Laplacian on the Fricke-Macbeath surface
Authors:
Chul-hee Lee
Abstract:
The Fricke-Macbeath surface is the unique Hurwitz surface of genus 7 with 504 conformal automorphisms. In this paper, we prove that the first eigenvalue of the Laplacian on the Fricke-Macbeath surface has a sevenfold multiplicity and contained in the interval $[1.23, 1.26]$. Further, we numerically identify the 7-dimensional representation of its automorphism group corresponding to the eigenspace…
▽ More
The Fricke-Macbeath surface is the unique Hurwitz surface of genus 7 with 504 conformal automorphisms. In this paper, we prove that the first eigenvalue of the Laplacian on the Fricke-Macbeath surface has a sevenfold multiplicity and contained in the interval $[1.23, 1.26]$. Further, we numerically identify the 7-dimensional representation of its automorphism group corresponding to the eigenspace associated to the first eigenvalue. We also determine the Dirichlet domain centered at 0 for a Fuchsian group that uniformizes the Fricke-Macbeath surface, identifying the algebraic coordinates for its vertices.
△ Less
Submitted 17 December, 2023; v1 submitted 5 November, 2023;
originally announced November 2023.
-
On the image of convolutions along an arithmetic progression
Authors:
Ernie Croot,
Chi-Nuo Lee
Abstract:
We consider the question of determining the structure of the set of all $d$-dimensional vectors of the form $N^{-1}(1_A*1_{-A}(x_1), ..., 1_A*1_{-A}(x_d))$ for $A \subseteq \{1,...,N\}$, and also the set of all $(2N+1)^{-1}(1_B*1_B(x_1), ..., 1_B*1_B(x_d))$, for $B \subseteq \{-N, -N+1, ..., 0, 1, ..., N\}$, where $x_1,...,x_d$ are fixed positive integers (we let $N \to \infty$). Using an elementa…
▽ More
We consider the question of determining the structure of the set of all $d$-dimensional vectors of the form $N^{-1}(1_A*1_{-A}(x_1), ..., 1_A*1_{-A}(x_d))$ for $A \subseteq \{1,...,N\}$, and also the set of all $(2N+1)^{-1}(1_B*1_B(x_1), ..., 1_B*1_B(x_d))$, for $B \subseteq \{-N, -N+1, ..., 0, 1, ..., N\}$, where $x_1,...,x_d$ are fixed positive integers (we let $N \to \infty$). Using an elementary method related to the Birkhoff-von Neumann theorem on decompositions of doubly-stochastic matrices we show that both the above two sets of vectors roughly form polytopes; and of particular interest is the question of bounding the number of corner vertices, as well as understanding their structure.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Balanced Group Convolution: An Improved Group Convolution Based on Approximability Estimates
Authors:
Youngkyu Lee,
Jongho Park,
Chang-Ock Lee
Abstract:
The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers. However, this increase in performance comes with a higher computational cost, resulting in numerous studies focused on reducing it. One promising approach to address this issue is group convolution, which effectively reduces the computational cost by grouping channels. H…
▽ More
The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers. However, this increase in performance comes with a higher computational cost, resulting in numerous studies focused on reducing it. One promising approach to address this issue is group convolution, which effectively reduces the computational cost by grouping channels. However, to the best of our knowledge, there has been no theoretical analysis on how well the group convolution approximates the standard convolution. In this paper, we mathematically analyze the approximation of the group convolution to the standard convolution with respect to the number of groups. Furthermore, we propose a novel variant of the group convolution called balanced group convolution, which shows a higher approximation with a small additional computational cost. We provide experimental results that validate our theoretical findings and demonstrate the superior performance of the balanced group convolution over other variants of group convolution.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
On character table of Clifford groups
Authors:
Chin-Yen Lee,
Wei-Hsuan Yu,
Yung-Ning Peng,
Ching-Jui Lai
Abstract:
Based on a presentation of $\mathcal{C}_n$ and the help of [GAP], we construct the character table of the Clifford group $\mathcal{C}_n$ for $n=1,2,3$. As an application, we can efficiently decompose the (higher power of) tensor product of the matrix representation in those cases. Our results recover some known results in [HWW, WF] and reveal some new phenomena. We prove that when $n \geq 3$, (1)…
▽ More
Based on a presentation of $\mathcal{C}_n$ and the help of [GAP], we construct the character table of the Clifford group $\mathcal{C}_n$ for $n=1,2,3$. As an application, we can efficiently decompose the (higher power of) tensor product of the matrix representation in those cases. Our results recover some known results in [HWW, WF] and reveal some new phenomena. We prove that when $n \geq 3$, (1) the trivial character is the only linear character for $\mathcal{C}_n$ and hence $\mathcal{C}_n$ equals to its commutator subgroup, (2) the $n$-qubit Pauli group $\mathcal{P}_n$ is the only proper non-trivial normal subgroup of $\mathcal{C}_n$, (3) the matrix representation $\mathcal{M}_{2^n}$ is a faithful representation for $\mathcal{C}_n$. As a byproduct, we give a presentation of the finite symplectic group $Sp(2n,2)$ in terms of generators and relations.
△ Less
Submitted 25 October, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Parameter-Varying Koopman Operator for Nonlinear System Modeling and Control
Authors:
Changyu Lee,
Kiyong Park,
Jinwhan Kim
Abstract:
This paper proposes a novel approach for modeling and controlling nonlinear systems with varying parameters. The approach introduces the use of a parameter-varying Koopman operator (PVKO) in a lifted space, which provides an efficient way to understand system behavior and design control algorithms that account for underlying dynamics and changing parameters. The PVKO builds on a conventional Koopm…
▽ More
This paper proposes a novel approach for modeling and controlling nonlinear systems with varying parameters. The approach introduces the use of a parameter-varying Koopman operator (PVKO) in a lifted space, which provides an efficient way to understand system behavior and design control algorithms that account for underlying dynamics and changing parameters. The PVKO builds on a conventional Koopman model by incorporating local time-invariant linear systems through interpolation within the lifted space. This paper outlines a procedure for identifying the PVKO and designing a model predictive control using the identified PVKO model. Simulation results demonstrate that the proposed approach improves model accuracy and enables predictions based on future parameter information. The feasibility and stability of the proposed control approach are analyzed, and their effectiveness is demonstrated through simulation.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Local times of anisotropic Gaussian random fields and stochastic heat equation
Authors:
Cheuk Yin Lee,
Yimin Xiao
Abstract:
We study the local times of a large class of Gaussian random fields satisfying strong local nondeterminism with respect to an anisotropic metric. We establish moment estimates and Hölder conditions for the local times of the Gaussian random fields. Our key estimates rely on geometric properties of Voronoi partitions with respect to an anisotropic metric and the use of Besicovitch's covering theore…
▽ More
We study the local times of a large class of Gaussian random fields satisfying strong local nondeterminism with respect to an anisotropic metric. We establish moment estimates and Hölder conditions for the local times of the Gaussian random fields. Our key estimates rely on geometric properties of Voronoi partitions with respect to an anisotropic metric and the use of Besicovitch's covering theorem. As a consequence, we deduce sample path properties of the Gaussian random fields that are related to Chung's law of the iterated logarithm and modulus of non-differentiability. Moreover, we apply our results to systems of stochastic heat equations with additive Gaussian noise and determine the exact Hausdorff measure function with respect to the parabolic metric for the level sets of the solutions.
△ Less
Submitted 30 October, 2023; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Multilevel well modeling in aggregation-based nonlinear multigrid for multiphase flow in porous media
Authors:
Chak Shing Lee,
François P. Hamon,
Nicola Castelletto,
Panayot S. Vassilevski,
Joshua A. White
Abstract:
A full approximation scheme (FAS) nonlinear multigrid solver for two-phase flow and transport problems driven by wells with multiple perforations is developed. It is an extension to our previous work on FAS solvers for diffusion and transport problems. The solver is applicable to discrete problems defined on unstructured grids as the coarsening algorithm is aggregation-based and algebraic. To cons…
▽ More
A full approximation scheme (FAS) nonlinear multigrid solver for two-phase flow and transport problems driven by wells with multiple perforations is developed. It is an extension to our previous work on FAS solvers for diffusion and transport problems. The solver is applicable to discrete problems defined on unstructured grids as the coarsening algorithm is aggregation-based and algebraic. To construct coarse basis that can better capture the radial flow near wells, coarse grids in which perforated well cells are not near the coarse-element interface are desired. This is achieved by an aggregation algorithm proposed in this paper that makes use of the location of well cells in the cell-connectivity graph. Numerical examples in which the FAS solver is compared against Newton's method on benchmark problems are given. In particular, for a refined version of the SAIGUP model, the FAS solver is at least 35% faster than Newton's method for time steps with a CFL number greater than 10.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Stable Khovanov homology and Volume
Authors:
Christine Ruey Shan Lee
Abstract:
We show the $n$ colored Jones polynomials of a highly twisted link approach the Kauffman bracket of an $n$ colored skein element. This is in the sense that the corresponding categorifications of the colored Jones polynomials approach the categorification of the Kauffman bracket of the skein element in a direct limit, as the number of twisting of each twist region tends toward infinity, proving a q…
▽ More
We show the $n$ colored Jones polynomials of a highly twisted link approach the Kauffman bracket of an $n$ colored skein element. This is in the sense that the corresponding categorifications of the colored Jones polynomials approach the categorification of the Kauffman bracket of the skein element in a direct limit, as the number of twisting of each twist region tends toward infinity, proving a quantum version of Thurston's hyperbolic Dehn surgery theorem implicit in Rozansky's work, and categorifying a result by Champanerkar-Kofman. In view of the volume conjecture, we compute the asymptotic growth rate of the Kauffman bracket of the limiting skein element at a root of unity and relate it to the volume of regular ideal octahedra that arise naturally from the evaluation of the colored Jones polynomials of the link.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Finite-Sample Symmetric Mean Estimation with Fisher Information Rate
Authors:
Shivam Gupta,
Jasper C. H. Lee,
Eric Price
Abstract:
The mean of an unknown variance-$σ^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{σ^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone…
▽ More
The mean of an unknown variance-$σ^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{σ^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $δ$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, δ$ with $n > \log \frac{1}δ$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Polarity of points for systems of nonlinear stochastic heat equations in the critical dimension
Authors:
Cheuk Yin Lee,
Yimin Xiao
Abstract:
Let $u(t, x) = (u_1(t, x), \dots, u_d(t, x))$ be the solution to the systems of nonlinear stochastic heat equations \[ \begin{split} \frac{\partial}{\partial t} u(t, x) &= \frac{\partial^2}{\partial x^2} u(t, x) + σ(u(t, x)) \dot{W}(t, x),\\ u(0, x) &= u_0(x), \end{split} \] where $t \ge 0$, $x \in \mathbb{R}$, $\dot{W}(t, x) = (\dot{W}_1(t, x), \dots, \dot{W}_d(t, x))$ is a vector of $d$ independ…
▽ More
Let $u(t, x) = (u_1(t, x), \dots, u_d(t, x))$ be the solution to the systems of nonlinear stochastic heat equations \[ \begin{split} \frac{\partial}{\partial t} u(t, x) &= \frac{\partial^2}{\partial x^2} u(t, x) + σ(u(t, x)) \dot{W}(t, x),\\ u(0, x) &= u_0(x), \end{split} \] where $t \ge 0$, $x \in \mathbb{R}$, $\dot{W}(t, x) = (\dot{W}_1(t, x), \dots, \dot{W}_d(t, x))$ is a vector of $d$ independent space-time white noises, and $σ: \mathbb{R}^d \to \mathbb{R}^{d\times d}$ is a matrix-valued function. We say that a subset $S$ of $\mathbb{R}^d$ is polar for $\{u(t, x), t \ge 0, x \in \mathbb{R}\}$ if \[ \mathbb{P}\{u(t,x) \in S \text{ for some } t>0 \text{ and } x\in\mathbb{R} \}=0. \] The main result of this paper shows that, in the critical dimension $d=6$, all points in $\mathbb{R}^d$ are polar for $\{u(t, x), t \ge 0, x \in \mathbb{R}\}$. This solves an open problem of Dalang, Khoshnevisan and Nualart (2009, 2013) and Dalang, Mueller and Xiao (2021). We also provide a sufficient condition for a subset $S$ of $\mathbb{R}^d$ to be polar.
△ Less
Submitted 20 August, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Constraint Programming to Improve Hub Utilization in Autonomous Transfer Hub Networks
Authors:
Chungjae Lee,
Wirattawut Boonbandansook,
Vahid Eghbal Akhlaghi,
Kevin Dalmeijer,
Pascal Van Hentenryck
Abstract:
The Autonomous Transfer Hub Network (ATHN) is one of the most promising ways to adapt self-driving trucks for the freight industry. These networks use autonomous trucks for the middle mile, while human drivers perform the first and last miles. This paper extends previous work on optimizing ATHN operations by including transfer hub capacities, which are crucial for labor planning and policy design.…
▽ More
The Autonomous Transfer Hub Network (ATHN) is one of the most promising ways to adapt self-driving trucks for the freight industry. These networks use autonomous trucks for the middle mile, while human drivers perform the first and last miles. This paper extends previous work on optimizing ATHN operations by including transfer hub capacities, which are crucial for labor planning and policy design. It presents a Constraint Programming (CP) model that shifts an initial schedule produced by a Mixed Integer Program to minimize the hub capacities. The scalability of the CP model is demonstrated on a case study at the scale of the United States, based on data provided by Ryder System, Inc. The CP model efficiently finds optimal solutions and lowers the necessary total hub capacity by 42%, saving $15.2M in annual labor costs. The results also show that the reduced capacity is close to a theoretical (optimistic) lower bound.
△ Less
Submitted 22 September, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Optimizing Autonomous Transfer Hub Networks: Quantifying the Potential Impact of Self-Driving Trucks
Authors:
Chungjae Lee,
Kevin Dalmeijer,
Pascal Van Hentenryck,
Peibo Zhang
Abstract:
Autonomous trucks are expected to fundamentally transform the freight transportation industry. In particular, Autonomous Transfer Hub Networks (ATHNs), which combine autonomous trucks on middle miles with human-driven trucks on the first and last miles, are seen as the most likely deployment pathway for this technology. This paper presents a framework to optimize ATHN operations and evaluate the b…
▽ More
Autonomous trucks are expected to fundamentally transform the freight transportation industry. In particular, Autonomous Transfer Hub Networks (ATHNs), which combine autonomous trucks on middle miles with human-driven trucks on the first and last miles, are seen as the most likely deployment pathway for this technology. This paper presents a framework to optimize ATHN operations and evaluate the benefits of autonomous trucking. By exploiting the problem structure, this paper introduces a flow-based optimization model for this purpose that can be solved by blackbox solvers in a matter of hours. The resulting framework is easy to apply and enables the data-driven analysis of large-scale systems. The power of this approach is demonstrated on a system that spans all of the United States over a four-week horizon. The case study quantifies the potential impact of autonomous trucking and shows that ATHNs can have significant benefits over traditional transportation networks.
△ Less
Submitted 14 August, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $α<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(μ, Σ)$, the goal is to output a list of $O(1/α)$ hypotheses at least one of which is close to $Σ$ in relative Frobenius norm. Our main result is a…
▽ More
We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $α<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(μ, Σ)$, the goal is to output a list of $O(1/α)$ hypotheses at least one of which is close to $Σ$ in relative Frobenius norm. Our main result is a $\mathrm{poly}(d,1/α)$ sample and time algorithm for this task that guarantees relative Frobenius norm error of $\mathrm{poly}(1/α)$. Importantly, our algorithm relies purely on spectral techniques. As a corollary, we obtain an efficient spectral algorithm for robust partial clustering of Gaussian mixture models (GMMs) -- a key ingredient in the recent work of [BDJ+22] on robustly learning arbitrary GMMs. Combined with the other components of [BDJ+22], our new method yields the first Sum-of-Squares-free algorithm for robustly learning GMMs. At the technical level, we develop a novel multi-filtering method for list-decodable covariance estimation that may be useful in other settings.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Regularization of the inverse Laplace transform by Mollification
Authors:
Pierre Maréchal,
Faouzi Triki,
Walter C. Simo Tao Lee
Abstract:
In this paper we study the inverse Laplace transform. We first derive a new global logarithmic stability estimate that shows that the inversion is severely ill-posed. Then we propose a regularization method to compute the inverse Laplace transform using the concept of mollification. Taking into account the exponential instability we derive a criterion for selection of the regularization parameter.…
▽ More
In this paper we study the inverse Laplace transform. We first derive a new global logarithmic stability estimate that shows that the inversion is severely ill-posed. Then we propose a regularization method to compute the inverse Laplace transform using the concept of mollification. Taking into account the exponential instability we derive a criterion for selection of the regularization parameter. We show that by taking the optimal value of this parameter we improve significantly the convergence of the method. Finally, making use of the holomorphic extension of the Laplace transform, we suggest a new PDEs based numerical method for the computation of the solution. The effectiveness of the proposed regularization method is demonstrated through several numerical examples.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Statistical and computational rates in high rank tensor estimation
Authors:
Chanwoo Lee,
Miaoyan Wang
Abstract:
Higher-order tensor datasets arise commonly in recommendation systems, neuroimaging, and social networks. Here we develop probable methods for estimating a possibly high rank signal tensor from noisy observations. We consider a generative latent variable tensor model that incorporates both high rank and low rank models, including but not limited to, simple hypergraphon models, single index models,…
▽ More
Higher-order tensor datasets arise commonly in recommendation systems, neuroimaging, and social networks. Here we develop probable methods for estimating a possibly high rank signal tensor from noisy observations. We consider a generative latent variable tensor model that incorporates both high rank and low rank models, including but not limited to, simple hypergraphon models, single index models, low-rank CP models, and low-rank Tucker models. Comprehensive results are developed on both the statistical and computational limits for the signal tensor estimation. We find that high-dimensional latent variable tensors are of log-rank; the fact explains the pervasiveness of low-rank tensors in applications. Furthermore, we propose a polynomial-time spectral algorithm that achieves the computationally optimal rate. We show that the statistical-computational gap emerges only for latent variable tensors of order 3 or higher. Numerical experiments and two real data applications are presented to demonstrate the practical merits of our methods.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Branch & Learn with Post-hoc Correction for Predict+Optimize with Unknown Parameters in Constraints
Authors:
Xinyi Hu,
Jasper C. H. Lee,
Jimmy H. M. Lee
Abstract:
Combining machine learning and constrained optimization, Predict+Optimize tackles optimization problems containing parameters that are unknown at the time of solving. Prior works focus on cases with unknowns only in the objectives. A new framework was recently proposed to cater for unknowns also in constraints by introducing a loss function, called Post-hoc Regret, that takes into account the cost…
▽ More
Combining machine learning and constrained optimization, Predict+Optimize tackles optimization problems containing parameters that are unknown at the time of solving. Prior works focus on cases with unknowns only in the objectives. A new framework was recently proposed to cater for unknowns also in constraints by introducing a loss function, called Post-hoc Regret, that takes into account the cost of correcting an unsatisfiable prediction. Since Post-hoc Regret is non-differentiable, the previous work computes only its approximation. While the notion of Post-hoc Regret is general, its specific implementation is applicable to only packing and covering linear programming problems. In this paper, we first show how to compute Post-hoc Regret exactly for any optimization problem solvable by a recursive algorithm satisfying simple conditions. Experimentation demonstrates substantial improvement in the quality of solutions as compared to the earlier approximation approach. Furthermore, we show experimentally the empirical behavior of different combinations of correction and penalty functions used in the Post-hoc Regret of the same benchmarks. Results provide insights for defining the appropriate Post-hoc Regret in different application scenarios.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors
Authors:
Shivam Gupta,
Jasper C. H. Lee,
Eric Price
Abstract:
In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $λ$, and want to estimate $λ$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cramér-Rao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends o…
▽ More
In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $λ$, and want to estimate $λ$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cramér-Rao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends on $f$, and may be arbitrarily large. We build on the theory using \emph{smoothed} estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$-smoothed distribution. As $n \to \infty$, $r \to 0$ at an explicit rate and this converges to the Cramér-Rao bound. We (1) improve the prior work for 1-dimensional $f$ to converge for constant failure probability in addition to high probability, and (2) extend the theory to high-dimensional distributions. In the process, we prove a new bound on the norm of a high-dimensional random variable whose 1-dimensional projections are subgamma, which may be of independent interest.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
MetaNO: How to Transfer Your Knowledge on Learning Hidden Physics
Authors:
Lu Zhang,
Huaiqian You,
Tian Gao,
Mo Yu,
Chung-Hao Lee,
Yue Yu
Abstract:
Gradient-based meta-learning methods have primarily been applied to classical machine learning tasks such as image classification. Recently, PDE-solving deep learning methods, such as neural operators, are starting to make an important impact on learning and predicting the response of a complex physical system directly from observational data. Since the data acquisition in this context is commonly…
▽ More
Gradient-based meta-learning methods have primarily been applied to classical machine learning tasks such as image classification. Recently, PDE-solving deep learning methods, such as neural operators, are starting to make an important impact on learning and predicting the response of a complex physical system directly from observational data. Since the data acquisition in this context is commonly challenging and costly, the call of utilization and transfer of existing knowledge to new and unseen physical systems is even more acute. Herein, we propose a novel meta-learning approach for neural operators, which can be seen as transferring the knowledge of solution operators between governing (unknown) PDEs with varying parameter fields. Our approach is a provably universal solution operator for multiple PDE solving tasks, with a key theoretical observation that underlying parameter fields can be captured in the first layer of neural operator models, in contrast to typical final-layer transfer in existing meta-learning methods. As applications, we demonstrate the efficacy of our proposed approach on PDE-based datasets and a real-world material modeling problem, illustrating that our method can handle complex and nonlinear physical response learning tasks while greatly improving the sampling efficiency in unseen tasks.
△ Less
Submitted 3 February, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Parabolic stochastic PDEs on bounded domains with rough initial conditions: moment and correlation bounds
Authors:
David Candil,
Le Chen,
Cheuk Yin Lee
Abstract:
We consider nonlinear parabolic stochastic PDEs on a bounded Lipschitz domain driven by a Gaussian noise that is white in time and colored in space, with Dirichlet or Neumann boundary condition. We establish existence, uniqueness and moment bounds of the random field solution under measure-valued initial data $ν$. We also study the two-point correlation function of the solution and obtain explicit…
▽ More
We consider nonlinear parabolic stochastic PDEs on a bounded Lipschitz domain driven by a Gaussian noise that is white in time and colored in space, with Dirichlet or Neumann boundary condition. We establish existence, uniqueness and moment bounds of the random field solution under measure-valued initial data $ν$. We also study the two-point correlation function of the solution and obtain explicit upper and lower bounds. For $C^{1, α}$-domains with Dirichlet condition, the initial data $ν$ is not required to be a finite measure and the moment bounds can be improved under the weaker condition that the leading eigenfunction of the differential operator is integrable with respect to $|ν|$. As an application, we show that the solution is fully intermittent for sufficiently high level $λ$ of noise under the Dirichlet condition, and for all $λ> 0$ under the Neumann condition.
△ Less
Submitted 4 August, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Generalized Lindemann-Weierstrass and Gelfond-Schneider-Baker Theorems
Authors:
Suk-Geun Hwang,
Choon Ho Lee,
Ki-Bong Nam Rachel M Chaphalkar
Abstract:
We generalize Lindemann-Weierstrass theorem and Gelfond -Schneider-Baker Theorem. We find new transcendental numbers in this work. There are several methods to find transcendental numbers in the work. Recently transcendental numbers are applicable for cryptography (\cite{G}, \cite{K}, \cite{V}). Since we are able to make many tables of random numbers, the new transcendental numbers will be applica…
▽ More
We generalize Lindemann-Weierstrass theorem and Gelfond -Schneider-Baker Theorem. We find new transcendental numbers in this work. There are several methods to find transcendental numbers in the work. Recently transcendental numbers are applicable for cryptography (\cite{G}, \cite{K}, \cite{V}). Since we are able to make many tables of random numbers, the new transcendental numbers will be applicable for encryption and decryption in this work (\cite{V}, \cite{Z}).
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Ankit Pensia
Abstract:
We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $μ$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $μ$ with high probability. Prior work had obtained…
▽ More
We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $μ$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $μ$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $τ$, having an additive $\log(1/τ)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Self-expanders to the mean curvature flow based on the generalized Lawson-Osserman cone
Authors:
Chen-Kuan Lee
Abstract:
We derive the equation of self-similar solutions to mean curvature flow based on the generalized Lawson-Osserman cone and prove the existence of self-expanders by modifying the theory of equilibria in the autonomous system. In particular, those self-expanders are unique if a local assumption is given.
We derive the equation of self-similar solutions to mean curvature flow based on the generalized Lawson-Osserman cone and prove the existence of self-expanders by modifying the theory of equilibria in the autonomous system. In particular, those self-expanders are unique if a local assumption is given.
△ Less
Submitted 15 February, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
On the two-distance embedding in real Euclidean space of coherent configuration of type (2,2;3)
Authors:
Eiichi Bannai,
Etsuko Bannai,
Chin-Yen Lee,
Ziqing Xiang,
Wei-Hsuan Yu
Abstract:
Finding the maximum cardinality of a $2$-distance set in Euclidean space is a classical problem in geometry. Lisoněk in 1997 constructed a maximum $2$-distance set in $\mathbb R^8$ with $45$ points. That $2$-distance set constructed by Lisoněk has a distinguished structure of a coherent configuration of type $(2,2;3)$ and is embedded in two concentric spheres in $\mathbb R^8$. In this paper we stu…
▽ More
Finding the maximum cardinality of a $2$-distance set in Euclidean space is a classical problem in geometry. Lisoněk in 1997 constructed a maximum $2$-distance set in $\mathbb R^8$ with $45$ points. That $2$-distance set constructed by Lisoněk has a distinguished structure of a coherent configuration of type $(2,2;3)$ and is embedded in two concentric spheres in $\mathbb R^8$. In this paper we study whether there exists any other similar embedding of a coherent configuration of type $(2,2;3)$ as a $2$-distance set in $\mathbb R^n$, without assuming any restriction on the size of the set. We prove that there exists no such example other than that of Lisoněk. The key ideas of our proof are as follows: (i) study the geometry of the embedding of the coherent configuration in Euclidean spaces and to drive diophantine equations coming from this embedding. (ii) solve diophantine equations with certain additional conditions of integrality of some parameters of the combinatorial structure by using the method of auxiliary equations.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Accelerated projected gradient algorithms for sparsity constrained optimization problems
Authors:
Jan Harold Alcantara,
Ching-pei Lee
Abstract:
We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity constraint as a finite union of linear subspaces, we present two acceleration schemes with global convergence guarantees, one by same-space extrapolation and the othe…
▽ More
We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity constraint as a finite union of linear subspaces, we present two acceleration schemes with global convergence guarantees, one by same-space extrapolation and the other by subspace identification. The former fully utilizes the problem structure to greatly accelerate the optimization speed with only negligible additional cost. The latter leads to a two-stage meta-algorithm that first uses classical projected gradient iterations to identify the correct subspace containing an optimal solution, and then switches to a highly-efficient smooth optimization method in the identified subspace to attain superlinear convergence. Experiments demonstrate that the proposed accelerated algorithms are magnitudes faster than their non-accelerated counterparts as well as the state of the art.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Strain energy density as a Gaussian process and its utilization in stochastic finite element analysis: application to planar soft tissues
Authors:
Ankush Aggarwal,
Bjørn Sand Jensen,
Sanjay Pant,
Chung-Hao Lee
Abstract:
Data-based approaches are promising alternatives to the traditional analytical constitutive models for solid mechanics. Herein, we propose a Gaussian process (GP) based constitutive modeling framework, specifically focusing on planar, hyperelastic and incompressible soft tissues. The strain energy density of soft tissues is modeled as a GP, which can be regressed to experimental stress-strain data…
▽ More
Data-based approaches are promising alternatives to the traditional analytical constitutive models for solid mechanics. Herein, we propose a Gaussian process (GP) based constitutive modeling framework, specifically focusing on planar, hyperelastic and incompressible soft tissues. The strain energy density of soft tissues is modeled as a GP, which can be regressed to experimental stress-strain data obtained from biaxial experiments. Moreover, the GP model can be weakly constrained to be convex. A key advantage of a GP-based model is that, in addition to the mean value, it provides a probability density (i.e. associated uncertainty) for the strain energy density. To simulate the effect of this uncertainty, a non-intrusive stochastic finite element analysis (SFEA) framework is proposed. The proposed framework is verified against an artificial dataset based on the Gasser--Ogden--Holzapfel model and applied to a real experimental dataset of a porcine aortic valve leaflet tissue. Results show that the proposed framework can be trained with limited experimental data and fits the data better than several existing models. The SFEA framework provides a straightforward way of using the experimental data and quantifying the resulting uncertainty in simulation-based predictions.
△ Less
Submitted 22 November, 2022; v1 submitted 28 September, 2022;
originally announced October 2022.
-
A non-existence result for a nonlinear Neumann problem
Authors:
Chiun-Chang Lee
Abstract:
In this note we consider a semilinear elliptic equation in $B_R$ with the nonlinear boundary condition, where $B_R$ is a ball of radius $R$. Under certain conditions, we establish a sufficient condition on the non-existence of solutions provided that $R$ is sufficiently large. The main argument is based on applying the asymptotic analysis to the equation with respect to $R\gg1$.
In this note we consider a semilinear elliptic equation in $B_R$ with the nonlinear boundary condition, where $B_R$ is a ball of radius $R$. Under certain conditions, we establish a sufficient condition on the non-existence of solutions provided that $R$ is sufficiently large. The main argument is based on applying the asymptotic analysis to the equation with respect to $R\gg1$.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
A characterization of graphs with at most four boundary vertices
Authors:
Nick Chiem,
William Dudarov,
Chris Lee,
Sean Lee,
Kevin Liu
Abstract:
Steinerberger defined a notion of boundary for a graph and established a corresponding isoperimetric inquality. Hence, "large" graphs have more boundary vertices. In this paper, we first characterize graphs with three boundary vertices in terms of two infinite families of graphs. We then completely characterize graphs with four boundary vertices in terms of eight families of graphs, five of which…
▽ More
Steinerberger defined a notion of boundary for a graph and established a corresponding isoperimetric inquality. Hence, "large" graphs have more boundary vertices. In this paper, we first characterize graphs with three boundary vertices in terms of two infinite families of graphs. We then completely characterize graphs with four boundary vertices in terms of eight families of graphs, five of which are infinite. This parallels earlier work by Hasegawa and Saito as well as Müller, Pór, and Sereni on another notion of boundary defined by Chartrand, Erwin, Johns, and Zhang.
△ Less
Submitted 5 June, 2023; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Predict+Optimize for Packing and Covering LPs with Unknown Parameters in Constraints
Authors:
Xinyi Hu,
Jasper C. H. Lee,
Jimmy H. M. Lee
Abstract:
Predict+Optimize is a recently proposed framework which combines machine learning and constrained optimization, tackling optimization problems that contain parameters that are unknown at solving time. The goal is to predict the unknown parameters and use the estimates to solve for an estimated optimal solution to the optimization problem. However, all prior works have focused on the case where unk…
▽ More
Predict+Optimize is a recently proposed framework which combines machine learning and constrained optimization, tackling optimization problems that contain parameters that are unknown at solving time. The goal is to predict the unknown parameters and use the estimates to solve for an estimated optimal solution to the optimization problem. However, all prior works have focused on the case where unknown parameters appear only in the optimization objective and not the constraints, for the simple reason that if the constraints were not known exactly, the estimated optimal solution might not even be feasible under the true parameters. The contributions of this paper are two-fold. First, we propose a novel and practically relevant framework for the Predict+Optimize setting, but with unknown parameters in both the objective and the constraints. We introduce the notion of a correction function, and an additional penalty term in the loss function, modelling practical scenarios where an estimated optimal solution can be modified into a feasible solution after the true parameters are revealed, but at an additional cost. Second, we propose a corresponding algorithmic approach for our framework, which handles all packing and covering linear programs. Our approach is inspired by the prior work of Mandi and Guns, though with crucial modifications and re-derivations for our very different setting. Experimentation demonstrates the superior empirical performance of our method over classical approaches.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
New mixed formulation and mesh dependency of finite elements based on the consistent couple stress theory
Authors:
Theodore L. Chang,
Chin-Long Lee
Abstract:
This work presents a general finite element formulation based on a six--field variational principle that incorporates the consistent couple stress theory. A simple, efficient and local iteration free solving procedure that covers both elastic and inelastic materials is derived to minimise computation cost. With proper interpolations, membrane elements of various nodes are proposed as the examples.…
▽ More
This work presents a general finite element formulation based on a six--field variational principle that incorporates the consistent couple stress theory. A simple, efficient and local iteration free solving procedure that covers both elastic and inelastic materials is derived to minimise computation cost. With proper interpolations, membrane elements of various nodes are proposed as the examples. The implemented finite elements are used to conduct numerical experiments to investigate the performance of the in-plane drilling degrees of freedom introduced by the consistent couple stress theory. The mesh dependency issue is also studied with both elastic and inelastic materials. It is shown that the consistent couple stress theory provides an objective definition of rotation compared with the Cauchy theory but additional regularisation (or other techniques) is required to overcome mesh/size dependency in softening or fracture related problems. In the case of hardening continuum problems and/or large characteristic lengths, the proposed formulation and elements offer a more reliable approach to model structures with both translational and rotational degrees of freedom.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
On the uniqueness of linear convection--diffusion equations with integral boundary conditions
Authors:
Chiun-Chang Lee,
Masashi Mizuno,
Sang-Hyuck Moon
Abstract:
This work contributes to an understanding of the domain size's effect on the existence and uniqueness of the linear convection--diffusion equation with integral-type boundary conditions, where boundary conditions depend non-locally on unknown solutions. Generally, the uniqueness result of this type of equation is unclear. In this preliminary study, a uniqueness result is verified when the domain i…
▽ More
This work contributes to an understanding of the domain size's effect on the existence and uniqueness of the linear convection--diffusion equation with integral-type boundary conditions, where boundary conditions depend non-locally on unknown solutions. Generally, the uniqueness result of this type of equation is unclear. In this preliminary study, a uniqueness result is verified when the domain is sufficiently large or small. The main approach has an advantage of transforming the integral boundary conditions into new Dirichlet boundary conditions so that we can obtain refined estimates, and the comparison theorem can be applied to the equations. Furthermore, we show a domain such that under different boundary data, the equation in this domain can have infinitely numerous solutions or no solution.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Finite-Sample Maximum Likelihood Estimation of Location
Authors:
Shivam Gupta,
Jasper C. H. Lee,
Eric Price,
Paul Valiant
Abstract:
We consider 1-dimensional location estimation, where we estimate a parameter $λ$ from $n$ samples $λ+ η_i$, with each $η_i$ drawn i.i.d. from a known distribution $f$. For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n \to \infty$: it is asymptotically normal with variance matching the Cramér-Rao lower bound of $\frac{1}{n\mathcal{I}}$, where…
▽ More
We consider 1-dimensional location estimation, where we estimate a parameter $λ$ from $n$ samples $λ+ η_i$, with each $η_i$ drawn i.i.d. from a known distribution $f$. For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n \to \infty$: it is asymptotically normal with variance matching the Cramér-Rao lower bound of $\frac{1}{n\mathcal{I}}$, where $\mathcal{I}$ is the Fisher information of $f$. However, this bound does not hold for finite $n$, or when $f$ varies with $n$. We show for arbitrary $f$ and $n$ that one can recover a similar theory based on the Fisher information of a smoothed version of $f$, where the smoothing radius decays with $n$.
△ Less
Submitted 18 July, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.