Search | arXiv e-print repository

Progressive Query Refinement Framework for Bird's-Eye-View Semantic Segmentation from Surrounding Images

Authors: Dooseop Choi, Jungyu Kang, Taeghyun An, Kyounghwan Ahn, KyoungWook Min

Abstract: Expressing images with Multi-Resolution (MR) features has been widely adopted in many computer vision tasks. In this paper, we introduce the MR concept into Bird's-Eye-View (BEV) semantic segmentation for autonomous driving. This introduction enhances our model's ability to capture both global and local characteristics of driving scenes through our proposed residual learning. Specifically, given a… ▽ More Expressing images with Multi-Resolution (MR) features has been widely adopted in many computer vision tasks. In this paper, we introduce the MR concept into Bird's-Eye-View (BEV) semantic segmentation for autonomous driving. This introduction enhances our model's ability to capture both global and local characteristics of driving scenes through our proposed residual learning. Specifically, given a set of MR BEV query maps, the lowest resolution query map is initially updated using a View Transformation (VT) encoder. This updated query map is then upscaled and merged with a higher resolution query map to undergo further updates in a subsequent VT encoder. This process is repeated until the resolution of the updated query map reaches the target. Finally, the lowest resolution map is added to the target resolution to generate the final query map. During training, we enforce both the lowest and final query maps to align with the ground-truth BEV semantic map to help our model effectively capture the global and local characteristics. We also propose a visual feature interaction network that promotes interactions between features across images and across feature levels, thus highly contributing to the performance improvement. We evaluate our model on a large-scale real-world dataset. The experimental results show that our model outperforms the SOTA models in terms of IoU metric. Codes are available at https://github.com/d1024choi/ProgressiveQueryRefineNet △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2406.08020 [pdf, other]

Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

Authors: Kyeongjin Ahn, Sungwon Han, Sungwon Park, Jihee Kim, Sangyoon Park, Meeyoung Cha

Abstract: The increasing frequency and intensity of natural disasters demand more sophisticated approaches for rapid and precise damage assessment. To tackle this issue, researchers have developed various methods on disaster benchmark datasets from satellite imagery to aid in detecting disaster damage. However, the diverse nature of geographical landscapes and disasters makes it challenging to apply existin… ▽ More The increasing frequency and intensity of natural disasters demand more sophisticated approaches for rapid and precise damage assessment. To tackle this issue, researchers have developed various methods on disaster benchmark datasets from satellite imagery to aid in detecting disaster damage. However, the diverse nature of geographical landscapes and disasters makes it challenging to apply existing methods to regions unseen during training. We present DAVI (Disaster Assessment with VIsion foundation model), which overcomes domain disparities and detects structural damage (e.g., building) without requiring ground-truth labels of the target region. DAVI integrates task-specific knowledge from a model trained on source regions with an image segmentation foundation model to generate pseudo labels of possible damage in the target region. It then employs a two-stage refinement process, targeting both the pixel and overall image, to more accurately pinpoint changes in disaster-struck areas based on before-and-after images. Comprehensive evaluations demonstrate that DAVI achieves exceptional performance across diverse terrains (e.g., USA and Mexico) and disaster types (e.g., wildfires, hurricanes, and earthquakes). This confirms its robustness in assessing disaster impact without dependence on ground-truth labels. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures, 2 tables

arXiv:2405.18199 [pdf, ps, other]

Adam with model exponential moving average is effective for nonconvex optimization

Authors: Kwangjun Ahn, Ashok Cutkosky

Abstract: In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demonstrate that a clipped version of Adam with model EMA achieves the optimal convergence rates in various nonconvex optimization settings, both smooth an… ▽ More In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demonstrate that a clipped version of Adam with model EMA achieves the optimal convergence rates in various nonconvex optimization settings, both smooth and nonsmooth. Moreover, when the scale varies significantly across different coordinates, we demonstrate that the coordinate-wise adaptivity of Adam is provably advantageous. Notably, unlike previous analyses of Adam, our analysis crucially relies on its core elements -- momentum and discounting factors -- as well as model EMA, motivating their wide applications in practice. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Comments would be appreciated!

arXiv:2405.16002 [pdf, other]

Does SGD really happen in tiny subspaces?

Authors: Minhak Song, Kwangjun Ahn, Chulhee Yun

Abstract: Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a low-rank top eigenspace of the training loss Hessian, referred to as the dominant subspace. Given this alignment, this paper explores whether neural n… ▽ More Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a low-rank top eigenspace of the training loss Hessian, referred to as the dominant subspace. Given this alignment, this paper explores whether neural networks can be trained within the dominant subspace, which, if feasible, could lead to more efficient training methods. Our primary observation is that when the SGD update is projected onto the dominant subspace, the training loss does not decrease further. This suggests that the observed alignment between the gradient and the dominant subspace is spurious. Surprisingly, projecting out the dominant subspace proves to be just as effective as the original update, despite removing the majority of the original update component. Similar observations are made for the large learning rate regime (also known as Edge of Stability) and Sharpness-Aware Minimization. We discuss the main causes and implications of this spurious alignment, shedding light on the intricate dynamics of neural network training. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 22 pages

arXiv:2402.15546 [pdf, other]

HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding

Authors: Huijie Tang, Federico Berto, Zihan Ma, Chuanbo Hua, Kyuree Ahn, Jinkyoo Park

Abstract: Large-scale multi-agent pathfinding (MAPF) presents significant challenges in several areas. As systems grow in complexity with a multitude of autonomous agents operating simultaneously, efficient and collision-free coordination becomes paramount. Traditional algorithms often fall short in scalability, especially in intricate scenarios. Reinforcement Learning (RL) has shown potential to address th… ▽ More Large-scale multi-agent pathfinding (MAPF) presents significant challenges in several areas. As systems grow in complexity with a multitude of autonomous agents operating simultaneously, efficient and collision-free coordination becomes paramount. Traditional algorithms often fall short in scalability, especially in intricate scenarios. Reinforcement Learning (RL) has shown potential to address the intricacies of MAPF; however, it has also been shown to struggle with scalability, demanding intricate implementation, lengthy training, and often exhibiting unstable convergence, limiting its practical application. In this paper, we introduce Heuristics-Informed Multi-Agent Pathfinding (HiMAP), a novel scalable approach that employs imitation learning with heuristic guidance in a decentralized manner. We train on small-scale instances using a heuristic policy as a teacher that maps each single agent observation information to an action probability distribution. During pathfinding, we adopt several inference techniques to improve performance. With a simple training scheme and implementation, HiMAP demonstrates competitive results in terms of success rate and scalability in the field of imitation-learning-only MAPF, showing the potential of imitation-learning-only MAPF equipped with inference techniques. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted as Extended Abstract in Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024)

arXiv:2402.01567 [pdf, other]

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

Authors: Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook, Yan Dai

Abstract: Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmi… ▽ More Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective. △ Less

Submitted 30 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024

arXiv:2310.09802 [pdf]

Exploitation Business: Leveraging Information Asymmetry

Authors: Kwangseob Ahn

Abstract: This paper investigates the "Exploitation Business" model, which capitalizes on information asymmetry to exploit vulnerable populations. It focuses on businesses targeting non-experts or fraudsters who capitalize on information asymmetry to sell their products or services to desperate individuals. This phenomenon, also described as "profit-making activities based on informational exploitation," th… ▽ More This paper investigates the "Exploitation Business" model, which capitalizes on information asymmetry to exploit vulnerable populations. It focuses on businesses targeting non-experts or fraudsters who capitalize on information asymmetry to sell their products or services to desperate individuals. This phenomenon, also described as "profit-making activities based on informational exploitation," thrives on individuals' limited access to information, lack of expertise, and Fear of Missing Out (FOMO). The recent advancement of social media and the rising trend of fandom business have accelerated the proliferation of such exploitation business models. Discussions on the empowerment and exploitation of fans in the digital media era present a restructuring of relationships between fans and media creators, highlighting the necessity of not overlooking the exploitation of fans' free labor. This paper analyzes the various facets and impacts of exploitation business models, enriched by real-world examples from sectors like cryptocurrency and GenAI, thereby discussing their social, economic, and ethical implications. Moreover, through theoretical backgrounds and research, it explores similar themes like existing exploitation theories, commercial exploitation, and financial exploitation to gain a deeper understanding of the "Exploitation Business" subject. △ Less

Submitted 16 June, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Comments: Exploitation Business, Information Asymmetry, Digital Media, Social Media, Fandom Business, Cognitive Bias,Behavioral Economics, Ethical Implications, Cryptocurrency, Generative AI

arXiv:2310.01082 [pdf, other]

Linear attention is (maybe) all you need (to understand transformer optimization)

Authors: Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Abstract: Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J.~von Oswald et al.~(ICML 2023), and… ▽ More Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J.~von Oswald et al.~(ICML 2023), and K.~Ahn et al.~(NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of Transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized Transformer model could actually be a valuable, realistic abstraction for understanding Transformer optimization. △ Less

Submitted 13 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Published at ICLR 2024

arXiv:2309.09390 [pdf, other]

Augmenting text for spoken language understanding with Large Language Models

Authors: Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer

Abstract: Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcrip… ▽ More Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways to generate speech representations for unpaired text. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we consider the setting when unpaired text is not available in existing textual corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired text for existing and new domains. Experiments show that examples and words that co-occur with intents can be used to generate unpaired text with Llama 2.0. Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1.4% and 2.6% absolute for existing and new domains respectively. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP 2024

arXiv:2306.13853 [pdf, other]

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Authors: Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan

Abstract: Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it h… ▽ More Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it has been argued that gradient descent (GD) induces an implicit $\ell_2$-norm regularization in regression and classification problems. However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization. To address this, we present a unified approach using mirror descent (MD), a notable generalization of GD, to control implicit regularization in both regression and classification settings. More specifically, we show that MD with the general class of homogeneous potential functions converges in direction to a generalized maximum-margin solution for linear classification problems, thereby answering a long-standing question in the classification setting. Further, we show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions. Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn have different generalization performances. △ Less

Submitted 11 January, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2205.12808

arXiv:2306.01914 [pdf, other]

Smooth Model Predictive Control with Applications to Statistical Learning

Authors: Kwangjun Ahn, Daniel Pfrommer, Jack Umenberger, Tobia Marcucci, Zak Mhammedi, Ali Jadbabaie

Abstract: Statistical learning theory and high dimensional statistics have had a tremendous impact on Machine Learning theory and have impacted a variety of domains including systems and control theory. Over the past few years we have witnessed a variety of applications of such theoretical tools to help answer questions such as: how many state-action pairs are needed to learn a static control policy to a gi… ▽ More Statistical learning theory and high dimensional statistics have had a tremendous impact on Machine Learning theory and have impacted a variety of domains including systems and control theory. Over the past few years we have witnessed a variety of applications of such theoretical tools to help answer questions such as: how many state-action pairs are needed to learn a static control policy to a given accuracy? Recent results have shown that continuously differentiable and stabilizing control policies can be well-approximated using neural networks with hard guarantees on performance, yet often even the simplest constrained control problems are not smooth. To address this void, in this paper we study smooth approximations of linear Model Predictive Control (MPC) policies, in which hard constraints are replaced by barrier functions, a.k.a. barrier MPC. In particular, we show that barrier MPC inherits the exponential stability properties of the original non-smooth MPC policy. Using a careful analysis of the proposed barrier MPC, we show that its smoothness constant can be carefully controlled, thereby paving the way for new sample complexity results for approximating MPC policies from sampled state-action pairs. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 15 pages, 1 figure

arXiv:2306.00297 [pdf, other]

Transformers learn to implement preconditioned gradient descent for in-context learning

Authors: Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, Suvrit Sra

Abstract: Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent. Going beyond the question of expressivity, we ask: Can transformers learn to implement such algorithms by training over random problem instance… ▽ More Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent. Going beyond the question of expressivity, we ask: Can transformers learn to implement such algorithms by training over random problem instances? To our knowledge, we make the first theoretical progress on this question via an analysis of the loss landscape for linear transformers trained over random instances of linear regression. For a single attention layer, we prove the global minimum of the training objective implements a single iteration of preconditioned gradient descent. Notably, the preconditioning matrix not only adapts to the input distribution but also to the variance induced by data inadequacy. For a transformer with $L$ attention layers, we prove certain critical points of the training objective implement $L$ iterations of preconditioned gradient descent. Our results call for future theoretical studies on learning algorithms by training transformers. △ Less

Submitted 9 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: Improved presentation and added new results for the nonlinear activation case; 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Journal ref: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2305.15659 [pdf, other]

How to escape sharp minima with random perturbations

Authors: Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

Abstract: Modern machine learning applications have witnessed the remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this design choice, we undertake a formal study that (i) formulates the notion of flat minima, and (ii) studies the complexity of finding them. Specifically, we adopt the trace of the Hessian of the cost function as a measure of flatness, and use… ▽ More Modern machine learning applications have witnessed the remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this design choice, we undertake a formal study that (i) formulates the notion of flat minima, and (ii) studies the complexity of finding them. Specifically, we adopt the trace of the Hessian of the cost function as a measure of flatness, and use it to formally define the notion of approximate flat minima. Under this notion, we then analyze algorithms that find approximate flat minima efficiently. For general cost functions, we discuss a gradient-based algorithm that finds an approximate flat local minimum efficiently. The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice. △ Less

Submitted 25 May, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted at ICML 2024

arXiv:2305.15287 [pdf, other]

The Crucial Role of Normalization in Sharpness-Aware Minimization

Authors: Yan Dai, Kwangjun Ahn, Suvrit Sra

Abstract: Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically an… ▽ More Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically and empirically study the effect of normalization in SAM for both convex and non-convex functions, revealing two key roles played by normalization: i) it helps in stabilizing the algorithm; and ii) it enables the algorithm to drift along a continuum (manifold) of minima -- a property identified by recent theoretical works that is the key to better performance. We further argue that these two properties of normalization make SAM robust against the choice of hyper-parameters, supporting the practicality of SAM. Our conclusions are backed by various experiments. △ Less

Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 30 pages, Published in 37th Neural Information Processing Systems (NeurIPS 2023)

arXiv:2212.07469 [pdf, other]

Learning threshold neurons via the "edge of stability"

Authors: Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang

Abstract: Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learni… ▽ More Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks. △ Less

Submitted 19 October, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: 31 pages, 13 figures, Published at NeurIPS 2023

arXiv:2211.12137 [pdf, ps, other]

Implicit Inverse Force Identification Method of Acoustic Liquid-structure Interaction Finite Element Model

Authors: Seungin Oh, Chang-uk Ahn, Kwanghyun Ahn, Jin-Gyun Kim

Abstract: The two-field vibroacoustic finite-element (FE) model requires a relatively large number of degrees of freedom compared to the monophysics model, and the conventional force identification method for structural vibration can be adjusted for multiphysics problems. In this study, an effective inverse force identification method for an FE vibroacoustic interaction model of an interior fluid-structure… ▽ More The two-field vibroacoustic finite-element (FE) model requires a relatively large number of degrees of freedom compared to the monophysics model, and the conventional force identification method for structural vibration can be adjusted for multiphysics problems. In this study, an effective inverse force identification method for an FE vibroacoustic interaction model of an interior fluid-structure system was proposed. The method consists of: (1) implicit inverse force identification based on the Newmark-$β$ time integration algorithm for stability and efficiency, (2) second-order ordinary differential formulation by avoiding the state-space form causing large degrees of freedom, (3) projection-based multiphysics reduced-order modeling for further reduction of degrees of freedom, and (4) Tikhonov regularization to alleviate the measurement noise. The proposed method can accurately identify the unmeasured applied forces on the in situ application and concurrently reconstruct the response fields. The accuracy, stability, and computational efficiency of the proposed method were evaluated using numerical models and an experimental testbed. A comparative study with the augmented Kalman filter method was performed to evaluate its relative performance. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 31 Pages, 20 Figures, 5 Tables

arXiv:2210.09206 [pdf, other]

Model Predictive Control via On-Policy Imitation Learning

Authors: Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie

Abstract: In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying… ▽ More In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying samples from an expert. Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system. Behavior cloning, however, is a method that is known to be data inefficient and suffer from distribution shifts. As an alternative, we develop a variant of the forward training algorithm which is an on-policy imitation learning method proposed by Ross et al. (2010). Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance. We validate our results through simulations and show that the forward training algorithm is indeed superior to behavior cloning when applied to MPC. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 26 pages

arXiv:2210.04122 [pdf, other]

doi 10.3847/1538-4357/ac927e

Inferring Line-of-Sight Velocities and Doppler Widths from Stokes Profiles of GST/NIRIS Using Stacked Deep Neural Networks

Authors: Haodi Jiang, Qin Li, Yan Xu, Wynne Hsu, Kwangsu Ahn, Wenda Cao, Jason T. L. Wang, Haimin Wang

Abstract: Obtaining high-quality magnetic and velocity fields through Stokes inversion is crucial in solar physics. In this paper, we present a new deep learning method, named Stacked Deep Neural Networks (SDNN), for inferring line-of-sight (LOS) velocities and Doppler widths from Stokes profiles collected by the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6 m Goode Solar Telescope (GST) at th… ▽ More Obtaining high-quality magnetic and velocity fields through Stokes inversion is crucial in solar physics. In this paper, we present a new deep learning method, named Stacked Deep Neural Networks (SDNN), for inferring line-of-sight (LOS) velocities and Doppler widths from Stokes profiles collected by the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6 m Goode Solar Telescope (GST) at the Big Bear Solar Observatory (BBSO). The training data of SDNN is prepared by a Milne-Eddington (ME) inversion code used by BBSO. We quantitatively assess SDNN, comparing its inversion results with those obtained by the ME inversion code and related machine learning (ML) algorithms such as multiple support vector regression, multilayer perceptrons and a pixel-level convolutional neural network. Major findings from our experimental study are summarized as follows. First, the SDNN-inferred LOS velocities are highly correlated to the ME-calculated ones with the Pearson product-moment correlation coefficient being close to 0.9 on average. Second, SDNN is faster, while producing smoother and cleaner LOS velocity and Doppler width maps, than the ME inversion code. Third, the maps produced by SDNN are closer to ME's maps than those from the related ML algorithms, demonstrating the better learning capability of SDNN than the ML algorithms. Finally, comparison between the inversion results of ME and SDNN based on GST/NIRIS and those from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory in flare-prolific active region NOAA 12673 is presented. We also discuss extensions of SDNN for inferring vector magnetic fields with empirical evaluation. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 16 pages, 8 figures

Journal ref: The Astrophysical Journal, 2022

arXiv:2207.13853 [pdf, other]

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

Authors: Youngjae Min, Kwangjun Ahn, Navid Azizan

Abstract: While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we inv… ▽ More While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: IEEE Conference on Decision and Control, 2022

arXiv:2205.12808 [pdf, other]

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Authors: Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan

Abstract: Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to characterize the implicit bias of the optimization algorithms used, such as gradient descent (GD), and the structural properties of their preferred solutions. Thi… ▽ More Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to characterize the implicit bias of the optimization algorithms used, such as gradient descent (GD), and the structural properties of their preferred solutions. This paper answers an open question in this literature: For the classification setting, what solution does mirror descent (MD) converge to? Specifically, motivated by its efficient implementation, we consider the family of mirror descent algorithms with potential function chosen as the $p$-th power of the $\ell_p$-norm, which is an important generalization of GD. We call this algorithm $p$-$\textsf{GD}$. For this family, we characterize the solutions it obtains and show that it converges in direction to a generalized maximum-margin solution with respect to the $\ell_p$-norm for linearly separable classification. While the MD update rule is in general expensive to compute and perhaps not suitable for deep learning, $p$-$\textsf{GD}$ is fully parallelizable in the same manner as SGD and can be used to train deep neural networks with virtually no additional computational overhead. Using comprehensive experiments with both linear and deep neural network models, we demonstrate that $p$-$\textsf{GD}$ can noticeably affect the structure and the generalization performance of the learned models. △ Less

Submitted 29 September, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2204.01050 [pdf, ps, other]

Understanding the unstable convergence of gradient descent

Authors: Kwangjun Ahn, Jingzhao Zhang, Suvrit Sra

Abstract: Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$. However, many works have observed that in machine learning applications step sizes often do not fulfill this condition, yet (stochastic) gradient descent still converges, albeit in an unstable manner. We investigate this unstable convergence phenomenon from fir… ▽ More Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$. However, many works have observed that in machine learning applications step sizes often do not fulfill this condition, yet (stochastic) gradient descent still converges, albeit in an unstable manner. We investigate this unstable convergence phenomenon from first principles, and discuss key causes behind it. We also identify its main characteristics, and how they interrelate based on both theory and experiments, offering a principled view toward understanding the phenomenon. △ Less

Submitted 9 June, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

Comments: Accepted to the 39th International Conference on Machine Learning (ICML 2022), Baltimore, Maryland, USA. Version 2 improves writing and presentation, adds discussion regarding concurrent works

arXiv:2202.04598 [pdf, ps, other]

Reproducibility in Optimization: Theoretical Framework and Limits

Authors: Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

Abstract: We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions… ▽ More We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions and establish tight bounds on the limits of reproducibility in each setting. Our analysis reveals a fundamental trade-off between computation and reproducibility: more computation is necessary (and sufficient) for better reproducibility. △ Less

Submitted 4 December, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: 45 Pages; Accepted to NeurIPS 2022

arXiv:2202.01675 [pdf]

Environmental and Safety Impacts of Vehicle-to-Everything Enabled Applications: A Review of State-of-the-Art Studies

Authors: Jianhe Du, Kyoungho Ahn, Mohamed Farag, Hesham Rakha

Abstract: With the rapid development of communication technology, connected vehicles (CV) have the potential, through the sharing of data, to enhance vehicle safety and reduce vehicle energy consumption and emissions. Numerous research efforts are quantifying the impacts of CV applications, assuming instant and accurate communication among vehicles, devices, pedestrians, infrastructure, the network, the clo… ▽ More With the rapid development of communication technology, connected vehicles (CV) have the potential, through the sharing of data, to enhance vehicle safety and reduce vehicle energy consumption and emissions. Numerous research efforts are quantifying the impacts of CV applications, assuming instant and accurate communication among vehicles, devices, pedestrians, infrastructure, the network, the cloud, and the grid, collectively known as V2X (vehicle-to-everything). The use of cellular vehicle-to-everything (C-V2X), to share data is emerging as an efficient means to achieve this objective. C-V2X releases 14 and 15 utilize the 4G LTE technology and release 16 utilizes the new 5G new radio (NR) technology. C-V2X can function without network infrastructure coverage and has a better communication range, improved latency, and greater data rates compared to older technologies. Such highly efficient interchange of information among all participating parts in a CV environment will not only provide timely data to enhance the capacity of the transportation system but can also be used to develop applications that enhance vehicle safety and minimize negative environmental impacts. However, before the full benefits of CV can be achieved, there is a need to thoroughly investigate the effectiveness, strengths, and weaknesses of different CV applications, the communication protocols, the varied results with different CV market penetration rates (MPRs), the interaction of CVs and human driven vehicles, the integration of multiple applications, and the errors and latencies associated with data communication. This paper reviews existing literature on the environmental, mobility and safety impacts of CV applications, identifies the gaps in our current research of CVs and recommends future research directions. △ Less

Submitted 7 December, 2021; originally announced February 2022.

Comments: This paper is a literature review of V2X-enabled applications

arXiv:2201.13419 [pdf, ps, other]

Agnostic Learnability of Halfspaces via Logistic Loss

Authors: Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

Abstract: We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tildeΩ(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where… ▽ More We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tildeΩ(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where $\textrm{OPT}$ denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $Ω(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021). On the other hand, we also show that if we impose a radial-Lipschitzness condition in addition to well-behaved-ness on the distribution, logistic regression on a ball of bounded radius reaches $\tilde{O}(\textrm{OPT})$ misclassification risk. Our techniques also show for any well-behaved distribution, regardless of radial Lipschitzness, we can overcome the $Ω(\sqrt{\textrm{OPT}})$ lower bound for logistic loss simply at the cost of one additional convex optimization step involving the hinge loss and attain $\tilde{O}(\textrm{OPT})$ misclassification risk. This two-step convex optimization algorithm is simpler than previous methods obtaining this guarantee, all of which require solving $O(\log(1/\textrm{OPT}))$ minimization problems. △ Less

Submitted 31 January, 2022; originally announced January 2022.

arXiv:2104.09336 [pdf]

Multi-objective Eco-Routing Model Development and Evaluation for Battery Electric Vehicles

Authors: Kyoungho Ahn, Youssef Bichiou, Mohamed Farag, Hesham A. Rakha

Abstract: This paper develops and investigates the impacts of multi-objective Nash optimum (user equilibrium) traffic assignment on a large-scale network for battery electric vehicles (BEVs) and internal combustion engine vehicles (ICEVs) in a microscopic traffic simulation environment. Eco-routing is a technique that finds the most energy efficient route. ICEV and BEV energy consumption patterns are signif… ▽ More This paper develops and investigates the impacts of multi-objective Nash optimum (user equilibrium) traffic assignment on a large-scale network for battery electric vehicles (BEVs) and internal combustion engine vehicles (ICEVs) in a microscopic traffic simulation environment. Eco-routing is a technique that finds the most energy efficient route. ICEV and BEV energy consumption patterns are significantly different with regard to their sensitivity to driving cycles. Unlike ICEVs, BEVs are more energy efficient on low-speed arterial trips compared to highway trips. Different energy consumption patterns require different eco-routing strategies for ICEVs and BEVs. This study found that eco-routing could reduce energy consumption for BEVs but also significantly increases their average travel time. The simulation study found that multi-objective routing could reduce the energy consumption of BEVs by 13.5, 14.2, 12.9, and 10.7 percent, as well as the fuel consumption of ICEVs by 0.1, 4.3, 3.4, and 10.6 percent for "not congested", "slightly congested", "moderately congested", and "highly congested" conditions, respectively. The study also found that multi-objective user equilibrium routing reduced the average vehicle travel time by up to 10.1% compared to the standard user equilibrium traffic assignment for the highly congested conditions, producing a solution closer to the system optimum traffic assignment. The results indicate that the multi-objective eco-routing can effectively reduce fuel/energy consumption with minimum impacts on travel times for both BEVs and ICEVs. △ Less

Submitted 10 August, 2020; originally announced April 2021.

Comments: Paper submitted to Transportation Research Board Annual Meeting

arXiv:2102.00937 [pdf, other]

Riemannian Perspective on Matrix Factorization

Authors: Kwangjun Ahn, Felipe Suarez

Abstract: We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry. Based on an optimization formulation over a Grassmannian manifold, we characterize the landscape based on the notion of principal angles between subspaces. For the fully observed case, our results show that there is a region in which the cost is geodesically convex, and outside of which all critical… ▽ More We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry. Based on an optimization formulation over a Grassmannian manifold, we characterize the landscape based on the notion of principal angles between subspaces. For the fully observed case, our results show that there is a region in which the cost is geodesically convex, and outside of which all critical points are strictly saddle. We empirically study the partially observed case based on our findings. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: 23 pages, 6 figures. Comments would be appreciated!

arXiv:2012.12810 [pdf, ps, other]

Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm

Authors: Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, Philippe Rigollet

Abstract: Conventional wisdom in the sampling literature, backed by a popular diffusion scaling limit, suggests that the mixing time of the Metropolis-Adjusted Langevin Algorithm (MALA) scales as $O(d^{1/3})$, where $d$ is the dimension. However, the diffusion scaling limit requires stringent assumptions on the target distribution and is asymptotic in nature. In contrast, the best known non-asymptotic mixin… ▽ More Conventional wisdom in the sampling literature, backed by a popular diffusion scaling limit, suggests that the mixing time of the Metropolis-Adjusted Langevin Algorithm (MALA) scales as $O(d^{1/3})$, where $d$ is the dimension. However, the diffusion scaling limit requires stringent assumptions on the target distribution and is asymptotic in nature. In contrast, the best known non-asymptotic mixing time bound for MALA on the class of log-smooth and strongly log-concave distributions is $O(d)$. In this work, we establish that the mixing time of MALA on this class of target distributions is $\widetildeΘ(d^{1/2})$ under a warm start. Our upper bound proof introduces a new technique based on a projection characterization of the Metropolis adjustment which reduces the study of MALA to the well-studied discretization analysis of the Langevin SDE and bypasses direct computation of the acceptance probability. △ Less

Submitted 23 December, 2020; originally announced December 2020.

Comments: 41 pages

arXiv:2010.16212 [pdf, other]

Efficient constrained sampling via the mirror-Langevin algorithm

Authors: Kwangjun Ahn, Sinho Chewi

Abstract: We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence. Our analysis uses relative convexity/smoothness and self-concordance, ideas which originated in convex optimization, together with a new result in optimal transport that generalizes the displacement convexity of the entropy. Unlike prior works, our result both (1) requires much weaker assump… ▽ More We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence. Our analysis uses relative convexity/smoothness and self-concordance, ideas which originated in convex optimization, together with a new result in optimal transport that generalizes the displacement convexity of the entropy. Unlike prior works, our result both (1) requires much weaker assumptions on the mirror map and the target distribution, and (2) has vanishing bias as the step size tends to zero. In particular, for the task of sampling from a log-concave distribution supported on a compact set, our theoretical results are significantly better than the existing guarantees. △ Less

Submitted 25 October, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: 26 pages, 4 figures

arXiv:2009.12072 [pdf, other]

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

Authors: Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, Wangmeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou , et al. (51 additional authors not shown)

Abstract: This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is to attract more attention to realistic image degradation for the SR task, wh… ▽ More This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is to attract more attention to realistic image degradation for the SR task, which is much more complicated and challenging, and contributes to real-world image super-resolution applications. 452 participants were registered for three tracks in total, and 24 teams submitted their results. They gauge the state-of-the-art approaches for real image SR in terms of PSNR and SSIM. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Journal ref: European Conference on Computer Vision Workshops, 2020

arXiv:2008.03556 [pdf, ps, other]

A simpler strong refutation of random $k$-XOR

Authors: Kwangjun Ahn

Abstract: Strong refutation of random CSPs is a fundamental question in theoretical computer science that has received particular attention due to the long-standing gap between the information-theoretic limit and the computational limit. This gap is recently bridged by Raghavendra, Rao and Schramm where they study sub-exponential algorithms for the regime between the two limits. In this work, we take a simp… ▽ More Strong refutation of random CSPs is a fundamental question in theoretical computer science that has received particular attention due to the long-standing gap between the information-theoretic limit and the computational limit. This gap is recently bridged by Raghavendra, Rao and Schramm where they study sub-exponential algorithms for the regime between the two limits. In this work, we take a simpler approach to their algorithm and analysis. △ Less

Submitted 8 August, 2020; originally announced August 2020.

Comments: 16 pages; presented at International Conference on Randomization and Computation (RANDOM) 2020

arXiv:2006.14352 [pdf, other]

doi 10.1109/ACCESS.2020.3009748

HARMer: Cyber-attacks Automation and Evaluation

Authors: Simon Yusuf Enoch, Zhibin Huang, Chun Yong Moon, Donghwan Lee, Myung Kil Ahn, Dong Seong Kim

Abstract: With the increasing growth of cyber-attack incidences, it is important to develop innovative and effective techniques to assess and defend networked systems against cyber attacks. One of the well-known techniques for this is performing penetration testing which is carried by a group of security professionals (i.e, red team). Penetration testing is also known to be effective to find existing and ne… ▽ More With the increasing growth of cyber-attack incidences, it is important to develop innovative and effective techniques to assess and defend networked systems against cyber attacks. One of the well-known techniques for this is performing penetration testing which is carried by a group of security professionals (i.e, red team). Penetration testing is also known to be effective to find existing and new vulnerabilities, however, the quality of security assessment can be depending on the quality of the red team members and their time and devotion to the penetration testing. In this paper, we propose a novel automation framework for cyber-attacks generation named `HARMer' to address the challenges with respect to manual attack execution by the red team. Our novel proposed framework, design, and implementation is based on a scalable graphical security model called Hierarchical Attack Representation Model (HARM). (1) We propose the requirements and the key phases for the automation framework. (2) We propose security metrics-based attack planning strategies along with their algorithms. (3) We conduct experiments in a real enterprise network and Amazon Web Services. The results show how the different phases of the framework interact to model the attackers' operations. This framework will allow security administrators to automatically assess the impact of various threats and attacks in an automated manner. △ Less

Submitted 17 July, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: 19 pages, journal

Journal ref: IEEE Access, 8, 129397-129414 (2020)

arXiv:2005.08304 [pdf, ps, other]

Understanding Nesterov's Acceleration via Proximal Point Method

Authors: Kwangjun Ahn, Suvrit Sra

Abstract: The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms. In this work, we use the PPM method to provide conceptually simple derivations along with convergence analyses of different versions of Nesterov's accelerated gradient method (AGM). The key observation is that AGM is a simple approximation of PPM, wh… ▽ More The proximal point method (PPM) is a fundamental method in optimization that is often used as a building block for designing optimization algorithms. In this work, we use the PPM method to provide conceptually simple derivations along with convergence analyses of different versions of Nesterov's accelerated gradient method (AGM). The key observation is that AGM is a simple approximation of PPM, which results in an elementary derivation of the update equations and stepsizes of AGM. This view also leads to a transparent and conceptually simple analysis of AGM's convergence by using the analysis of PPM. The derivations also naturally extend to the strongly convex case. Ultimately, the results presented in this paper are of both didactic and conceptual value; they unify and explain existing variants of AGM while motivating other accelerated methods for practically relevant settings. △ Less

Submitted 2 June, 2022; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: 14 pages; Presented at SIAM Symposium on Simplicity in Algorithms (SOSA22), January 10 - 11, 2022

arXiv:2004.08657 [pdf, ps, other]

On Tight Convergence Rates of Without-replacement SGD

Authors: Kwangjun Ahn, Suvrit Sra

Abstract: For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by $n$ the number of components in the cost and $K$ the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on $n$ and $K$ than the baseline rate of $O(1/(nK))$ for SGD. However, ther… ▽ More For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by $n$ the number of components in the cost and $K$ the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on $n$ and $K$ than the baseline rate of $O(1/(nK))$ for SGD. However, there are two main limitations shared among those works: the rates have extra poly-logarithmic factors on $nK$, and denoting by $κ$ the condition number of the problem, the rates hold after $κ^c\log(nK)$ epochs for some $c>0$. In this work, we overcome these limitations by analyzing step sizes that vary across epochs. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: 12 pages

arXiv:2004.04459 [pdf, ps, other]

Fast frequency discrimination and phoneme recognition using a biomimetic membrane coupled to a neural network

Authors: Woo Seok Lee, Hyunjae Kim, Andrew N. Cleland, Kang-Hun Ahn

Abstract: In the human ear, the basilar membrane plays a central role in sound recognition. When excited by sound, this membrane responds with a frequency-dependent displacement pattern that is detected and identified by the auditory hair cells combined with the human neural system. Inspired by this structure, we designed and fabricated an artificial membrane that produces a spatial displacement pattern in… ▽ More In the human ear, the basilar membrane plays a central role in sound recognition. When excited by sound, this membrane responds with a frequency-dependent displacement pattern that is detected and identified by the auditory hair cells combined with the human neural system. Inspired by this structure, we designed and fabricated an artificial membrane that produces a spatial displacement pattern in response to an audible signal, which we used to train a convolutional neural network (CNN). When trained with single frequency tones, this system can unambiguously distinguish tones closely spaced in frequency. When instead trained to recognize spoken vowels, this system outperforms existing methods for phoneme recognition, including the discrete Fourier transform (DFT), zoom FFT and chirp z-transform, especially when tested in short time windows. This sound recognition scheme therefore promises significant benefits in fast and accurate sound identification compared to existing methods. △ Less

Submitted 9 April, 2020; originally announced April 2020.

Comments: 7 pages, 4 figures

arXiv:1812.02023 [pdf, ps, other]

Correlation Clustering in Data Streams

Authors: Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth

Abstract: Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consis… ▽ More Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on $n$ nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, $O(n\cdot \ \mbox{polylog}~n)$-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the "quality" of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in $O(n\cdot \mbox{polylog}~n)$-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. △ Less

Submitted 5 December, 2018; originally announced December 2018.

arXiv:1809.01822 [pdf, other]

Driving Experience Transfer Method for End-to-End Control of Self-Driving Cars

Authors: Dooseop Choi, Taeg-Hyun An, Kyounghwan Ahn, Jeongdan Choi

Abstract: In this paper, we present a transfer learning method for the end-to-end control of self-driving cars, which enables a convolutional neural network (CNN) trained on a source domain to be utilized for the same task in a different target domain. A conventional CNN for the end-to-end control is designed to map a single front-facing camera image to a steering command. To enable the transfer learning, w… ▽ More In this paper, we present a transfer learning method for the end-to-end control of self-driving cars, which enables a convolutional neural network (CNN) trained on a source domain to be utilized for the same task in a different target domain. A conventional CNN for the end-to-end control is designed to map a single front-facing camera image to a steering command. To enable the transfer learning, we let the CNN produce not only a steering command but also a lane departure level (LDL) by adding a new task module, which takes the output of the last convolutional layer as input. The CNN trained on the source domain, called source network, is then utilized to train another task module called target network, which also takes the output of the last convolutional layer of the source network and is trained to produce a steering command for the target domain. The steering commands from the source and target network are finally merged according to the LDL and the merged command is utilized for controlling a car in the target domain. To demonstrate the effectiveness of the proposed method, we utilized two simulators, TORCS and GTAV, for the source and the target domains, respectively. Experimental results show that the proposed method outperforms other baseline methods in terms of stable and safe control of cars. △ Less

Submitted 7 September, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

arXiv:1808.10086 [pdf, other]

Artifacts Detection and Error Block Analysis from Broadcasted Videos

Authors: Md Mehedi Hasan, Tasneem Rahman, Kiok Ahn, Oksam Chae

Abstract: With the advancement of IPTV and HDTV technology, previous subtle errors in videos are now becoming more prominent because of the structure oriented and compression based artifacts. In this paper, we focus towards the development of a real-time video quality check system. Light weighted edge gradient magnitude information is incorporated to acquire the statistical information and the distorted fra… ▽ More With the advancement of IPTV and HDTV technology, previous subtle errors in videos are now becoming more prominent because of the structure oriented and compression based artifacts. In this paper, we focus towards the development of a real-time video quality check system. Light weighted edge gradient magnitude information is incorporated to acquire the statistical information and the distorted frames are then estimated based on the characteristics of their surrounding frames. Then we apply the prominent texture patterns to classify them in different block errors and analyze them not only in video error detection application but also in error concealment, restoration and retrieval. Finally, evaluating the performance through experiments on prominent datasets and broadcasted videos show that the proposed algorithm is very much efficient to detect errors for video broadcast and surveillance applications in terms of computation time and analysis of distorted frames. △ Less

Submitted 29 August, 2018; originally announced August 2018.

arXiv:1805.08956 [pdf, ps, other]

doi 10.1109/JSTSP.2018.2837638

Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

Authors: Kwangjun Ahn, Kangwook Lee, Changho Suh

Abstract: Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there are many other applications in which only \emph{multi}-way similarity measures are available. This motivates us to explore the multi-way measurement setting. In… ▽ More Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there are many other applications in which only \emph{multi}-way similarity measures are available. This motivates us to explore the multi-way measurement setting. In this work, we develop two algorithms intended for such setting: Hypergraph Spectral Clustering (HSC) and Hypergraph Spectral Clustering with Local Refinement (HSCLR). Our main contribution lies in performance analysis of the poly-time algorithms under a random hypergraph model, which we name the weighted stochastic block model, in which objects and multi-way measures are modeled as nodes and weights of hyperedges, respectively. Denoting by $n$ the number of nodes, our analysis reveals the following: (1) HSC outputs a partition which is better than a random guess if the sum of edge weights (to be explained later) is $Ω(n)$; (2) HSC outputs a partition which coincides with the hidden partition except for a vanishing fraction of nodes if the sum of edge weights is $ω(n)$; and (3) HSCLR exactly recovers the hidden partition if the sum of edge weights is on the order of $n \log n$. Our results improve upon the state of the arts recently established under the model and they firstly settle the order-wise optimal results for the binary edge weight case. Moreover, we show that our results lead to efficient sketching algorithms for subspace clustering, a computer vision application. Lastly, we show that HSCLR achieves the information-theoretic limits for a special yet practically relevant model, thereby showing no computational barrier for the case. △ Less

Submitted 23 May, 2018; originally announced May 2018.

Comments: 16 pages; 3 figures

Journal ref: October 2018 special issue on "Information-Theoretic Methods in Data Acquisition, Analysis, and Processing" of the IEEE Journal of Selected Topics in Signal Processing

arXiv:1712.06340 [pdf, other]

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

Authors: Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn

Abstract: Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data. W… ▽ More Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training. Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types. △ Less

Submitted 18 December, 2017; originally announced December 2017.

arXiv:1710.05117 [pdf, other]

Computing the maximum matching width is NP-hard

Authors: Kwangjun Ahn, Jisu Jeong

Abstract: The maximum matching width is a graph width parameter that is defined on a branch-decomposition over the vertex set of a graph. In this short paper, we prove that the problem of computing the maximum matching width is NP-hard. The maximum matching width is a graph width parameter that is defined on a branch-decomposition over the vertex set of a graph. In this short paper, we prove that the problem of computing the maximum matching width is NP-hard. △ Less

Submitted 13 October, 2017; originally announced October 2017.

Comments: 5 pages; 1 figure

arXiv:1709.03670 [pdf, other]

Community Recovery in Hypergraphs

Authors: Kwangjun Ahn, Kangwook Lee, Changho Suh

Abstract: Community recovery is a central problem that arises in a wide variety of applications such as network clustering, motion segmentation, face clustering and protein complex detection. The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points. While most of the prior w… ▽ More Community recovery is a central problem that arises in a wide variety of applications such as network clustering, motion segmentation, face clustering and protein complex detection. The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points. While most of the prior works focus on a setting in which the number of data points involved in a measurement is two, this work explores a generalized setting in which the number can be more than two. Motivated by applications particularly in machine learning and channel coding, we consider two types of measurements: (1) homogeneity measurement which indicates whether or not the associated data points belong to the same community; (2) parity measurement which denotes the modulo-2 sum of the values of the data points. Such measurements are possibly corrupted by Bernoulli noise. We characterize the fundamental limits on the number of measurements required to reconstruct the communities for the considered models. △ Less

Submitted 11 September, 2017; originally announced September 2017.

Comments: 25 pages, 7 figures. Submitted to IEEE Transacations on Information Theory

arXiv:1707.07872 [pdf, ps, other]

An Executable Specification of Typing Rules for Extensible Records based on Row Polymorphism

Authors: Ki Yung Ahn

Abstract: Type inference is an application domain that is a natural fit for logic programming (LP). LP systems natively support unification, which serves as a basic building block of typical type inference algorithms. In particular, polymorphic type inference in the Hindley--Milner type system (HM) can be succinctly specified and executed in Prolog. In our previous work, we have demonstrated that more advan… ▽ More Type inference is an application domain that is a natural fit for logic programming (LP). LP systems natively support unification, which serves as a basic building block of typical type inference algorithms. In particular, polymorphic type inference in the Hindley--Milner type system (HM) can be succinctly specified and executed in Prolog. In our previous work, we have demonstrated that more advanced features of parametric polymorphism beyond HM, such as type-constructor polymorphism and kind polymorphism, can be similarly specified in Prolog. Here, we demonstrate a specification for records, which is one of the most widely supported compound data structures in real-world programming languages, and discuss the advantages and limitations of Prolog as a specification language for type systems. Record types are specified as order-irrelevant collections of named fields mapped to their corresponding types. In addition, an open-ended collection is used to support row polymorphism for record types to be extensible. △ Less

Submitted 11 September, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

ACM Class: D.3.3; D.1.6

arXiv:1705.10908 [pdf, other]

Generating Witness of Non-Bisimilarity for the pi-Calculus

Authors: Ki Yung Ahn, Ross Horne, Alwen Tiu

Abstract: In the logic programming paradigm, it is difficult to develop an elegant solution for generating distinguishing formulae that witness the failure of open-bisimilarity between two pi-calculus processes; this was unexpected because the semantics of the pi-calculus and open bisimulation have already been elegantly specified in higher-order logic programming systems. Our solution using Haskell defines… ▽ More In the logic programming paradigm, it is difficult to develop an elegant solution for generating distinguishing formulae that witness the failure of open-bisimilarity between two pi-calculus processes; this was unexpected because the semantics of the pi-calculus and open bisimulation have already been elegantly specified in higher-order logic programming systems. Our solution using Haskell defines the formulae generation as a tree transformation from the forest of all nondeterministic bisimulation steps to a pair of distinguishing formulae. Thanks to laziness in Haskell, only the necessary paths demanded by the tree transformation function are generated. Our work demonstrates that Haskell and its libraries provide an attractive platform for symbolically analyzing equivalence properties of labeled transition systems in an environment sensitive setting. △ Less

Submitted 30 May, 2017; originally announced May 2017.

arXiv:1701.05324 [pdf, other]

doi 10.46298/lmcs-17(3:2)2021

A Characterisation of Open Bisimilarity using an Intuitionistic Modal Logic

Authors: Ki Yung Ahn, Ross Horne, Alwen Tiu

Abstract: Open bisimilarity is defined for open process terms in which free variables may appear. The insight is, in order to characterise open bisimilarity, we move to the setting of intuitionistic modal logics. The intuitionistic modal logic introduced, called $\mathcal{OM}$, is such that modalities are closed under substitutions, which induces a property known as intuitionistic hereditary. Intuitionistic… ▽ More Open bisimilarity is defined for open process terms in which free variables may appear. The insight is, in order to characterise open bisimilarity, we move to the setting of intuitionistic modal logics. The intuitionistic modal logic introduced, called $\mathcal{OM}$, is such that modalities are closed under substitutions, which induces a property known as intuitionistic hereditary. Intuitionistic hereditary reflects in logic the lazy instantiation of free variables performed when checking open bisimilarity. The soundness proof for open bisimilarity with respect to our intuitionistic modal logic is mechanised in Abella. The constructive content of the completeness proof provides an algorithm for generating distinguishing formulae, which we have implemented. We draw attention to the fact that there is a spectrum of bisimilarity congruences that can be characterised by intuitionistic modal logics. △ Less

Submitted 9 August, 2021; v1 submitted 19 January, 2017; originally announced January 2017.

ACM Class: F.4.1

Journal ref: Logical Methods in Computer Science, Volume 17, Issue 3 (August 10, 2021) lmcs:4666

arXiv:1412.5227 [pdf, ps, other]

BER-Based Physical Layer Security with Finite Codelength: Combining Strong Converse and Error Amplification

Authors: Il-Min Kim, Byoung-Hoon Kim, Joon Kui Ahn

Abstract: A bit error rate (BER)-based physical layer security approach is proposed for finite blocklength. For secure communication in the sense of high BER, the information-theoretic strong converse is combined with cryptographic error amplification achieved by substitution permutation networks (SPNs) based on confusion and diffusion. For discrete memoryless channels (DMCs), an analytical framework is pro… ▽ More A bit error rate (BER)-based physical layer security approach is proposed for finite blocklength. For secure communication in the sense of high BER, the information-theoretic strong converse is combined with cryptographic error amplification achieved by substitution permutation networks (SPNs) based on confusion and diffusion. For discrete memoryless channels (DMCs), an analytical framework is provided showing the tradeoffs among finite blocklength, maximum/minimum possible transmission rates, and BER requirements for the legitimate receiver and the eavesdropper. Also, the security gap is analytically studied for Gaussian channels and the concept is extended to other DMCs including binary symmetric channels (BSCs) and binary erasure channels (BECs). For fading channels, the transmit power is optimized to minimize the outage probability of the legitimate receiver subject to a BER threshold for the eavesdropper. △ Less

Submitted 4 January, 2015; v1 submitted 16 December, 2014; originally announced December 2014.

arXiv:1307.4359 [pdf, other]

Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints

Authors: Kook Jin Ahn, Sudipto Guha

Abstract: In this paper we consider graph algorithms in models of computation where the space usage (random accessible storage, in addition to the read only input) is sublinear in the number of edges $m$ and the access to input data is constrained. These questions arises in many natural settings, and in particular in the analysis of MapReduce or similar algorithms that model constrained parallelism with sub… ▽ More In this paper we consider graph algorithms in models of computation where the space usage (random accessible storage, in addition to the read only input) is sublinear in the number of edges $m$ and the access to input data is constrained. These questions arises in many natural settings, and in particular in the analysis of MapReduce or similar algorithms that model constrained parallelism with sublinear central processing. In SPAA 2011, Lattanzi etal. provided a $O(1)$ approximation of maximum matching using $O(p)$ rounds of iterative filtering via mapreduce and $O(n^{1+1/p})$ space of central processing for a graph with $n$ nodes and $m$ edges. We focus on weighted nonbipartite maximum matching in this paper. For any constant $p>1$, we provide an iterative sampling based algorithm for computing a $(1-ε)$-approximation of the weighted nonbipartite maximum matching that uses $O(p/ε)$ rounds of sampling, and $O(n^{1+1/p})$ space. The results extends to $b$-Matching with small changes. This paper combines adaptive sketching literature and fast primal-dual algorithms based on relaxed Dantzig-Wolfe decision procedures. Each round of sampling is implemented through linear sketches and executed in a single round of MapReduce. The paper also proves that nonstandard linear relaxations of a problem, in particular penalty based formulations, are helpful in mapreduce and similar settings in reducing the adaptive dependence of the iterations. △ Less

Submitted 20 April, 2015; v1 submitted 16 July, 2013; originally announced July 2013.

arXiv:1307.4355 [pdf, other]

Near Linear Time Approximation Schemes for Uncapacitated and Capacitated b--Matching Problems in Nonbipartite Graphs

Authors: Kook Jin Ahn, Sudipto Guha

Abstract: We present the first near optimal approximation schemes for the maximum weighted (uncapacitated or capacitated) $b$--matching problems for non-bipartite graphs that run in time (near) linear in the number of edges. For any $δ>3/\sqrt{n}$ the algorithm produces a $(1-δ)$ approximation in $O(m \poly(δ^{-1},\log n))$ time. We provide fractional solutions for the standard linear programmin… ▽ More We present the first near optimal approximation schemes for the maximum weighted (uncapacitated or capacitated) $b$--matching problems for non-bipartite graphs that run in time (near) linear in the number of edges. For any $δ>3/\sqrt{n}$ the algorithm produces a $(1-δ)$ approximation in $O(m \poly(δ^{-1},\log n))$ time. We provide fractional solutions for the standard linear programming formulations for these problems and subsequently also provide (near) linear time approximation schemes for rounding the fractional solutions. Through these problems as a vehicle, we also present several ideas in the context of solving linear programs approximately using fast primal-dual algorithms. First, even though the dual of these problems have exponentially many variables and an efficient exact computation of dual weights is infeasible, we show that we can efficiently compute and use a sparse approximation of the dual weights using a combination of (i) adding perturbation to the constraints of the polytope and (ii) amplification followed by thresholding of the dual weights. Second, we show that approximation algorithms can be used to reduce the width of the formulation, and faster convergence. △ Less

Submitted 18 June, 2018; v1 submitted 16 July, 2013; originally announced July 2013.

arXiv:1202.2923 [pdf, other]

doi 10.4204/EPTCS.76.9

Irrelevance, Heterogeneous Equality, and Call-by-value Dependent Type Systems

Authors: Vilhelm Sjöberg, Chris Casinghino, Ki Yung Ahn, Nathan Collins, Harley D. Eades III, Peng Fu, Garrin Kimmell, Tim Sheard, Aaron Stump, Stephanie Weirich

Abstract: We present a full-spectrum dependently typed core language which includes both nontermination and computational irrelevance (a.k.a. erasure), a combination which has not been studied before. The two features interact: to protect type safety we must be careful to only erase terminating expressions. Our language design is strongly influenced by the choice of CBV evaluation, and by our novel treatmen… ▽ More We present a full-spectrum dependently typed core language which includes both nontermination and computational irrelevance (a.k.a. erasure), a combination which has not been studied before. The two features interact: to protect type safety we must be careful to only erase terminating expressions. Our language design is strongly influenced by the choice of CBV evaluation, and by our novel treatment of propositional equality which has a heterogeneous, completely erased elimination form. △ Less

Submitted 13 February, 2012; originally announced February 2012.

Comments: In Proceedings MSFP 2012, arXiv:1202.2407

ACM Class: D.3.1

Journal ref: EPTCS 76, 2012, pp. 112-162

arXiv:1105.0515 [pdf]

Core-Periphery Segregation in Evolving Prisoner's Dilemma Networks

Authors: Yunkyu Sohn, Jung-Kyoo Choi, T. K. Ahn

Abstract: Dense cooperative networks are an essential element of social capital for a prosperous society. These networks enable individuals to overcome collective action dilemmas by enhancing trust. In many biological and social settings, network structures evolve endogenously as agents exit relationships and build new ones. However, the process by which evolutionary dynamics lead to self-organization of de… ▽ More Dense cooperative networks are an essential element of social capital for a prosperous society. These networks enable individuals to overcome collective action dilemmas by enhancing trust. In many biological and social settings, network structures evolve endogenously as agents exit relationships and build new ones. However, the process by which evolutionary dynamics lead to self-organization of dense cooperative networks has not been explored. Our large group prisoner's dilemma experiments with exit and partner choice options show that core-periphery segregation of cooperators and defectors drives the emergence of cooperation. Cooperators' Quit-for-Tat and defectors' Roving strategy lead to a highly asymmetric core and periphery structure. Densely connected to each other, cooperators successfully isolate defectors and earn larger payoffs than defectors. Our analysis of the topological characteristics of evolving networks illuminates how social capital is generated. △ Less

Submitted 9 December, 2012; v1 submitted 3 May, 2011; originally announced May 2011.

arXiv:1104.4058 [pdf, ps, other]

Laminar Families and Metric Embeddings: Non-bipartite Maximum Matching Problem in the Semi-Streaming Model

Authors: Kook Jin Ahn, Sudipto Guha

Abstract: In this paper, we study the non-bipartite maximum matching problem in the semi-streaming model. The maximum matching problem in the semi-streaming model has received a significant amount of attention lately. While the problem has been somewhat well solved for bipartite graphs, the known algorithms for non-bipartite graphs use $2^{\frac1ε}$ passes or $n^{\frac1ε}$ time to compute a $(1-ε)$ approxim… ▽ More In this paper, we study the non-bipartite maximum matching problem in the semi-streaming model. The maximum matching problem in the semi-streaming model has received a significant amount of attention lately. While the problem has been somewhat well solved for bipartite graphs, the known algorithms for non-bipartite graphs use $2^{\frac1ε}$ passes or $n^{\frac1ε}$ time to compute a $(1-ε)$ approximation. In this paper we provide the first FPTAS (polynomial in $n,\frac1ε$) for the problem which is efficient in both the running time and the number of passes. We also show that we can estimate the size of the matching in $O(\frac1ε)$ passes using slightly superlinear space. To achieve both results, we use the structural properties of the matching polytope such as the laminarity of the tight sets and total dual integrality. The algorithms are iterative, and are based on the fractional packing and covering framework. However the formulations herein require exponentially many variables or constraints. We use laminarity, metric embeddings and graph sparsification to reduce the space required by the algorithms in between and across the iterations. This is the first use of these ideas in the semi-streaming model to solve a combinatorial optimization problem. △ Less

Submitted 20 April, 2011; originally announced April 2011.

Showing 1–50 of 53 results for author: Ahn, K