Skip to main content

Showing 1–50 of 82 results for author: Kyrillidis, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13879  [pdf, other

    quant-ph cs.DS cs.LG math.OC

    A Catalyst Framework for the Quantum Linear System Problem via the Proximal Point Algorithm

    Authors: Junhyung Lyle Kim, Nai-Hui Chia, Anastasios Kyrillidis

    Abstract: Solving systems of linear equations is a fundamental problem, but it can be computationally intensive for classical algorithms in high dimensions. Existing quantum algorithms can achieve exponential speedups for the quantum linear system problem (QLSP) in terms of the problem dimension, but even such a theoretical advantage is bottlenecked by the condition number of the coefficient matrix. In this… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. Better Schedules for Low Precision Training of Deep Neural Networks

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: Low precision training can significantly reduce the computational overhead of training deep neural networks (DNNs). Though many such techniques exist, cyclic precision training (CPT), which dynamically adjusts precision throughout training according to a cyclic schedule, achieves particularly impressive improvements in training efficiency, while actually improving DNN performance. Existing CPT imp… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 20 pages, 8 figures, 1 table, ACML 2023

    ACM Class: I.2.6; I.2.10; I.4.0

    Journal ref: Machine Learning (2024): 1-19

  3. arXiv:2310.04283  [pdf, other

    cs.LG math.OC stat.ML

    On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis

    Authors: Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis

    Abstract: Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the… ▽ More

    Submitted 29 May, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: ICML2024

  4. arXiv:2310.03899  [pdf, other

    cs.LG

    CrysFormer: Protein Structure Prediction via 3d Patterson Maps and Partial Structure Attention

    Authors: Chen Dun, Qiutai Pan, Shikai Jin, Ria Stevens, Mitchell D. Miller, George N. Phillips, Jr., Anastasios Kyrillidis

    Abstract: Determining the structure of a protein has been a decades-long open question. A protein's three-dimensional structure often poses nontrivial computation costs, when classical simulation algorithms are utilized. Advances in the transformer neural network architecture -- such as AlphaFold2 -- achieve significant improvements for this problem, by learning from a large dataset of sequence information… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  5. arXiv:2310.02842  [pdf, other

    cs.CL cs.AI

    Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

    Authors: Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim

    Abstract: Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how… ▽ More

    Submitted 5 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

  6. arXiv:2309.16862  [pdf, other

    cs.RO

    Stochastic Implicit Neural Signed Distance Functions for Safe Motion Planning under Sensing Uncertainty

    Authors: Carlos Quintero-Peña, Wil Thomason, Zachary Kingston, Anastasios Kyrillidis, Lydia E. Kavraki

    Abstract: Motion planning under sensing uncertainty is critical for robots in unstructured environments to guarantee safety for both the robot and any nearby humans. Most work on planning under uncertainty does not scale to high-dimensional robots such as manipulators, assumes simplified geometry of the robot or environment, or requires per-object knowledge of noise. Instead, we propose a method that direct… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 8 pages, 4 figures, 1 table. Submitted to the 2024 IEEE International Conference on Robotics and Automation

    ACM Class: I.2.9; I.2.8

  7. arXiv:2309.03469  [pdf, other

    cs.LG cs.AI cs.CV

    Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size

    Authors: John Chen, Chen Dun, Anastasios Kyrillidis

    Abstract: Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels. However, recent performance improvements have often come \textit{at the cost of significantly increased training computation}. To address this, we propose Curriculum Batch Size (CBS), \textit{an unlabeled batch size curriculum which exploits the… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  8. arXiv:2309.03237  [pdf, other

    cs.LG cs.IT math.OC

    Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat

    Authors: Erdong Hu, Yuxin Tang, Anastasios Kyrillidis, Chris Jermaine

    Abstract: We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate lear… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 16 pages, 7 figures, Accepted at ICCV2023

  9. arXiv:2306.11201  [pdf, other

    cs.LG cs.DC math.OC

    Adaptive Federated Learning with Auto-Tuned Clients

    Authors: Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis

    Abstract: Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challen… ▽ More

    Submitted 2 May, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

  10. arXiv:2306.08586  [pdf, other

    cs.LG cs.AI math.OC

    FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts

    Authors: Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Anastasios Kyrillidis, Dimitrios Dimitriadis

    Abstract: One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have priv… ▽ More

    Submitted 4 October, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 19 Pages

  11. arXiv:2306.08109  [pdf, other

    cs.LG math.OC

    Provable Accelerated Convergence of Nesterov's Momentum for Deep ReLU Neural Networks

    Authors: Fangshuo Liao, Anastasios Kyrillidis

    Abstract: Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity. While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov's momentum enjoys accelerated convergence under… ▽ More

    Submitted 4 January, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted by ALT 2024

  12. arXiv:2305.17118  [pdf, other

    cs.LG cs.CL

    Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time

    Authors: Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, Anshumali Shrivastava

    Abstract: Large language models(LLMs) have sparked a new wave of exciting AI applications. Hosting these models at scale requires significant memory resources. One crucial memory bottleneck for the deployment stems from the context window. It is commonly recognized that model weights are memory hungry; however, the size of key-value embedding stored during the generation process (KV cache) can easily surpas… ▽ More

    Submitted 28 August, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  13. arXiv:2211.04659  [pdf, other

    cs.LG math.OC stat.ML

    When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

    Authors: Junhyung Lyle Kim, Gauthier Gidel, Anastasios Kyrillidis, Fabian Pedregosa

    Abstract: The extragradient method has gained popularity due to its robust convergence properties for differentiable games. Unlike single-objective optimization, game dynamics involve complex interactions reflected by the eigenvalues of the game vector field's Jacobian scattered across the complex plane. This complexity can cause the simple gradient method to diverge, even for bilinear games, while the extr… ▽ More

    Submitted 10 February, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

  14. arXiv:2211.04624  [pdf, other

    cs.LG cs.CV math.OC

    Cold Start Streaming Learning for Deep Networks

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: The ability to dynamically adapt neural networks to newly-available data without performance deterioration would revolutionize deep learning applications. Streaming learning (i.e., learning from one data example at a time) has the potential to enable such real-time adaptation, but current approaches i) freeze a majority of network parameters during streaming and ii) are dependent upon offline, bas… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 52 pages, 7 figures, pre-print

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  15. arXiv:2210.16589  [pdf, other

    cs.LG cs.AI cs.IT math.OC

    Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation

    Authors: Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis

    Abstract: The strong Lottery Ticket Hypothesis (LTH) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training. We extend the theoretical guarantee of the strong LTH literature to a scenario more similar to the original LTH, by generalizing the weight change in the pre-training step to some pert… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

  16. arXiv:2210.16169  [pdf, other

    cs.LG cs.AI cs.IT math.OC

    LOFT: Finding Lottery Tickets through Filter-wise Training

    Authors: Qihan Wang, Chen Dun, Fangshuo Liao, Chris Jermaine, Anastasios Kyrillidis

    Abstract: Recent work on the Lottery Ticket Hypothesis (LTH) shows that there exist ``\textit{winning tickets}'' in large neural networks. These tickets represent ``sparse'' versions of the full model that can be trained independently to achieve comparable accuracy with respect to the full model. However, finding the winning tickets requires one to \emph{pretrain} the large model for at least a number of ep… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  17. arXiv:2210.16105  [pdf, other

    cs.LG cs.AI cs.IT math.OC

    Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout

    Authors: Chen Dun, Mirian Hipolito, Chris Jermaine, Dimitrios Dimitriadis, Anastasios Kyrillidis

    Abstract: Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup, where slower clients can severely impede the learning process. Herein, we propose \texttt{AsyncDrop}, a novel asynchronous FL framework that utilizes dropout regularization to handle device heterogeneity in distributed settings. Overall, \texttt{AsyncDrop} achieves better performance co… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  18. arXiv:2205.03747  [pdf, other

    cs.AI cs.LO math.OC

    DPMS: An ADD-Based Symbolic Approach for Generalized MaxSAT Solving

    Authors: Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

    Abstract: Boolean MaxSAT, as well as generalized formulations such as Min-MaxSAT and Max-hybrid-SAT, are fundamental optimization problems in Boolean reasoning. Existing methods for MaxSAT have been successful in solving benchmarks in CNF format. They lack, however, the ability to handle 1) (non-CNF) hybrid constraints, such as XORs and 2) generalized MaxSAT problems natively. To address this issue, we prop… ▽ More

    Submitted 6 May, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

  19. arXiv:2203.11579  [pdf, other

    quant-ph cs.LG math.OC

    Local Stochastic Factored Gradient Descent for Distributed Quantum State Tomography

    Authors: Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis

    Abstract: We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines. QST is the canonical procedure to characterize the state of a quantum system, which we formulate as a stochastic nonconvex smooth optimization problem. Physically, the estimation of a low-rank… ▽ More

    Submitted 1 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  20. arXiv:2203.10428  [pdf, other

    cs.LG cs.AI

    PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication

    Authors: Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin

    Abstract: Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data, and training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions for every… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: ICLR 2022

  21. arXiv:2203.02502  [pdf, other

    cs.LG cs.AI

    No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

    Abstract: Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest im… ▽ More

    Submitted 15 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted for ICASSP 2022, 8 figures, 1 table

  22. arXiv:2112.04905  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, t… ▽ More

    Submitted 29 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: 29 pages, 4 figures, 4th Annual Conference on Learning for Dynamics and Control

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  23. arXiv:2112.02668  [pdf, other

    cs.LG

    On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons

    Authors: Fangshuo Liao, Anastasios Kyrillidis

    Abstract: With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks. Such scenarios have either implicitly or explicitly emerged in the recent literature: see e.g., the Dropout family of regularization techniques, or some distributed ML training protocols that reduce communicat… ▽ More

    Submitted 11 August, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

  24. arXiv:2111.06171  [pdf, other

    math.OC cs.LG stat.ML

    Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

    Authors: Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis

    Abstract: Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods,… ▽ More

    Submitted 26 June, 2023; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: 24 pages, 2 figures, 4th Annual Conference on Learning for Dynamics and Control

  25. arXiv:2110.12292  [pdf, other

    cs.LG

    Federated Multiple Label Hashing (FedMLH): Communication Efficient Federated Learning on Extreme Classification Tasks

    Authors: Zhenwei Dai, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Anshumali Shrivastava

    Abstract: Federated learning enables many local devices to train a deep learning model jointly without sharing the local data. Currently, most of federated training schemes learns a global model by averaging the parameters of local models. However, most of these training schemes suffer from high communication cost resulted from transmitting full local model parameters. Moreover, directly averaging model par… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

    Comments: 10 pages, 5 figures

  26. arXiv:2108.00259  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    How much pre-training is enough to discover a good subnetwork?

    Authors: Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

    Abstract: Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. More often than not, it involves a three-step process -- pre-training, pruning, and re-training -- that is computationally expensive, as the dense model must be fully pre-trained. While previous work has revealed through experiments the relationship between the a… ▽ More

    Submitted 22 August, 2023; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: 29 pages

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  27. arXiv:2107.04197  [pdf, other

    cs.LG

    REX: Revisiting Budgeted Training with an Improved Schedule

    Authors: John Chen, Cameron Wolfe, Anastasios Kyrillidis

    Abstract: Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step s… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  28. arXiv:2107.00961  [pdf, other

    cs.LG cs.CV cs.DC math.OC

    ResIST: Layer-Wise Decomposition of ResNets for Distributed Training

    Authors: Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

    Abstract: We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the proc… ▽ More

    Submitted 14 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: 26 pages, 8 figures, pre-print under review

  29. arXiv:2107.00797  [pdf, other

    cs.LG

    Mitigating deep double descent by concatenating inputs

    Authors: John Chen, Qihan Wang, Anastasios Kyrillidis

    Abstract: The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In parti… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

  30. arXiv:2106.08775  [pdf, other

    math.OC cs.IT cs.LG cs.MS stat.ML

    Momentum-inspired Low-Rank Coordinate Descent for Diagonally Constrained SDPs

    Authors: Junhyung Lyle Kim, JA Lara Benitez, Mohammad Taha Toghani, Cameron Wolfe, Zhiwei Zhang, Anastasios Kyrillidis

    Abstract: We present a novel, practical, and provable approach for solving diagonally constrained semi-definite programming (SDP) problems at scale using accelerated non-convex programming. Our algorithm non-trivially combines acceleration motions from convex optimization with coordinate power iteration and matrix factorization techniques. The algorithm is extremely simple to implement, and adds only a sing… ▽ More

    Submitted 2 July, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 10 pages, 8 figures, preprint under review

    MSC Class: 49-02 ACM Class: F.2.1; G.4

  31. arXiv:2104.07006  [pdf, other

    quant-ph cs.IT cs.LG math.OC stat.ML

    Fast quantum state reconstruction via accelerated non-convex programming

    Authors: Junhyung Lyle Kim, George Kollias, Amir Kalev, Ken X. Wei, Anastasios Kyrillidis

    Abstract: We propose a new quantum state reconstruction method that combines ideas from compressed sensing, non-convex optimization, and acceleration methods. The algorithm, called Momentum-Inspired Factored Gradient Descent (\texttt{MiFGD}), extends the applicability of quantum tomography for larger systems. Despite being a non-convex method, \texttt{MiFGD} converges \emph{provably} close to the true densi… ▽ More

    Submitted 23 March, 2022; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: 45 pages

  32. arXiv:2102.10424  [pdf, other

    cs.LG cs.AI cs.DC math.OC

    GIST: Distributed Training for Large-Scale Graph Convolutional Networks

    Authors: Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis

    Abstract: The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs (e.g., GraphSAGE, ClusterGCN, etc.), we pioneer efficient training of large-scale GCN models (i.e., ultra-wide, overparameterized mo… ▽ More

    Submitted 14 March, 2022; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: 28 pages, 5 figures, pre-print under review

    ACM Class: I.2.4

  33. arXiv:2012.09768  [pdf, other

    stat.ML cs.LG math.OC

    Rank-One Measurements of Low-Rank PSD Matrices Have Small Feasible Sets

    Authors: T. Mitchell Roddenberry, Santiago Segarra, Anastasios Kyrillidis

    Abstract: We study the role of the constraint set in determining the solution to low-rank, positive semidefinite (PSD) matrix sensing problems. The setting we consider involves rank-one sensing matrices: In particular, given a set of rank-one projections of an approximately low-rank PSD matrix, we characterize the radius of the set of PSD matrices that satisfy the measurements. This result yields a sampling… ▽ More

    Submitted 6 April, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: 22 pages, 3 figures

  34. arXiv:2012.07983  [pdf, other

    cs.AI cs.IT cs.LG cs.LO math.OC

    On Continuous Local BDD-Based Search for Hybrid SAT Solving

    Authors: Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

    Abstract: We explore the potential of continuous local search (CLS) in SAT solving by proposing a novel approach for finding a solution of a hybrid system of Boolean constraints. The algorithm is based on CLS combined with belief propagation on binary decision diagrams (BDDs). Our framework accepts all Boolean constraints that admit compact BDDs, including symmetric Boolean constraints and small-coefficient… ▽ More

    Submitted 12 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: AAAI 21

  35. arXiv:2011.14066  [pdf, other

    stat.ML cs.LG

    On Generalization of Adaptive Methods for Over-parameterized Linear Regression

    Authors: Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on t… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.07055

  36. arXiv:2011.12618  [pdf, other

    cs.CV cs.LG

    StackMix: A complementary Mix algorithm

    Authors: John Chen, Samarth Sinha, Anastasios Kyrillidis

    Abstract: Techniques combining multiple images as input/output have proven to be effective data augmentations for training convolutional neural networks. In this paper, we present StackMix: Each input is presented as a concatenation of two images, and the label is the mean of the two one-hot labels. On its own, StackMix rivals other widely used methods in the "Mix" line of work. More importantly, unlike pre… ▽ More

    Submitted 17 March, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

  37. arXiv:2007.00715  [pdf, other

    stat.ML cs.LG stat.CO

    Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective

    Authors: Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

    Abstract: Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that the posterior inference using the selected subset closely approximates the posterior inference using the full dataset. This manuscript revisits Bayesian coresets through the lens of sparsity constrain… ▽ More

    Submitted 25 February, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: AISTATS 2021 (Oral)

  38. arXiv:1912.01032  [pdf, other

    cs.LO cs.IT cs.LG math.OC

    FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints

    Authors: Anastasios Kyrillidis, Anshumali Shrivastava, Moshe Y. Vardi, Zhiwei Zhang

    Abstract: The Boolean SATisfiability problem (SAT) is of central importance in computer science. Although SAT is known to be NP-complete, progress on the engineering side, especially that of Conflict-Driven Clause Learning (CDCL) and Local Search SAT solvers, has been remarkable. Yet, while SAT solvers aimed at solving industrial-scale benchmarks in Conjunctive Normal Form (CNF) have become quite mature, SA… ▽ More

    Submitted 24 February, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: The paper was accepted by Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020). V2 (Feb 24): Typos corrected

  39. arXiv:1911.06459  [pdf, other

    cs.LG cs.DC stat.ML

    Optimal Mini-Batch Size Selection for Fast Gradient Descent

    Authors: Michael P. Perrone, Haidar Khan, Changhoan Kim, Anastasios Kyrillidis, Jerry Quinn, Valentina Salapura

    Abstract: This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems. By decoupling algorithmic analysis issues from hardware and software implementation details, we reveal a robust empirical inverse law between mini-batch size and the average number of SGD updates required to converge to a speci… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

  40. arXiv:1911.05166  [pdf, other

    cs.LG stat.ML

    Negative sampling in semi-supervised learning

    Authors: John Chen, Vatsal Shah, Anastasios Kyrillidis

    Abstract: We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL). NS3L is motivated by the success of negative sampling/contrastive estimation. We demonstrate that adding the NS3L loss to state-of-the-art SSL algorithms, such as the Virtual Adversarial Training (VAT), significantly improves upon vanilla VAT and its variant… ▽ More

    Submitted 28 June, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

  41. arXiv:1910.13389  [pdf, ps, other

    stat.ML cs.LG math.OC

    Learning Sparse Distributions using Iterative Hard Thresholding

    Authors: Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

    Abstract: Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propo… ▽ More

    Submitted 30 January, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019

  42. arXiv:1910.04952  [pdf, other

    cs.LG eess.IV math.OC stat.ML

    Demon: Improved Neural Network Training with Momentum Decay

    Authors: John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis

    Abstract: Momentum is a widely used technique for gradient-based optimizers in deep learning. In this paper, we propose a decaying momentum (\textsc{Demon}) rule. We conduct the first large-scale empirical analysis of momentum decay methods for modern neural network optimization, in addition to the most popular learning rate decay schedules. Across 28 relevant combinations of models, epochs, datasets, and o… ▽ More

    Submitted 1 July, 2021; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: 12 pages

  43. arXiv:1910.02120  [pdf, other

    cs.LG stat.ML

    Distributed Learning of Deep Neural Networks using Independent Subnet Training

    Authors: Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

    Abstract: Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. Distributed learning partitions models and data over many machines, allowing model and dataset sizes beyond the available compute power and memory of a single machine. In practice though, distributed ML is challenging when distribution is mandato… ▽ More

    Submitted 18 April, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

  44. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  45. arXiv:1902.00179  [pdf, other

    cs.LG cs.DS stat.ML

    Compressing Gradient Optimizers via Count-Sketches

    Authors: Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava

    Abstract: Many popular first-order optimization methods (e.g., Momentum, AdaGrad, Adam) accelerate the convergence rate of deep learning models. However, these algorithms require auxiliary parameters, which cost additional memory proportional to the number of parameters in the model. The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-sca… ▽ More

    Submitted 26 February, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

    Comments: Initially submitted to WWW 2019 (November 2018)

  46. arXiv:1811.07055   

    stat.ML cs.LG

    Minimum weight norm models do not always generalize well for over-parameterized problems

    Authors: Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: This work is substituted by the paper in arXiv:2011.14066. Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter optimization. Recently, researchers have studied both algo… ▽ More

    Submitted 1 December, 2020; v1 submitted 16 November, 2018; originally announced November 2018.

    Comments: This work is substituted by the paper in arXiv:2011.14066

  47. arXiv:1806.02046  [pdf, other

    stat.ML cs.LG math.OC

    Implicit regularization and solution uniqueness in over-parameterized matrix sensing

    Authors: Kelly Geyer, Anastasios Kyrillidis, Amir Kalev

    Abstract: We consider whether algorithmic choices in over-parameterized linear matrix factorization introduce implicit regularization. We focus on noiseless matrix sensing over rank-$r$ positive semi-definite (PSD) matrices in $\mathbb{R}^{n \times n}$, with a sensing mechanism that satisfies restricted isometry properties (RIP). The algorithm we study is \emph{factored gradient descent}, where we model the… ▽ More

    Submitted 13 September, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: 12 pages

  48. arXiv:1806.00534  [pdf, other

    cs.LG cs.DS cs.IT math.OC stat.ML

    Provably convergent acceleration in factored gradient descent with applications in matrix sensing

    Authors: Tayo Ajayi, David Mildebrath, Anastasios Kyrillidis, Shashanka Ubaru, Georgios Kollias, Kristofer Bouchard

    Abstract: We present theoretical results on the convergence of \emph{non-convex} accelerated gradient descent in matrix factorization models with $\ell_2$-norm loss. The purpose of this work is to study the effects of acceleration in non-convex settings, where provable convergence with acceleration should not be considered a \emph{de facto} property. The technique is applied to matrix sensing problems, for… ▽ More

    Submitted 21 September, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 23 pages

  49. arXiv:1805.09464  [pdf, other

    cs.LG cs.IT math.NA math.OC stat.ML

    Simple and practical algorithms for $\ell_p$-norm low-rank approximation

    Authors: Anastasios Kyrillidis

    Abstract: We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$. The proposed framework, which is non-convex and gradient-based, is easy to implement and typically attains better approximations, faster, than state of the art. From a theoretical standpoint, we show that the proposed scheme can attain $(1 + \varepsilon)$-OPT approximations. Our algor… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: 16 pages, 11 figures, to appear in UAI 2018

  50. arXiv:1805.08920  [pdf, other

    cs.LG cs.CV math.OC math.ST stat.ML

    Approximate Newton-based statistical inference using only stochastic gradients

    Authors: Tianyang Li, Anastasios Kyrillidis, Liu Liu, Constantine Caramanis

    Abstract: We present a novel statistical inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in $M$-estimation, both for unregul… ▽ More

    Submitted 5 February, 2019; v1 submitted 22 May, 2018; originally announced May 2018.