Search | arXiv e-print repository

Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization

Authors: Zhe Li, Bicheng Ying, Zidong Liu, Haibo Yang

Abstract: Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL pose a significant challenge to its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstac… ▽ More Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL pose a significant challenge to its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. In this paper, we introduce a novel dimension-free communication strategy for FL, leveraging zero-order optimization techniques. We propose a new algorithm, FedDisco, which facilitates the transmission of only a constant number of scalar values between clients and the server in each communication round, thereby reducing the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$, where $d$ is the dimension of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions and dimension-free rate for low effective rank scenarios. Empirical evaluations through classic deep learning training and large language model fine-tuning substantiate significant reductions in communication overhead compared to traditional FL approaches. Our code is available at https://github.com/ZidongLiu/FedDisco. △ Less

Submitted 24 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2306.00256 [pdf, other]

DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Authors: Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin

Abstract: Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their c… ▽ More Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose \underline{D}ecentralized \underline{SGD} with \underline{C}ommunication-optimal \underline{E}xact \underline{C}onsensus \underline{A}lgorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an $\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD's convergence analysis. Numerical experiments show the efficiency of DSGD-CECA. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2111.04287 [pdf, other]

BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning

Authors: Bicheng Ying, Kun Yuan, Hanbin Hu, Yiming Chen, Wotao Yin

Abstract: Decentralized algorithm is a form of computation that achieves a global goal through local dynamics that relies on low-cost communication between directly-connected agents. On large-scale optimization tasks involving distributed datasets, decentralized algorithms have shown strong, sometimes superior, performance over distributed algorithms with a central node. Recently, developing decentralized a… ▽ More Decentralized algorithm is a form of computation that achieves a global goal through local dynamics that relies on low-cost communication between directly-connected agents. On large-scale optimization tasks involving distributed datasets, decentralized algorithms have shown strong, sometimes superior, performance over distributed algorithms with a central node. Recently, developing decentralized algorithms for deep learning has attracted great attention. They are considered as low-communication-overhead alternatives to those using a parameter server or the Ring-Allreduce protocol. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. To fill the gap, we introduce BlueFog, a python library for straightforward, high-performance implementations of diverse decentralized algorithms. Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms, from those using a static, undirected graph for synchronous operations to those using dynamic and directed graphs for asynchronous operations. BlueFog also adopts several system-level acceleration techniques to further optimize the performance on the deep learning tasks. On mainstream DNN training tasks, BlueFog reaches a much higher throughput and achieves an overall $1.2\times \sim 1.8\times$ speedup over Horovod, a state-of-the-art distributed deep learning package based on Ring-Allreduce. BlueFog is open source at https://github.com/Bluefog-Lib/bluefog. △ Less

Submitted 8 November, 2021; originally announced November 2021.

arXiv:2110.13363 [pdf, other]

Exponential Graph is Provably Efficient for Decentralized Deep Training

Authors: Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, Wotao Yin

Abstract: Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging u… ▽ More Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization. In this paper, we study so-called exponential graphs where every node is connected to $O(\log(n))$ neighbors and $n$ is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of $\log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging. This favorable property enables one-peer exponential graph to average as effective as its static counterpart but communicates more efficiently. We apply these exponential graphs in decentralized (momentum) SGD to obtain the state-of-the-art balance between per-iteration communication and iteration complexity among all commonly-used topologies. Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training. Our code is implemented through BlueFog and available at https://github.com/Bluefog-Lib/NeurIPS2021-Exponential-Graph. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:1903.10956 [pdf, other]

doi 10.1109/TSP.2020.3008605

On the Influence of Bias-Correction on Distributed Stochastic Optimization

Authors: Kun Yuan, Sulaiman A. Alghunaim, Bicheng Ying, Ali H. Sayed

Abstract: Various bias-correction methods such as EXTRA, gradient tracking methods, and exact diffusion have been proposed recently to solve distributed {\em deterministic} optimization problems. These methods employ constant step-sizes and converge linearly to the {\em exact} solution under proper conditions. However, their performance under stochastic and adaptive settings is less explored. It is still un… ▽ More Various bias-correction methods such as EXTRA, gradient tracking methods, and exact diffusion have been proposed recently to solve distributed {\em deterministic} optimization problems. These methods employ constant step-sizes and converge linearly to the {\em exact} solution under proper conditions. However, their performance under stochastic and adaptive settings is less explored. It is still unknown {\em whether}, {\em when} and {\em why} these bias-correction methods can outperform their traditional counterparts (such as consensus and diffusion) with noisy gradient and constant step-sizes. This work studies the performance of exact diffusion under the stochastic and adaptive setting, and provides conditions under which exact diffusion has superior steady-state mean-square deviation (MSD) performance than traditional algorithms without bias-correction. In particular, it is proven that this superiority is more evident over sparsely-connected network topologies such as lines, cycles, or grids. Conditions are also provided under which exact diffusion method match or may even degrade the performance of traditional methods. Simulations are provided to validate the theoretical findings. △ Less

Submitted 11 July, 2019; v1 submitted 26 March, 2019; originally announced March 2019.

Comments: 17 pages, 9 figure, submitted for publication

arXiv:1810.08901 [pdf, other]

Dynamic Average Diffusion with randomized Coordinate Updates

Authors: Bicheng Ying, Kun Yuan, Ali H. Sayed

Abstract: This work derives and analyzes an online learning strategy for tracking the average of time-varying distributed signals by relying on randomized coordinate-descent updates. During each iteration, each agent selects or observes a random entry of the observation vector, and different agents may select different entries of their observations before engaging in a consultation step. Careful coordinatio… ▽ More This work derives and analyzes an online learning strategy for tracking the average of time-varying distributed signals by relying on randomized coordinate-descent updates. During each iteration, each agent selects or observes a random entry of the observation vector, and different agents may select different entries of their observations before engaging in a consultation step. Careful coordination of the interactions among agents is necessary to avoid bias and ensure convergence. We provide a convergence analysis for the proposed methods, and illustrate the results by means of simulations. △ Less

Submitted 30 July, 2019; v1 submitted 21 October, 2018; originally announced October 2018.

arXiv:1805.11384 [pdf, other]

doi 10.1109/TSP.2018.2881661

Supervised Learning Under Distributed Features

Authors: Bicheng Ying, Kun Yuan, Ali H. Sayed

Abstract: This work studies the problem of learning under both large datasets and large-dimensional feature space scenarios. The feature information is assumed to be spread across agents in a network, where each agent observes some of the features. Through local cooperation, the agents are supposed to interact with each other to solve an inference problem and converge towards the global minimizer of an empi… ▽ More This work studies the problem of learning under both large datasets and large-dimensional feature space scenarios. The feature information is assumed to be spread across agents in a network, where each agent observes some of the features. Through local cooperation, the agents are supposed to interact with each other to solve an inference problem and converge towards the global minimizer of an empirical risk. We study this problem exclusively in the primal domain, and propose new and effective distributed solutions with guaranteed convergence to the minimizer with linear rate under strong convexity. This is achieved by combining a dynamic diffusion construction, a pipeline strategy, and variance-reduced techniques. Simulation results illustrate the conclusions. △ Less

Submitted 22 May, 2020; v1 submitted 29 May, 2018; originally announced May 2018.

arXiv:1803.07964 [pdf, other]

doi 10.1109/TSP.2018.2878551

Stochastic Learning under Random Reshuffling with Constant Step-sizes

Authors: Bicheng Ying, Kun Yuan, Stefan Vlaski, Ali H. Sayed

Abstract: In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the… ▽ More In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss function. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size $O(μ^2)$ around the minimizer rather than $O(μ)$. Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations. △ Less

Submitted 9 October, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

arXiv:1801.05479 [pdf, other]

Belief Control Strategies for Interactions over Weakly-Connected Graphs

Authors: Hawraa Salami, Bicheng Ying, Ali H. Sayed

Abstract: In diffusion social learning over weakly-connected graphs, it has been shown recently that influential agents shape the beliefs of non-influential agents. This paper analyzes this mechanism more closely and addresses two main questions. First, the article examines how much freedom influential agents have in controlling the beliefs of the receiving agents, namely, whether receiving agents can be dr… ▽ More In diffusion social learning over weakly-connected graphs, it has been shown recently that influential agents shape the beliefs of non-influential agents. This paper analyzes this mechanism more closely and addresses two main questions. First, the article examines how much freedom influential agents have in controlling the beliefs of the receiving agents, namely, whether receiving agents can be driven to arbitrary beliefs and whether the network structure limits the scope of control by the influential agents. Second, even if there is a limit to what influential agents can accomplish, this article develops mechanisms by which they can lead receiving agents to adopt certain beliefs. These questions raise interesting possibilities about belief control over networked agents. Once addressed, one ends up with design procedures that allow influential agents to drive other agents to endorse particular beliefs regardless of their local observations or convictions. The theoretical findings are illustrated by means of examples. △ Less

Submitted 5 November, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: Submitted for publication

arXiv:1708.01384 [pdf, other]

Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling

Authors: Kun Yuan, Bicheng Ying, Jiageng Liu, Ali H. Sayed

Abstract: A new amortized variance-reduced gradient (AVRG) algorithm was developed in \cite{ying2017convergence}, which has constant storage requirement in comparison to SAGA and balanced gradient computations in comparison to SVRG. One key advantage of the AVRG strategy is its amenability to decentralized implementations. In this work, we show how AVRG can be extended to the network case where multiple lea… ▽ More A new amortized variance-reduced gradient (AVRG) algorithm was developed in \cite{ying2017convergence}, which has constant storage requirement in comparison to SAGA and balanced gradient computations in comparison to SVRG. One key advantage of the AVRG strategy is its amenability to decentralized implementations. In this work, we show how AVRG can be extended to the network case where multiple learning agents are assumed to be connected by a graph topology. In this scenario, each agent observes data that is spatially distributed and all agents are only allowed to communicate with direct neighbors. Moreover, the amount of data observed by the individual agents may differ drastically. For such situations, the balanced gradient computation property of AVRG becomes a real advantage in reducing idle time caused by unbalanced local data storage requirements, which is characteristic of other reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is shown to have linear convergence to the exact solution, and is much more memory efficient than other alternative algorithms. In addition, we propose a mini-batch strategy to balance the communication and computation efficiency for diffusion-AVRG. When a proper batch size is employed, it is observed in simulations that diffusion-AVRG is more computationally efficient than exact diffusion or EXTRA while maintaining almost the same communication efficiency. △ Less

Submitted 29 May, 2018; v1 submitted 4 August, 2017; originally announced August 2017.

Comments: 23 pages, 12 figures, submitted for publication

arXiv:1708.01383 [pdf, other]

Variance-Reduced Stochastic Learning under Random Reshuffling

Authors: Bicheng Ying, Kun Yuan, Ali H. Sayed

Abstract: Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA, Finito, and SAG, have been proposed to minimize empirical risks with linear convergence properties to the exact minimizer. The existing convergence results assume uniform data sampling with replacement. However, it has been observed in related works that random reshuffling can deliver superior performance over unif… ▽ More Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA, Finito, and SAG, have been proposed to minimize empirical risks with linear convergence properties to the exact minimizer. The existing convergence results assume uniform data sampling with replacement. However, it has been observed in related works that random reshuffling can deliver superior performance over uniform sampling and, yet, no formal proofs or guarantees of exact convergence exist for variance-reduced algorithms under random reshuffling. This paper makes two contributions. First, it resolves this open issue and provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA; the argument is also adaptable to other variance-reduced algorithms. Second, under random reshuffling, the paper proposes a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements compared to SAGA and with balanced gradient computations compared to SVRG. AVRG is also shown analytically to converge linearly. △ Less

Submitted 16 February, 2018; v1 submitted 4 August, 2017; originally announced August 2017.

arXiv:1704.06025 [pdf, other]

Performance Limits of Stochastic Sub-Gradient Learning, Part II: Multi-Agent Case

Authors: Bicheng Ying, Ali H. Sayed

Abstract: The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present. These algorithms are used when the risk functions are non-smooth and involve non-differentiable components. They have been long recognized as being slow converging methods. However, it was revealed in Part I that the rate of convergenc… ▽ More The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present. These algorithms are used when the risk functions are non-smooth and involve non-differentiable components. They have been long recognized as being slow converging methods. However, it was revealed in Part I that the rate of convergence becomes linear for stochastic optimization problems, with the error iterate converging at an exponential rate $α^i$ to within an $O(μ)-$neighborhood of the optimizer, for some $α\in (0,1)$ and small step-size $μ$. The conclusion was established under weaker assumptions than the prior literature and, moreover, several important problems (such as LASSO, SVM, and Total Variation) were shown to satisfy these weaker assumptions automatically (but not the previously used conditions from the literature). These results revealed that sub-gradient learning methods have more favorable behavior than originally thought when used to enable continuous adaptation and learning. The results of Part I were exclusive to single-agent adaptation. The purpose of the current Part II is to examine the implications of these discoveries when a collection of networked agents employs subgradient learning as their cooperative mechanism. The analysis will show that, despite the coupled dynamics that arises in a networked scenario, the agents are still able to attain linear convergence in the stochastic case; they are also able to reach agreement within $O(μ)$ of the optimizer. △ Less

Submitted 20 April, 2017; originally announced April 2017.

arXiv:1609.03703 [pdf, other]

Social Learning over Weakly-Connected Graphs

Authors: Hawraa Salami, Bicheng Ying, Ali H. Sayed

Abstract: In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential agents ends up shaping the beliefs of non-in… ▽ More In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential agents ends up shaping the beliefs of non-influential agents. We derive useful closed-form expressions that characterize this influence, and which can be used to motivate design problems to control it. We provide simulation examples to illustrate the results. △ Less

Submitted 6 January, 2017; v1 submitted 13 September, 2016; originally announced September 2016.

Comments: To appear in 2017 in the IEEE Transactions on Signal and Information Processing over Networks

arXiv:1607.01838 [pdf, other]

doi 10.1109/TSP.2017.2757903

Coordinate-Descent Diffusion Learning by Networked Agents

Authors: Chengcheng Wang, Yonggang Zhang, Bicheng Ying, Ali H. Sayed

Abstract: This work examines the mean-square error performance of diffusion stochastic algorithms under a generalized coordinate-descent scheme. In this setting, the adaptation step by each agent is limited to a random subset of the coordinates of its stochastic gradient vector. The selection of coordinates varies randomly from iteration to iteration and from agent to agent across the network. Such schemes… ▽ More This work examines the mean-square error performance of diffusion stochastic algorithms under a generalized coordinate-descent scheme. In this setting, the adaptation step by each agent is limited to a random subset of the coordinates of its stochastic gradient vector. The selection of coordinates varies randomly from iteration to iteration and from agent to agent across the network. Such schemes are useful in reducing computational complexity at each iteration in power-intensive large data applications. They are also useful in modeling situations where some partial gradient information may be missing at random. Interestingly, the results show that the steady-state performance of the learning strategy is not always degraded, while the convergence rate suffers some degradation. The results provide yet another indication of the resilience and robustness of adaptive distributed strategies. △ Less

Submitted 10 October, 2017; v1 submitted 6 July, 2016; originally announced July 2016.

Comments: Accepted for publication

arXiv:1603.04136 [pdf, ps, other]

On the Influence of Momentum Acceleration on Online Learning

Authors: Kun Yuan, Bicheng Ying, Ali H. Sayed

Abstract: The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum… ▽ More The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems. △ Less

Submitted 12 October, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

Comments: 66 pages, 9 figures, to appear in Journal of Machine Learning Research, 2016

arXiv:1602.07630 [pdf, ps, other]

Online Dual Coordinate Ascent Learning

Authors: Bicheng Ying, Kun Yuan, Ali H. Sayed

Abstract: The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This… ▽ More The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. In this work, we develop an {\em online} dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data. This feature embeds the resulting construction with continuous adaptation, learning, and tracking abilities, which are particularly attractive for online learning scenarios. △ Less

Submitted 24 February, 2016; originally announced February 2016.

arXiv:1511.07902 [pdf, other]

Performance Limits of Stochastic Sub-Gradient Learning, Part I: Single Agent Case

Authors: Bicheng Ying, Ali H. Sayed

Abstract: In this work and the supporting Part II, we examine the performance of stochastic sub-gradient learning strategies under weaker conditions than usually considered in the literature. The new conditions are shown to be automatically satisfied by several important cases of interest including SVM, LASSO, and Total-Variation denoising formulations. In comparison, these problems do not satisfy the tradi… ▽ More In this work and the supporting Part II, we examine the performance of stochastic sub-gradient learning strategies under weaker conditions than usually considered in the literature. The new conditions are shown to be automatically satisfied by several important cases of interest including SVM, LASSO, and Total-Variation denoising formulations. In comparison, these problems do not satisfy the traditional assumptions used in prior analyses and, therefore, conclusions derived from these earlier treatments are not directly applicable to these problems. The results in this article establish that stochastic sub-gradient strategies can attain linear convergence rates, as opposed to sub-linear rates, to the steady-state regime. A realizable exponential-weighting procedure is employed to smooth the intermediate iterates and guarantee useful performance bounds in terms of convergence rate and excessive risk performance. Part I of this work focuses on single-agent scenarios, which are common in stand-alone learning applications, while Part II extends the analysis to networked learners. The theoretical conclusions are illustrated by several examples and simulations, including comparisons with the FISTA procedure. △ Less

Submitted 21 April, 2017; v1 submitted 24 November, 2015; originally announced November 2015.

Comments: Part II is available on http://arxiv.org/abs/1704.06025

arXiv:1412.1523 [pdf, ps, other]

Information Exchange and Learning Dynamics over Weakly-Connected Adaptive Networks

Authors: Bicheng Ying, Ali H. Sayed

Abstract: The paper examines the learning mechanism of adaptive agents over weakly-connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower relationship develops with the performance of some a… ▽ More The paper examines the learning mechanism of adaptive agents over weakly-connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower relationship develops with the performance of some agents being fully determined by the performance of other agents that are outside their domain of influence. This scenario can arise, for example, due to intruder attacks by malicious agents or as the result of failures by some critical links. The findings in this work help explain why strong-connectivity of the network topology, adaptation of the combination weights, and clustering of agents are important ingredients to equalize the learning abilities of all agents against such disturbances. The results also clarify how weak-connectivity can be helpful in reducing the effect of outlier data on learning performance. △ Less

Submitted 6 December, 2015; v1 submitted 3 December, 2014; originally announced December 2014.

Showing 1–18 of 18 results for author: Ying, B