Search | arXiv e-print repository

arXiv:2405.19272 [pdf, other]

Mitigating Disparate Impact of Differential Privacy in Federated Learning through Robust Clustering

Authors: Saber Malekmohammadi, Afaf Taik, Golnoosh Farnadi

Abstract: Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Privacy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address per… ▽ More Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Privacy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address performance fairness in vanilla FL through clustering, but this method remains sensitive and prone to errors, which are further exacerbated by the DP noise in DPFL. To fill this gap, in this paper, we propose a novel clustered DPFL algorithm designed to effectively identify clients' clusters in highly heterogeneous settings while maintaining high accuracy with DP guarantees. To this end, we propose to cluster clients based on both their model updates and training loss values. Our proposed approach also addresses the server's uncertainties in clustering clients' model updates by employing larger batch sizes along with Gaussian Mixture Model (GMM) to alleviate the impact of noise and potential clustering errors, especially in privacy-sensitive scenarios. We provide theoretical analysis of the effectiveness of our proposed approach. We also extensively evaluate our approach across diverse data distributions and privacy budgets and show its effectiveness in mitigating the disparate impact of DP in FL settings with a small computational cost. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2404.14620 [pdf, other]

Fairness Incentives in Response to Unfair Dynamic Pricing

Authors: Jesse Thibodeau, Hadi Nekoei, Afaf Taïk, Janarthanan Rajendran, Golnoosh Farnadi

Abstract: The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, p… ▽ More The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, policy makers might leverage tools such as taxation and subsidy to adapt policy mechanisms dependent upon their social objective. In this paper, we explore the potential for AI methods to assist such intervention strategies. To this end, we design a basic simulated economy, wherein we introduce a dynamic social planner (SP) to generate corporate taxation schedules geared to incentivizing firms towards adopting fair pricing behaviours, and to use the collected tax budget to subsidize consumption among underrepresented groups. To cover a range of possible policy scenarios, we formulate our social planner's learning problem as a multi-armed bandit, a contextual bandit and finally as a full reinforcement learning (RL) problem, evaluating welfare outcomes from each case. To alleviate the difficulty in retaining meaningful tax rates that apply to less frequently occurring brackets, we introduce FairReplayBuffer, which ensures that our RL agent samples experiences uniformly across a discretized fairness space. We find that, upon deploying a learned tax and redistribution policy, social welfare improves on that of the fairness-agnostic baseline, and approaches that of the analytically optimal fairness-aware baseline for the multi-armed and contextual bandit settings, and surpassing it by 13.19% in the full RL setting. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2403.13213 [pdf, other]

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards

Authors: Khaoula Chehbouni, Megha Roshan, Emmanuel Ma, Futian Andrew Wei, Afaf Taik, Jackie CK Cheung, Golnoosh Farnadi

Abstract: Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging saf… ▽ More Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain. Furthermore, previous work has demonstrated that models optimized for safety often display exaggerated safety behaviors, such as a tendency to refrain from responding to certain requests as a precautionary measure. As such, a clear trade-off between the helpfulness and safety of these models has been documented in the literature. In this paper, we further investigate the effectiveness of safety measures by evaluating models on already mitigated biases. Using the case of Llama 2 as an example, we illustrate how LLMs' safety responses can still encode harmful assumptions. To do so, we create a set of non-toxic prompts, which we then use to evaluate Llama models. Through our new taxonomy of LLMs responses to users, we observe that the safety/helpfulness trade-offs are more pronounced for certain demographic groups which can lead to quality-of-service harms for marginalized populations. △ Less

Submitted 5 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures. Accepted to Findings of the Association for Computational Linguistics: ACL 2024

arXiv:2403.05564 [pdf, other]

Promoting Fair Vaccination Strategies Through Influence Maximization: A Case Study on COVID-19 Spread

Authors: Nicola Neophytou, Afaf Taïk, Golnoosh Farnadi

Abstract: The aftermath of the Covid-19 pandemic saw more severe outcomes for racial minority groups and economically-deprived communities. Such disparities can be explained by several factors, including unequal access to healthcare, as well as the inability of low income groups to reduce their mobility due to work or social obligations. Moreover, senior citizens were found to be more susceptible to severe… ▽ More The aftermath of the Covid-19 pandemic saw more severe outcomes for racial minority groups and economically-deprived communities. Such disparities can be explained by several factors, including unequal access to healthcare, as well as the inability of low income groups to reduce their mobility due to work or social obligations. Moreover, senior citizens were found to be more susceptible to severe symptoms, largely due to age-related health reasons. Adapting vaccine distribution strategies to consider a range of demographics is therefore essential to address these disparities. In this study, we propose a novel approach that utilizes influence maximization (IM) on mobility networks to develop vaccination strategies which incorporate demographic fairness. By considering factors such as race, social status, age, and associated risk factors, we aim to optimize vaccine distribution to achieve various fairness definitions for one or more protected attributes at a time. Through extensive experiments conducted on Covid-19 spread in three major metropolitan areas across the United States, we demonstrate the effectiveness of our proposed approach in reducing disease transmission and promoting fairness in vaccination distribution. △ Less

Submitted 20 February, 2024; originally announced March 2024.

Comments: 10 pages, 4 figures

ACM Class: I.2.1; I.6.3; J.3

arXiv:2306.10043 [pdf, ps, other]

Unraveling the Interconnected Axes of Heterogeneity in Machine Learning for Democratic and Inclusive Advancements

Authors: Maryam Molamohammadi, Afaf Taik, Nicolas Le Roux, Golnoosh Farnadi

Abstract: The growing utilization of machine learning (ML) in decision-making processes raises questions about its benefits to society. In this study, we identify and analyze three axes of heterogeneity that significantly influence the trajectory of ML products. These axes are i) values, culture and regulations, ii) data composition, and iii) resource and infrastructure capacity. We demonstrate how these ax… ▽ More The growing utilization of machine learning (ML) in decision-making processes raises questions about its benefits to society. In this study, we identify and analyze three axes of heterogeneity that significantly influence the trajectory of ML products. These axes are i) values, culture and regulations, ii) data composition, and iii) resource and infrastructure capacity. We demonstrate how these axes are interdependent and mutually influence one another, emphasizing the need to consider and address them jointly. Unfortunately, the current research landscape falls short in this regard, often failing to adopt a holistic approach. We examine the prevalent practices and methodologies that skew these axes in favor of a selected few, resulting in power concentration, homogenized control, and increased dependency. We discuss how this fragmented study of the three axes poses a significant challenge, leading to an impractical solution space that lacks reflection of real-world scenarios. Addressing these issues is crucial to ensure a more comprehensive understanding of the interconnected nature of society and to foster the democratic and inclusive development of ML systems that are more aligned with real-world complexities and its diverse requirements. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2206.11328 [pdf, ps, other]

Federated Deep Reinforcement Learning for Open RAN Slicing in 6G Networks

Authors: Amine Abouaomar, Afaf Taik, Abderrahime Filali, Soumaya Cherkaoui

Abstract: Radio access network (RAN) slicing is a key element in enabling current 5G networks and next-generation networks to meet the requirements of different services in various verticals. However, the heterogeneous nature of these services' requirements, along with the limited RAN resources, makes the RAN slicing very complex. Indeed, the challenge that mobile virtual network operators (MVNOs) face is t… ▽ More Radio access network (RAN) slicing is a key element in enabling current 5G networks and next-generation networks to meet the requirements of different services in various verticals. However, the heterogeneous nature of these services' requirements, along with the limited RAN resources, makes the RAN slicing very complex. Indeed, the challenge that mobile virtual network operators (MVNOs) face is to rapidly adapt their RAN slicing strategies to the frequent changes of the environment constraints and service requirements. Machine learning techniques, such as deep reinforcement learning (DRL), are increasingly considered a key enabler for automating the management and orchestration of RAN slicing operations. Nerveless, the ability to generalize DRL models to multiple RAN slicing environments may be limited, due to their strong dependence on the environment data on which they are trained. Federated learning enables MVNOs to leverage more diverse training inputs for DRL without the high cost of collecting this data from different RANs. In this paper, we propose a federated deep reinforcement learning approach for RAN slicing. In this approach, MVNOs collaborate to improve the performance of their DRL-based RAN slicing models. Each MVNO trains a DRL model and sends it for aggregation. The aggregated model is then sent back to each MVNO for immediate use and further training. The simulation results show the effectiveness of the proposed DRL approach. △ Less

Submitted 22 December, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

arXiv:2201.11271 [pdf, other]

Clustered Vehicular Federated Learning: Process and Optimization

Authors: Afaf Taik, Zoubeir Mlika, Soumaya Cherkaoui

Abstract: Federated Learning (FL) is expected to play a prominent role for privacy-preserving machine learning (ML) in autonomous vehicles. FL involves the collaborative training of a single ML model among edge devices on their distributed datasets while keeping data locally. While FL requires less communication compared to classical distributed learning, it remains hard to scale for large models. In vehicu… ▽ More Federated Learning (FL) is expected to play a prominent role for privacy-preserving machine learning (ML) in autonomous vehicles. FL involves the collaborative training of a single ML model among edge devices on their distributed datasets while keeping data locally. While FL requires less communication compared to classical distributed learning, it remains hard to scale for large models. In vehicular networks, FL must be adapted to the limited communication resources, the mobility of the edge nodes, and the statistical heterogeneity of data distributions. Indeed, a judicious utilization of the communication resources alongside new perceptive learning-oriented methods are vital. To this end, we propose a new architecture for vehicular FL and corresponding learning and scheduling processes. The architecture utilizes vehicular-to-vehicular(V2V) resources to bypass the communication bottleneck where clusters of vehicles train models simultaneously and only the aggregate of each cluster is sent to the multi-access edge (MEC) server. The cluster formation is adapted for single and multi-task learning, and takes into account both communication and learning aspects. We show through simulations that the proposed process is capable of improving the learning accuracy in several non-independent and-identically-distributed (non-i.i.d) and unbalanced datasets distributions, under mobility constraints, in comparison to standard FL. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2201.11248 [pdf, ps, other]

doi 10.1109/ICC40277.2020.9148937

Electrical Load Forecasting Using Edge Computing and Federated Learning

Authors: Afaf Taik, Soumaya Cherkaoui

Abstract: In the smart grid, huge amounts of consumption data are used to train deep learning models for applications such as load monitoring and demand response. However, these applications raise concerns regarding security and have high accuracy requirements. In one hand, the data used is privacy-sensitive. For instance, the fine-grained data collected by a smart meter at a consumer's home may reveal info… ▽ More In the smart grid, huge amounts of consumption data are used to train deep learning models for applications such as load monitoring and demand response. However, these applications raise concerns regarding security and have high accuracy requirements. In one hand, the data used is privacy-sensitive. For instance, the fine-grained data collected by a smart meter at a consumer's home may reveal information on the appliances and thus the consumer's behaviour at home. On the other hand, the deep learning models require big data volumes with enough variety and to be trained adequately. In this paper, we evaluate the use of Edge computing and federated learning, a decentralized machine learning scheme that allows to increase the volume and diversity of data used to train the deep learning models without compromising privacy. This paper reports, to the best of our knowledge, the first use of federated learning for household load forecasting and achieves promising results. The simulations were done using Tensorflow Federated on the data from 200 houses from Texas, USA. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: ICC 2020-2020 IEEE International Conference on Communications (ICC)

arXiv:2201.11247 [pdf, other]

doi 10.1109/LCN52139.2021.9524974

Data-Quality Based Scheduling for Federated Edge Learning

Authors: Afaf Taik, Hajar Moudoud, Soumaya Cherkaoui

Abstract: FEderated Edge Learning (FEEL) has emerged as a leading technique for privacy-preserving distributed training in wireless edge networks, where edge devices collaboratively train machine learning (ML) models with the orchestration of a server. However, due to frequent communication, FEEL needs to be adapted to the limited communication bandwidth. Furthermore, the statistical heterogeneity of local… ▽ More FEderated Edge Learning (FEEL) has emerged as a leading technique for privacy-preserving distributed training in wireless edge networks, where edge devices collaboratively train machine learning (ML) models with the orchestration of a server. However, due to frequent communication, FEEL needs to be adapted to the limited communication bandwidth. Furthermore, the statistical heterogeneity of local datasets' distributions, and the uncertainty about the data quality pose important challenges to the training's convergence. Therefore, a meticulous selection of the participating devices and an analogous bandwidth allocation are necessary. In this paper, we propose a data-quality based scheduling (DQS) algorithm for FEEL. DQS prioritizes reliable devices with rich and diverse datasets. In this paper, we define the different components of the learning algorithm and the data-quality evaluation. Then, we formulate the device selection and the bandwidth allocation problem. Finally, we present our DQS algorithm for FEEL, and we evaluate it in different data poisoning scenarios. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 2021 IEEE 46th Conference on Local Computer Networks (LCN)

arXiv:2104.03169 [pdf, ps, other]

doi 10.1109/MWC.017.2100187

Empowering Prosumer Communities in Smart Grid with Wireless Communications and Federated Edge Learning

Authors: Afaf Taik, Boubakr Nour, Soumaya Cherkaoui

Abstract: The exponential growth of distributed energy resources is enabling the transformation of traditional consumers in the smart grid into prosumers. Such transition presents a promising opportunity for sustainable energy trading. Yet, the integration of prosumers in the energy market imposes new considerations in designing unified and sustainable frameworks for efficient use of the power and communica… ▽ More The exponential growth of distributed energy resources is enabling the transformation of traditional consumers in the smart grid into prosumers. Such transition presents a promising opportunity for sustainable energy trading. Yet, the integration of prosumers in the energy market imposes new considerations in designing unified and sustainable frameworks for efficient use of the power and communication infrastructure. Furthermore, several issues need to be tackled to adequately promote the adoption of decentralized renewable-oriented systems, such as communication overhead, data privacy, scalability, and sustainability. In this article, we present the different aspects and challenges to be addressed for building efficient energy trading markets in relation to communication and smart decision-making. Accordingly, we propose a multi-level pro-decision framework for prosumer communities to achieve collective goals. Since the individual decisions of prosumers are mainly driven by individual self-sufficiency goals, the framework prioritizes the individual prosumers' decisions and relies on the 5G wireless network for fast coordination among community members. In fact, each prosumer predicts energy production and consumption to make proactive trading decisions as a response to collective-level requests. Moreover, the collaboration of the community is further extended by including the collaborative training of prediction models using Federated Learning, assisted by edge servers and prosumer home-area equipment. In addition to preserving prosumers' privacy, we show through evaluations that training prediction models using Federated Learning yields high accuracy for different energy resources while reducing the communication overhead. △ Less

Submitted 28 January, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

arXiv:2102.09491 [pdf, ps, other]

Data-Aware Device Scheduling for Federated Edge Learning

Authors: Afaf Taik, Zoubeir Mlika, Soumaya Cherkaoui

Abstract: Federated Edge Learning (FEEL) involves the collaborative training of machine learning models among edge devices, with the orchestration of a server in a wireless edge network. Due to frequent model updates, FEEL needs to be adapted to the limited communication bandwidth, scarce energy of edge devices, and the statistical heterogeneity of edge devices' data distributions. Therefore, a careful sche… ▽ More Federated Edge Learning (FEEL) involves the collaborative training of machine learning models among edge devices, with the orchestration of a server in a wireless edge network. Due to frequent model updates, FEEL needs to be adapted to the limited communication bandwidth, scarce energy of edge devices, and the statistical heterogeneity of edge devices' data distributions. Therefore, a careful scheduling of a subset of devices for training and uploading models is necessary. In contrast to previous work in FEEL where the data aspects are under-explored, we consider data properties at the heart of the proposed scheduling algorithm. To this end, we propose a new scheduling scheme for non-independent and-identically-distributed (non-IID) and unbalanced datasets in FEEL. As the data is the key component of the learning, we propose a new set of considerations for data characteristics in wireless scheduling algorithms in FEEL. In fact, the data collected by the devices depends on the local environment and usage pattern. Thus, the datasets vary in size and distributions among the devices. In the proposed algorithm, we consider both data and resource perspectives. In addition to minimizing the completion time of FEEL as well as the transmission energy of the participating devices, the algorithm prioritizes devices with rich and diverse datasets. We first define a general framework for the data-aware scheduling and the main axes and requirements for diversity evaluation. Then, we discuss diversity aspects and some exploitable techniques and metrics. Next, we formulate the problem and present our FEEL scheduling algorithm. Evaluations in different scenarios show that our proposed FEEL scheduling algorithm can help achieve high accuracy in few rounds with a reduced cost. △ Less

Submitted 26 January, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

arXiv:2009.00081 [pdf, other]

Federated Edge Learning : Design Issues and Challenges

Authors: Afaf Taïk, Soumaya Cherkaoui

Abstract: Federated Learning (FL) is a distributed machine learning technique, where each device contributes to the learning model by independently computing the gradient based on its local training data. It has recently become a hot research topic, as it promises several benefits related to data privacy and scalability. However, implementing FL at the network edge is challenging due to system and data hete… ▽ More Federated Learning (FL) is a distributed machine learning technique, where each device contributes to the learning model by independently computing the gradient based on its local training data. It has recently become a hot research topic, as it promises several benefits related to data privacy and scalability. However, implementing FL at the network edge is challenging due to system and data heterogeneity and resources constraints. In this article, we examine the existing challenges and trade-offs in Federated Edge Learning (FEEL). The design of FEEL algorithms for resources-efficient learning raises several challenges. These challenges are essentially related to the multidisciplinary nature of the problem. As the data is the key component of the learning, this article advocates a new set of considerations for data characteristics in wireless scheduling algorithms in FEEL. Hence, we propose a general framework for the data-aware scheduling as a guideline for future research directions. We also discuss the main axes and requirements for data evaluation and some exploitable techniques and metrics. △ Less

Submitted 26 January, 2022; v1 submitted 31 August, 2020; originally announced September 2020.

Comments: Submitted to IEEE Network Magazine

Showing 1–12 of 12 results for author: Taik, A