-
Synthetic Networks That Preserve Edge Connectivity
Authors:
Lahari Anne,
The-Anh Vu-Le,
Minhyuk Park,
Tandy Warnow,
George Chacko
Abstract:
Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools available, Stochastic Block Models (SBMs) produce networks with ground truth clusters that well approximate input parameters from real-world networks and clusterings. H…
▽ More
Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools available, Stochastic Block Models (SBMs) produce networks with ground truth clusters that well approximate input parameters from real-world networks and clusterings. However, we show that SBMs can produce disconnected ground truth clusters, even when given parameters from clusterings where all clusters are connected. Here we describe the REalistic Cluster Connectivity Simulator (RECCS), a technique that modifies an SBM synthetic network to improve the fit to a given clustered real-world network with respect to edge connectivity within clusters, while maintaining the good fit with respect to other network and cluster statistics. Using real-world networks up to 13.9 million nodes in size, we show that RECCS, applied to stochastic block models, results in synthetic networks that have a better fit to cluster edge connectivity than unmodified SBMs, while providing roughly the same quality fit for other network and clustering parameters as unmodified SBMs.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Improved Community Detection using Stochastic Block Models
Authors:
Minhyuk Park,
Daniel Wang Feng,
Siya Digra,
The-Anh Vu-Le,
George Chacko,
Tandy Warnow
Abstract:
Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover community structure in graphs. In this study, we demonstrate that SBM software applied to various real-world and synthetic networks produces poorly-connected to disco…
▽ More
Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover community structure in graphs. In this study, we demonstrate that SBM software applied to various real-world and synthetic networks produces poorly-connected to disconnected clusters. We present simple modifications to improve the connectivity of SBM clusters, and show that the modifications improve accuracy using simulated networks.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment
Authors:
Cong-Duy Nguyen,
The-Anh Vu-Le,
Thong Nguyen,
Tho Quan,
Luu Anh Tuan
Abstract:
Language models have been supervised with both language-only objective and visual grounding in existing studies of visual-grounded language learning. However, due to differences in the distribution and scale of visual-grounded datasets and language corpora, the language model tends to mix up the context of the tokens that occurred in the grounded data with those that do not. As a result, during re…
▽ More
Language models have been supervised with both language-only objective and visual grounding in existing studies of visual-grounded language learning. However, due to differences in the distribution and scale of visual-grounded datasets and language corpora, the language model tends to mix up the context of the tokens that occurred in the grounded data with those that do not. As a result, during representation learning, there is a mismatch between the visual information and the contextual meaning of the sentence. To overcome this limitation, we propose GroundedBERT - a grounded language learning method that enhances the BERT representation with visually grounded information. GroundedBERT comprises two components: (i) the original BERT which captures the contextual representation of words learned from the language corpora, and (ii) a visual grounding module which captures visual information learned from visual-grounded datasets. Moreover, we employ Optimal Transport (OT), specifically its partial variant, to solve the fractional alignment problem between the two modalities. Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.
△ Less
Submitted 9 January, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
UniFed: All-In-One Federated Learning Platform to Unify Open-Source Frameworks
Authors:
Xiaoyuan Liu,
Tianneng Shi,
Chulin Xie,
Qinbin Li,
Kangping Hu,
Haoyu Kim,
Xiaojun Xu,
The-Anh Vu-Le,
Zhen Huang,
Arash Nourian,
Bo Li,
Dawn Song
Abstract:
Federated Learning (FL) has become a practical and widely adopted distributed learning paradigm. However, the lack of a comprehensive and standardized solution covering diverse use cases makes it challenging to use in practice. In addition, selecting an appropriate FL framework for a specific use case can be a daunting task. In this work, we present UniFed, the first unified platform for standardi…
▽ More
Federated Learning (FL) has become a practical and widely adopted distributed learning paradigm. However, the lack of a comprehensive and standardized solution covering diverse use cases makes it challenging to use in practice. In addition, selecting an appropriate FL framework for a specific use case can be a daunting task. In this work, we present UniFed, the first unified platform for standardizing existing open-source FL frameworks. The platform streamlines the end-to-end workflow for distributed experimentation and deployment, encompassing 11 popular open-source FL frameworks. In particular, to address the substantial variations in workflows and data formats, UniFed introduces a configuration-based schema-enforced task specification, offering 20 editable fields. UniFed also provides functionalities such as distributed execution management, logging, and data analysis.
With UniFed, we evaluate and compare 11 popular FL frameworks from the perspectives of functionality, privacy protection, and performance, through conducting developer surveys and code-level investigation. We collect 15 diverse FL scenario setups (e.g., horizontal and vertical settings) for FL framework evaluation. This comprehensive evaluation allows us to analyze both model and system performance, providing detailed comparisons and offering recommendations for framework selection. UniFed simplifies the process of selecting and utilizing the appropriate FL framework for specific use cases, while enabling standardized distributed experimentation and deployment. Our results and analysis based on experiments with up to 178 distributed nodes provide valuable system design and deployment insights, aiming to empower practitioners in their pursuit of effective FL solutions.
△ Less
Submitted 30 December, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Improving Mini-batch Optimal Transport via Partial Transportation
Authors:
Khai Nguyen,
Dang Nguyen,
The-Anh Vu-Le,
Tung Pham,
Nhat Ho
Abstract:
Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified ma…
▽ More
Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.
△ Less
Submitted 6 June, 2022; v1 submitted 22 August, 2021;
originally announced August 2021.