Search | arXiv e-print repository

CLIP-Guided Networks for Transferable Targeted Attacks

Authors: Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia

Abstract: Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-targe… ▽ More Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-target} attacks address this by training only one class-conditional generator for multiple classes. However, the generator simply uses class labels as conditions, failing to leverage the rich semantic information of the target class. To this end, we design a \textbf{C}LIP-guided \textbf{G}enerative \textbf{N}etwork with \textbf{C}ross-attention modules (CGNC) to enhance multi-target attacks by incorporating textual knowledge of CLIP into the generator. Extensive experiments demonstrate that CGNC yields significant improvements over previous multi-target generative attacks, e.g., a 21.46\% improvement in success rate from ResNet-152 to DenseNet-121. Moreover, we propose a masked fine-tuning mechanism to further strengthen our method in attacking a single class, which surpasses existing single-target methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2406.05491 [pdf, other]

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Authors: Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Shutao Xia, Ke Xu

Abstract: Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus o… ▽ More Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus on instance-specific attacks that generate perturbations for each input sample. In this paper, we show that VLP models can be vulnerable to a new class of universal adversarial perturbation (UAP) for all input samples. Although initially transplanting existing UAP algorithms to perform attacks showed effectiveness in attacking discriminative models, the results were unsatisfactory when applied to VLP models. To this end, we revisit the multimodal alignments in VLP model training and propose the Contrastive-training Perturbation Generator with Cross-modal conditions (C-PGC). Specifically, we first design a generator that incorporates cross-modal information as conditioning input to guide the training. To further exploit cross-modal interactions, we propose to formulate the training objective as a multimodal contrastive learning paradigm based on our constructed positive and negative image-text pairs. By training the conditional generator with the designed loss, we successfully force the adversarial samples to move away from its original area in the VLP model's feature space, and thus essentially enhance the attacks. Extensive experiments show that our method achieves remarkable attack performance across various VLP models and Vision-and-Language (V+L) tasks. Moreover, C-PGC exhibits outstanding black-box transferability and achieves impressive results in fooling prevalent large VLP models including LLaVA and Qwen-VL. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2404.06709 [pdf, other]

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

Authors: Longwei Zou, Qingyang Wang, Han Zhao, Jiangang Kong, Yi Yang, Yangdong Deng

Abstract: The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks. However, the effectiveness of large language models are reliant on an exponentially increasing number of parameters. The overwhelming computation complexity incurs a high inference latency that negatively affects user experience. Existing methods to improve inferen… ▽ More The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks. However, the effectiveness of large language models are reliant on an exponentially increasing number of parameters. The overwhelming computation complexity incurs a high inference latency that negatively affects user experience. Existing methods to improve inference efficiency, such as tensor parallelism and quantization, target to reduce per-layer computing latency, yet overlook the cumulative latency due to the number of layers. Recent works on reducing the cumulative latency through layer removing, however, lead to significant performance drop. Motivated by the similarity of inputs among adjacent layers, we propose to identify quasi-independent layers, which can be concurrently computed to significantly decrease inference latency. We also introduce a bypassing technique to mitigate the effect of information loss. Empirical experiments of the proposed approach on the LLaMA models confirm that Concurrent Computation of Quasi-Independent Layers (CQIL) can reduce latency by up to 48.3% on LLaMA-33B, while maintaining a close level of performance. △ Less

Submitted 4 July, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: ACL 2024

arXiv:2403.18134 [pdf, other]

Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification

Authors: Zhan Shi, Jingwei Zhang, Jun Kong, Fusheng Wang

Abstract: In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based… ▽ More In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based MIL frameworks have limited power to recognize the long-range dependencies. In this paper, we introduce the integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer modeling neighboring relations at the local instance level and an efficient global attention model capturing comprehensive global information from extensive feature embeddings. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and BRIGHT, demonstrate the superiority of our approach over current state-of-the-art MIL methods, achieving an improvement of 1.0% to 2.6% in accuracy and 0.7%-1.6% in AUROC. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.01971 [pdf, other]

ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs

Authors: Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du, Qi Guo

Abstract: Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose… ▽ More Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose ContrastRepair, a novel conversation-based APR approach that augments conversation-driven APR by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, which offer contrastive feedback to the LLM. Our key insight is to minimize the difference between the generated passing test and the given failing test, which can better isolate the root causes of bugs. By providing informative and specific feedback, ContrastRepair enables the LLM to produce effective bug fixes. The implementation of ContrastRepair is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The results demonstrate that ContrastRepair significantly outperforms existing methods, achieving a new state-of-the-art in program repair. For instance, among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all 337 bug cases, while the best-performing baseline fixes 124 bugs. △ Less

Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00669 [pdf, other]

Advancing Additive Manufacturing through Deep Learning: A Comprehensive Review of Current Progress and Future Challenges

Authors: Amirul Islam Saimon, Emmanuel Yangue, Xiaowei Yue, Zhenyu James Kong, Chenang Liu

Abstract: Additive manufacturing (AM) has already proved itself to be the potential alternative to widely-used subtractive manufacturing due to its extraordinary capacity of manufacturing highly customized products with minimum material wastage. Nevertheless, it is still not being considered as the primary choice for the industry due to some of its major inherent challenges, including complex and dynamic pr… ▽ More Additive manufacturing (AM) has already proved itself to be the potential alternative to widely-used subtractive manufacturing due to its extraordinary capacity of manufacturing highly customized products with minimum material wastage. Nevertheless, it is still not being considered as the primary choice for the industry due to some of its major inherent challenges, including complex and dynamic process interactions, which are sometimes difficult to fully understand even with traditional machine learning because of the involvement of high-dimensional data such as images, point clouds, and voxels. However, the recent emergence of deep learning (DL) is showing great promise in overcoming many of these challenges as DL can automatically capture complex relationships from high-dimensional data without hand-crafted feature extraction. Therefore, the volume of research in the intersection of AM and DL is exponentially growing each year which makes it difficult for the researchers to keep track of the trend and future potential directions. Furthermore, to the best of our knowledge, there is no comprehensive review paper in this research track summarizing the recent studies. Therefore, this paper reviews the recent studies that apply DL for making the AM process better with a high-level summary of their contributions and limitations. Finally, it summarizes the current challenges and recommends some of the promising opportunities in this domain for further investigation with a special focus on generalizing DL models for wide-range of geometry types, managing uncertainties both in AM data and DL models, overcoming limited and noisy AM data issues by incorporating generative models, and unveiling the potential of interpretable DL for AM. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.10087 [pdf, ps, other]

Decentralized Covert Routing in Heterogeneous Networks Using Reinforcement Learning

Authors: Justin Kong, Terrence J. Moore, Fikadu T. Dagefu

Abstract: This letter investigates covert routing communications in a heterogeneous network where a source transmits confidential data to a destination with the aid of relaying nodes where each transmitter judiciously chooses one modality among multiple communication modalities. We develop a novel reinforcement learning-based covert routing algorithm that finds a route from the source to the destination whe… ▽ More This letter investigates covert routing communications in a heterogeneous network where a source transmits confidential data to a destination with the aid of relaying nodes where each transmitter judiciously chooses one modality among multiple communication modalities. We develop a novel reinforcement learning-based covert routing algorithm that finds a route from the source to the destination where each node identifies its next hop and modality only based on the local feedback information received from its neighboring nodes. We show based on numerical simulations that the proposed covert routing strategy has only negligible performance loss compared to the optimal centralized routing scheme. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2402.04013 [pdf, other]

Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses

Authors: Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia

Abstract: Model Inversion (MI) attacks aim to disclose private information about the training data by abusing access to the pre-trained models. These attacks enable adversaries to reconstruct high-fidelity data that closely aligns with the private training data, which has raised significant privacy concerns. Despite the rapid advances in the field, we lack a comprehensive overview of existing MI attacks and… ▽ More Model Inversion (MI) attacks aim to disclose private information about the training data by abusing access to the pre-trained models. These attacks enable adversaries to reconstruct high-fidelity data that closely aligns with the private training data, which has raised significant privacy concerns. Despite the rapid advances in the field, we lack a comprehensive overview of existing MI attacks and defenses. To fill this gap, this paper thoroughly investigates this field and presents a holistic survey. Firstly, our work briefly reviews the traditional MI on machine learning scenarios. We then elaborately analyze and compare numerous recent attacks and defenses on \textbf{D}eep \textbf{N}eural \textbf{N}etworks (DNNs) across multiple modalities and learning tasks. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2312.04106 [pdf, other]

Identity-Obscured Neural Radiance Fields: Privacy-Preserving 3D Facial Reconstruction

Authors: Jiayi Kong, Baixin Xu, Xurui Song, Chen Qian, Jun Luo, Ying He

Abstract: Neural radiance fields (NeRF) typically require a complete set of images taken from multiple camera perspectives to accurately reconstruct geometric details. However, this approach raise significant privacy concerns in the context of facial reconstruction. The critical need for privacy protection often leads invidividuals to be reluctant in sharing their facial images, due to fears of potential mi… ▽ More Neural radiance fields (NeRF) typically require a complete set of images taken from multiple camera perspectives to accurately reconstruct geometric details. However, this approach raise significant privacy concerns in the context of facial reconstruction. The critical need for privacy protection often leads invidividuals to be reluctant in sharing their facial images, due to fears of potential misuse or security risks. Addressing these concerns, we propose a method that leverages privacy-preserving images for reconstructing 3D head geometry within the NeRF framework. Our method stands apart from traditional facial reconstruction techniques as it does not depend on RGB information from images containing sensitive facial data. Instead, it effectively generates plausible facial geometry using a series of identity-obscured inputs, thereby protecting facial privacy. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.11745 [pdf, other]

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

Authors: Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sangjin Kim

Abstract: In this work, we propose a novel method for modeling numerous speakers, which enables expressing the overall characteristics of speakers in detail like a trained multi-speaker model without additional training on the target speaker's dataset. Although various works with similar purposes have been actively studied, their performance has not yet reached that of trained multi-speaker models due to th… ▽ More In this work, we propose a novel method for modeling numerous speakers, which enables expressing the overall characteristics of speakers in detail like a trained multi-speaker model without additional training on the target speaker's dataset. Although various works with similar purposes have been actively studied, their performance has not yet reached that of trained multi-speaker models due to their fundamental limitations. To overcome previous limitations, we propose effective methods for feature learning and representing target speakers' speech characteristics by discretizing the features and conditioning them to a speech synthesis model. Our method obtained a significantly higher similarity mean opinion score (SMOS) in subjective similarity evaluation than seen speakers of a high-performance multi-speaker model, even with unseen speakers. The proposed method also outperforms a zero-shot method by significant margins. Furthermore, our method shows remarkable performance in generating new artificial speakers. In addition, we demonstrate that the encoded latent features are sufficiently informative to reconstruct an original speaker's speech completely. It implies that our method can be used as a general methodology to encode and reconstruct speakers' characteristics in various tasks. △ Less

Submitted 31 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: ICML 2024

arXiv:2310.17448 [pdf, other]

Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge

Authors: Tanel Alumäe, Jiaming Kong, Daniil Robnikov

Abstract: This paper describes Tallinn University of Technology (TalTech) systems developed for the ASRU MADASR 2023 Challenge. The challenge focuses on automatic speech recognition of dialect-rich Indian languages with limited training audio and text data. TalTech participated in two tracks of the challenge: Track 1 that allowed using only the provided training data and Track 3 which allowed using addition… ▽ More This paper describes Tallinn University of Technology (TalTech) systems developed for the ASRU MADASR 2023 Challenge. The challenge focuses on automatic speech recognition of dialect-rich Indian languages with limited training audio and text data. TalTech participated in two tracks of the challenge: Track 1 that allowed using only the provided training data and Track 3 which allowed using additional audio data. In both tracks, we relied on wav2vec2.0 models. Our methodology diverges from the traditional procedure of finetuning pretrained wav2vec2.0 models in two key points: firstly, through the implementation of the aligned data augmentation technique to enhance the linguistic diversity of the training data, and secondly, via the application of deep prefix tuning for dialect adaptation of wav2vec2.0 models. In both tracks, our approach yielded significant improvements over the provided baselines, achieving the lowest word error rates across all participating teams. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Journal ref: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

arXiv:2310.04453 [pdf, other]

COVID-19 South African Vaccine Hesitancy Models Show Boost in Performance Upon Fine-Tuning on M-pox Tweets

Authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado

Abstract: Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-p… ▽ More Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-pox dataset before and after fine-tuning. More than 20k M-pox-related tweets from South Africa were hand-labelled as being either positive, negative or neutral. After fine-tuning these COVID-19 models on the M-pox dataset, the F1-scores increased by more than 8% falling just short of 70%, but still outperforming state-of-the-art models and well-known classification algorithms. An LDA-based topic modelling procedure was used to compare the miss-classified M-pox tweets of the original COVID-19 RoBERTa model with its fine-tuned version, and from this analysis, we were able to draw conclusions on how to build more sophisticated models. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.07449 [pdf]

Rate-Induced Transitions in Networked Complex Adaptive Systems: Exploring Dynamics and Management Implications Across Ecological, Social, and Socioecological Systems

Authors: Vítor V. Vasconcelos, Flávia M. D. Marquitti, Theresa Ong, Lisa C. McManus, Marcus Aguiar, Amanda B. Campos, Partha S. Dutta, Kristen Jovanelly, Victoria Junquera, Jude Kong, Elisabeth H. Krueger, Simon A. Levin, Wenying Liao, Mingzhen Lu, Dhruv Mittal, Mercedes Pascual, Flávio L. Pinheiro, Juan Rocha, Fernando P. Santos, Peter Sloot, Chenyang, Su, Benton Taylor, Eden Tekwa, Sjoerd Terpstra , et al. (5 additional authors not shown)

Abstract: Complex adaptive systems (CASs), from ecosystems to economies, are open systems and inherently dependent on external conditions. While a system can transition from one state to another based on the magnitude of change in external conditions, the rate of change -- irrespective of magnitude -- may also lead to system state changes due to a phenomenon known as a rate-induced transition (RIT). This st… ▽ More Complex adaptive systems (CASs), from ecosystems to economies, are open systems and inherently dependent on external conditions. While a system can transition from one state to another based on the magnitude of change in external conditions, the rate of change -- irrespective of magnitude -- may also lead to system state changes due to a phenomenon known as a rate-induced transition (RIT). This study presents a novel framework that captures RITs in CASs through a local model and a network extension where each node contributes to the structural adaptability of others. Our findings reveal how RITs occur at a critical environmental change rate, with lower-degree nodes tipping first due to fewer connections and reduced adaptive capacity. High-degree nodes tip later as their adaptability sources (lower-degree nodes) collapse. This pattern persists across various network structures. Our study calls for an extended perspective when managing CASs, emphasizing the need to focus not only on thresholds of external conditions but also the rate at which those conditions change, particularly in the context of the collapse of surrounding systems that contribute to the focal system's resilience. Our analytical method opens a path to designing management policies that mitigate RIT impacts and enhance resilience in ecological, social, and socioecological systems. These policies could include controlling environmental change rates, fostering system adaptability, implementing adaptive management strategies, and building capacity and knowledge exchange. Our study contributes to the understanding of RIT dynamics and informs effective management strategies for complex adaptive systems in the face of rapid environmental change. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 25 pages, 4 figures, 1 box, supplementary information

MSC Class: 37G; 37N; 91B; 91C; 91D; 91E; 92D; 92D25; 92D40; 92F; 93A; 93A14; 93A16 ACM Class: I.6.3; I.6.m; J.3; J.4; J.m; K.4.2

arXiv:2309.05253 [pdf, other]

A quantum tug of war between randomness and symmetries on homogeneous spaces

Authors: Rahul Arvind, Kishor Bharti, Jun Yong Khoo, Dax Enshan Koh, Jian Feng Kong

Abstract: We explore the interplay between symmetry and randomness in quantum information. Adopting a geometric approach, we consider states as $H$-equivalent if related by a symmetry transformation characterized by the group $H$. We then introduce the Haar measure on the homogeneous space $\mathbb{U}/H$, characterizing true randomness for $H$-equivalent systems. While this mathematical machinery is well-st… ▽ More We explore the interplay between symmetry and randomness in quantum information. Adopting a geometric approach, we consider states as $H$-equivalent if related by a symmetry transformation characterized by the group $H$. We then introduce the Haar measure on the homogeneous space $\mathbb{U}/H$, characterizing true randomness for $H$-equivalent systems. While this mathematical machinery is well-studied by mathematicians, it has seen limited application in quantum information: we believe our work to be the first instance of utilizing homogeneous spaces to characterize symmetry in quantum information. This is followed by a discussion of approximations of true randomness, commencing with $t$-wise independent approximations and defining $t$-designs on $\mathbb{U}/H$ and $H$-equivalent states. Transitioning further, we explore pseudorandomness, defining pseudorandom unitaries and states within homogeneous spaces. Finally, as a practical demonstration of our findings, we study the expressibility of quantum machine learning ansatze in homogeneous spaces. Our work provides a fresh perspective on the relationship between randomness and symmetry in the quantum world. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 9 + 1 pages, 3 figures

arXiv:2309.05088 [pdf]

Towards Trustworthy Artificial Intelligence for Equitable Global Health

Authors: Hong Qin, Jude Kong, Wandi Ding, Ramneek Ahluwalia, Christo El Morr, Zeynep Engin, Jake Okechukwu Effoduh, Rebecca Hwa, Serena Jingchuan Guo, Laleh Seyyed-Kalantari, Sylvia Kiwuwa Muyingo, Candace Makeda Moore, Ravi Parikh, Reva Schwartz, Dongxiao Zhu, Xiaoqian Wang, Yiye Zhang

Abstract: Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a glob… ▽ More Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a global mix of experts from various disciplines, community health practitioners, policymakers, and more. Topics covered included managing AI bias in socio-technical systems, AI's potential impacts on global health, and balancing data privacy with transparency. Panel discussions examined the cultural, political, and ethical dimensions of AI in global health. FairMI4GH aimed to stimulate dialogue, facilitate knowledge transfer, and spark innovative solutions. Drawing from NIST's AI Risk Management Framework, it provided suggestions for handling AI risks and biases. The need to mitigate data biases from the research design stage, adopt a human-centered approach, and advocate for AI transparency was recognized. Challenges such as updating legal frameworks, managing cross-border data sharing, and motivating developers to reduce bias were acknowledged. The event emphasized the necessity of diverse viewpoints and multi-dimensional dialogue for creating a fair and ethical AI framework for equitable global health. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 7 pages

arXiv:2307.16430 [pdf, other]

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Authors: Jungil Kong, Jihoon Park, Beomjeong Kim, Jeongmin Kim, Dohee Kong, Sangjin Kim

Abstract: Single-stage text-to-speech models have been actively studied recently, and their results have outperformed two-stage pipeline systems. Although the previous single-stage model has made great progress, there is room for improvement in terms of its intermittent unnaturalness, computational efficiency, and strong dependence on phoneme conversion. In this work, we introduce VITS2, a single-stage text… ▽ More Single-stage text-to-speech models have been actively studied recently, and their results have outperformed two-stage pipeline systems. Although the previous single-stage model has made great progress, there is room for improvement in terms of its intermittent unnaturalness, computational efficiency, and strong dependence on phoneme conversion. In this work, we introduce VITS2, a single-stage text-to-speech model that efficiently synthesizes a more natural speech by improving several aspects of the previous work. We propose improved structures and training mechanisms and present that the proposed methods are effective in improving naturalness, similarity of speech characteristics in a multi-speaker model, and efficiency of training and inference. Furthermore, we demonstrate that the strong dependence on phoneme conversion in previous works can be significantly reduced with our method, which allows a fully end-to-end single-stage approach. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Interspeech 2023

arXiv:2307.15072 [pdf, other]

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

Authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado

Abstract: Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, w… ▽ More Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2306.06862 [pdf, other]

Saltation Matrices: The Essential Tool for Linearizing Hybrid Dynamical Systems

Authors: Nathan J. Kong, J. Joe Payne, James Zhu, Aaron M. Johnson

Abstract: Hybrid dynamical systems, i.e. systems that have both continuous and discrete states, are ubiquitous in engineering, but are difficult to work with due to their discontinuous transitions. For example, a robot leg is able to exert very little control effort while it is in the air compared to when it is on the ground. When the leg hits the ground, the penetrating velocity instantaneously collapses t… ▽ More Hybrid dynamical systems, i.e. systems that have both continuous and discrete states, are ubiquitous in engineering, but are difficult to work with due to their discontinuous transitions. For example, a robot leg is able to exert very little control effort while it is in the air compared to when it is on the ground. When the leg hits the ground, the penetrating velocity instantaneously collapses to zero. These instantaneous changes in dynamics and discontinuities (or jumps) in state make standard smooth tools for planning, estimation, control, and learning difficult for hybrid systems. One of the key tools for accounting for these jumps is called the saltation matrix. The saltation matrix is the sensitivity update when a hybrid jump occurs and has been used in a variety of fields including robotics, power circuits, and computational neuroscience. This paper presents an intuitive derivation of the saltation matrix and discusses what it captures, where it has been used in the past, how it is used for linear and quadratic forms, how it is computed for rigid body systems with unilateral constraints, and some of the structural properties of the saltation matrix in these cases. △ Less

Submitted 20 June, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.16143 [pdf, other]

Condensed Prototype Replay for Class Incremental Learning

Authors: Jiangtao Kong, Zhenyu Zong, Tianyi Zhou, Huajie Shao

Abstract: Incremental learning (IL) suffers from catastrophic forgetting of old tasks when learning new tasks. This can be addressed by replaying previous tasks' data stored in a memory, which however is usually prone to size limits and privacy leakage. Recent studies store only class centroids as prototypes and augment them with Gaussian noises to create synthetic data for replay. However, they cannot effe… ▽ More Incremental learning (IL) suffers from catastrophic forgetting of old tasks when learning new tasks. This can be addressed by replaying previous tasks' data stored in a memory, which however is usually prone to size limits and privacy leakage. Recent studies store only class centroids as prototypes and augment them with Gaussian noises to create synthetic data for replay. However, they cannot effectively avoid class interference near their margins that leads to forgetting. Moreover, the injected noises distort the rich structure between real data and prototypes, hence even detrimental to IL. In this paper, we propose YONO that You Only Need to replay One condensed prototype per class, which for the first time can even outperform memory-costly exemplar-replay methods. To this end, we develop a novel prototype learning method that (1) searches for more representative prototypes in high-density regions by an attentional mean-shift algorithm and (2) moves samples in each class to their prototype to form a compact cluster distant from other classes. Thereby, the class margins are maximized, which effectively reduces interference causing future forgetting. In addition, we extend YONO to YONO+, which creates synthetic replay data by random sampling in the neighborhood of each prototype in the representation space. We show that the synthetic data can further improve YONO. Extensive experiments on IL benchmarks demonstrate the advantages of YONO/YONO+ over existing IL methods in terms of both accuracy and forgetting. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.13048 [pdf, other]

RWKV: Reinventing RNNs for the Transformer Era

Authors: Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Jiaju Lin, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Bolun Wang , et al. (9 additional authors not shown)

Abstract: Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scala… ▽ More Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks. △ Less

Submitted 10 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2212.07194 [pdf]

Traffic Flow Prediction via Variational Bayesian Inference-based Encoder-Decoder Framework

Authors: Jianlei Kong, Xiaomeng Fan, Xue-Bo Jin, Min Zuo

Abstract: Accurate traffic flow prediction, a hotspot for intelligent transportation research, is the prerequisite for mastering traffic and making travel plans. The speed of traffic flow can be affected by roads condition, weather, holidays, etc. Furthermore, the sensors to catch the information about traffic flow will be interfered with by environmental factors such as illumination, collection time, occlu… ▽ More Accurate traffic flow prediction, a hotspot for intelligent transportation research, is the prerequisite for mastering traffic and making travel plans. The speed of traffic flow can be affected by roads condition, weather, holidays, etc. Furthermore, the sensors to catch the information about traffic flow will be interfered with by environmental factors such as illumination, collection time, occlusion, etc. Therefore, the traffic flow in the practical transportation system is complicated, uncertain, and challenging to predict accurately. This paper proposes a deep encoder-decoder prediction framework based on variational Bayesian inference. A Bayesian neural network is constructed by combining variational inference with gated recurrent units (GRU) and used as the deep neural network unit of the encoder-decoder framework to mine the intrinsic dynamics of traffic flow. Then, the variational inference is introduced into the multi-head attention mechanism to avoid noise-induced deterioration of prediction accuracy. The proposed model achieves superior prediction performance on the Guangzhou urban traffic flow dataset over the benchmarks, particularly when the long-term prediction. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2210.08723 [pdf, other]

Private Data Valuation and Fair Payment in Data Marketplaces

Authors: Zhihua Tian, Jian Liu, Jingyu Li, Xinle Cao, Ruoxi Jia, Jun Kong, Mengdi Liu, Kui Ren

Abstract: Data valuation is an essential task in a data marketplace. It aims at fairly compensating data owners for their contribution. There is increasing recognition in the machine learning community that the Shapley value -- a foundational profit-sharing scheme in cooperative game theory -- has major potential to value data, because it uniquely satisfies basic properties for fair credit allocation and ha… ▽ More Data valuation is an essential task in a data marketplace. It aims at fairly compensating data owners for their contribution. There is increasing recognition in the machine learning community that the Shapley value -- a foundational profit-sharing scheme in cooperative game theory -- has major potential to value data, because it uniquely satisfies basic properties for fair credit allocation and has been shown to be able to identify data sources that are useful or harmful to model performance. However, calculating the Shapley value requires accessing original data sources. It still remains an open question how to design a real-world data marketplace that takes advantage of the Shapley value-based data pricing while protecting privacy and allowing fair payments. In this paper, we propose the {\em first} prototype of a data marketplace that values data sources based on the Shapley value in a privacy-preserving manner and at the same time ensures fair payments. Our approach is enabled by a suite of innovations on both algorithm and system design. We firstly propose a Shapley value calculation algorithm that can be efficiently implemented via multiparty computation (MPC) circuits. The key idea is to learn a performance predictor that can directly predict model performance corresponding to an input dataset without performing actual training. We further optimize the MPC circuit design based on the structure of the performance predictor. We further incorporate fair payment into the MPC circuit to guarantee that the data that the buyer pays for is exactly the same as the one that has been valuated. Our experimental results show that the proposed new data valuation algorithm is as effective as the original expensive one. Furthermore, the customized MPC protocol is efficient and scalable. △ Less

Submitted 17 February, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

Comments: 14 pages

arXiv:2209.06421 [pdf, other]

A Transfer Function Design Using A Knowledge Database based on Deep Image and Primitive Intensity Profile Features Retrieval

Authors: Younhyun Jung, Jim Kong, Jinman Kim

Abstract: Transfer function (TF) plays a key role for the generation of direct volume rendering (DVR), by enabling accurate identification of structures of interest (SOIs) interactively as well as ensuring appropriate visibility of them. Attempts at mitigating the repetitive manual process of TF design have led to approaches that make use of a knowledge database consisting of pre-designed TFs by domain expe… ▽ More Transfer function (TF) plays a key role for the generation of direct volume rendering (DVR), by enabling accurate identification of structures of interest (SOIs) interactively as well as ensuring appropriate visibility of them. Attempts at mitigating the repetitive manual process of TF design have led to approaches that make use of a knowledge database consisting of pre-designed TFs by domain experts. In these approaches, a user navigates the knowledge database to find the most suitable pre-designed TF for their input volume to visualize the SOIs. Although these approaches potentially reduce the workload to generate the TFs, they, however, require manual TF navigation of the knowledge database, as well as the likely fine tuning of the selected TF to suit the input. In this work, we propose a TF design approach where we introduce a new content-based retrieval (CBR) to automatically navigate the knowledge database. Instead of pre-designed TFs, our knowledge database contains image volumes with SOI labels. Given an input image volume, our CBR approach retrieves relevant image volumes (with SOI labels) from the knowledge database; the retrieved labels are then used to generate and optimize TFs of the input. This approach does not need any manual TF navigation and fine tuning. For improving SOI retrieval performance, we propose a two-stage CBR scheme to enable the use of local intensity and regional deep image feature representations in a complementary manner. We demonstrate the capabilities of our approach with comparison to a conventional CBR approach in visualization, where an intensity profile matching algorithm is used, and also with potential use-cases in medical image volume visualization where DVR plays an indispensable role for different clinical usages. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: submitted to Computer Graphics Forum for review

arXiv:2207.04591 [pdf, other]

doi 10.1109/TRO.2023.3308773

Hybrid iLQR Model Predictive Control for Contact Implicit Stabilization on Legged Robots

Authors: Nathan J. Kong, Chuanzheng Li, Aaron M. Johnson

Abstract: Model Predictive Control (MPC) is a popular strategy for controlling robots but is difficult for systems with contact due to the complex nature of hybrid dynamics. To implement MPC for systems with contact, dynamic models are often simplified or contact sequences fixed in time in order to plan trajectories efficiently. In this work, we extend Hybrid iterative Linear Quadratic Regulator to work in… ▽ More Model Predictive Control (MPC) is a popular strategy for controlling robots but is difficult for systems with contact due to the complex nature of hybrid dynamics. To implement MPC for systems with contact, dynamic models are often simplified or contact sequences fixed in time in order to plan trajectories efficiently. In this work, we extend Hybrid iterative Linear Quadratic Regulator to work in a MPC fashion (HiLQR MPC) by 1) modifying how the cost function is computed when contact modes do not align, 2) utilizing parallelizations when simulating rigid body dynamics, and 3) using efficient analytical derivative computations of the rigid body dynamics. The result is a system that can modify the contact sequence of the reference behavior and plan whole body motions cohesively -- which is crucial when dealing with large perturbations. HiLQR MPC is tested on two systems: first, the hybrid cost modification is validated on a simple actuated bouncing ball hybrid system. Then HiLQR MPC is compared against methods that utilize centroidal dynamic assumptions on a quadruped robot (Unitree A1). HiLQR MPC outperforms the centroidal methods in both simulation and hardware tests. △ Less

Submitted 6 November, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

Comments: in IEEE Transactions on Robotics, 2023. arXiv admin note: substantial text overlap with arXiv:2103.14584

arXiv:2206.00798 [pdf, other]

Multi-scale frequency separation network for image deblurring

Authors: Yanni Zhang, Qiang Li, Miao Qi, Di Liu, Jun Kong, Jianzhong Wang

Abstract: Image deblurring aims to restore the detailed texture information or structures from blurry images, which has become an indispensable step in many computer vision tasks. Although various methods have been proposed to deal with the image deblurring problem, most of them treated the blurry image as a whole and neglected the characteristics of different image frequencies. In this paper, we present a… ▽ More Image deblurring aims to restore the detailed texture information or structures from blurry images, which has become an indispensable step in many computer vision tasks. Although various methods have been proposed to deal with the image deblurring problem, most of them treated the blurry image as a whole and neglected the characteristics of different image frequencies. In this paper, we present a new method called multi-scale frequency separation network (MSFS-Net) for image deblurring. MSFS-Net introduces the frequency separation module (FSM) into an encoder-decoder network architecture to capture the low- and high-frequency information of image at multiple scales. Then, a cycle-consistency strategy and a contrastive learning module (CLM) are respectively designed to retain the low-frequency information and recover the high-frequency information during deblurring. At last, the features of different scales are fused by a cross-scale feature fusion module (CSFFM). Extensive experiments on benchmark datasets show that the proposed network achieves state-of-the-art performance. △ Less

Submitted 8 December, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2205.14811 [pdf, other]

doi 10.1016/j.neucom.2023.01.032

Last-iterate convergence analysis of stochastic momentum methods for neural networks

Authors: Dongpo Xu, Jinlan Liu, Yinghua Lu, Jun Kong, Danilo Mandic

Abstract: The stochastic momentum method is a commonly used acceleration technique for solving large-scale stochastic optimization problems in artificial neural networks. Current convergence results of stochastic momentum methods under non-convex stochastic settings mostly discuss convergence in terms of the random output and minimum output. To this end, we address the convergence of the last iterate output… ▽ More The stochastic momentum method is a commonly used acceleration technique for solving large-scale stochastic optimization problems in artificial neural networks. Current convergence results of stochastic momentum methods under non-convex stochastic settings mostly discuss convergence in terms of the random output and minimum output. To this end, we address the convergence of the last iterate output (called last-iterate convergence) of the stochastic momentum methods for non-convex stochastic optimization problems, in a way conformal with traditional optimization theory. We prove the last-iterate convergence of the stochastic momentum methods under a unified framework, covering both stochastic heavy ball momentum and stochastic Nesterov accelerated gradient momentum. The momentum factors can be fixed to be constant, rather than time-varying coefficients in existing analyses. Finally, the last-iterate convergence of the stochastic momentum methods is verified on the benchmark MNIST and CIFAR-10 datasets. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: 21pages, 4figures

MSC Class: 90C26 ACM Class: G.1.6

Journal ref: Neurocomputing 527 (2023) 27-35

arXiv:2205.06801 [pdf]

Twitter-Based Gender Recognition Using Transformers

Authors: Zahra Movahedi Nia, Ali Ahmadi, Bruce Mellado, Jianhong Wu, James Orbinski, Ali Agary, Jude Dzevela Kong

Abstract: Social media contains useful information about people and the society that could help advance research in many different areas (e.g. by applying opinion mining, emotion/sentiment analysis, and statistical analysis) such as business and finance, health, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However,… ▽ More Social media contains useful information about people and the society that could help advance research in many different areas (e.g. by applying opinion mining, emotion/sentiment analysis, and statistical analysis) such as business and finance, health, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However, user demographics such as gender are considered private and are not freely available. In this study, we propose a model based on transformers to predict the user's gender from their images and tweets. We fine-tune a model based on Vision Transformers (ViT) to stratify female and male images. Next, we fine-tune another model based on Bidirectional Encoders Representations from Transformers (BERT) to recognize the user's gender by their tweets. This is highly beneficial, because not all users provide an image that indicates their gender. The gender of such users could be detected form their tweets. The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively. This shows that the image and text classification models are capable of complementing each other by providing additional information to one another. We apply our method to the PAN-2018 dataset, and obtain an accuracy of 85.52%. △ Less

Submitted 24 April, 2022; originally announced May 2022.

arXiv:2202.12729 [pdf, other]

The Uncertainty Aware Salted Kalman Filter: State Estimation for Hybrid Systems with Uncertain Guards

Authors: J. Joe Payne, Nathan J. Kong, Aaron M. Johnson

Abstract: In this paper we present a method for updating robotic state belief through contact with uncertain surfaces and apply this update to a Kalman filter for more accurate state estimation. Examining how guard surface uncertainty affects the time spent in each mode, we derive a guard saltation matrix - which maps perturbations prior to hybrid events to perturbations after - accounting for additional va… ▽ More In this paper we present a method for updating robotic state belief through contact with uncertain surfaces and apply this update to a Kalman filter for more accurate state estimation. Examining how guard surface uncertainty affects the time spent in each mode, we derive a guard saltation matrix - which maps perturbations prior to hybrid events to perturbations after - accounting for additional variation in the resulting state. Additionally, we propose the use of parameterized reset functions - capturing how unknown parameters change how states are mapped from one mode to the next - the Jacobian of which accounts for the additional uncertainty in the resulting state. The accuracy of these mappings is shown by simulating sampled distributions through uncertain transition events and comparing the resulting covariances. Finally, we integrate these additional terms into the "uncertainty aware Salted Kalman Filter", uaSKF, and show a peak reduction in average estimation error by 24-60% on a variety of test conditions and systems. △ Less

Submitted 29 July, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: To appear in IROS 2022

arXiv:2112.01442 [pdf, other]

Learning Large-scale Network Embedding from Representative Subgraph

Authors: Junsheng Kong, Weizhao Li, Ben Liao, Jiezhong Qiu, Chang-Yu, Hsieh, Yi Cai, Jinhui Zhu, Shengyu Zhang

Abstract: We study the problem of large-scale network embedding, which aims to learn low-dimensional latent representations for network mining applications. Recent research in the field of network embedding has led to significant progress such as DeepWalk, LINE, NetMF, NetSMF. However, the huge size of many real-world networks makes it computationally expensive to learn network embedding from the entire net… ▽ More We study the problem of large-scale network embedding, which aims to learn low-dimensional latent representations for network mining applications. Recent research in the field of network embedding has led to significant progress such as DeepWalk, LINE, NetMF, NetSMF. However, the huge size of many real-world networks makes it computationally expensive to learn network embedding from the entire network. In this work, we present a novel network embedding method called "NES", which learns network embedding from a small representative subgraph. NES leverages theories from graph sampling to efficiently construct representative subgraph with smaller size which can be used to make inferences about the full network, enabling significantly improved efficiency in embedding learning. Then, NES computes the network embedding from this representative subgraph, efficiently. Compared with well-known methods, extensive experiments on networks of various scales and types demonstrate that NES achieves comparable performance and significant efficiency superiority. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 10 pages, 5 figures

arXiv:2110.01123 [pdf, ps, other]

Hybrid Event Shaping to Stabilize Periodic Hybrid Orbits

Authors: James Zhu, Nathan J. Kong, George Council, Aaron M. Johnson

Abstract: Many controllers for legged robotic systems leverage open- or closed-loop control at discrete hybrid events to enhance stability. These controllers appear in several well studied phenomena such as the Raibert stepping controller, paddle juggling and swing leg retraction. This work introduces hybrid event shaping (HES): a generalized method for analyzing and producing stable hybrid event controller… ▽ More Many controllers for legged robotic systems leverage open- or closed-loop control at discrete hybrid events to enhance stability. These controllers appear in several well studied phenomena such as the Raibert stepping controller, paddle juggling and swing leg retraction. This work introduces hybrid event shaping (HES): a generalized method for analyzing and producing stable hybrid event controllers. HES utilizes the saltation matrix, which gives a closed-form equation for the effect that hybrid events have on stability. We also introduce shape parameters, which are higher order terms that can be tuned completely independently from the system dynamics to promote stability. Optimization methods are used to produce values of these parameters that optimize a stability measure. Hybrid event shaping captures previously developed control methods while also producing new optimally stable trajectories without the need for continuous-domain feedback. △ Less

Submitted 3 July, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: Presented at IEEE ICRA 2022

arXiv:2109.07084 [pdf, other]

doi 10.1145/3459637.3482343

Fast Extraction of Word Embedding from Q-contexts

Authors: Junsheng Kong, Weizhao Li, Zeyi Liu, Ben Liao, Jiezhong Qiu, Chang-Yu Hsieh, Yi Cai, Shengyu Zhang

Abstract: The notion of word embedding plays a fundamental role in natural language processing (NLP). However, pre-training word embedding for very large-scale vocabulary is computationally challenging for most existing methods. In this work, we show that with merely a small fraction of contexts (Q-contexts)which are typical in the whole corpus (and their mutual information with words), one can construct hi… ▽ More The notion of word embedding plays a fundamental role in natural language processing (NLP). However, pre-training word embedding for very large-scale vocabulary is computationally challenging for most existing methods. In this work, we show that with merely a small fraction of contexts (Q-contexts)which are typical in the whole corpus (and their mutual information with words), one can construct high-quality word embedding with negligible errors. Mutual information between contexts and words can be encoded canonically as a sampling state, thus, Q-contexts can be fast constructed. Furthermore, we present an efficient and effective WEQ method, which is capable of extracting word embedding directly from these typical contexts. In practical scenarios, our algorithm runs 11$\sim$13 times faster than well-established methods. By comparing with well-known methods such as matrix factorization, word2vec, GloVeand fasttext, we demonstrate that our method achieves comparable performance on a variety of downstream NLP tasks, and in the meanwhile maintains run-time and resource advantages over all these baselines. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: Accepted by CIKM 2021

arXiv:2109.01182 [pdf, other]

COVID-19 Vaccine Hesitancy and Information Diffusion: An Agent-based Modeling Approach

Authors: Pooria Taghizadeh Naderi, Ali Asgary, Jude Kong, Jianhong Wu, Fattaneh Taghiyareh

Abstract: Despite the unprecedented success in the rapid development of several effective vaccines against the Cov-SARS-2, global vaccination rollout efforts suffer from vaccine distribution inequality and vaccine acceptance, leading to insufficient public immunity provided by the vaccine products. While a major current focus in vaccine acceptance research is how to model and inform vaccine acceptance based… ▽ More Despite the unprecedented success in the rapid development of several effective vaccines against the Cov-SARS-2, global vaccination rollout efforts suffer from vaccine distribution inequality and vaccine acceptance, leading to insufficient public immunity provided by the vaccine products. While a major current focus in vaccine acceptance research is how to model and inform vaccine acceptance based on social-demographic parameters, characteristics of vaccine acceptance are not well understood and in particular, it is not known whether and how information diffusion influences vaccine acceptance. This study examines how information diffusion can change vaccine acceptance by developing a comprehensive computational model with an agent-based simulation technique to overcome the modeling and quantification complexity associated with socio-demographics, vaccine types, population statistics, and information diffusion. Our analyses, calibrated by the vaccine acceptance survey data from the provinces and territories of Canada, provide clear evidence that the propagation of information can greatly influence vaccine acceptance rates. The results illustrate that spread of negative messages about the COVID-19 vaccines can cause significant vaccine hesitancy that challenges the goal of a high public immunity provided by the vaccines. Our findings might help solve the vaccine hesitancy problem by focusing more on individuals' opinions and behavior. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2106.06103 [pdf, other]

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Authors: Jaehyeon Kim, Jungil Kong, Juhee Son

Abstract: Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. Our method adopts variational inference augmented with normalizing flo… ▽ More Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. Our method adopts variational inference augmented with normalizing flows and an adversarial training process, which improves the expressive power of generative modeling. We also propose a stochastic duration predictor to synthesize speech with diverse rhythms from input text. With the uncertainty modeling over latent variables and the stochastic duration predictor, our method expresses the natural one-to-many relationship in which a text input can be spoken in multiple ways with different pitches and rhythms. A subjective human evaluation (mean opinion score, or MOS) on the LJ Speech, a single speaker dataset, shows that our method outperforms the best publicly available TTS systems and achieves a MOS comparable to ground truth. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: ICML 2021

arXiv:2103.14584 [pdf, other]

iLQR for Piecewise-Smooth Hybrid Dynamical Systems

Authors: Nathan J. Kong, George Council, Aaron M. Johnson

Abstract: Trajectory optimization is a popular strategy for planning trajectories for robotic systems. However, many robotic tasks require changing contact conditions, which is difficult due to the hybrid nature of the dynamics. The optimal sequence and timing of these modes are typically not known ahead of time. In this work, we extend the Iterative Linear Quadratic Regulator (iLQR) method to a class of pi… ▽ More Trajectory optimization is a popular strategy for planning trajectories for robotic systems. However, many robotic tasks require changing contact conditions, which is difficult due to the hybrid nature of the dynamics. The optimal sequence and timing of these modes are typically not known ahead of time. In this work, we extend the Iterative Linear Quadratic Regulator (iLQR) method to a class of piecewise smooth hybrid dynamical systems by allowing for changing hybrid modes in the forward pass, using the saltation matrix to update the gradient information in the backwards pass, and using a reference extension to account for mode mismatch. We demonstrate these changes on a variety of hybrid systems and compare the different strategies for computing the gradients. △ Less

Submitted 6 September, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: To Appear in IEEE CDC 2021

arXiv:2101.05403 [pdf]

Image deblurring based on lightweight multi-information fusion network

Authors: Yanni Zhang, Yiming Liu, Qiang Li, Miao Qi, Dahong Xu, Jun Kong, Jianzhong Wang

Abstract: Recently, deep learning based image deblurring has been well developed. However, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from high computational burden. To solve this problem, we propose a lightweight multiinformation fusion network (LMFN) for image deblurring. The proposed LMFN is designed… ▽ More Recently, deep learning based image deblurring has been well developed. However, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from high computational burden. To solve this problem, we propose a lightweight multiinformation fusion network (LMFN) for image deblurring. The proposed LMFN is designed as an encoder-decoder architecture. In the encoding stage, the image feature is reduced to various smallscale spaces for multi-scale information extraction and fusion without a large amount of information loss. Then, a distillation network is used in the decoding stage, which allows the network benefit the most from residual learning while remaining sufficiently lightweight. Meanwhile, an information fusion strategy between distillation modules and feature channels is also carried out by attention mechanism. Through fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring result with smaller number of parameters and outperforms existing methods in model complexity. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2011.06228 [pdf, other]

DSAM: A Distance Shrinking with Angular Marginalizing Loss for High Performance Vehicle Re-identificatio

Authors: Jiangtao Kong, Yu Cheng, Benjia Zhou, Kai Li, Junliang Xing

Abstract: Vehicle Re-identification (ReID) is an important yet challenging problem in computer vision. Compared to other visual objects like faces and persons, vehicles simultaneously exhibit much larger intraclass viewpoint variations and interclass visual similarities, making most exiting loss functions designed for face recognition and person ReID unsuitable for vehicle ReID. To obtain a high-performance… ▽ More Vehicle Re-identification (ReID) is an important yet challenging problem in computer vision. Compared to other visual objects like faces and persons, vehicles simultaneously exhibit much larger intraclass viewpoint variations and interclass visual similarities, making most exiting loss functions designed for face recognition and person ReID unsuitable for vehicle ReID. To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information. Specifically, it shrinks the distance between samples of the same class locally in the Original Feature Space while keeps samples of different classes far away in the Feature Angular Space. The shrinking and marginalizing operations are performed during each iteration of the training process and are suitable for different SoftMax based loss functions. We evaluate the DSAM loss function on three large vehicle ReID datasets with detailed analyses and extensive comparisons with many competing vehicle ReID methods. Experimental results show that our DSAM loss enhances the SoftMax loss by a large margin on the PKU-VD1-Large dataset: 10.41% for mAP, 5.29% for cmc1, and 4.60% for cmc5. Moreover, the mAP is increased by 9.34% on the PKU-VehicleID dataset and 6.13% on the VeRi-776 dataset. Source code will be released to facilitate further studies in this research direction. △ Less

Submitted 8 September, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:2010.05646 [pdf, other]

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Authors: Jungil Kong, Jaehyeon Kim, Jaekyoung Bae

Abstract: Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech… ▽ More Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis. Finally, a small footprint version of HiFi-GAN generates samples 13.4 times faster than real-time on CPU with comparable quality to an autoregressive counterpart. △ Less

Submitted 23 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020. Code available at https://github.com/jik876/hifi-gan

arXiv:2010.04935 [pdf, other]

HPCC-YNU at SemEval-2020 Task 9: A Bilingual Vector Gating Mechanism for Sentiment Analysis of Code-Mixed Text

Authors: Jun Kong, Jin Wang, Xuejie Zhang

Abstract: It is fairly common to use code-mixing on a social media platform to express opinions and emotions in multilingual societies. The purpose of this task is to detect the sentiment of code-mixed social media text. Code-mixed text poses a great challenge for the traditional NLP system, which currently uses monolingual resources to deal with the problem of multilingual mixing. This task has been solved… ▽ More It is fairly common to use code-mixing on a social media platform to express opinions and emotions in multilingual societies. The purpose of this task is to detect the sentiment of code-mixed social media text. Code-mixed text poses a great challenge for the traditional NLP system, which currently uses monolingual resources to deal with the problem of multilingual mixing. This task has been solved in the past using lexicon lookup in respective sentiment dictionaries and using a long short-term memory (LSTM) neural network for monolingual resources. In this paper, we (my codalab username is kongjun) present a system that uses a bilingual vector gating mechanism for bilingual resources to complete the task. The model consists of two main parts: the vector gating mechanism, which combines the character and word levels, and the attention mechanism, which extracts the important emotional parts of the text. The results show that the proposed system outperforms the baseline algorithm. We achieved fifth place in Spanglish and 19th place in Hinglish.The code of this paper is availabled at : https://github.com/JunKong5/Semveal2020-task9 △ Less

Submitted 10 October, 2020; originally announced October 2020.

Comments: 6 pages, 3 figures

arXiv:2007.12233 [pdf, other]

The Salted Kalman Filter: Kalman Filtering on Hybrid Dynamical Systems

Authors: Nathan J. Kong, J. Joe Payne, George Council, Aaron M. Johnson

Abstract: Many state estimation and control algorithms require knowledge of how probability distributions propagate through dynamical systems. However, despite hybrid dynamical systems becoming increasingly important in many fields, there has been little work on utilizing the knowledge of how probability distributions map through hybrid transitions. Here, we make use of a propagation law that employs the sa… ▽ More Many state estimation and control algorithms require knowledge of how probability distributions propagate through dynamical systems. However, despite hybrid dynamical systems becoming increasingly important in many fields, there has been little work on utilizing the knowledge of how probability distributions map through hybrid transitions. Here, we make use of a propagation law that employs the saltation matrix (a first-order update to the sensitivity equation) to create the Salted Kalman Filter (SKF), a natural extension of the Kalman Filter and Extended Kalman Filter to hybrid dynamical systems. Away from hybrid events, the SKF is a standard Kalman filter. When a hybrid event occurs, the saltation matrix plays an analogous role as that of the system dynamics, subsequently inducing a discrete modification to both the prediction and update steps. The SKF outperforms a naive variational update - the Jacobian of the reset map - by having a reduced mean squared error in state estimation, especially immediately after a hybrid transition event. Compared a hybrid particle filter, the particle filter outperforms the SKF in mean squared error only when a large number of particles are used, likely due to a more accurate accounting of the split distribution near a hybrid transition. △ Less

Submitted 8 February, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: Submitted to Automatica

arXiv:2005.11129 [pdf, other]

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Authors: Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

Abstract: Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external al… ▽ More Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external aligner. By combining the properties of flows and dynamic programming, the proposed model searches for the most probable monotonic alignment between text and the latent representation of speech on its own. We demonstrate that enforcing hard monotonic alignments enables robust TTS, which generalizes to long utterances, and employing generative flows enables fast, diverse, and controllable speech synthesis. Glow-TTS obtains an order-of-magnitude speed-up over the autoregressive model, Tacotron 2, at synthesis with comparable speech quality. We further show that our model can be easily extended to a multi-speaker setting. △ Less

Submitted 22 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: Accepted by NeurIPS2020

arXiv:1911.07088 [pdf, other]

Liver Steatosis Segmentation with Deep Learning Methods

Authors: Xiaoyuan Guo, Fusheng Wang, George Teodorou, Alton B. Farris, Jun Kong

Abstract: Liver steatosis is known as the abnormal accumulation of lipids within cells. An accurate quantification of steatosis area within the liver histopathological microscopy images plays an important role in liver disease diagnosis and trans-plantation assessment. Such a quantification analysis often requires a precise steatosis segmentation that is challenging due to abundant presence of highly overla… ▽ More Liver steatosis is known as the abnormal accumulation of lipids within cells. An accurate quantification of steatosis area within the liver histopathological microscopy images plays an important role in liver disease diagnosis and trans-plantation assessment. Such a quantification analysis often requires a precise steatosis segmentation that is challenging due to abundant presence of highly overlapped steatosis droplets. In this paper, a deep learning model Mask-RCNN is used to segment the steatosis droplets in clumps. Extended from Faster R-CNN, Mask-RCNN can predict object masks in addition to bounding box detection. With transfer learning, the resulting model is able to segment overlapped steatosis regions at 75.87% by Average Precision, 60.66% by Recall,65.88% by F1-score, and 76.97% by Jaccard index, promising to support liver disease diagnosis and allograft rejection prediction in future clinical practice. △ Less

Submitted 16 November, 2019; originally announced November 2019.

Comments: 4 pages

Journal ref: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) Venice, Italy, April 8-11, 2019

arXiv:1910.14548 [pdf, other]

Run-time Parameter Sensitivity Analysis Optimizations

Authors: Eduardo Scartezini, Willian Barreiros Jr., Tahsin Kurc, Jun Kong, Alba C. M. A. Melo, Joel Saltz, George Teodoro

Abstract: Efficient execution of parameter sensitivity analysis (SA) is critical to allow for its routinely use. The pathology image processing application investigated in this work processes high-resolution whole-slide cancer tissue images from large datasets to characterize and classify the disease. However, the application is parameterized and changes in parameter values may significantly affect its resu… ▽ More Efficient execution of parameter sensitivity analysis (SA) is critical to allow for its routinely use. The pathology image processing application investigated in this work processes high-resolution whole-slide cancer tissue images from large datasets to characterize and classify the disease. However, the application is parameterized and changes in parameter values may significantly affect its results. Thus, understanding the impact of parameters to the output using SA is important to draw reliable scientific conclusions. The execution of the application is rather compute intensive, and a SA requires it to process the input data multiple times as parameter values are systematically varied. Optimizing this process is then important to allow for SA to be executed with large datasets. In this work, we employ a distributed computing system with novel computation reuse optimizations to accelerate SA. The new computation reuse strategy can maximize reuse even with limited memory availability where previous approaches would not be able to fully take advantage of reuse. The proposed solution was evaluated on an environment with 256 nodes (7168 CPU-cores) attaining a parallel efficiency of over 92%, and improving the previous reuse strategies in up to 2.8x. △ Less

Submitted 31 October, 2019; originally announced October 2019.

Comments: 8 pages, 8 figures

arXiv:1908.07144 [pdf, other]

doi 10.1145/3332165.3347873

StateLens: A Reverse Engineering Solution for Making Existing Dynamic Touchscreens Accessible

Authors: Anhong Guo, Junhan Kong, Michael Rivera, Frank F. Xu, Jeffrey P. Bigham

Abstract: Blind people frequently encounter inaccessible dynamic touchscreens in their everyday lives that are difficult, frustrating, and often impossible to use independently. Touchscreens are often the only way to control everything from coffee machines and payment terminals, to subway ticket machines and in-flight entertainment systems. Interacting with dynamic touchscreens is difficult non-visually bec… ▽ More Blind people frequently encounter inaccessible dynamic touchscreens in their everyday lives that are difficult, frustrating, and often impossible to use independently. Touchscreens are often the only way to control everything from coffee machines and payment terminals, to subway ticket machines and in-flight entertainment systems. Interacting with dynamic touchscreens is difficult non-visually because the visual user interfaces change, interactions often occur over multiple different screens, and it is easy to accidentally trigger interface actions while exploring the screen. To solve these problems, we introduce StateLens - a three-part reverse engineering solution that makes existing dynamic touchscreens accessible. First, StateLens reverse engineers the underlying state diagrams of existing interfaces using point-of-view videos found online or taken by users using a hybrid crowd-computer vision pipeline. Second, using the state diagrams, StateLens automatically generates conversational agents to guide blind users through specifying the tasks that the interface can perform, allowing the StateLens iOS application to provide interactive guidance and feedback so that blind users can access the interface. Finally, a set of 3D-printed accessories enable blind people to explore capacitive touchscreens without the risk of triggering accidental touches on the interface. Our technical evaluation shows that StateLens can accurately reconstruct interfaces from stationary, hand-held, and web videos; and, a user study of the complete system demonstrates that StateLens successfully enables blind users to access otherwise inaccessible dynamic touchscreens. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: ACM UIST 2019

arXiv:1906.03385 [pdf, other]

Applications of Gaussian Binomials to Coding Theory for Deletion Error Correction

Authors: Manabu Hagiwara, Justin Kong

Abstract: We present new applications on $q$-binomials, also known as Gaussian binomial coefficients. Our main theorems determine cardinalities of certain error-correcting codes based on Varshamov-Tenengolts codes and prove a curious phenomenon relating to deletion sphere for specific cases. We present new applications on $q$-binomials, also known as Gaussian binomial coefficients. Our main theorems determine cardinalities of certain error-correcting codes based on Varshamov-Tenengolts codes and prove a curious phenomenon relating to deletion sphere for specific cases. △ Less

Submitted 12 June, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

Comments: 17 pages, 2 figures

MSC Class: 05A10; 05A19; 94B25

arXiv:1810.02911 [pdf]

Tuning for Tissue Image Segmentation Workflows for Accuracy and Performance

Authors: Luis F. R. Taveira, Tahsin Kurc, Alba C. M. A. Melo, Jun Kong, Erich Bremer, Joel H. Saltz, George Teodoro

Abstract: We propose a software platform that integrates methods and tools for multi-objective parameter auto- tuning in tissue image segmentation workflows. The goal of our work is to provide an approach for improving the accuracy of nucleus/cell segmentation pipelines by tuning their input parameters. The shape, size and texture features of nuclei in tissue are important biomarkers for disease prognosis,… ▽ More We propose a software platform that integrates methods and tools for multi-objective parameter auto- tuning in tissue image segmentation workflows. The goal of our work is to provide an approach for improving the accuracy of nucleus/cell segmentation pipelines by tuning their input parameters. The shape, size and texture features of nuclei in tissue are important biomarkers for disease prognosis, and accurate computation of these features depends on accurate delineation of boundaries of nuclei. Input parameters in many nucleus segmentation workflows affect segmentation accuracy and have to be tuned for optimal performance. This is a time-consuming and computationally expensive process; automating this step facilitates more robust image segmentation workflows and enables more efficient application of image analysis in large image datasets. Our software platform adjusts the parameters of a nuclear segmentation algorithm to maximize the quality of image segmentation results while minimizing the execution time. It implements several optimization methods to search the parameter space efficiently. In addition, the methodology is developed to execute on high performance computing systems to reduce the execution time of the parameter tuning phase. Our results using three real-world image segmentation workflows demonstrate that the proposed solution is able to (1) search a small fraction (about 100 points) of the parameter space, which contains billions to trillions of points, and improve the quality of segmentation output by 1.20x, 1.29x, and 1.29x, on average; (2) decrease the execution time of a segmentation workflow by up to 11.79x while improving output quality; and (3) effectively use parallel systems to accelerate parameter tuning and segmentation phases. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Comments: 29 pages, 5 figures

arXiv:1808.05283 [pdf, other]

doi 10.1016/j.sysarc.2019.02.009

All One Needs to Know about Fog Computing and Related Edge Computing Paradigms: A Complete Survey

Authors: Ashkan Yousefpour, Caleb Fung, Tam Nguyen, Krishna Kadiyala, Fatemeh Jalali, Amirreza Niakanlahiji, Jian Kong, Jason P. Jue

Abstract: With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promisin… ▽ More With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promising solutions for handling the large volume of security-critical and time-sensitive data that is being produced by the IoT. In this paper, we first provide a tutorial on fog computing and its related computing paradigms, including their similarities and differences. Next, we provide a taxonomy of research topics in fog computing, and through a comprehensive survey, we summarize and categorize the efforts on fog computing and its related computing paradigms. Finally, we provide challenges and future directions for research in fog computing. △ Less

Submitted 13 February, 2019; v1 submitted 15 August, 2018; originally announced August 2018.

Comments: 48 pages, 7 tables, 11 figures, 450 references. The data (categories and features/objectives of the papers) of this survey are now available publicly. Accepted by Elsevier Journal of Systems Architecture

arXiv:1808.04795 [pdf, other]

Clumped Nuclei Segmentation with Adjacent Point Match and Local Shape based Intensity Analysis for Overlapped Nuclei in Fluorescence In-Situ Hybridization Images

Authors: Xiaoyuan Guo, Hanyi Yu, Blair Rossetti, George Teodoro, Daniel Brat, Jun Kong

Abstract: Highly clumped nuclei clusters captured in fluorescence in situ hybridization microscopy images are common histology entities under investigations in a wide spectrum of tissue-related biomedical investigations. Due to their large scale in presence, computer based image analysis is used to facilitate such analysis with improved analysis efficiency and reproducibility. To ensure the quality of downs… ▽ More Highly clumped nuclei clusters captured in fluorescence in situ hybridization microscopy images are common histology entities under investigations in a wide spectrum of tissue-related biomedical investigations. Due to their large scale in presence, computer based image analysis is used to facilitate such analysis with improved analysis efficiency and reproducibility. To ensure the quality of downstream biomedical analyses, it is essential to segment clustered nuclei with high quality. However, this presents a technical challenge commonly encountered in a large number of biomedical research, as nuclei are often overlapped due to a high cell density. In this paper, we propose an segmentation algorithm that identifies point pair connection candidates and evaluates adjacent point connections with a formulated ellipse fitting quality indicator. After connection relationships are determined, we recover the resulting dividing paths by following points with specific eigenvalues from Hessian in a constrained searching space. We validate our algorithm with 560 image patches from two classes of tumor regions of seven brain tumor patients. Both qualitative and quantitative experimental results suggest that our algorithm is promising for dividing overlapped nuclei in fluorescence in situ hybridization microscopy images widely used in various biomedical research. △ Less

Submitted 14 August, 2018; originally announced August 2018.

Comments: 4 pages

arXiv:1806.09093 [pdf, other]

Analysis of Cellular Feature Differences of Astrocytomas with Distinct Mutational Profiles Using Digitized Histopathology Images

Authors: Mousumi Roy, Fusheng Wang, George Teodoro, Jose Velazqeuz Vega, Daniel Brat, Jun Kong

Abstract: Cellular phenotypic features derived from histopathology images are the basis of pathologic diagnosis and are thought to be related to underlying molecular profiles. Due to overwhelming cell numbers and population heterogeneity, it remains challenging to quantitatively compute and compare features of cells with distinct molecular signatures. In this study, we propose a self-reliant and efficient a… ▽ More Cellular phenotypic features derived from histopathology images are the basis of pathologic diagnosis and are thought to be related to underlying molecular profiles. Due to overwhelming cell numbers and population heterogeneity, it remains challenging to quantitatively compute and compare features of cells with distinct molecular signatures. In this study, we propose a self-reliant and efficient analysis framework that supports quantitative analysis of cellular phenotypic difference across distinct molecular groups. To demonstrate efficacy, we quantitatively analyze astrocytomas that are molecularly characterized as either Isocitrate Dehydrogenase (IDH) mutant (MUT) or wildtype (WT) using imaging data from The Cancer Genome Atlas database. Representative cell instances that are phenotypically different between these two groups are retrieved after segmentation, feature computation, data pruning, dimensionality reduction, and unsupervised clustering. Our analysis is generic and can be applied to a wide set of cell-based biomedical research. △ Less

Submitted 24 June, 2018; originally announced June 2018.

arXiv:1806.09090 [pdf, other]

Segmentation of Overlapped Steatosis in Whole-Slide Liver Histopathology Microscopy Images

Authors: Mousumi Roy, Fusheng Wang, George Teodoro, Miriam B Vos, Alton Brad Farris, Jun Kong

Abstract: An accurate steatosis quantification with pathology tissue samples is of high clinical importance. However, such pathology measurement is manually made in most clinical practices, subject to severe reader variability due to large sampling bias and poor reproducibility. Although some computerized automated methods are developed to quantify the steatosis regions, they present limited analysis capaci… ▽ More An accurate steatosis quantification with pathology tissue samples is of high clinical importance. However, such pathology measurement is manually made in most clinical practices, subject to severe reader variability due to large sampling bias and poor reproducibility. Although some computerized automated methods are developed to quantify the steatosis regions, they present limited analysis capacity for high resolution whole-slide microscopy images and accurate overlapped steatosis division. In this paper, we propose a method that extracts an individual whole tissue piece at high resolution with minimum background area by estimating tissue bounding box and rotation angle. This is followed by the segmentation and segregation of steatosis regions with high curvature point detection and an ellipse fitting quality assessment method. We validate our method with isolated and overlapped steatosis regions in liver tissue images of 11 patients. The experimental results suggest that our method is promising for enhanced support of steatosis quantization during the pathology review for liver disease treatment. △ Less

Submitted 24 June, 2018; originally announced June 2018.

arXiv:1803.04364 [pdf]

Maturation Trajectories of Cortical Resting-State Networks Depend on the Mediating Frequency Band

Authors: Sheraz Khan, Javeria Hashmi, Fahimeh Mamashli, Konstantinos Michmizos, Manfred Kitzbichler, Hari Bharadwaj, Yousra Bekhti, Santosh Ganesan, Keri A Garel, Susan Whitfield-Gabrieli, Randy Gollub, Jian Kong, Lucia M Vaina, Kunjan Rana, Steven Stufflebeam, Matti Hamalainen, Tal Kenet

Abstract: The functional significance of resting state networks and their abnormal manifestations in psychiatric disorders are firmly established, as is the importance of the cortical rhythms in mediating these networks. Resting state networks are known to undergo substantial reorganization from childhood to adulthood, but whether distinct cortical rhythms, which are generated by separable neural mechanisms… ▽ More The functional significance of resting state networks and their abnormal manifestations in psychiatric disorders are firmly established, as is the importance of the cortical rhythms in mediating these networks. Resting state networks are known to undergo substantial reorganization from childhood to adulthood, but whether distinct cortical rhythms, which are generated by separable neural mechanisms and are often manifested abnormally in psychiatric conditions, mediate maturation differentially, remains unknown. Using magnetoencephalography (MEG) to map frequency band specific maturation of resting state networks from age 7 to 29 in 162 participants (31 independent), we found significant changes with age in networks mediated by the beta (13-30Hz) and gamma (31-80Hz) bands. More specifically, gamma band mediated networks followed an expected asymptotic trajectory, but beta band mediated networks followed a linear trajectory. Network integration increased with age in gamma band mediated networks, while local segregation increased with age in beta band mediated networks. Spatially, the hubs that changed in importance with age in the beta band mediated networks had relatively little overlap with those that showed the greatest changes in the gamma band mediated networks. These findings are relevant for our understanding of the neural mechanisms of cortical maturation, in both typical and atypical development. △ Less

Submitted 12 February, 2018; originally announced March 2018.

Showing 1–50 of 64 results for author: Kong, J