Search | arXiv e-print repository

Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT

Authors: Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang

Abstract: Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model… ▽ More Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2405.20071 [pdf]

A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture

Authors: Anjum Shaik, Kristoffer Larsen, Nancy E. Lane, Chen Zhao, Kuan-Jui Su, Joyce H. Keyak, Qing Tian, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou

Abstract: Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using… ▽ More Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using CNNs to extract features from hip DXA images, along with clinical variables, shape measurements, and texture features, our method provides a comprehensive framework for assessing fracture risk. A staged machine learning-based model was developed using two ensemble models: Ensemble 1 (clinical variables only) and Ensemble 2 (clinical variables and DXA imaging features). This staged approach used uncertainty quantification from Ensemble 1 to decide if DXA features are necessary for further prediction. Ensemble 2 exhibited the highest performance, achieving an AUC of 0.9541, an accuracy of 0.9195, a sensitivity of 0.8078, and a specificity of 0.9427. The staged model also performed well, with an AUC of 0.8486, an accuracy of 0.8611, a sensitivity of 0.5578, and a specificity of 0.9249, outperforming Ensemble 1, which had an AUC of 0.5549, an accuracy of 0.7239, a sensitivity of 0.1956, and a specificity of 0.8343. Furthermore, the staged model suggested that 54.49% of patients did not require DXA scanning. It effectively balanced accuracy and specificity, offering a robust solution when DXA data acquisition is not always feasible. Statistical tests confirmed significant differences between the models, highlighting the advantages of the advanced modeling strategies. Our staged approach could identify individuals at risk with a high accuracy but reduce the unnecessary DXA scanning. It has great promise to guide interventions to prevent hip fractures with reduced cost and radiation. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 29 pages, 5 figures, 6 tables

arXiv:2404.09622 [pdf, other]

DIDLM:A Comprehensive Multi-Sensor Dataset with Infrared Cameras, Depth Cameras, LiDAR, and 4D Millimeter-Wave Radar in Challenging Scenarios for 3D Mapping

Authors: WeiSheng Gong, Chen He, KaiJie Su, QingYong Li

Abstract: This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as… ▽ More This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at https://github.com/GongWeiSheng/DIDLM. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2401.04934 [pdf, ps, other]

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A Survey

Authors: Jiechuan Jiang, Kefan Su, Zongqing Lu

Abstract: Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks, but restrictions of real-world applications may require training the agents in a fully decentralized manner. Due to the lack of information about other agents, it is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting. Thus, this… ▽ More Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks, but restrictions of real-world applications may require training the agents in a fully decentralized manner. Due to the lack of information about other agents, it is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting. Thus, this research area has not been thoroughly studied. In this paper, we seek to systematically review the fully decentralized methods in two settings: maximizing a shared reward of all agents and maximizing the sum of individual rewards of all agents, and discuss open questions and future research directions. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: The first two authors contribute equally with an alphabetic order

arXiv:2312.07623 [pdf]

Supervised Contrastive Learning for Fine-grained Chromosome Recognition

Authors: Ruijia Chang, Suncheng Xiang, Chengyu Zhou, Kui Su, Dahong Qian, Jun Wang

Abstract: Chromosome recognition is an essential task in karyotyping, which plays a vital role in birth defect diagnosis and biomedical research. However, existing classification methods face significant challenges due to the inter-class similarity and intra-class variation of chromosomes. To address this issue, we propose a supervised contrastive learning strategy that is tailored to train model-agnostic d… ▽ More Chromosome recognition is an essential task in karyotyping, which plays a vital role in birth defect diagnosis and biomedical research. However, existing classification methods face significant challenges due to the inter-class similarity and intra-class variation of chromosomes. To address this issue, we propose a supervised contrastive learning strategy that is tailored to train model-agnostic deep networks for reliable chromosome classification. This method enables extracting fine-grained chromosomal embeddings in latent space. These embeddings effectively expand inter-class boundaries and reduce intra-class variations, enhancing their distinctiveness in predicting chromosome types. On top of two large-scale chromosome datasets, we comprehensively validate the power of our contrastive learning strategy in boosting cutting-edge deep networks such as Transformers and ResNets. Extensive results demonstrate that it can significantly improve models' generalization performance, with an accuracy improvement up to +4.5%. Codes and pretrained models will be released upon acceptance of this work. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.07990 [pdf]

Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research. △ Less

Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 19 pages, 3 figures

arXiv:2309.04960 [pdf, other]

SdCT-GAN: Reconstructing CT from Biplanar X-Rays with Self-driven Generative Adversarial Networks

Authors: Shuangqin Cheng, Qingliang Chen, Qiyi Zhang, Ming Li, Yamuhanmode Alike, Kaile Su, Pengcheng Wen

Abstract: Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing met… ▽ More Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing methods primarily prioritize minimizing pixel/voxel-level intensity discrepancies, often neglecting the preservation of textural details in the synthesized images. This oversight directly impacts the quality of the reconstructed images and thus affects the clinical diagnosis. To address the deficits, this paper presents a new self-driven generative adversarial network model (SdCT-GAN), which is motivated to pay more attention to image details by introducing a novel auto-encoder structure in the discriminator. In addition, a Sobel Gradient Guider (SGG) idea is applied throughout the model, where the edge information from the 2D X-ray image at the input can be integrated. Moreover, LPIPS (Learned Perceptual Image Patch Similarity) evaluation metric is adopted that can quantitatively evaluate the fine contours and textures of reconstructed images better than the existing ones. Finally, the qualitative and quantitative results of the empirical studies justify the power of the proposed model compared to mainstream state-of-the-art baselines. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2307.06689 [pdf, other]

doi 10.1016/j.imavis.2024.105095

YOLIC: An Efficient Method for Object Localization and Classification on Edge Devices

Authors: Kai Su, Yoichi Tomioka, Qiangfu Zhao, Yong Liu

Abstract: In the realm of Tiny AI, we introduce ``You Only Look at Interested Cells" (YOLIC), an efficient method for object localization and classification on edge devices. Through seamlessly blending the strengths of semantic segmentation and object detection, YOLIC offers superior computational efficiency and precision. By adopting Cells of Interest for classification instead of individual pixels, YOLIC… ▽ More In the realm of Tiny AI, we introduce ``You Only Look at Interested Cells" (YOLIC), an efficient method for object localization and classification on edge devices. Through seamlessly blending the strengths of semantic segmentation and object detection, YOLIC offers superior computational efficiency and precision. By adopting Cells of Interest for classification instead of individual pixels, YOLIC encapsulates relevant information, reduces computational load, and enables rough object shape inference. Importantly, the need for bounding box regression is obviated, as YOLIC capitalizes on the predetermined cell configuration that provides information about potential object location, size, and shape. To tackle the issue of single-label classification limitations, a multi-label classification approach is applied to each cell for effectively recognizing overlapping or closely situated objects. This paper presents extensive experiments on multiple datasets to demonstrate that YOLIC achieves detection performance comparable to the state-of-the-art YOLO algorithms while surpassing in speed, exceeding 30fps on a Raspberry Pi 4B CPU. All resources related to this study, including datasets, cell designer, image annotation tool, and source code, have been made publicly available on our project website at https://kai3316.github.io/yolic.github.io △ Less

Submitted 30 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Journal ref: Image and Vision Computing 147C (2024) 105095

arXiv:2305.13871 [pdf, other]

Improving Heterogeneous Model Reuse by Density Estimation

Authors: Anke Tang, Yong Luo, Han Hu, Fengxiang He, Kehua Su, Bo Du, Yixin Chen, Dacheng Tao

Abstract: This paper studies multiparty learning, aiming to learn a model using the private data of different participants. Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches have been developed. However, although pre-traine… ▽ More This paper studies multiparty learning, aiming to learn a model using the private data of different participants. Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches have been developed. However, although pre-trained local classifiers are utilized in these approaches, the characteristics of the local data are not well exploited. This motivates us to estimate the density of local data and design an auxiliary model together with the local classifiers for reuse. To address the scenarios where some local models are not well pre-trained, we further design a multiparty cross-entropy loss for calibration. Upon existing works, we address a challenging problem of heterogeneous model reuse from a decision theory perspective and take advantage of recent advances in density estimation. Experimental results on both synthetic and benchmark data demonstrate the superiority of the proposed method. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 9 pages, 5 figues. Accepted by IJCAI 2023

arXiv:2305.06594 [pdf, other]

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Authors: Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

Abstract: Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally alig… ▽ More Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally aligned signatures between video and music directly from paired music and videos, without explicitly modeling domain-specific rhythmic or semantic relationships. We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types using a multi-stage autoregressive model. Trained on 5k hours of music audio clips paired with video frames mined from in-the-wild music videos, V2Meow is competitive with previous domain-specific models when evaluated in a zero-shot manner. It synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality. Music samples are available at tinyurl.com/v2meow. △ Less

Submitted 22 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: accepted at AAAI 2024, music samples available at https://tinyurl.com/v2meow

arXiv:2303.16897 [pdf, other]

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

Authors: Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

Abstract: Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely availab… ▽ More Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly. △ Less

Submitted 8 July, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: CVPR 2023. Project page: https://sukun1045.github.io/video-physics-sound-diffusion/

arXiv:2303.02915 [pdf]

GlobalNER: Incorporating Non-local Information into Named Entity Recognition

Authors: Chiao-Wei Hsu, Keh-Yih Su

Abstract: Nowadays, many Natural Language Processing (NLP) tasks see the demand for incorporating knowledge external to the local information to further improve the performance. However, there is little related work on Named Entity Recognition (NER), which is one of the foundations of NLP. Specifically, no studies were conducted on the query generation and re-ranking for retrieving the related information f… ▽ More Nowadays, many Natural Language Processing (NLP) tasks see the demand for incorporating knowledge external to the local information to further improve the performance. However, there is little related work on Named Entity Recognition (NER), which is one of the foundations of NLP. Specifically, no studies were conducted on the query generation and re-ranking for retrieving the related information for the purpose of improving NER. This work demonstrates the effectiveness of a DNN-based query generation method and a mention-aware re-ranking architecture based on BERTScore particularly for NER. In the end, a state-of-the-art performance of 61.56 micro-f1 score on WNUT17 dataset is achieved. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 13 pages, 5 figures

arXiv:2302.07450 [pdf, other]

FedABC: Targeting Fair Competition in Personalized Federated Learning

Authors: Dui Wang, Li Shen, Yong Luo, Han Hu, Kehua Su, Yonggang Wen, Dacheng Tao

Abstract: Federated learning aims to collaboratively train models without accessing their client's local private data. The data may be Non-IID for different clients and thus resulting in poor performance. Recently, personalized federated learning (PFL) has achieved great success in handling Non-IID data by enforcing regularization in local optimization or improving the model aggregation scheme on the server… ▽ More Federated learning aims to collaboratively train models without accessing their client's local private data. The data may be Non-IID for different clients and thus resulting in poor performance. Recently, personalized federated learning (PFL) has achieved great success in handling Non-IID data by enforcing regularization in local optimization or improving the model aggregation scheme on the server. However, most of the PFL approaches do not take into account the unfair competition issue caused by the imbalanced data distribution and lack of positive samples for some classes in each client. To address this issue, we propose a novel and generic PFL framework termed Federated Averaging via Binary Classification, dubbed FedABC. In particular, we adopt the ``one-vs-all'' training strategy in each client to alleviate the unfair competition between classes by constructing a personalized binary classification problem for each class. This may aggravate the class imbalance challenge and thus a novel personalized binary classification loss that incorporates both the under-sampling and hard sample mining strategies is designed. Extensive experiments are conducted on two popular datasets under different settings, and the results demonstrate that our FedABC can significantly outperform the existing counterparts. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 9 pages,5 figures

Journal ref: AAAI2023

arXiv:2212.07855 [pdf, other]

QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query

Authors: Yabo Xiao, Kai Su, Xiaojuan Wang, Dongdong Yu, Lei Jin, Mingshu He, Zehuan Yuan

Abstract: We propose a sparse end-to-end multi-person pose regression framework, termed QueryPose, which can directly predict multi-person keypoint sequences from the input image. The existing end-to-end methods rely on dense representations to preserve the spatial detail and structure for precise keypoint localization. However, the dense paradigm introduces complex and redundant post-processes during infer… ▽ More We propose a sparse end-to-end multi-person pose regression framework, termed QueryPose, which can directly predict multi-person keypoint sequences from the input image. The existing end-to-end methods rely on dense representations to preserve the spatial detail and structure for precise keypoint localization. However, the dense paradigm introduces complex and redundant post-processes during inference. In our framework, each human instance is encoded by several learnable spatial-aware part-level queries associated with an instance-level query. First, we propose the Spatial Part Embedding Generation Module (SPEGM) that considers the local spatial attention mechanism to generate several spatial-sensitive part embeddings, which contain spatial details and structural information for enhancing the part-level queries. Second, we introduce the Selective Iteration Module (SIM) to adaptively update the sparse part-level queries via the generated spatial-sensitive part embeddings stage-by-stage. Based on the two proposed modules, the part-level queries are able to fully encode the spatial details and structural information for precise keypoint regression. With the bipartite matching, QueryPose avoids the hand-designed post-processes and surpasses the existing dense end-to-end methods with 73.6 AP on MS COCO mini-val set and 72.7 AP on CrowdPose test set. Code is available at https://github.com/buptxyb666/QueryPose. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Published on NeurIPS 2022

arXiv:2211.03032 [pdf, other]

Decentralized Policy Optimization

Authors: Kefan Su, Zongqing Lu

Abstract: The study of decentralized learning or independent learning in cooperative multi-agent reinforcement learning has a history of decades. Recently empirical studies show that independent PPO (IPPO) can obtain good performance, close to or even better than the methods of centralized training with decentralized execution, in several benchmarks. However, decentralized actor-critic with convergence guar… ▽ More The study of decentralized learning or independent learning in cooperative multi-agent reinforcement learning has a history of decades. Recently empirical studies show that independent PPO (IPPO) can obtain good performance, close to or even better than the methods of centralized training with decentralized execution, in several benchmarks. However, decentralized actor-critic with convergence guarantee is still open. In this paper, we propose \textit{decentralized policy optimization} (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee. We derive a novel decentralized surrogate for policy optimization such that the monotonic improvement of joint policy can be guaranteed by each agent \textit{independently} optimizing the surrogate. In practice, this decentralized surrogate can be realized by two adaptive coefficients for policy optimization at each agent. Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments. The results show DPO outperforms IPPO in most tasks, which can be the evidence for our theoretical results. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: 14 pages

arXiv:2210.09084 [pdf, other]

Multi-Agent Automated Machine Learning

Authors: Zhaozhi Wang, Kefan Su, Jian Zhang, Huizhu Jia, Qixiang Ye, Xiaodong Xie, Zongqing Lu

Abstract: In this paper, we propose multi-agent automated machine learning (MA2ML) with the aim to effectively handle joint optimization of modules in automated machine learning (AutoML). MA2ML takes each machine learning module, such as data augmentation (AUG), neural architecture search (NAS), or hyper-parameters (HPO), as an agent and the final performance as the reward, to formulate a multi-agent reinfo… ▽ More In this paper, we propose multi-agent automated machine learning (MA2ML) with the aim to effectively handle joint optimization of modules in automated machine learning (AutoML). MA2ML takes each machine learning module, such as data augmentation (AUG), neural architecture search (NAS), or hyper-parameters (HPO), as an agent and the final performance as the reward, to formulate a multi-agent reinforcement learning problem. MA2ML explicitly assigns credit to each agent according to its marginal contribution to enhance cooperation among modules, and incorporates off-policy learning to improve search efficiency. Theoretically, MA2ML guarantees monotonic improvement of joint optimization. Extensive experiments show that MA2ML yields the state-of-the-art top-1 accuracy on ImageNet under constraints of computational cost, e.g., $79.7\%/80.5\%$ with FLOPs fewer than 600M/800M. Extensive ablation studies verify the benefits of credit assignment and off-policy learning of MA2ML. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.04014 [pdf, other]

AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose Regression

Authors: Yabo Xiao, Xiaojuan Wang, Dongdong Yu, Kai Su, Lei Jin, Mei Song, Shuicheng Yan, Jian Zhao

Abstract: Multi-person pose estimation generally follows top-down and bottom-up paradigms. Both of them use an extra stage ($\boldsymbol{e.g.,}$ human detection in top-down paradigm or grouping process in bottom-up paradigm) to build the relationship between the human instance and corresponding keypoints, thus leading to the high computation cost and redundant two-stage pipeline. To address the above issue,… ▽ More Multi-person pose estimation generally follows top-down and bottom-up paradigms. Both of them use an extra stage ($\boldsymbol{e.g.,}$ human detection in top-down paradigm or grouping process in bottom-up paradigm) to build the relationship between the human instance and corresponding keypoints, thus leading to the high computation cost and redundant two-stage pipeline. To address the above issue, we propose to represent the human parts as adaptive points and introduce a fine-grained body representation method. The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between the human instance and corresponding keypoints in a single-forward pass. With the proposed body representation, we further deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose. During inference, our proposed network only needs a single-step decode operation to form the multi-person pose without complex post-processes and refinements. We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose. Without any bells and whistles, we achieve the most competitive performance on MS COCO and CrowdPose in terms of accuracy and speed. Furthermore, the outstanding performance on MuCo-3DHP and MuPoTS-3D further demonstrates the effectiveness and generalizability on 3D scenes. Code is available at https://github.com/buptxyb666/AdaptivePose. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: Submit to IEEE TCSVT; 11 pages. arXiv admin note: text overlap with arXiv:2112.13635

arXiv:2209.12713 [pdf, other]

Multi-Agent Sequential Decision-Making via Communication

Authors: Ziluo Ding, Kefan Su, Weixin Hong, Liwen Zhu, Tiejun Huang, Zongqing Lu

Abstract: Communication helps agents to obtain information about others so that better coordinated behavior can be learned. Some existing work communicates predicted future trajectory with others, hoping to get clues about what others would do for better coordination. However, circular dependencies sometimes can occur when agents are treated synchronously so it is hard to coordinate decision-making. In this… ▽ More Communication helps agents to obtain information about others so that better coordinated behavior can be learned. Some existing work communicates predicted future trajectory with others, hoping to get clues about what others would do for better coordination. However, circular dependencies sometimes can occur when agents are treated synchronously so it is hard to coordinate decision-making. In this paper, we propose a novel communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, which is obtained by modeling the environment dynamics. In launching phase, the upper-level agents take the lead in making decisions and communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various multi-agent cooperative tasks. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 20 pages

arXiv:2209.08244 [pdf, other]

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning

Authors: Kefan Su, Siyuan Zhou, Jiechuan Jiang, Chuang Gan, Xiangjun Wang, Zongqing Lu

Abstract: Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in fully decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose multi-agent alternate Q-learning (MA2QL), where agents take turns updating their Q-functions by Q-lear… ▽ More Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in fully decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose multi-agent alternate Q-learning (MA2QL), where agents take turns updating their Q-functions by Q-learning. MA2QL is a minimalist approach to fully decentralized cooperative MARL but is theoretically grounded. We prove that when each agent guarantees $\varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. In practice, MA2QL only requires minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL on a variety of cooperative multi-agent tasks. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes. △ Less

Submitted 7 February, 2023; v1 submitted 17 September, 2022; originally announced September 2022.

Comments: 18 pages

arXiv:2208.07638 [pdf, other]

doi 10.1145/3534678.3539472

Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries

Authors: Xiao Liu, Shiyu Zhao, Kai Su, Yukuo Cen, Jiezhong Qiu, Mengdi Zhang, Wei Wu, Yuxiao Dong, Jie Tang

Abstract: Knowledge graph (KG) embeddings have been a mainstream approach for reasoning over incomplete KGs. However, limited by their inherently shallow and static architectures, they can hardly deal with the rising focus on complex logical queries, which comprise logical operators, imputed edges, multiple source entities, and unknown intermediate entities. In this work, we present the Knowledge Graph Tran… ▽ More Knowledge graph (KG) embeddings have been a mainstream approach for reasoning over incomplete KGs. However, limited by their inherently shallow and static architectures, they can hardly deal with the rising focus on complex logical queries, which comprise logical operators, imputed edges, multiple source entities, and unknown intermediate entities. In this work, we present the Knowledge Graph Transformer (kgTransformer) with masked pre-training and fine-tuning strategies. We design a KG triple transformation method to enable Transformer to handle KGs, which is further strengthened by the Mixture-of-Experts (MoE) sparse activation. We then formulate the complex logical queries as masked prediction and introduce a two-stage masked pre-training strategy to improve transferability and generalizability. Extensive experiments on two benchmarks demonstrate that kgTransformer can consistently outperform both KG embedding-based baselines and advanced encoders on nine in-domain and out-of-domain reasoning tasks. Additionally, kgTransformer can reason with explainability via providing the full reasoning paths to interpret given answers. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: kgTransformer; Accepted to KDD 2022

arXiv:2206.01369 [pdf, other]

Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

Authors: Chenyu You, Jinlin Xiang, Kun Su, Xiaoran Zhang, Siyuan Dong, John Onofrey, Lawrence Staib, James S. Duncan

Abstract: Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, whi… ▽ More Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, which achieve competitive performance on average but such methods rely on the assumption about the availability of all training data, thus limiting its effectiveness in practical deployment. In this paper, we propose a novel multi-site segmentation framework called incremental-transfer learning (ITL), which learns a model from multi-site datasets in an end-to-end sequential fashion. Specifically, "incremental" refers to training sequentially constructed datasets, and "transfer" is achieved by leveraging useful information from the linear combination of embedding features on each dataset. In addition, we introduce our ITL framework, where we train the network including a site-agnostic encoder with pre-trained weights and at most two segmentation decoder heads. We also design a novel site-level incremental loss in order to generalize well on the target domain. Second, we show for the first time that leveraging our ITL training scheme is able to alleviate challenging catastrophic forgetting problems in incremental learning. We conduct experiments using five challenging benchmark datasets to validate the effectiveness of our incremental-transfer learning approach. Our approach makes minimal assumptions on computation resources and domain-specific expertise, and hence constitutes a strong starting point in multi-site medical image segmentation. △ Less

Submitted 30 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

arXiv:2110.00304 [pdf, other]

Divergence-Regularized Multi-Agent Actor-Critic

Authors: Kefan Su, Zongqing Lu

Abstract: Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective of the original Markov Decision Process (MDP). Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularizatio… ▽ More Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective of the original Markov Decision Process (MDP). Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Theoretically, we derive the update rule of DMAC which is naturally off-policy and guarantees monotonic policy improvement and convergence in both the original MDP and divergence-regularized MDP. We also give a bound of the discrepancy between the converged policy and optimal policy in the original MDP. DMAC is a flexible framework and can be combined with many existing MARL algorithms. Empirically, we evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and show that DMAC substantially improves the performance of existing MARL algorithms. △ Less

Submitted 21 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: ICML 2022, 24 pages, 10 figures

arXiv:2109.06109 [pdf, other]

Weakly Supervised Person Search with Region Siamese Networks

Authors: Chuchu Han, Kai Su, Dongdong Yu, Zehuan Yuan, Changxin Gao, Nong Sang, Yi Yang, Changhu Wang

Abstract: Supervised learning is dominant in person search, but it requires elaborate labeling of bounding boxes and identities. Large-scale labeled training data is often difficult to collect, especially for person identities. A natural question is whether a good person search model can be trained without the need of identity supervision. In this paper, we present a weakly supervised setting where only bou… ▽ More Supervised learning is dominant in person search, but it requires elaborate labeling of bounding boxes and identities. Large-scale labeled training data is often difficult to collect, especially for person identities. A natural question is whether a good person search model can be trained without the need of identity supervision. In this paper, we present a weakly supervised setting where only bounding box annotations are available. Based on this new setting, we provide an effective baseline model termed Region Siamese Networks (R-SiamNets). Towards learning useful representations for recognition in the absence of identity labels, we supervise the R-SiamNet with instance-level consistency loss and cluster-level contrastive loss. For instance-level consistency learning, the R-SiamNet is constrained to extract consistent features from each person region with or without out-of-region context. For cluster-level contrastive learning, we enforce the aggregation of closest instances and the separation of dissimilar ones in feature space. Extensive experiments validate the utility of our weakly supervised method. Our model achieves the rank-1 of 87.1% and mAP of 86.0% on CUHK-SYSU benchmark, which surpasses several fully supervised methods, such as OIM and MGTS, by a clear margin. More promising performance can be reached by incorporating extra training data. We hope this work could encourage the future research in this field. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by ICCV 2021

arXiv:2109.00373 [pdf, other]

Memory Based Video Scene Parsing

Authors: Zhenchao Jin, Dongdong Yu, Kai Su, Zehuan Yuan, Changhu Wang

Abstract: Video scene parsing is a long-standing challenging task in computer vision, aiming to assign pre-defined semantic labels to pixels of all frames in a given video. Compared with image semantic segmentation, this task pays more attention on studying how to adopt the temporal information to obtain higher predictive accuracy. In this report, we introduce our solution for the 1st Video Scene Parsing in… ▽ More Video scene parsing is a long-standing challenging task in computer vision, aiming to assign pre-defined semantic labels to pixels of all frames in a given video. Compared with image semantic segmentation, this task pays more attention on studying how to adopt the temporal information to obtain higher predictive accuracy. In this report, we introduce our solution for the 1st Video Scene Parsing in the Wild Challenge, which achieves a mIoU of 57.44 and obtained the 2nd place (our team name is CharlesBLWX). △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: technical report for "The 1st Video Scene Parsing in the Wild Challenge Workshop". arXiv admin note: text overlap with arXiv:2108.11819

arXiv:2107.02893 [pdf]

Answering Chinese Elementary School Social Study Multiple Choice Questions

Authors: Daniel Lee, Chao-Chun Liang, Keh-Yih Su

Abstract: We present a novel approach to answer the Chinese elementary school Social Study Multiple Choice questions. Although BERT has demonstrated excellent performance on Reading Comprehension tasks, it is found not good at handling some specific types of questions, such as Negation, All-of-the-above, and None-of-the-above. We thus propose a novel framework to cascade BERT with a Pre-Processor and an Ans… ▽ More We present a novel approach to answer the Chinese elementary school Social Study Multiple Choice questions. Although BERT has demonstrated excellent performance on Reading Comprehension tasks, it is found not good at handling some specific types of questions, such as Negation, All-of-the-above, and None-of-the-above. We thus propose a novel framework to cascade BERT with a Pre-Processor and an Answer-Selector modules to tackle the above challenges. Experimental results show the proposed approach effectively improves the performance of BERT, and thus demonstrate the feasibility of supplementing BERT with additional modules. △ Less

Submitted 26 June, 2021; originally announced July 2021.

Comments: TAAI-2020

arXiv:2106.15772 [pdf]

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

Authors: Shen-Yun Miao, Chao-Chun Liang, Keh-Yih Su

Abstract: We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus for evaluating the capability of various MWP solvers. Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types. We thus present a new English MWP corpus with 2,305 MWPs that cover mo… ▽ More We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus for evaluating the capability of various MWP solvers. Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types. We thus present a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem types taught in elementary school. Each MWP is annotated with its problem type and grade level (for indicating the level of difficulty). Furthermore, we propose a metric to measure the lexicon usage diversity of a given MWP corpus, and demonstrate that ASDiv is more diverse than existing corpora. Experiments show that our proposed corpus reflects the true capability of MWP solvers more faithfully. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: ACL-2020

arXiv:2106.00990 [pdf, other]

Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving

Authors: Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, Keh-Yih Su

Abstract: With the recent advancements in deep learning, neural solvers have gained promising results in solving math word problems. However, these SOTA solvers only generate binary expression trees that contain basic arithmetic operators and do not explicitly use the math formulas. As a result, the expression trees they produce are lengthy and uninterpretable because they need to use multiple operators and… ▽ More With the recent advancements in deep learning, neural solvers have gained promising results in solving math word problems. However, these SOTA solvers only generate binary expression trees that contain basic arithmetic operators and do not explicitly use the math formulas. As a result, the expression trees they produce are lengthy and uninterpretable because they need to use multiple operators and constants to represent one single formula. In this paper, we propose sequence-to-general tree (S2G) that learns to generate interpretable and executable operation trees where the nodes can be formulas with an arbitrary number of arguments. With nodes now allowed to be formulas, S2G can learn to incorporate mathematical domain knowledge into problem-solving, making the results more interpretable. Experiments show that S2G can achieve a better performance against strong baselines on problems that require domain knowledge. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: ACL2021

arXiv:2012.03478 [pdf, other]

Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements

Authors: Kun Su, Xiulong Liu, Eli Shlizerman

Abstract: We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting. Learning to generate multi-instrumental music from videos without labeling the instruments is a challenging problem. To achieve the transformation, we built a pipeline named 'Multi-instrumentalistNet' (MI Net). At its base, the pipeline learns a… ▽ More We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting. Learning to generate multi-instrumental music from videos without labeling the instruments is a challenging problem. To achieve the transformation, we built a pipeline named 'Multi-instrumentalistNet' (MI Net). At its base, the pipeline learns a discrete latent representation of various instruments music from log-spectrogram using a Vector Quantized Variational Autoencoder (VQ-VAE) with multi-band residual blocks. The pipeline is then trained along with an autoregressive prior conditioned on the musician's body keypoints movements encoded by a recurrent neural network. Joint training of the prior with the body movements encoder succeeds in the disentanglement of the music into latent features indicating the musical components and the instrumental features. The latent space results in distributions that are clustered into distinct instruments from which new music can be generated. Furthermore, the VQ-VAE architecture supports detailed music generation with additional conditioning. We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video. We evaluate MI Net on two datasets containing videos of 13 instruments and obtain generated music of reasonable audio quality, easily associated with the corresponding instrument, and consistent with the music audio content. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: Please see associated video at https://www.youtube.com/watch?v=yo5OZKBbBh4

arXiv:2006.14348 [pdf, other]

Audeo: Audio Generation for a Silent Performance Video

Authors: Kun Su, Xiulong Liu, Eli Shlizerman

Abstract: We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video. Generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the associat… ▽ More We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video. Generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association of sounds with visual events. To achieve the transformation we built a full pipeline named `\textit{Audeo}' containing three components. We first translate the video frames of the keyboard and the musician hand movements into raw mechanical musical symbolic representation Piano-Roll (Roll) for each video frame which represents the keys pressed at each time step. We then adapt the Roll to be amenable for audio synthesis by including temporal correlations. This step turns out to be critical for meaningful audio generation. As a last step, we implement Midi synthesizers to generate realistic music. \textit{Audeo} converts video to audio smoothly and clearly with only a few setup constraints. We evaluate \textit{Audeo} on `in the wild' piano performance videos and obtain that their generated music is of reasonable audio quality and can be successfully recognized with high precision by popular music identification software. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: Please see associated video at https://www.youtube.com/watch?v=8rS3VgjG7_c

Journal ref: Advances in neural information processing 2020

arXiv:2006.09874 [pdf]

Cluster Diffusing Shuffles

Authors: Kevin Su

Abstract: Unbiased shuffling algorithms, such as the Fisher-Yates shuffle, are often used for shuffle play in media players. These algorithms treat all items being shuffled equally regardless of how similar the items are to each other. While this may be desirable for many applications, this is problematic for shuffle play due to the clustering illusion, which is the tendency for humans to erroneously consid… ▽ More Unbiased shuffling algorithms, such as the Fisher-Yates shuffle, are often used for shuffle play in media players. These algorithms treat all items being shuffled equally regardless of how similar the items are to each other. While this may be desirable for many applications, this is problematic for shuffle play due to the clustering illusion, which is the tendency for humans to erroneously consider 'streaks' or 'clusters' that may arise from samplings of random distributions to be non-random. This thesis attempts to address this issue with a family of biased shuffling algorithms called cluster diffusing (CD) shuffles which are based on disordered hyperuniform systems such as the distribution of cone cells in chicken eyes, the energy levels of heavy atomic nuclei, the eigenvalue distributions of various types of random matrices, and many others which appear in a variety of biological, chemical, physical, and mathematical settings. These systems suppress density fluctuations at large length scales without appearing ordered like lattices, making them ideal for shuffle play. The CD shuffles range from a random matrix based shuffle which takes $O(n^3)$ time and $O(n^2)$ space to more efficient approximations which take $O(n)$ time and $O(n)$ space. △ Less

Submitted 9 June, 2020; originally announced June 2020.

ACM Class: F.2.2

arXiv:2006.07412 [pdf, other]

BI-MAML: Balanced Incremental Approach for Meta Learning

Authors: Yang Zheng, Jinlin Xiang, Kun Su, Eli Shlizerman

Abstract: We present a novel Balanced Incremental Model Agnostic Meta Learning system (BI-MAML) for learning multiple tasks. Our method implements a meta-update rule to incrementally adapt its model to new tasks without forgetting old tasks. Such a capability is not possible in current state-of-the-art MAML approaches. These methods effectively adapt to new tasks, however, suffer from 'catastrophic forgetti… ▽ More We present a novel Balanced Incremental Model Agnostic Meta Learning system (BI-MAML) for learning multiple tasks. Our method implements a meta-update rule to incrementally adapt its model to new tasks without forgetting old tasks. Such a capability is not possible in current state-of-the-art MAML approaches. These methods effectively adapt to new tasks, however, suffer from 'catastrophic forgetting' phenomena, in which new tasks that are streamed into the model degrade the performance of the model on previously learned tasks. Our system performs the meta-updates with only a few-shots and can successfully accomplish them. Our key idea for achieving this is the design of balanced learning strategy for the baseline model. The strategy sets the baseline model to perform equally well on various tasks and incorporates time efficiency. The balanced learning strategy enables BI-MAML to both outperform other state-of-the-art models in terms of classification accuracy for existing tasks and also accomplish efficient adaption to similar new tasks with less required shots. We evaluate BI-MAML by conducting comparisons on two common benchmark datasets with multiple number of image classification tasks. BI-MAML performance demonstrates advantages in both accuracy and efficiency. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: Please see associated video at: https://youtu.be/4qlb-iG5SFo

arXiv:1911.12409 [pdf, other]

PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition

Authors: Kun Su, Xiulong Liu, Eli Shlizerman

Abstract: We propose a novel system for unsupervised skeleton-based action recognition. Given inputs of body keypoints sequences obtained during various movements, our system associates the sequences with actions. Our system is based on an encoder-decoder recurrent neural network, where the encoder learns a separable feature representation within its hidden states formed by training the model to perform pre… ▽ More We propose a novel system for unsupervised skeleton-based action recognition. Given inputs of body keypoints sequences obtained during various movements, our system associates the sequences with actions. Our system is based on an encoder-decoder recurrent neural network, where the encoder learns a separable feature representation within its hidden states formed by training the model to perform prediction task. We show that according to such unsupervised training the decoder and the encoder self-organize their hidden states into a feature space which clusters similar movements into the same cluster and distinct movements into distant clusters. Current state-of-the-art methods for action recognition are strongly supervised, i.e., rely on providing labels for training. Unsupervised methods have been proposed, however, they require camera and depth inputs (RGB+D) at each time step. In contrast, our system is fully unsupervised, does not require labels of actions at any stage, and can operate with body keypoints input only. Furthermore, the method can perform on various dimensions of body keypoints (2D or 3D) and include additional cues describing movements. We evaluate our system on three extensive action recognition benchmarks with different number of actions and examples. Our results outperform prior unsupervised skeleton-based methods, unsupervised RGB+D based methods on cross-view tests and while being unsupervised have similar performance to supervised skeleton-based action recognition. △ Less

Submitted 27 November, 2019; originally announced November 2019.

Comments: See video at: https://www.youtube.com/watch?v=-dcCFUBRmwE

arXiv:1911.07938 [pdf, ps, other]

Towards Good Practices for Multi-Person Pose Estimation

Authors: Dongdong Yu, Kai Su, Changhu Wang

Abstract: Multi-Person Pose Estimation is an interesting yet challenging task in computer vision. In this paper, we conduct a series of refinements with the MSPN and PoseFix Networks, and empirically evaluate their impact on the final model performance through ablation studies. By taking all the refinements, we achieve 78.7 on the COCO test-dev dataset and 76.3 on the COCO test-challenge dataset. Multi-Person Pose Estimation is an interesting yet challenging task in computer vision. In this paper, we conduct a series of refinements with the MSPN and PoseFix Networks, and empirically evaluate their impact on the final model performance through ablation studies. By taking all the refinements, we achieve 78.7 on the COCO test-dev dataset and 76.3 on the COCO test-challenge dataset. △ Less

Submitted 27 October, 2019; originally announced November 2019.

arXiv:1909.13583 [pdf, other]

Towards Good Practices for Video Object Segmentation

Authors: Dongdong Yu, Kai Su, Hengkai Guo, Jian Wang, Kaihui Zhou, Yuanyuan Huang, Minghui Dong, Jie Shao, Changhu Wang

Abstract: Semi-supervised video object segmentation is an interesting yet challenging task in machine learning. In this work, we conduct a series of refinements with the propagation-based video object segmentation method and empirically evaluate their impact on the final model performance through ablation study. By taking all the refinements, we improve the space-time memory networks to achieve a Overall of… ▽ More Semi-supervised video object segmentation is an interesting yet challenging task in machine learning. In this work, we conduct a series of refinements with the propagation-based video object segmentation method and empirically evaluate their impact on the final model performance through ablation study. By taking all the refinements, we improve the space-time memory networks to achieve a Overall of 79.1 on the Youtube-VOS Challenge 2019. △ Less

Submitted 30 September, 2019; originally announced September 2019.

arXiv:1905.12176 [pdf, other]

Clustering and Recognition of Spatiotemporal Features through Interpretable Embedding of Sequence to Sequence Recurrent Neural Networks

Authors: Kun Su, Eli Shlizerman

Abstract: Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved great success in ubiquitous areas of computation and applications. It was shown to be successful in modeling data with both temporal and spatial dependencies for translation or prediction tasks. In this study, we propose an embedding approach to visualize and interpret the representation of data by these models. Furthermor… ▽ More Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved great success in ubiquitous areas of computation and applications. It was shown to be successful in modeling data with both temporal and spatial dependencies for translation or prediction tasks. In this study, we propose an embedding approach to visualize and interpret the representation of data by these models. Furthermore, we show that the embedding is an effective method for unsupervised learning and can be utilized to estimate the optimality of model training. In particular, we demonstrate that embedding space projections of the decoder states of RNN Seq2Seq model trained on sequences prediction are organized in clusters capturing similarities and differences in the dynamics of these sequences. Such performance corresponds to an unsupervised clustering of any spatio-temporal features and can be employed for time-dependent problems such as temporal segmentation, clustering of dynamic activity, self-supervised classification, action recognition, failure prediction, etc. We test and demonstrate the application of the embedding methodology to time-sequences of 3D human body poses. We show that the methodology provides a high-quality unsupervised categorization of movements. △ Less

Submitted 31 January, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

arXiv:1905.05355 [pdf, other]

A Context-and-Spatial Aware Network for Multi-Person Pose Estimation

Authors: Dongdong Yu, Kai Su, Xin Geng, Changhu Wang

Abstract: Multi-person pose estimation is a fundamental yet challenging task in computer vision. Both rich context information and spatial information are required to precisely locate the keypoints for all persons in an image. In this paper, a novel Context-and-Spatial Aware Network (CSANet), which integrates both a Context Aware Path and Spatial Aware Path, is proposed to obtain effective features involvin… ▽ More Multi-person pose estimation is a fundamental yet challenging task in computer vision. Both rich context information and spatial information are required to precisely locate the keypoints for all persons in an image. In this paper, a novel Context-and-Spatial Aware Network (CSANet), which integrates both a Context Aware Path and Spatial Aware Path, is proposed to obtain effective features involving both context information and spatial information. Specifically, we design a Context Aware Path with structure supervision strategy and spatial pyramid pooling strategy to enhance the context information. Meanwhile, a Spatial Aware Path is proposed to preserve the spatial information, which also shortens the information propagation path from low-level features to high-level features. On top of these two paths, we employ a Heavy Head Path to further combine and enhance the features effectively. Experimentally, our proposed network outperforms state-of-the-art methods on the COCO keypoint benchmark, which verifies the effectiveness of our method and further corroborates the above proposition. △ Less

Submitted 13 May, 2019; originally announced May 2019.

arXiv:1905.03466 [pdf, other]

Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information

Authors: Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, Changhu Wang

Abstract: Multi-person pose estimation is an important but challenging problem in computer vision. Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps. In this paper, we propose two novel modules to perform the enhancement of the information for the multi-per… ▽ More Multi-person pose estimation is an important but challenging problem in computer vision. Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps. In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation. First, a Channel Shuffle Module (CSM) is proposed to adopt the channel shuffle operation on the feature maps with different levels, promoting cross-channel information communication among the pyramid feature maps. Second, a Spatial, Channel-wise Attention Residual Bottleneck (SCARB) is designed to boost the original residual unit with attention mechanism, adaptively highlighting the information of the feature maps both in the spatial and channel-wise context. The effectiveness of our proposed modules is evaluated on the COCO keypoint benchmark, and experimental results show that our approach achieves the state-of-the-art results. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: Accepted by CVPR 2019

arXiv:1903.07025 [pdf]

VeriSFQ - A Semi-formal Verification Framework and Benchmark for Single Flux Quantum Technology

Authors: Alvin D. Wong, Kevin Su, Hang Sun, Arash Fayyazi, Massoud Pedram, Shahin Nazarian

Abstract: In this paper, we propose a semi-formal verification framework for single-flux quantum (SFQ) circuits called VeriSFQ, using the Universal Verification Methodology (UVM) standard. The considered SFQ technology is superconducting digital electronic devices that operate at cryogenic temperatures with active circuit elements called the Josephson junction, which operate at high switching speeds and low… ▽ More In this paper, we propose a semi-formal verification framework for single-flux quantum (SFQ) circuits called VeriSFQ, using the Universal Verification Methodology (UVM) standard. The considered SFQ technology is superconducting digital electronic devices that operate at cryogenic temperatures with active circuit elements called the Josephson junction, which operate at high switching speeds and low switching energy - allowing SFQ circuits to operate at frequencies over 300 gigahertz. Due to key differences between SFQ and CMOS logic, verification techniques for the former are not as advanced as the latter. Thus, it is crucial to develop efficient verification techniques as the complexity of SFQ circuits scales. The VeriSFQ framework focuses on verifying the key circuit and gate-level properties of SFQ logic: fanout, gate-level pipeline, path balancing, and input-to-output latency. The combinational circuits considered in analyzing the performance of VeriSFQ are: Kogge-Stone adders (KSA), array multipliers, integer dividers, and select ISCAS'85 combinational benchmark circuits. Methods of introducing bugs into SFQ circuit designs for verification detection were experimented with - including stuck-at faults, fanout errors, unbalanced paths, and functional bugs like incorrect logic gates. In addition, we propose an SFQ verification benchmark consisting of combinational SFQ circuits that exemplify SFQ logic properties and present the performance of the VeriSFQ framework on these benchmark circuits. The portability and reusability of the UVM standard allows the VeriSFQ framework to serve as a foundation for future SFQ semi-formal verification techniques. △ Less

Submitted 17 March, 2019; originally announced March 2019.

Comments: 7 pages, 6 figures, 4 tables; submitted, accepted, and presented at ISQED 2019 (20th International Symposium on Quality Electronic Design) on March 7th, 2019 in Santa Clara, CA, USA

arXiv:1808.09907 [pdf, other]

Dropout with Tabu Strategy for Regularizing Deep Neural Networks

Authors: Zongjie Ma, Abdul Sattar, Jun Zhou, Qingliang Chen, Kaile Su

Abstract: Dropout has proven to be an effective technique for regularization and preventing the co-adaptation of neurons in deep neural networks (DNN). It randomly drops units with a probability $p$ during the training stage of DNN. Dropout also provides a way of approximately combining exponentially many different neural network architectures efficiently. In this work, we add a diversification strategy int… ▽ More Dropout has proven to be an effective technique for regularization and preventing the co-adaptation of neurons in deep neural networks (DNN). It randomly drops units with a probability $p$ during the training stage of DNN. Dropout also provides a way of approximately combining exponentially many different neural network architectures efficiently. In this work, we add a diversification strategy into dropout, which aims at generating more different neural network architectures in a proper times of iterations. The dropped units in last forward propagation will be marked. Then the selected units for dropping in the current FP will be kept if they have been marked in the last forward propagation. We only mark the units from the last forward propagation. We call this new technique Tabu Dropout. Tabu Dropout has no extra parameters compared with the standard Dropout and also it is computationally cheap. The experiments conducted on MNIST, Fashion-MNIST datasets show that Tabu Dropout improves the performance of the standard dropout. △ Less

Submitted 29 August, 2018; originally announced August 2018.

arXiv:1808.01308

The Normal Map Based on Area-Preserving Parameterization

Authors: Hui Zhao, Kehua Su, Ming Ma, Na Lei, Li Cui, Xianfeng Gu

Abstract: In this paper, we present an approach to enhance and improve the current normal map rendering technique. Our algorithm is based on semi-discrete Optimal Mass Transportation (OMT) theory and has a solid theoretical base. The key difference from previous normal map method is that we preserve the local area when we unwrap a disk-like 3D surface onto 2D plane. Compared to the currently used techniques… ▽ More In this paper, we present an approach to enhance and improve the current normal map rendering technique. Our algorithm is based on semi-discrete Optimal Mass Transportation (OMT) theory and has a solid theoretical base. The key difference from previous normal map method is that we preserve the local area when we unwrap a disk-like 3D surface onto 2D plane. Compared to the currently used techniques which is based on conformal parameterization, our method does not need to cut a surface into many small pieces to avoid the large area distortion. The following charts packing step is also unnecessary in our framework. Our method is practical and makes the normal map technique more robust and efficient. △ Less

Submitted 21 April, 2020; v1 submitted 14 July, 2018; originally announced August 2018.

Comments: we need update it

arXiv:1804.08187 [pdf, ps, other]

Advancing Tabu and Restart in Local Search for Maximum Weight Cliques

Authors: Yi Fan, Nan Li, Chengqian Li, Zongjie Ma, Longin Jan Latecki, Kaile Su

Abstract: The tabu and restart are two fundamental strategies for local search. In this paper, we improve the local search algorithms for solving the Maximum Weight Clique (MWC) problem by introducing new tabu and restart strategies. Both the tabu and restart strategies proposed are based on the notion of a local search scenario, which involves not only a candidate solution but also the tabu status and unlo… ▽ More The tabu and restart are two fundamental strategies for local search. In this paper, we improve the local search algorithms for solving the Maximum Weight Clique (MWC) problem by introducing new tabu and restart strategies. Both the tabu and restart strategies proposed are based on the notion of a local search scenario, which involves not only a candidate solution but also the tabu status and unlocking relationship. Compared to the strategy of configuration checking, our tabu mechanism discourages forming a cycle of unlocking operations. Our new restart strategy is based on the re-occurrence of a local search scenario instead of that of a candidate solution. Experimental results show that the resulting MWC solver outperforms several state-of-the-art solvers on the DIMACS, BHOSLIB, and two benchmarks from practical applications. △ Less

Submitted 22 April, 2018; originally announced April 2018.

arXiv:1803.06064 [pdf]

A Meaning-based Statistical English Math Word Problem Solver

Authors: Chao-Chun Liang, Yu-Shiang Wong, Yi-Chung Lin, Keh-Yih Su

Abstract: We introduce MeSys, a meaning-based approach, for solving English math word problems (MWPs) via understanding and reasoning in this paper. It first analyzes the text, transforms both body and question parts into their corresponding logic forms, and then performs inference on them. The associated context of each quantity is represented with proposed role-tags (e.g., nsubj, verb, etc.), which provid… ▽ More We introduce MeSys, a meaning-based approach, for solving English math word problems (MWPs) via understanding and reasoning in this paper. It first analyzes the text, transforms both body and question parts into their corresponding logic forms, and then performs inference on them. The associated context of each quantity is represented with proposed role-tags (e.g., nsubj, verb, etc.), which provides the flexibility for annotating an extracted math quantity with its associated context information (i.e., the physical meaning of this quantity). Statistical models are proposed to select the operator and operands. A noisy dataset is designed to assess if a solver solves MWPs mainly via understanding or mechanical pattern matching. Experimental results show that our approach outperforms existing systems on both benchmark datasets and the noisy dataset, which demonstrates that the proposed approach understands the meaning of each quantity in the text more. △ Less

Submitted 5 July, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

Comments: Accepted as a long paper at NAACL HLT 2018

arXiv:1710.10403 [pdf, other]

doi 10.1007/s10489-018-1266-3

Trainable back-propagated functional transfer matrices

Authors: Cheng-Hao Cai, Yanyan Xu, Dengfeng Ke, Kaile Su, Jing Sun

Abstract: Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then s… ▽ More Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits. △ Less

Submitted 28 October, 2017; originally announced October 2017.

Comments: 39 pages, 4 figures, submitted as a journal article

Journal ref: Appl. Intell. (2018)

arXiv:1710.05488 [pdf, other]

A Geometric View of Optimal Transportation and Generative Model

Authors: Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu

Abstract: In this work, we show the intrinsic relations between optimal transportation and convex geometry, especially the variational approach to solve Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes. This leads to a geometric interpretation to generative models, and leads to a novel framework for generative models. By using the optimal transportation view of GAN… ▽ More In this work, we show the intrinsic relations between optimal transportation and convex geometry, especially the variational approach to solve Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes. This leads to a geometric interpretation to generative models, and leads to a novel framework for generative models. By using the optimal transportation view of GAN model, we show that the discriminator computes the Kantorovich potential, the generator calculates the transportation map. For a large class of transportation costs, the Kantorovich potential can give the optimal transportation map by a close-form formula. Therefore, it is sufficient to solely optimize the discriminator. This shows the adversarial competition can be avoided, and the computational architecture can be simplified. Preliminary experimental results show the geometric method outperforms WGAN for approximating probability measures with multiple clusters in low dimensional space. △ Less

Submitted 18 December, 2017; v1 submitted 15 October, 2017; originally announced October 2017.

arXiv:1704.07503 [pdf, other]

doi 10.1016/j.bica.2018.07.004

Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks

Authors: Cheng-Hao Cai, Dengfeng Ke, Yanyan Xu, Kaile Su

Abstract: There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic… ▽ More There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals. △ Less

Submitted 24 April, 2017; originally announced April 2017.

Comments: 8 pages, 7 figures

ACM Class: I.2.0; I.2.3; I.2.4; I.2.6; I.2.8; I.5.0; I.5.1; I.5.2; I.5.4; F.4.1

arXiv:1605.07705 [pdf, ps, other]

Understanding Content Placement Strategies in Smartrouter-based Peer CDN for Video Streaming

Authors: Ming Ma, Zhi Wang, Ke Su, Lifeng Sun

Abstract: Recent years have witnessed a new video delivery paradigm: smartrouter-based peer video content delivery network, which is enabled by smartrouters deployed at users' homes. ChinaCache (one of the largest CDN providers in China) and Youku (a video provider using smartrouters to assist video delivery) announced their cooperation in 2015, to create a new paradigm of content delivery based on househol… ▽ More Recent years have witnessed a new video delivery paradigm: smartrouter-based peer video content delivery network, which is enabled by smartrouters deployed at users' homes. ChinaCache (one of the largest CDN providers in China) and Youku (a video provider using smartrouters to assist video delivery) announced their cooperation in 2015, to create a new paradigm of content delivery based on householders' network resources. This new paradigm is different from the conventional peer-to-peer (P2P) approach, because millions of dedicated smartrouters are operated by the centralized video service providers in a coordinative manner. Thus it is intriguing to study the content placement strategies used in a smartrouter-based content delivery system, as well as its potential impact on the content delivery ecosystem. In this paper, we carry out measurement studies of Youku's peer video CDN, who has deployed over 300K smartrouter devices for its video delivery. In our measurement studies, 104K videos were investigated and 4TB traffic has been analyzed, over controlled smartrouter nodes and players. Our measurement insights are as follows. First, a global content replication strategy is essential for the peer CDN systems. Second, such peer CDN deployment itself can form an effective sub-system for end-to-end QoS monitoring, which can be used for fine-grained request redirection (e.g., user-level) and content replication. We also show our analysis on the performance limitations and propose potential improvements to the peer CDN systems. △ Less

Submitted 25 May, 2016; v1 submitted 24 May, 2016; originally announced May 2016.

Comments: arXiv admin note: text overlap with arXiv:1605.07704

arXiv:1605.07704 [pdf, ps, other]

Understanding the Smartrouter-based Peer CDN for Video Streaming

Authors: Ming Ma, Zhi Wang, Ke Su, Lifeng Sun

Abstract: Recent years have witnessed a new video delivery paradigm: smartrouter-based video delivery network, which is enabled by smartrouters deployed at users' homes, together with the conventional video servers deployed in the datacenters. Recently, ChinaCache, a large content delivery network (CDN) provider, and Youku, a video service provider using smartrouters to assist video delivery, announced thei… ▽ More Recent years have witnessed a new video delivery paradigm: smartrouter-based video delivery network, which is enabled by smartrouters deployed at users' homes, together with the conventional video servers deployed in the datacenters. Recently, ChinaCache, a large content delivery network (CDN) provider, and Youku, a video service provider using smartrouters to assist video delivery, announced their cooperation to create a new paradigm of content delivery based on householders' network resources. This new paradigm is different from the conventional peer-to-peer (P2P) approach, because such dedicated smartrouters are inherently operated by the centralized video service providers in a coordinative manner. It is intriguing to study the strategies, performance and potential impact on the content delivery ecosystem of such peer CDN systems. In this paper, we study the Youku peer CDN, which has deployed over 300K smartrouter devices for its video streaming. In our measurement, 78K videos were investigated and 3TB traffic has been analyzed, over controlled routers and players. Our contributions are the following measurement insights. First, a global replication and caching strategy is essential for the peer CDN systems, and proactively scheduling replication and caching on a daily basis can guarantee their performance. Second, such peer CDN deployment can itself form an effective Quality of Service (QoS) monitoring sub-system, which can be used for fine-grained user request redirection. We also provide our analysis on the performance issues and potential improvements to the peer CDN systems. △ Less

Submitted 24 May, 2016; originally announced May 2016.

arXiv:1604.05086 [pdf, ps, other]

Normative Multiagent Systems: A Dynamic Generalization

Authors: Xiaowei Huang, Ji Ruan, Qingliang Chen, Kaile Su

Abstract: Social norms are powerful formalism in coordinating autonomous agents' behaviour to achieve certain objectives. In this paper, we propose a dynamic normative system to enable the reasoning of the changes of norms under different circumstances, which cannot be done in the existing static normative systems. We study two important problems (norm synthesis and norm recognition) related to the autonomy… ▽ More Social norms are powerful formalism in coordinating autonomous agents' behaviour to achieve certain objectives. In this paper, we propose a dynamic normative system to enable the reasoning of the changes of norms under different circumstances, which cannot be done in the existing static normative systems. We study two important problems (norm synthesis and norm recognition) related to the autonomy of the entire system and the agents, and characterise the computational complexities of solving these problems. △ Less

Submitted 18 April, 2016; originally announced April 2016.

Comments: 26 pages. A conference version of this work is accepted by the 25th International Joint Conference on Artificial Intelligence (IJCAI-16)

ACM Class: I.2.11; I.2.4

arXiv:1410.2662 [pdf, ps, other]

Evaluating Opportunistic Delivery of Large Content with TCP over WiFi in I2V Communication

Authors: Shreyasee Mukherjee, Kai Su, Narayan B. Mandayam, K. K. Ramakrishnan, Dipankar Raychaudhuri, Ivan Seskar

Abstract: With the increasing interest in connected vehicles, it is useful to evaluate the capability of delivering large content over a WiFi infrastructure to vehicles. The throughput achieved over WiFi channels can be highly variable and also rapidly degrades as the distance from the access point increases. While this behavior is well understood at the data link layer, the interactions across the various… ▽ More With the increasing interest in connected vehicles, it is useful to evaluate the capability of delivering large content over a WiFi infrastructure to vehicles. The throughput achieved over WiFi channels can be highly variable and also rapidly degrades as the distance from the access point increases. While this behavior is well understood at the data link layer, the interactions across the various protocol layers (data link and up through the transport layer) and the effect of mobility may reduce the amount of content transferred to the vehicle, as it travels along the roadway. This paper examines the throughput achieved at the TCP layer over a carefully designed outdoor WiFi environment and the interactions across the layers that impact the performance achieved, as a function of the receiver mobility. The experimental studies conducted reveal that impairments over the WiFi link (frame loss, ARQ and increased delay) and the residual loss seen by TCP causes a cascade of duplicate ACKs to be generated. This triggers large congestion window reductions at the sender, leading to a drastic degradation of throughput to the vehicular client. To ensure outdoor WiFi infrastructures have the potential to sustain reasonable downlink throughput for drive-by vehicles, we speculate that there is a need to adapt how WiFi and TCP (as well as mobility protocols) function for such vehicular applications. △ Less

Submitted 9 October, 2014; originally announced October 2014.

arXiv:1402.0584 [pdf]

doi 10.1613/jair.3907

NuMVC: An Efficient Local Search Algorithm for Minimum Vertex Cover

Authors: Shaowei Cai, Kaile Su, Chuan Luo, Abdul Sattar

Abstract: The Minimum Vertex Cover (MVC) problem is a prominent NP-hard combinatorial optimization problem of great importance in both theory and application. Local search has proved successful for this problem. However, there are two main drawbacks in state-of-the-art MVC local search algorithms. First, they select a pair of vertices to exchange simultaneously, which is time-consuming. Secondly, although u… ▽ More The Minimum Vertex Cover (MVC) problem is a prominent NP-hard combinatorial optimization problem of great importance in both theory and application. Local search has proved successful for this problem. However, there are two main drawbacks in state-of-the-art MVC local search algorithms. First, they select a pair of vertices to exchange simultaneously, which is time-consuming. Secondly, although using edge weighting techniques to diversify the search, these algorithms lack mechanisms for decreasing the weights. To address these issues, we propose two new strategies: two-stage exchange and edge weighting with forgetting. The two-stage exchange strategy selects two vertices to exchange separately and performs the exchange in two stages. The strategy of edge weighting with forgetting not only increases weights of uncovered edges, but also decreases some weights for each edge periodically. These two strategies are used in designing a new MVC local search algorithm, which is referred to as NuMVC. We conduct extensive experimental studies on the standard benchmarks, namely DIMACS and BHOSLIB. The experiment comparing NuMVC with state-of-the-art heuristic algorithms show that NuMVC is at least competitive with the nearest competitor namely PLS on the DIMACS benchmark, and clearly dominates all competitors on the BHOSLIB benchmark. Also, experimental results indicate that NuMVC finds an optimal solution much faster than the current best exact algorithm for Maximum Clique on random instances as well as some structured ones. Moreover, we study the effectiveness of the two strategies and the run-time behaviour through experimental analysis. △ Less

Submitted 3 February, 2014; originally announced February 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 46, pages 687-716, 2013

Showing 1–50 of 54 results for author: Su, K