Search | arXiv e-print repository

How to beat a Bayesian adversary

Authors: Zihan Ding, Kexin Jin, Jonas Latz, Chenguang Liu

Abstract: Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine lea… ▽ More Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks. In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram - a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean-Vlasov process and justify the use of Abram by giving assumptions under which the McKean-Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments. △ Less

Submitted 11 July, 2024; originally announced July 2024.

MSC Class: 90C15; 65C35; 68T07

arXiv:2407.00623 [pdf, other]

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Authors: Yiquan Li, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Bo Li, Chaowei Xiao

Abstract: Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images res… ▽ More Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work. Consistency Purification employs the consistency model, a one-step generative model distilled from PF-ODE, thus can generate on-manifold purified images with a single network evaluation. However, the consistency model is designed not for purification thus it does not inherently ensure semantic alignment between purified and original images. To resolve this issue, we further refine it through Consistency Fine-tuning with LPIPS loss, which enables more aligned semantic meaning while keeping the purified images on data manifold. Our comprehensive experiments demonstrate that our Consistency Purification framework achieves state-of the-art certified robustness and efficiency compared to baseline methods. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.16756 [pdf, other]

Addressing Polarization and Unfairness in Performative Prediction

Authors: Kun Jin, Tian Xie, Yang Liu, Xueru Zhang

Abstract: When machine learning (ML) models are used in applications that involve humans (e.g., online recommendation, school admission, hiring, lending), the model itself may trigger changes in the distribution of targeted data it aims to predict. Performative prediction (PP) is a framework that explicitly considers such model-dependent distribution shifts when learning ML models. While significant efforts… ▽ More When machine learning (ML) models are used in applications that involve humans (e.g., online recommendation, school admission, hiring, lending), the model itself may trigger changes in the distribution of targeted data it aims to predict. Performative prediction (PP) is a framework that explicitly considers such model-dependent distribution shifts when learning ML models. While significant efforts have been devoted to finding performative stable (PS) solutions in PP for system robustness, their societal implications are less explored and it is unclear whether PS solutions are aligned with social norms such as fairness. In this paper, we set out to examine the fairness property of PS solutions in performative prediction. We first show that PS solutions can incur severe polarization effects and group-wise loss disparity. Although existing fairness mechanisms commonly used in literature can help mitigate unfairness, they may fail and disrupt the stability under model-dependent distribution shifts. We thus propose novel fairness intervention mechanisms that can simultaneously achieve both stability and fairness in PP settings. Both theoretical analysis and experiments are provided to validate the proposed method. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.10467 [pdf, other]

Scheduling two types of jobs with minimum makespan

Authors: Song Cao, Kai Jin

Abstract: We consider scheduling two types of jobs (A-job and B-job) to $p$ machines and minimizing their makespan. A group of same type of jobs processed consecutively by a machine is called a batch. For machine $v$, processing $x$ A-jobs in a batch takes $k^A_vx^2$ time units for a given speed $k^A_v$, and processing $x$ B-jobs in a batch takes $k^B_vx^2$ time units for a given speed $k^B_v$. We give an… ▽ More We consider scheduling two types of jobs (A-job and B-job) to $p$ machines and minimizing their makespan. A group of same type of jobs processed consecutively by a machine is called a batch. For machine $v$, processing $x$ A-jobs in a batch takes $k^A_vx^2$ time units for a given speed $k^A_v$, and processing $x$ B-jobs in a batch takes $k^B_vx^2$ time units for a given speed $k^B_v$. We give an $O(n^2p\log(n))$ algorithm based on dynamic programming and binary search for solving this problem, where $n$ denotes the maximal number of A-jobs and B-jobs to be distributed to the machines. Our algorithm also fits the easier linear case where each batch of length $x$ of $A$-jobs takes $k^A_v x$ time units and each batch of length $x$ of $B$-jobs takes $k^B_vx$ time units. The running time is the same as the above case. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07436 [pdf, other]

McEval: Massively Multilingual Code Evaluation

Authors: Linzheng Chai, Shukai Liu, Jian Yang, Yuwei Yin, Ke Jin, Jiaheng Liu, Tao Sun, Ge Zhang, Changyu Ren, Hongcheng Guo, Zekun Wang, Boyang Wang, Xianjie Wu, Bing Wang, Tongliang Li, Liqun Yang, Sufeng Duan, Zhoujun Li

Abstract: Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited nu… ▽ More Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited number of languages, where other languages are translated from the Python samples (e.g. MultiPL-E) degrading the data diversity. To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. The benchmark contains challenging code completion, understanding, and generation evaluation tasks with finely curated massively multilingual instruction corpora McEval-Instruct. In addition, we introduce an effective multilingual coder mCoder trained on McEval-Instruct to support multilingual programming language generation. Extensive experimental results on McEval show that there is still a difficult journey between open-source models and closed-source LLMs (e.g. GPT-series models) in numerous languages. The instruction corpora, evaluation benchmark, and leaderboard are available at \url{https://mceval.github.io/}. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 22 pages

arXiv:2406.05247 [pdf, other]

Measuring Fairness in Large-Scale Recommendation Systems with Missing Labels

Authors: Yulong Dong, Kun Jin, Xinghai Hu, Yang Liu

Abstract: In large-scale recommendation systems, the vast array of items makes it infeasible to obtain accurate user preferences for each product, resulting in a common issue of missing labels. Typically, only items previously recommended to users have associated ground truth data. Although there is extensive research on fairness concerning fully observed user-item interactions, the challenge of fairness in… ▽ More In large-scale recommendation systems, the vast array of items makes it infeasible to obtain accurate user preferences for each product, resulting in a common issue of missing labels. Typically, only items previously recommended to users have associated ground truth data. Although there is extensive research on fairness concerning fully observed user-item interactions, the challenge of fairness in scenarios with missing labels remains underexplored. Previous methods often treat these samples missing labels as negative, which can significantly deviate from the ground truth fairness metrics. Our study addresses this gap by proposing a novel method employing a small randomized traffic to estimate fairness metrics accurately. We present theoretical bounds for the estimation error of our fairness metric and support our findings with empirical evidence on real data. Our numerical experiments on synthetic and TikTok's real-world data validate our theory and show the efficiency and effectiveness of our novel methods. To the best of our knowledge, we are the first to emphasize the necessity of random traffic in dataset collection for recommendation fairness, the first to publish a fairness-related dataset from TikTok and to provide reliable estimates of fairness metrics in the context of large-scale recommendation systems with missing labels. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2404.10514 [pdf, other]

Simple $k$-crashing Plan with a Good Approximation Ratio

Authors: Ruixi Luo, Kai Jin, Zelin Ye

Abstract: In project management, a project is typically described as an activity-on-edge network (AOE network), where each activity / job is represented as an edge of some network $N$ (which is a DAG). Some jobs must be finished before others can be started, as described by the topology structure of $N$. It is known that job $j_i$ in normal speed would require $b_i$ days to be finished after it is started.… ▽ More In project management, a project is typically described as an activity-on-edge network (AOE network), where each activity / job is represented as an edge of some network $N$ (which is a DAG). Some jobs must be finished before others can be started, as described by the topology structure of $N$. It is known that job $j_i$ in normal speed would require $b_i$ days to be finished after it is started. Given the network $N$ with the associated edge lengths $b_1,\ldots,b_m$, the duration of the project is determined, which equals the length of the critical path (namely, the longest path) of $N$. To speed up the project (i.e. reduce the duration), the manager can crash a few jobs (namely, reduce the length of the corresponding edges) by investing extra resources into that job. However, the time for completing $j_i$ has a lower bound due to technological limits -- it requires at least $a_i$ days to be completed. Moreover, it is expensive to buy resources. Given $N$ and an integer $k\geq 1$, the $k$-crashing problem asks the minimum amount of resources required to speed up the project by $k$ days. We show a simple and efficient algorithm with an approximation ratio $\frac{1}{1}+\ldots+\frac{1}{k}$ for this problem. We also study a related problem called $k$-LIS, in which we are given a sequence $ω$ of numbers and we aim to find $k$ disjoint increasing subsequence of $ω$ with the largest total length. We show a $(1-\frac{1}{e})$-approximation algorithm which is simple and efficient. △ Less

Submitted 16 April, 2024; originally announced April 2024.

ACM Class: K.6.1

arXiv:2404.09682 [pdf, other]

Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

Authors: Juhwan Choi, Jungmin Yun, Kyohoon Jin, YoungBin Kim

Abstract: The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models. However, datasets often contain noisy data inadvertently included during the construction process. Numerous attempts have been made to correct this issue through human annotators. However, hiring and managing human annotators is expensive and time-consuming. As an alternative, recent s… ▽ More The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models. However, datasets often contain noisy data inadvertently included during the construction process. Numerous attempts have been made to correct this issue through human annotators. However, hiring and managing human annotators is expensive and time-consuming. As an alternative, recent studies are exploring the use of large language models (LLMs) for data annotation. In this study, we present a case study that extends the application of LLM-based data annotation to enhance the quality of existing datasets through a cleansing strategy. Specifically, we leverage approaches such as chain-of-thought (CoT) and majority voting to imitate human annotation and classify unrelated documents from the Multi-News dataset, which is widely used for the multi-document summarization task. Through our proposed cleansing method, we introduce an enhanced Multi-News+. By employing LLMs for data cleansing, we demonstrate an efficient and effective approach to improving dataset quality without relying on expensive human annotation efforts. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.05558 [pdf, other]

JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Authors: Woo Kyoung Han, Sunghoon Im, Jaedeok Kim, Kyong Hwan Jin

Abstract: We propose a practical approach to JPEG image decoding, utilizing a local implicit neural representation with continuous cosine formulation. The JPEG algorithm significantly quantizes discrete cosine transform (DCT) spectra to achieve a high compression rate, inevitably resulting in quality degradation while encoding an image. We have designed a continuous cosine spectrum estimator to address the… ▽ More We propose a practical approach to JPEG image decoding, utilizing a local implicit neural representation with continuous cosine formulation. The JPEG algorithm significantly quantizes discrete cosine transform (DCT) spectra to achieve a high compression rate, inevitably resulting in quality degradation while encoding an image. We have designed a continuous cosine spectrum estimator to address the quality degradation issue that restores the distorted spectrum. By leveraging local DCT formulations, our network has the privilege to exploit dequantization and upsampling simultaneously. Our proposed model enables decoding compressed images directly across different quality factors using a single pre-trained model without relying on a conventional JPEG decoder. As a result, our proposed network achieves state-of-the-art performance in flexible color image JPEG artifact removal tasks. Our source code is available at https://github.com/WooKyoungHan/JDEC. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.17377 [pdf, other]

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Authors: Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim

Abstract: Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel… ▽ More Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Project page is available at https://ku-cvlab.github.io/Perturbed-Attention-Guidance

arXiv:2403.15512 [pdf, other]

Enhancing Effectiveness and Robustness in a Low-Resource Regime via Decision-Boundary-aware Data Augmentation

Authors: Kyohoon Jin, Junho Lee, Juhwan Choi, Sangmin Song, Youngbin Kim

Abstract: Efforts to leverage deep learning models in low-resource regimes have led to numerous augmentation studies. However, the direct application of methods such as mixup and cutout to text data, is limited due to their discrete characteristics. While methods using pretrained language models have exhibited efficiency, they require additional considerations for robustness. Inspired by recent studies on d… ▽ More Efforts to leverage deep learning models in low-resource regimes have led to numerous augmentation studies. However, the direct application of methods such as mixup and cutout to text data, is limited due to their discrete characteristics. While methods using pretrained language models have exhibited efficiency, they require additional considerations for robustness. Inspired by recent studies on decision boundaries, this paper proposes a decision-boundary-aware data augmentation strategy to enhance robustness using pretrained language models. The proposed technique first focuses on shifting the latent features closer to the decision boundary, followed by reconstruction to generate an ambiguous version with a soft label. Additionally, mid-K sampling is suggested to enhance the diversity of the generated sentences. This paper demonstrates the performance of the proposed augmentation strategy compared to other methods through extensive experiments. Furthermore, the ablation study reveals the effect of soft labels and mid-K sampling and the extensibility of the method with curriculum data augmentation. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2402.11702 [pdf, other]

doi 10.1145/3643991.3645074

Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation

Authors: Kailun Jin, Chung-Yu Wang, Hung Viet Pham, Hadi Hemmati

Abstract: Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an em… ▽ More Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development. △ Less

Submitted 16 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: 4 pages, 3 figures, 21st International Conference on Mining Software Repositories (MSR '24), April 15-16, 2024, Lisbon, Portugal

ACM Class: I.2.2

arXiv:2402.05591 [pdf, ps, other]

SoftEDA: Rethinking Rule-Based Data Augmentation with Soft Labels

Authors: Juhwan Choi, Kyohoon Jin, Junho Lee, Sangmin Song, Youngbin Kim

Abstract: Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and em… ▽ More Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and empirically demonstrated the effectiveness of our proposed approach. We have publicly opened our source code for reproducibility. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: ICLR 2023 Tiny Papers

arXiv:2402.05584 [pdf, other]

AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

Authors: Juhwan Choi, Kyohoon Jin, Junho Lee, Sangmin Song, Youngbin Kim

Abstract: Text data augmentation is a complex problem due to the discrete nature of sentences. Although rule-based augmentation methods are widely adopted in real-world applications because of their simplicity, they suffer from potential semantic damage. Previous researchers have suggested easy data augmentation with soft labels (softEDA), employing label smoothing to mitigate this problem. However, finding… ▽ More Text data augmentation is a complex problem due to the discrete nature of sentences. Although rule-based augmentation methods are widely adopted in real-world applications because of their simplicity, they suffer from potential semantic damage. Previous researchers have suggested easy data augmentation with soft labels (softEDA), employing label smoothing to mitigate this problem. However, finding the best factor for each model and dataset is challenging; therefore, using softEDA in real-world applications is still difficult. In this paper, we propose adapting AutoAugment to solve this problem. The experimental results suggest that the proposed method can boost existing augmentation methods and that rule-based methods can enhance cutting-edge pre-trained language models. We offer the source code. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: EACL 2024 Student Research Workshop

arXiv:2402.05512 [pdf, other]

GPTs Are Multilingual Annotators for Sequence Generation Tasks

Authors: Juhwan Choi, Eunju Lee, Kyohoon Jin, YoungBin Kim

Abstract: Data annotation is an essential step for constructing new datasets. However, the conventional approach of data annotation through crowdsourcing is both time-consuming and expensive. In addition, the complexity of this process increases when dealing with low-resource languages owing to the difference in the language pool of crowdworkers. To address these issues, this study proposes an autonomous an… ▽ More Data annotation is an essential step for constructing new datasets. However, the conventional approach of data annotation through crowdsourcing is both time-consuming and expensive. In addition, the complexity of this process increases when dealing with low-resource languages owing to the difference in the language pool of crowdworkers. To address these issues, this study proposes an autonomous annotation method by utilizing large language models, which have been recently demonstrated to exhibit remarkable performance. Through our experiments, we demonstrate that the proposed method is not just cost-efficient but also applicable for low-resource language annotation. Additionally, we constructed an image captioning dataset using our approach and are committed to open this dataset for future study. We have opened our source code for further study and reproducibility. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: EACL 2024 Findings: Camera-ready version

arXiv:2312.15840 [pdf, other]

Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval

Authors: Zeqiang Wei, Kai Jin, Xiuzhuang Zhou

Abstract: Cross-modal medical image-report retrieval task plays a significant role in clinical diagnosis and various medical generative tasks. Eliminating heterogeneity between different modalities to enhance semantic consistency is the key challenge of this task. The current Vision-Language Pretraining (VLP) models, with cross-modal contrastive learning and masked reconstruction as joint training tasks, ca… ▽ More Cross-modal medical image-report retrieval task plays a significant role in clinical diagnosis and various medical generative tasks. Eliminating heterogeneity between different modalities to enhance semantic consistency is the key challenge of this task. The current Vision-Language Pretraining (VLP) models, with cross-modal contrastive learning and masked reconstruction as joint training tasks, can effectively enhance the performance of cross-modal retrieval. This framework typically employs dual-stream inputs, using unmasked data for cross-modal contrastive learning and masked data for reconstruction. However, due to task competition and information interference caused by significant differences between the inputs of the two proxy tasks, the effectiveness of representation learning for intra-modal and cross-modal features is limited. In this paper, we propose an efficient VLP framework named Masked Contrastive and Reconstruction (MCR), which takes masked data as the sole input for both tasks. This enhances task connections, reducing information interference and competition between them, while also substantially decreasing the required GPU memory and training time. Moreover, we introduce a new modality alignment strategy named Mapping before Aggregation (MbA). Unlike previous methods, MbA maps different modalities to a common feature space before conducting local feature aggregation, thereby reducing the loss of fine-grained semantic information necessary for improved modality alignment. Qualitative and quantitative experiments conducted on the MIMIC-CXR dataset validate the effectiveness of our approach, demonstrating state-of-the-art performance in medical cross-modal retrieval tasks. △ Less

Submitted 26 December, 2023; v1 submitted 25 December, 2023; originally announced December 2023.

arXiv:2310.12189 [pdf, other]

Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation

Authors: Bosang Kim, Jonghyun Kim, Hyotae Lee, Lanying Jin, Jeongwon Ha, Dowoo Kwon, Jungpyo Kim, Wonhyeok Im, KyungMin Jin, Jungho Lee

Abstract: In general, hand pose estimation aims to improve the robustness of model performance in the real-world scenes. However, it is difficult to enhance the robustness since existing datasets are obtained in restricted environments to annotate 3D information. Although neural networks quantitatively achieve a high estimation accuracy, unsatisfied results can be observed in visual quality. This discrepanc… ▽ More In general, hand pose estimation aims to improve the robustness of model performance in the real-world scenes. However, it is difficult to enhance the robustness since existing datasets are obtained in restricted environments to annotate 3D information. Although neural networks quantitatively achieve a high estimation accuracy, unsatisfied results can be observed in visual quality. This discrepancy between quantitative results and their visual qualities remains an open issue in the hand pose representation. To this end, we propose a mesh represented recycle learning strategy for 3D hand pose and mesh estimation which reinforces synthesized hand mesh representation in a training phase. To be specific, a hand pose and mesh estimation model first predicts parametric 3D hand annotations (i.e., 3D keypoint positions and vertices for hand mesh) with real-world hand images in the training phase. Second, synthetic hand images are generated with self-estimated hand mesh representations. After that, the synthetic hand images are fed into the same model again. Thus, the proposed learning strategy simultaneously improves quantitative results and visual qualities by reinforcing synthetic mesh representation. To encourage consistency between original model output and its recycled one, we propose self-correlation loss which maximizes the accuracy and reliability of our learning strategy. Consequently, the model effectively conducts self-refinement on hand pose estimation by learning mesh representation from its own output. To demonstrate the effectiveness of our learning strategy, we provide extensive experiments on FreiHAND dataset. Notably, our learning strategy improves the performance on hand pose and mesh estimation without any extra computational burden during the inference. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.07394 [pdf, ps, other]

CLIP for Lightweight Semantic Segmentation

Authors: Ke Jin, Wankou Yang

Abstract: The large-scale pretrained model CLIP, trained on 400 million image-text pairs, offers a promising paradigm for tackling vision tasks, albeit at the image level. Later works, such as DenseCLIP and LSeg, extend this paradigm to dense prediction, including semantic segmentation, and have achieved excellent results. However, the above methods either rely on CLIP-pretrained visual backbones or use non… ▽ More The large-scale pretrained model CLIP, trained on 400 million image-text pairs, offers a promising paradigm for tackling vision tasks, albeit at the image level. Later works, such as DenseCLIP and LSeg, extend this paradigm to dense prediction, including semantic segmentation, and have achieved excellent results. However, the above methods either rely on CLIP-pretrained visual backbones or use none-pretrained but heavy backbones such as Swin, while falling ineffective when applied to lightweight backbones. The reason for this is that the lightweitht networks, feature extraction ability of which are relatively limited, meet difficulty embedding the image feature aligned with text embeddings perfectly. In this work, we present a new feature fusion module which tackles this problem and enables language-guided paradigm to be applied to lightweight networks. Specifically, the module is a parallel design of CNN and transformer with a two-way bridge in between, where CNN extracts spatial information and visual context of the feature map from the image encoder, and the transformer propagates text embeddings from the text encoder forward. The core of the module is the bidirectional fusion of visual and text feature across the bridge which prompts their proximity and alignment in embedding space. The module is model-agnostic, which can not only make language-guided lightweight semantic segmentation practical, but also fully exploit the pretrained knowledge of language priors and achieve better performance than previous SOTA work, such as DenseCLIP, whatever the vision backbone is. Extensive experiments have been conducted to demonstrate the superiority of our method. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.11119 [pdf, other]

BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird's Eye View Map Construction

Authors: Minsu Kim, Giseop Kim, Kyong Hwan Jin, Sunwook Choi

Abstract: A recent sensor fusion in a Bird's Eye View (BEV) space has shown its utility in various tasks such as 3D detection, map segmentation, etc. However, the approach struggles with inaccurate camera BEV estimation, and a perception of distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses the problems with a spatial synchronization app… ▽ More A recent sensor fusion in a Bird's Eye View (BEV) space has shown its utility in various tasks such as 3D detection, map segmentation, etc. However, the approach struggles with inaccurate camera BEV estimation, and a perception of distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses the problems with a spatial synchronization approach of cross-modality. Our strategy aims to enhance camera BEV estimation for a broad-sighted perception while simultaneously improving the completion of LiDAR's sparsity in the entire BEV space. Toward that end, we devise Point-scattering that scatters LiDAR BEV distribution to camera depth distribution. The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space. For an effective BEV fusion between the spatially synchronized features, we suggest ColFusion that applies self-attention weights of LiDAR and camera BEV features to each other. Our extensive experiments demonstrate that BroadBEV provides a broad-sighted BEV perception with remarkable performance gains. △ Less

Submitted 8 November, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.01409 [pdf, other]

Implicit Neural Image Stitching

Authors: Minsu Kim, Jaewon Lee, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin

Abstract: Existing frameworks for image stitching often provide visually reasonable stitchings. However, they suffer from blurry artifacts and disparities in illumination, depth level, etc. Although the recent learning-based stitchings relax such disparities, the required methods impose sacrifice of image qualities failing to capture high-frequency details for stitched images. To address the problem, we pro… ▽ More Existing frameworks for image stitching often provide visually reasonable stitchings. However, they suffer from blurry artifacts and disparities in illumination, depth level, etc. Although the recent learning-based stitchings relax such disparities, the required methods impose sacrifice of image qualities failing to capture high-frequency details for stitched images. To address the problem, we propose a novel approach, implicit Neural Image Stitching (NIS) that extends arbitrary-scale super-resolution. Our method estimates Fourier coefficients of images for quality-enhancing warps. Then, the suggested model blends color mismatches and misalignment in the latent space and decodes the features into RGB values of stitched images. Our experiments show that our approach achieves improvement in resolving the low-definition imaging of the previous deep image stitching with favorable accelerated image-enhancing methods. Our source code is available at https://github.com/minshu-kim/NIS. △ Less

Submitted 21 January, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.01406 [pdf, other]

Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Authors: Minsu Kim, Yongjun Lee, Woo Kyoung Han, Kyong Hwan Jin

Abstract: Trendy suggestions for learning-based elastic warps enable the deep image stitchings to align images exposed to large parallax errors. Despite the remarkable alignments, the methods struggle with occasional holes or discontinuity between overlapping and non-overlapping regions of a target image as the applied training strategy mostly focuses on overlap region alignment. As a result, they require a… ▽ More Trendy suggestions for learning-based elastic warps enable the deep image stitchings to align images exposed to large parallax errors. Despite the remarkable alignments, the methods struggle with occasional holes or discontinuity between overlapping and non-overlapping regions of a target image as the applied training strategy mostly focuses on overlap region alignment. As a result, they require additional modules such as seam finder and image inpainting for hiding discontinuity and filling holes, respectively. In this work, we suggest Recurrent Elastic Warps (REwarp) that address the problem with Dirichlet boundary condition and boost performances by residual learning for recurrent misalign correction. Specifically, REwarp predicts a homography and a Thin-plate Spline (TPS) under the boundary constraint for discontinuity and hole-free image stitching. Our experiments show the favorable aligns and the competitive computational costs of REwarp compared to the existing stitching methods. Our source code is available at https://github.com/minshu-kim/REwarp. △ Less

Submitted 18 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.13782 [pdf, other]

Planning with Logical Graph-based Language Model for Instruction Generation

Authors: Fan Zhang, Kebing Jin, Hankz Hankui Zhuo

Abstract: Despite the superior performance of large language models to generate natural language texts, it is hard to generate texts with correct logic according to a given task, due to the difficulties for neural models to capture implied rules from free-form texts. In this paper, we propose a novel graph-based language model, Logical-GLM, to infuse logic into language models for more valid text generation… ▽ More Despite the superior performance of large language models to generate natural language texts, it is hard to generate texts with correct logic according to a given task, due to the difficulties for neural models to capture implied rules from free-form texts. In this paper, we propose a novel graph-based language model, Logical-GLM, to infuse logic into language models for more valid text generation and interpretability. Specifically, we first capture information from natural language instructions and construct logical bayes graphs that generally describe domains. Next, we generate logical skeletons to guide language model training, infusing domain knowledge into language models. Finally, we alternately optimize the searching policy of graphs and language models until convergence. The experimental results show that Logical-GLM is both effective and efficient compared with traditional language models, despite using smaller-scale training data and fewer parameters. Our approach can generate instructional texts with more correct logic owing to the internalized domain knowledge. Moreover, the usage of logical graphs reflects the inner mechanism of the language models, which improves the interpretability of black-box models. △ Less

Submitted 5 July, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

Comments: 9 pages, 8 figures

arXiv:2306.04090 [pdf, other]

PlayBest: Professional Basketball Player Behavior Synthesis via Planning with Diffusion

Authors: Xiusi Chen, Wei-Yao Wang, Ziniu Hu, David Reynoso, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang

Abstract: Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to… ▽ More Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games. △ Less

Submitted 16 July, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: CIKM 2024

arXiv:2306.02582 [pdf, other]

Enhancing Point Annotations with Superpixel and Confidence Learning Guided for Improving Semi-Supervised OCT Fluid Segmentation

Authors: Tengjin Weng, Yang Shen, Kai Jin, Zhiming Cheng, Yunxiang Li, Gewen Zhang, Shuai Wang, Yaqi Wang

Abstract: Automatic segmentation of fluid in Optical Coherence Tomography (OCT) images is beneficial for ophthalmologists to make an accurate diagnosis. Although semi-supervised OCT fluid segmentation networks enhance their performance by introducing additional unlabeled data, the performance enhancement is limited. To address this, we propose Superpixel and Confident Learning Guide Point Annotations Networ… ▽ More Automatic segmentation of fluid in Optical Coherence Tomography (OCT) images is beneficial for ophthalmologists to make an accurate diagnosis. Although semi-supervised OCT fluid segmentation networks enhance their performance by introducing additional unlabeled data, the performance enhancement is limited. To address this, we propose Superpixel and Confident Learning Guide Point Annotations Network (SCLGPA-Net) based on the teacher-student architecture, which can learn OCT fluid segmentation from limited fully-annotated data and abundant point-annotated data. Specifically, we use points to annotate fluid regions in unlabeled OCT images and the Superpixel-Guided Pseudo-Label Generation (SGPLG) module generates pseudo-labels and pixel-level label trust maps from the point annotations. The label trust maps provide an indication of the reliability of the pseudo-labels. Furthermore, we propose the Confident Learning Guided Label Refinement (CLGLR) module identifies error information in the pseudo-labels and leads to further refinement. Experiments on the RETOUCH dataset show that we are able to reduce the need for fully-annotated data by 94.22\%, closing the gap with the best fully supervised baselines to a mean IoU of only 2\%. Furthermore, We constructed a private 2D OCT fluid segmentation dataset for evaluation. Compared with other methods, comprehensive experimental results demonstrate that the proposed method can achieve excellent performance in OCT fluid segmentation. △ Less

Submitted 30 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Submission to BSPC

arXiv:2306.00256 [pdf, other]

DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Authors: Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin

Abstract: Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their c… ▽ More Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose \underline{D}ecentralized \underline{SGD} with \underline{C}ommunication-optimal \underline{E}xact \underline{C}onsensus \underline{A}lgorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an $\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD's convergence analysis. Numerical experiments show the efficiency of DSGD-CECA. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.17866 [pdf, other]

Sequential Condition Evolved Interaction Knowledge Graph for Traditional Chinese Medicine Recommendation

Authors: Jingjin Liu, Hankz Hankui Zhuo, Kebing Jin, Jiamin Yuan, Zhimin Yang, Zhengan Yao

Abstract: Traditional Chinese Medicine (TCM) has a rich history of utilizing natural herbs to treat a diversity of illnesses. In practice, TCM diagnosis and treatment are highly personalized and organically holistic, requiring comprehensive consideration of the patient's state and symptoms over time. However, existing TCM recommendation approaches overlook the changes in patient status and only explore pote… ▽ More Traditional Chinese Medicine (TCM) has a rich history of utilizing natural herbs to treat a diversity of illnesses. In practice, TCM diagnosis and treatment are highly personalized and organically holistic, requiring comprehensive consideration of the patient's state and symptoms over time. However, existing TCM recommendation approaches overlook the changes in patient status and only explore potential patterns between symptoms and prescriptions. In this paper, we propose a novel Sequential Condition Evolved Interaction Knowledge Graph (SCEIKG), a framework that treats the model as a sequential prescription-making problem by considering the dynamics of the patient's condition across multiple visits. In addition, we incorporate an interaction knowledge graph to enhance the accuracy of recommendations by considering the interactions between different herbs and the patient's condition. Experimental results on a real-world dataset demonstrate that our approach outperforms existing TCM recommendation methods, achieving state-of-the-art performance. △ Less

Submitted 6 October, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.13882 [pdf, other]

Subsampling Error in Stochastic Gradient Langevin Diffusions

Authors: Kexin Jin, Chenguang Liu, Jonas Latz

Abstract: The Stochastic Gradient Langevin Dynamics (SGLD) are popularly used to approximate Bayesian posterior distributions in statistical learning procedures with large-scale data. As opposed to many usual Markov chain Monte Carlo (MCMC) algorithms, SGLD is not stationary with respect to the posterior distribution; two sources of error appear: The first error is introduced by an Euler--Maruyama discretis… ▽ More The Stochastic Gradient Langevin Dynamics (SGLD) are popularly used to approximate Bayesian posterior distributions in statistical learning procedures with large-scale data. As opposed to many usual Markov chain Monte Carlo (MCMC) algorithms, SGLD is not stationary with respect to the posterior distribution; two sources of error appear: The first error is introduced by an Euler--Maruyama discretisation of a Langevin diffusion process, the second error comes from the data subsampling that enables its use in large-scale data settings. In this work, we consider an idealised version of SGLD to analyse the method's pure subsampling error that we then see as a best-case error for diffusion-based subsampling MCMC methods. Indeed, we introduce and study the Stochastic Gradient Langevin Diffusion (SGLDiff), a continuous-time Markov process that follows the Langevin diffusion corresponding to a data subset and switches this data subset after exponential waiting times. There, we show the exponential ergodicity of SLGDiff and that the Wasserstein distance between the posterior and the limiting distribution of SGLDiff is bounded above by a fractional power of the mean waiting time. We bring our results into context with other analyses of SGLD. △ Less

Submitted 26 April, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: AISTATS 2024

MSC Class: 65C05; 62F15

arXiv:2305.13608 [pdf, other]

VDD: Varied Drone Dataset for Semantic Segmentation

Authors: Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang

Abstract: Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus… ▽ More Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus on urban scenes and are relatively small, our Varied Drone Dataset (VDD) addresses these limitations by offering a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes. This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions. We also make new annotations to UDD and UAVid, integrating them under VDD annotation standards, to create the Integrated Drone Dataset (IDD). We train seven state-of-the-art models on drone datasets as baselines. It's expected that our dataset will generate considerable interest in drone image segmentation and serve as a foundation for other drone vision tasks. Datasets are publicly available at \href{our website}{https://github.com/RussRobin/VDD}. △ Less

Submitted 2 July, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.05090 [pdf, other]

Performative Federated Learning: A Solution to Model-Dependent and Heterogeneous Distribution Shifts

Authors: Kun Jin, Tongxin Yin, Zhongzhu Chen, Zeyu Sun, Xueru Zhang, Yang Liu, Mingyan Liu

Abstract: We consider a federated learning (FL) system consisting of multiple clients and a server, where the clients aim to collaboratively learn a common decision model from their distributed data. Unlike the conventional FL framework that assumes the client's data is static, we consider scenarios where the clients' data distributions may be reshaped by the deployed decision model. In this work, we levera… ▽ More We consider a federated learning (FL) system consisting of multiple clients and a server, where the clients aim to collaboratively learn a common decision model from their distributed data. Unlike the conventional FL framework that assumes the client's data is static, we consider scenarios where the clients' data distributions may be reshaped by the deployed decision model. In this work, we leverage the idea of distribution shift mappings in performative prediction to formalize this model-dependent data distribution shift and propose a performative federated learning framework. We first introduce necessary and sufficient conditions for the existence of a unique performative stable solution and characterize its distance to the performative optimal solution. Then we propose the performative FedAvg algorithm and show that it converges to the performative stable solution at a rate of O(1/T) under both full and partial participation schemes. In particular, we use novel proof techniques and show how the clients' heterogeneity influences the convergence. Numerical results validate our analysis and provide valuable insights into real-world applications. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2304.12566 [pdf, other]

AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Authors: Yi-Fan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Abstract: Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target… ▽ More Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target data or extra sophisticated optimization procedures during the inference stage. In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC). In particular, we construct a memory that contains the feature and label pairs from training domains. During inference, given a test instance, AdaNPC first recalls K closed samples from the memory to vote for the prediction, and then the test feature and predicted label are added to the memory. In this way, the sample distribution in the memory can be gradually changed from the training distribution towards the test distribution with very little extra computation cost. We theoretically justify the rationality behind the proposed method. Besides, we test our model on extensive numerical experiments. AdaNPC significantly outperforms competitive baselines on various DG benchmarks. In particular, when the adaptation target is a series of domains, the adaptation accuracy of AdaNPC is 50% higher than advanced TTA methods. The code is available at https://github.com/yfzhang114/AdaNPC. △ Less

Submitted 9 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 30 pages, 12 figures

Journal ref: The Fortieth International Conference on Machine Learning, ICML, 2023

arXiv:2302.07676 [pdf, other]

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes

Authors: Shenghao Hao, Peiyuan Liu, Yibing Zhan, Kaixun Jin, Zuozhu Liu, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Abstract: Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5)… ▽ More Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods. To solve the aforementioned issues, we introduce DIVOTrack: a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT, which learns object detection, single-view association, and cross-view matching with an all-in-one embedding model. Finally, we present a summary of current methodologies and a set of standard benchmarks with our DIVOTrack to provide a fair comparison and conduct a comprehensive analysis of current approaches and our proposed CrossMOT. The dataset and code are available at https://github.com/shengyuhao/DIVOTrack. △ Less

Submitted 7 October, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: Accepted to IJCV 2023

arXiv:2212.05412 [pdf, other]

A Hierarchical Temporal Planning-Based Approach for Dynamic Hoist Scheduling Problems

Authors: Kebing Jin, Yingkai Xiao, Hankz Hankui Zhuo, Renyong Ma

Abstract: Hoist scheduling has become a bottleneck in electroplating industry applications with the development of autonomous devices. Although there are a few approaches proposed to target at the challenging problem, they generally cannot scale to large-scale scheduling problems. In this paper, we formulate the hoist scheduling problem as a new temporal planning problem in the form of adapted PDDL, and pro… ▽ More Hoist scheduling has become a bottleneck in electroplating industry applications with the development of autonomous devices. Although there are a few approaches proposed to target at the challenging problem, they generally cannot scale to large-scale scheduling problems. In this paper, we formulate the hoist scheduling problem as a new temporal planning problem in the form of adapted PDDL, and propose a novel hierarchical temporal planning approach to efficiently solve the scheduling problem. Additionally, we provide a collection of real-life benchmark instances that can be used to evaluate solution methods for the problem. We exhibit that the proposed approach is able to efficiently find solutions of high quality for large-scale real-life benchmark instances, with comparison to state-of-the-art baselines. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2211.15868 [pdf, other]

Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos

Authors: Kyung-Min Jin, Byoung-Sung Lim, Gun-Hee Lee, Tae-Kyung Kang, Seong-Whan Lee

Abstract: Previous video-based human pose estimation methods have shown promising results by leveraging aggregated features of consecutive frames. However, most approaches compromise accuracy to mitigate jitter or do not sufficiently comprehend the temporal aspects of human motion. Furthermore, occlusion increases uncertainty between consecutive frames, which results in unsmooth results. To address these is… ▽ More Previous video-based human pose estimation methods have shown promising results by leveraging aggregated features of consecutive frames. However, most approaches compromise accuracy to mitigate jitter or do not sufficiently comprehend the temporal aspects of human motion. Furthermore, occlusion increases uncertainty between consecutive frames, which results in unsmooth results. To address these issues, we design an architecture that exploits the keypoint kinematic features with the following components. First, we effectively capture the temporal features by leveraging individual keypoint's velocity and acceleration. Second, the proposed hierarchical transformer encoder aggregates spatio-temporal dependencies and refines the 2D or 3D input pose estimated from existing estimators. Finally, we provide an online cross-supervision between the refined input pose generated from the encoder and the final pose from our decoder to enable joint optimization. We demonstrate comprehensive results and validate the effectiveness of our model in various tasks: 2D pose estimation, 3D pose estimation, body mesh recovery, and sparsely annotated multi-human pose estimation. Our code is available at https://github.com/KyungMinJin/HANet. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.15666 [pdf, other]

Learning Visual Planning Models from Partially Observed Images

Authors: Kebing Jin, Zhanhao Xiao, Hankui Hankz Zhuo, Hai Wan, Jiaran Cai

Abstract: There has been increasing attention on planning model learning in classical planning. Most existing approaches, however, focus on learning planning models from structured data in symbolic representations. It is often difficult to obtain such structured data in real-world scenarios. Although a number of approaches have been developed for learning planning models from fully observed unstructured dat… ▽ More There has been increasing attention on planning model learning in classical planning. Most existing approaches, however, focus on learning planning models from structured data in symbolic representations. It is often difficult to obtain such structured data in real-world scenarios. Although a number of approaches have been developed for learning planning models from fully observed unstructured data (e.g., images), in many scenarios raw observations are often incomplete. In this paper, we provide a novel framework, \aType{Recplan}, for learning a transition model from partially observed raw image traces. More specifically, by considering the preceding and subsequent images in a trace, we learn the latent state representations of raw observations and then build a transition model based on such representations. Additionally, we propose a neural-network-based approach to learn a heuristic model that estimates the distance toward a given goal observation. Based on the learned transition model and heuristic model, we implement a classical planner for images. We exhibit empirically that our approach is more effective than a state-of-the-art approach of learning visual planning models in the environment with incomplete observations. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 25 pages, 5 figures

arXiv:2211.00322 [pdf, other]

DensePure: Understanding Diffusion Models towards Adversarial Robustness

Authors: Chaowei Xiao, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, Dawn Song

Abstract: Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions u… ▽ More Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. This deeper understanding allows us to propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. classifier). Given an (adversarial) input, DensePure consists of multiple runs of denoising via the reverse process of the diffusion model (with different random seeds) to get multiple reversed samples, which are then passed through the classifier, followed by majority voting of inferred labels to make the final prediction. This design of using multiple runs of denoising is informed by our theoretical analysis of the conditional distribution of the reversed sample. Specifically, when the data density of a clean sample is high, its conditional density under the reverse process in a diffusion model is also high; thus sampling from the latter conditional distribution can purify the adversarial example and return the corresponding clean sample with a high probability. By using the highest density point in the conditional distribution as the reversed sample, we identify the robust region of a given instance under the diffusion model's reverse process. We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works. In practice, DensePure can approximate the label of the high density region in the conditional distribution so that it can enhance certified robustness. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.07881 [pdf, other]

Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

Authors: Zhuoqing Song, Weijian Li, Kexin Jin, Lei Shi, Ming Yan, Wotao Yin, Kun Yuan

Abstract: Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. Howev… ▽ More Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. However, the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is ``mixed'' by communication. We found that popular communication topologies either have large maximum degrees (such as stars and complete graphs) or are ineffective at mixing information (such as rings and grids). To address this problem, we propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate that is used to measure the mixing efficiency. In the proposed family, EquiStatic has a degree of $Θ(\ln(n))$, where $n$ is the network size, and a series of time-dependent one-peer topologies, EquiDyn, has a constant degree of 1. We generate EquiDyn through a certain random sampling procedure. Both of them achieve an $n$-independent consensus rate. We apply them to decentralized SGD and decentralized gradient tracking and obtain faster communication and better convergence, theoretically and empirically. Our code is implemented through BlueFog and available at \url{https://github.com/kexinjinnn/EquiTopo} △ Less

Submitted 12 March, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2209.06208 [pdf, other]

Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning

Authors: Kaizhe Jin, Adrian Rubio-Solis, Ravi Naik, Tochukwu Onyeogulu, Amirul Islam, Salman Khan, Izzeddin Teeti, James Kinross, Daniel R Leff, Fabio Cuzzolin, George Mylonas

Abstract: The operating room (OR) is a dynamic and complex environment consisting of a multidisciplinary team working together in a high take environment to provide safe and efficient patient care. Additionally, surgeons are frequently exposed to multiple psycho-organisational stressors that may cause negative repercussions on their immediate technical performance and long-term health. Many factors can ther… ▽ More The operating room (OR) is a dynamic and complex environment consisting of a multidisciplinary team working together in a high take environment to provide safe and efficient patient care. Additionally, surgeons are frequently exposed to multiple psycho-organisational stressors that may cause negative repercussions on their immediate technical performance and long-term health. Many factors can therefore contribute to increasing the Cognitive Workload (CWL) such as temporal pressures, unfamiliar anatomy or distractions in the OR. In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL in four different surgical task conditions. Firstly, a model based on the concept of transfer learning is used to identify if a surgeon is experiencing any CWL. Secondly, a Convolutional Neural Network (CNN) uses this information to identify different degrees of CWL associated to each surgical task. The suggested multimodal approach considers adjacent signals from electroencephalogram (EEG), functional near-infrared spectroscopy (fNIRS) and eye pupil diameter. The concatenation of signals allows complex correlations in terms of time (temporal) and channel location (spatial). Data collection was performed by a Multi-sensing AI Environment for Surgical Task & Role Optimisation platform (MAESTRO) developed at the Hamlyn Centre, Imperial College London. To compare the performance of the proposed methodology, a number of state-of-art machine learning techniques have been implemented. The tests show that the proposed model has a precision of 93%. △ Less

Submitted 30 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

arXiv:2209.05056 [pdf, other]

Situation Awareness for Automated Surgical Check-listing in AI-Assisted Operating Room

Authors: Tochukwu Onyeogulu, Salman Khan, Izzeddin Teeti, Amirul Islam, Kaizhe Jin, Adrian Rubio-Solis, Ravi Naik, George Mylonas, Fabio Cuzzolin

Abstract: Nowadays, there are more surgical procedures that are being performed using minimally invasive surgery (MIS). This is due to its many benefits, such as minimal post-operative problems, less bleeding, minor scarring, and a speedy recovery. However, the MIS's constrained field of view, small operating room, and indirect viewing of the operating scene could lead to surgical tools colliding and potent… ▽ More Nowadays, there are more surgical procedures that are being performed using minimally invasive surgery (MIS). This is due to its many benefits, such as minimal post-operative problems, less bleeding, minor scarring, and a speedy recovery. However, the MIS's constrained field of view, small operating room, and indirect viewing of the operating scene could lead to surgical tools colliding and potentially harming human organs or tissues. Therefore, MIS problems can be considerably reduced, and surgical procedure accuracy and success rates can be increased by using an endoscopic video feed to detect and monitor surgical instruments in real-time. In this paper, a set of improvements made to the YOLOV5 object detector to enhance the detection of surgical instruments was investigated, analyzed, and evaluated. In doing this, we performed performance-based ablation studies, explored the impact of altering the YOLOv5 model's backbone, neck, and anchor structural elements, and annotated a unique endoscope dataset. Additionally, we compared the effectiveness of our ablation investigations with that of four additional SOTA object detectors (YOLOv7, YOLOR, Scaled-YOLOv4 and YOLOv3-SPP). Except for YOLOv3-SPP, which had the same model performance of 98.3% in mAP and a similar inference speed, all of our benchmark models, including the original YOLOv5, were surpassed by our top refined model in experiments using our fresh endoscope dataset. △ Less

Submitted 23 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

arXiv:2209.03705 [pdf, other]

Losing momentum in continuous-time stochastic optimisation

Authors: Kexin Jin, Jonas Latz, Chenguang Liu, Alessandro Scagliotti

Abstract: The training of deep neural networks and other modern machine learning models usually consists in solving non-convex optimisation problems that are high-dimensional and subject to large-scale data. Here, momentum-based stochastic optimisation algorithms have become especially popular in recent years. The stochasticity arises from data subsampling which reduces computational cost. Moreover, both, m… ▽ More The training of deep neural networks and other modern machine learning models usually consists in solving non-convex optimisation problems that are high-dimensional and subject to large-scale data. Here, momentum-based stochastic optimisation algorithms have become especially popular in recent years. The stochasticity arises from data subsampling which reduces computational cost. Moreover, both, momentum and stochasticity are supposed to help the algorithm to overcome local minimisers and, hopefully, converge globally. Theoretically, this combination of stochasticity and momentum is badly understood. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the particle movement by an underdamped dynamical system and the data subsampling through a stochastic switching of the dynamical system. In our analysis, we investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time: intuitively, the momentum helps to overcome local minimisers in the initial phase of the algorithm, but prohibits fast convergence to a global minimiser later. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and let the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In numerical experiments, we study our discretisation scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network to solve the CIFAR-10 image classification problem. Here, our algorithm reaches competitive results compared to stochastic gradient descent with momentum. △ Less

Submitted 8 September, 2022; originally announced September 2022.

MSC Class: 90C15; 37N40; 37H30; 65C40; 68T07; 68W20

arXiv:2207.09725 [pdf, other]

OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos

Authors: Kyung-Min Jin, Gun-Hee Lee, Seong-Whan Lee

Abstract: Although many approaches for multi-human pose estimation in videos have shown profound results, they require densely annotated data which entails excessive man labor. Furthermore, there exists occlusion and motion blur that inevitably lead to poor estimation performance. To address these problems, we propose a method that leverages an attention mask for occluded joints and encodes temporal depende… ▽ More Although many approaches for multi-human pose estimation in videos have shown profound results, they require densely annotated data which entails excessive man labor. Furthermore, there exists occlusion and motion blur that inevitably lead to poor estimation performance. To address these problems, we propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers. First, our framework composes different combinations of sparsely annotated frames that denote the track of the overall joint movement. We propose an occlusion attention mask from these combinations that enable encoding occlusion-aware heatmaps as a semi-supervised task. Second, the proposed temporal encoder employs transformer architecture to effectively aggregate the temporal relationship and keypoint-wise attention from each time step and accurately refines the target frame's final pose estimation. We achieve state-of-the-art pose estimation results for PoseTrack2017 and PoseTrack2018 datasets and demonstrate the robustness of our approach to occlusion and motion blur in sparsely annotated video data. △ Less

Submitted 27 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: 6 pages

MSC Class: 68T45

arXiv:2207.01831 [pdf, other]

Learning Local Implicit Fourier Representation for Image Warping

Authors: Jaewon Lee, Kwang Pyo Choi, Kyong Hwan Jin

Abstract: Image warping aims to reshape images defined on rectangular grids into arbitrary shapes. Recently, implicit neural functions have shown remarkable performances in representing images in a continuous manner. However, a standalone multi-layer perceptron suffers from learning high-frequency Fourier coefficients. In this paper, we propose a local texture estimator for image warping (LTEW) followed by… ▽ More Image warping aims to reshape images defined on rectangular grids into arbitrary shapes. Recently, implicit neural functions have shown remarkable performances in representing images in a continuous manner. However, a standalone multi-layer perceptron suffers from learning high-frequency Fourier coefficients. In this paper, we propose a local texture estimator for image warping (LTEW) followed by an implicit neural representation to deform images into continuous shapes. Local textures estimated from a deep super-resolution (SR) backbone are multiplied by locally-varying Jacobian matrices of a coordinate transformation to predict Fourier responses of a warped image. Our LTEW-based neural function outperforms existing warping methods for asymmetric-scale SR and homography transform. Furthermore, our algorithm well generalizes arbitrary coordinate transformations, such as homography transform with a large magnification factor and equirectangular projection (ERP) perspective transform, which are not provided in training. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: ECCV 2022 camera-ready version (https://ipl.dgist.ac.kr/LTEW.pdf)

arXiv:2207.00768 [pdf, other]

Sum-of-Max Partition under a Knapsack Constraint

Authors: Kai Jin, Danna Zhang, Canhui Zhang

Abstract: Sequence partition problems arise in many fields, such as sequential data analysis, information transmission, and parallel computing. In this paper, we study the following partition problem variant: given a sequence of $n$ items $1,\ldots,n$, where each item $i$ is associated with weight $w_i$ and another parameter $s_i$, partition the sequence into several consecutive subsequences, so that the to… ▽ More Sequence partition problems arise in many fields, such as sequential data analysis, information transmission, and parallel computing. In this paper, we study the following partition problem variant: given a sequence of $n$ items $1,\ldots,n$, where each item $i$ is associated with weight $w_i$ and another parameter $s_i$, partition the sequence into several consecutive subsequences, so that the total weight of each subsequence is no more than a threshold $w_0$, and the sum of the largest $s_i$ in each subsequence is minimized. This problem admits a straightforward solution based on dynamic programming, which costs $O(n^2)$ time and can be improved to $O(n\log n)$ time easily. Our contribution is an $O(n)$ time algorithm, which is nontrivial yet easy to implement. We also study the corresponding tree partition problem. We prove that the problem on the tree is NP-complete and we present an $O(w_0 n^2)$ time ($O(w_0^2n^2)$ time, respectively) algorithm for the unit weight (integer weight, respectively) case. △ Less

Submitted 11 October, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

ACM Class: F.2.2

arXiv:2205.00698 [pdf]

Unsupervised Denoising of Optical Coherence Tomography Images with Dual_Merged CycleWGAN

Authors: Jie Du, Xujian Yang, Kecheng Jin, Xuanzheng Qi, Hu Chen

Abstract: Nosie is an important cause of low quality Optical coherence tomography (OCT) image. The neural network model based on Convolutional neural networks(CNNs) has demonstrated its excellent performance in image denoising. However, OCT image denoising still faces great challenges because many previous neural network algorithms required a large number of labeled data, which might cost much time or is ex… ▽ More Nosie is an important cause of low quality Optical coherence tomography (OCT) image. The neural network model based on Convolutional neural networks(CNNs) has demonstrated its excellent performance in image denoising. However, OCT image denoising still faces great challenges because many previous neural network algorithms required a large number of labeled data, which might cost much time or is expensive. Besides, these CNN-based algorithms need numerous parameters and good tuning techniques, which is hardware resources consuming. To solved above problems, We proposed a new Cycle-Consistent Generative Adversarial Nets called Dual-Merged Cycle-WGAN for retinal OCT image denoiseing, which has remarkable performance with less unlabeled traning data. Our model consists of two Cycle-GAN networks with imporved generator, descriminator and wasserstein loss to achieve good training stability and better performance. Using image merge technique between two Cycle-GAN networks, our model could obtain more detailed information and hence better training effect. The effectiveness and generality of our proposed network has been proved via ablation experiments and comparative experiments. Compared with other state-of-the-art methods, our unsupervised method obtains best subjective visual effect and higher evaluation objective indicators. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: Mr. Hu Chen is our corresponding author

arXiv:2204.11213 [pdf, ps, other]

String Rearrangement Inequalities and a Total Order Between Primitive Words

Authors: Ruixi Luo, Taikun Zhu, Kai Jin

Abstract: We study the following rearrangement problem: Given $n$ words, rearrange and concatenate them so that the obtained string is lexicographically smallest (or largest, respectively). We show that this problem reduces to sorting the given words so that their repeating strings are non-decreasing (or non-increasing, respectively), where the repeating string of a word $A$ refers to the infinite string… ▽ More We study the following rearrangement problem: Given $n$ words, rearrange and concatenate them so that the obtained string is lexicographically smallest (or largest, respectively). We show that this problem reduces to sorting the given words so that their repeating strings are non-decreasing (or non-increasing, respectively), where the repeating string of a word $A$ refers to the infinite string $AAA\ldots$. Moreover, for fixed size alphabet $Σ$, we design an $O(L)$ time sorting algorithm of the words (in the mentioned orders), where $L$ denotes the total length of the input words. Hence we obtain an $O(L)$ time algorithm for the rearrangement problem. Finally, we point out that comparing primitive words via comparing their repeating strings leads to a total order, which can further be extended to a total order on the finite words (or all words). △ Less

Submitted 24 April, 2022; originally announced April 2022.

MSC Class: 68R15 ACM Class: F.2.2

arXiv:2203.09952 [pdf, other]

Conquering Ghosts: Relation Learning for Information Reliability Representation and End-to-End Robust Navigation

Authors: Kefan Jin, Xingyao Han

Abstract: Environmental disturbances, such as sensor data noises, various lighting conditions, challenging weathers and external adversarial perturbations, are inevitable in real self-driving applications. Existing researches and testings have shown that they can severely influence the vehicles perception ability and performance, one of the main issue is the false positive detection, i.e., the ghost object… ▽ More Environmental disturbances, such as sensor data noises, various lighting conditions, challenging weathers and external adversarial perturbations, are inevitable in real self-driving applications. Existing researches and testings have shown that they can severely influence the vehicles perception ability and performance, one of the main issue is the false positive detection, i.e., the ghost object which is not real existed or occurs in the wrong position (such as a non-existent vehicle). Traditional navigation methods tend to avoid every detected objects for safety, however, avoiding a ghost object may lead the vehicle into a even more dangerous situation, such as a sudden break on the highway. Considering the various disturbance types, it is difficult to address this issue at the perceptual aspect. A potential solution is to detect the ghost through relation learning among the whole scenario and develop an integrated end-to-end navigation system. Our underlying logic is that the behavior of all vehicles in the scene is influenced by their neighbors, and normal vehicles behave in a logical way, while ghost vehicles do not. By learning the spatio-temporal relation among surrounding vehicles, an information reliability representation is learned for each detected vehicle and then a robot navigation network is developed. In contrast to existing works, we encourage the network to learn how to represent the reliability and how to aggregate all the information with uncertainties by itself, thus increasing the efficiency and generalizability. To the best of the authors knowledge, this paper provides the first work on using graph relation learning to achieve end-to-end robust navigation in the presence of ghost vehicles. Simulation results in the CARLA platform demonstrate the feasibility and effectiveness of the proposed method in various scenarios. △ Less

Submitted 20 February, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2202.08373 [pdf, other]

Text-Based Action-Model Acquisition for Planning

Authors: Kebing Jin, Huaixun Chen, Hankz Hankui Zhuo

Abstract: Although there have been approaches that are capable of learning action models from plan traces, there is no work on learning action models from textual observations, which is pervasive and much easier to collect from real-world applications compared to plan traces. In this paper we propose a novel approach to learning action models from natural language texts by integrating Constraint Satisfactio… ▽ More Although there have been approaches that are capable of learning action models from plan traces, there is no work on learning action models from textual observations, which is pervasive and much easier to collect from real-world applications compared to plan traces. In this paper we propose a novel approach to learning action models from natural language texts by integrating Constraint Satisfaction and Natural Language Processing techniques. Specifically, we first build a novel language model to extract plan traces from texts, and then build a set of constraints to generate action models based on the extracted plan traces. After that, we iteratively improve the language model and constraints until we achieve the convergent language model and action models. We empirically exhibit that our approach is both effective and efficient. △ Less

Submitted 17 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

arXiv:2202.07138 [pdf, other]

Integrating AI Planning with Natural Language Processing: A Combination of Explicit and Tacit Knowledge

Authors: Kebing Jin, Hankz Hankui Zhuo

Abstract: Natural language processing (NLP) aims at investigating the interactions between agents and humans, processing and analyzing large amounts of natural language data. Large-scale language models play an important role in current natural language processing. However, the challenges of explainability and complexity come along with the developments of language models. One way is to introduce logical re… ▽ More Natural language processing (NLP) aims at investigating the interactions between agents and humans, processing and analyzing large amounts of natural language data. Large-scale language models play an important role in current natural language processing. However, the challenges of explainability and complexity come along with the developments of language models. One way is to introduce logical relations and rules into natural language processing models, such as making use of Automated Planning. Automated planning (AI planning) focuses on building symbolic domain models and synthesizing plans to transit initial states to goals based on domain models. Recently, there have been plenty of works related to these two fields, which have the abilities to generate explicit knowledge, e.g., preconditions and effects of action models, and learn from tacit knowledge, e.g., neural models, respectively. Integrating AI planning and natural language processing effectively improves the communication between human and intelligent agents. This paper outlines the commons and relations between AI planning and natural language processing, argues that each of them can effectively impact on the other one by five areas: (1) planning-based text understanding, (2) planning-based natural language processing, (3) planning-based explainability, (4) text-based human-robot interaction, and (5) applications. We also explore some potential future issues between AI planning and natural language processing. To the best of our knowledge, this survey is the first work that addresses the deep connections between AI planning and Natural language processing. △ Less

Submitted 13 April, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

arXiv:2202.04070 [pdf]

Joint user association and power allocation in ultra-dense mmWave networks: a multi-connectivity approach

Authors: Ailing Chen, Shengchang Li, Jichen Xiong, Kezhong Jin, Zhenzhou Tang

Abstract: In ultra-dense millimeter wave (mmWave) networks, mmWave signals suffer from severe path losses and are easily blocked by obstacles. Meanwhile, ultra-dense deployment causes excessive handovers, which reduces the data link reliability. To alleviate the above issues, the novel technology, known as multi-connectivity enabled user association (MCUA) is incorporated in this letter. We aim to jointly o… ▽ More In ultra-dense millimeter wave (mmWave) networks, mmWave signals suffer from severe path losses and are easily blocked by obstacles. Meanwhile, ultra-dense deployment causes excessive handovers, which reduces the data link reliability. To alleviate the above issues, the novel technology, known as multi-connectivity enabled user association (MCUA) is incorporated in this letter. We aim to jointly optimize MCUAs and downlink (DL) power allocations (PAs) to maximize the DL rate of each user simultaneously, rather than total. This is a non-convex nonlinear 0-1 mixed integer multi-objective optimization problem and quite complicated. To solve it, we first use the weighted sum method to scalarize it as a single-objective optimization problem (SOOP), and then relax the binary association variables to real ones. Considering that the relaxed SOOP is still non-convex, we perform a series of transformations upon it and make it a differential of convex programming. Finally, we develop an iterative algorithm based on the convex-concave procedure to solve the SOOP. Numerical results are presented to demonstrate the effectiveness of the proposed algorithms. △ Less

Submitted 7 February, 2022; originally announced February 2022.

arXiv:2112.09836 [pdf, other]

Creativity of AI: Hierarchical Planning Model Learning for Facilitating Deep Reinforcement Learning

Authors: Hankz Hankui Zhuo, Shuting Deng, Mu Jin, Zhihao Ma, Kebing Jin, Chen Chen, Chao Yu

Abstract: Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i.e., data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framew… ▽ More Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i.e., data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. Our framework features a loop training procedure, which enables guiding the improvement of policy by planning with planning models (including action models and hierarchical task network models) and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the symbolic planning models. To validate the effectiveness of our framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability. △ Less

Submitted 7 July, 2023; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2112.06028 [pdf, other]

Retrosynthetic Planning with Experience-Guided Monte Carlo Tree Search

Authors: Siqi Hong, Hankz Hankui Zhuo, Kebing Jin, Guang Shao, Zhanwen Zhou

Abstract: In retrosynthetic planning, the huge number of possible routes to synthesize a complex molecule using simple building blocks leads to a combinatorial explosion of possibilities. Even experienced chemists often have difficulty to select the most promising transformations. The current approaches rely on human-defined or machine-trained score functions which have limited chemical knowledge or use exp… ▽ More In retrosynthetic planning, the huge number of possible routes to synthesize a complex molecule using simple building blocks leads to a combinatorial explosion of possibilities. Even experienced chemists often have difficulty to select the most promising transformations. The current approaches rely on human-defined or machine-trained score functions which have limited chemical knowledge or use expensive estimation methods for guiding. Here we an propose experience-guided Monte Carlo tree search (EG-MCTS) to deal with this problem. Instead of rollout, we build an experience guidance network to learn knowledge from synthetic experiences during the search. Experiments on benchmark USPTO datasets show that, EG-MCTS gains significant improvement over state-of-the-art approaches both in efficiency and effectiveness. In a comparative experiment with the literature, our computer-generated routes mostly matched the reported routes. Routes designed for real drug compounds exhibit the effectiveness of EG-MCTS on assisting chemists performing retrosynthetic analysis. △ Less

Submitted 9 June, 2023; v1 submitted 11 December, 2021; originally announced December 2021.

Showing 1–50 of 88 results for author: Jin, K