Search | arXiv e-print repository

doi 10.1109/TGRS.2021.3066195

Weakly Contrastive Learning via Batch Instance Discrimination and Feature Clustering for Small Sample SAR ATR

Authors: Yikui Zhai, Wenlve Zhou, Bing Sun, Jingwen Li, Qirui Ke, Zilu Ying, Junying Gan, Chaoyun Mai, Ruggero Donida Labati, Vincenzo Piuri, Fabio Scotti

Abstract: In recent years, impressive performance of deep learning technology has been recognized in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). Since a large amount of annotated data is required in this technique, it poses a trenchant challenge to the issue of obtaining a high recognition rate through less labeled data. To overcome this problem, inspired by the contrastive learning,… ▽ More In recent years, impressive performance of deep learning technology has been recognized in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). Since a large amount of annotated data is required in this technique, it poses a trenchant challenge to the issue of obtaining a high recognition rate through less labeled data. To overcome this problem, inspired by the contrastive learning, we proposed a novel framework named Batch Instance Discrimination and Feature Clustering (BIDFC). In this framework, different from that of the objective of general contrastive learning methods, embedding distance between samples should be moderate because of the high similarity between samples in the SAR images. Consequently, our flexible framework is equipped with adjustable distance between embedding, which we term as weakly contrastive learning. Technically, instance labels are assigned to the unlabeled data in per batch and random augmentation and training are performed few times on these augmented data. Meanwhile, a novel Dynamic-Weighted Variance loss (DWV loss) function is also posed to cluster the embedding of enhanced versions for each sample. Experimental results on the moving and stationary target acquisition and recognition (MSTAR) database indicate a 91.25% classification accuracy of our method fine-tuned on only 3.13% training data. Even though a linear evaluation is performed on the same training data, the accuracy can still reach 90.13%. We also verified the effectiveness of BIDFC in OpenSarShip database, indicating that our method can be generalized to other datasets. Our code is avaliable at: https://github.com/Wenlve-Zhou/BIDFC-master. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.13327 [pdf, other]

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

Abstract: While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, w… ▽ More While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzh1/PURLS. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.20633 [pdf, other]

Action-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection

Authors: Jing Xu, Anqi Zhu, Jingyu Lin, Qiuhong Ke, Cunjian Chen

Abstract: Human action recognition is a crucial task in computer vision systems. However, in real-world scenarios, human actions often fall outside the distribution of training data, requiring a model to both recognize in-distribution (ID) actions and reject out-of-distribution (OOD) ones. Despite its importance, there has been limited research on OOD detection in human actions. Existing works on OOD detect… ▽ More Human action recognition is a crucial task in computer vision systems. However, in real-world scenarios, human actions often fall outside the distribution of training data, requiring a model to both recognize in-distribution (ID) actions and reject out-of-distribution (OOD) ones. Despite its importance, there has been limited research on OOD detection in human actions. Existing works on OOD detection mainly focus on image data with RGB structure, and many methods are post-hoc in nature. While these methods are convenient and computationally efficient, they often lack sufficient accuracy and fail to consider the presence of OOD samples. To address these challenges, we propose a novel end-to-end skeleton-based model called Action-OOD, specifically designed for OOD human action detection. Unlike some existing approaches that may require prior knowledge of existing OOD data distribution, our model solely utilizes in-distribution (ID) data during the training stage, effectively mitigating the overconfidence issue prevalent in OOD detection. We introduce an attention-based feature fusion block, which enhances the model's capability to recognize unknown classes while preserving classification accuracy for known classes. Further, we present a novel energy-based loss function and successfully integrate it with the traditional cross-entropy loss to maximize the separation of data distributions between ID and OOD. Through extensive experiments conducted on NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics-400 datasets, we demonstrate the superior performance of our proposed approach compared to state-of-the-art methods. Our findings underscore the effectiveness of classic OOD detection techniques in the context of skeleton-based action recognition tasks, offering promising avenues for future research in this field. Code will be available at: https://github.com/YilliaJing/Action-OOD.git. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Under consideration at Computer Vision and Image Understanding

arXiv:2405.11336 [pdf, other]

UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers

Authors: Duo Peng, Qiuhong Ke, Jun Liu

Abstract: Text-to-Image (T2I) models have raised security concerns due to their potential to generate inappropriate or harmful images. In this paper, we propose UPAM, a novel framework that investigates the robustness of T2I models from the attack perspective. Unlike most existing attack methods that focus on deceiving textual defenses, UPAM aims to deceive both textual and visual defenses in T2I models. UP… ▽ More Text-to-Image (T2I) models have raised security concerns due to their potential to generate inappropriate or harmful images. In this paper, we propose UPAM, a novel framework that investigates the robustness of T2I models from the attack perspective. Unlike most existing attack methods that focus on deceiving textual defenses, UPAM aims to deceive both textual and visual defenses in T2I models. UPAM enables gradient-based optimization, offering greater effectiveness and efficiency than previous methods. Given that T2I models might not return results due to defense mechanisms, we introduce a Sphere-Probing Learning (SPL) scheme to support gradient optimization even when no results are returned. Additionally, we devise a Semantic-Enhancing Learning (SEL) scheme to finetune UPAM for generating target-aligned images. Our framework also ensures attack stealthiness. Extensive experiments demonstrate UPAM's effectiveness and efficiency. △ Less

Submitted 25 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

ACM Class: I.2.6

arXiv:2405.05791 [pdf, other]

Sequential Amodal Segmentation via Cumulative Occlusion Learning

Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

Abstract: To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model wi… ▽ More To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions and adeptly reproducing the complex distribution of shapes and occlusion orders of occluded objects. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes. Experimental results across three amodal datasets show that our method outperforms established baselines. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2401.06445 [pdf, other]

Directed network comparison using motifs

Authors: Chenwei Xie, Qiao Ke, Haoyu Chen, Chuang Liu, Xiu-Xiu Zhan

Abstract: Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent di… ▽ More Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent directionality and higher-order attributes that should not be ignored when comparing networks. Therefore, we propose a motif-based directed network comparison method that captures local, global, and higher-order differences between two directed networks. Specifically, we first construct a motif distribution vector for each node, which captures the information of a node's involvement in different directed motifs. Then, the dissimilarity between two directed networks is defined on the basis of a matrix which is composed of the motif distribution vector of every node and Jensen-Shannon divergence. The performance of our method is evaluated via the comparison of six real directed networks with their null models as well as their perturbed networks based on edge perturbation. Our method is superior to the state-of-the-art baselines and is robust with different parameter settings. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.01510 [pdf, other]

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond

Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex dat… ▽ More While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex data. Recognizing that conventional self-paced CL methods rely on training loss for difficulty measurement, which might not accurately reflect the intricacies of video-question pairs, we introduce the concept of uncertainty-aware CL. Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty. Furthermore, we address the challenge posed by uncertainty by presenting a probabilistic modeling approach for VideoQA. Specifically, we conceptualize VideoQA as a stochastic computation graph, where the hidden representations are treated as stochastic variables. This yields two distinct types of uncertainty: one related to the inherent uncertainty in the data and another pertaining to the model's confidence. In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments. The findings affirm that our approach not only achieves enhanced performance but also effectively quantifies uncertainty in the context of VideoQA. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.01505 [pdf, other]

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Authors: Haopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen

Abstract: Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos… ▽ More Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos, which is not applicable to sports scenarios requiring professional action understanding and fine-grained motion analysis. In this paper, we introduce the first dataset, named Sports-QA, specifically designed for the sports VideoQA task. The Sports-QA dataset includes various types of questions, such as descriptions, chronologies, causalities, and counterfactual conditions, covering multiple sports. Furthermore, to address the characteristics of the sports VideoQA task, we propose a new Auto-Focus Transformer (AFT) capable of automatically focusing on particular scales of temporal information for question answering. We conduct extensive experiments on Sports-QA, including baseline studies and the evaluation of different methods. The results demonstrate that our AFT achieves state-of-the-art performance. △ Less

Submitted 14 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2311.03943 [pdf, other]

CLIP Guided Image-perceptive Prompt Learning for Image Enhancement

Authors: Weiwen Chen, Qiuhong Ke, Zinuo Li

Abstract: Image enhancement is a significant research area in the fields of computer vision and image processing. In recent years, many learning-based methods for image enhancement have been developed, where the Look-up-table (LUT) has proven to be an effective tool. In this paper, we delve into the potential of Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning, proposing a simple struct… ▽ More Image enhancement is a significant research area in the fields of computer vision and image processing. In recent years, many learning-based methods for image enhancement have been developed, where the Look-up-table (LUT) has proven to be an effective tool. In this paper, we delve into the potential of Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning, proposing a simple structure called CLIP-LUT for image enhancement. We found that the prior knowledge of CLIP can effectively discern the quality of degraded images, which can provide reliable guidance. To be specific, We initially learn image-perceptive prompts to distinguish between original and target images using CLIP model, in the meanwhile, we introduce a very simple network by incorporating a simple baseline to predict the weights of three different LUT as enhancement network. The obtained prompts are used to steer the enhancement network like a loss function and improve the performance of model. We demonstrate that by simply combining a straightforward method with CLIP, we can obtain satisfactory results. △ Less

Submitted 22 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: A trial work to the image enhancement

arXiv:2308.13893 [pdf, other]

Unsupervised Domain Adaptation via Domain-Adaptive Diffusion

Authors: Duo Peng, Qiuhong Ke, Yinjie Lei, Jun Liu

Abstract: Unsupervised Domain Adaptation (UDA) is quite challenging due to the large distribution discrepancy between the source domain and the target domain. Inspired by diffusion models which have strong capability to gradually convert data distributions across a large gap, we consider to explore the diffusion technique to handle the challenging UDA task. However, using diffusion models to convert data di… ▽ More Unsupervised Domain Adaptation (UDA) is quite challenging due to the large distribution discrepancy between the source domain and the target domain. Inspired by diffusion models which have strong capability to gradually convert data distributions across a large gap, we consider to explore the diffusion technique to handle the challenging UDA task. However, using diffusion models to convert data distribution across different domains is a non-trivial problem as the standard diffusion models generally perform conversion from the Gaussian distribution instead of from a specific domain distribution. Besides, during the conversion, the semantics of the source-domain data needs to be preserved for classification in the target domain. To tackle these problems, we propose a novel Domain-Adaptive Diffusion (DAD) module accompanied by a Mutual Learning Strategy (MLS), which can gradually convert data distribution from the source domain to the target domain while enabling the classification model to learn along the domain transition process. Consequently, our method successfully eases the challenge of UDA by decomposing the large domain gap into small ones and gradually enhancing the capacity of classification model to finally adapt to the target domain. Our method outperforms the current state-of-the-arts by a large margin on three widely used UDA datasets. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures

arXiv:2308.12350 [pdf, other]

Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation

Authors: Duo Peng, Ping Hu, Qiuhong Ke, Jun Liu

Abstract: Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source… ▽ More Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation. Concretely, we formulate cross-domain image translation as a denoising diffusion process and utilize a novel Semantic Gradient Guidance (SGG) method to constrain the translation process, conditioning it on the pixel-wise source labels. Additionally, a Progressive Translation Learning (PTL) strategy is devised to enable the SGG method to work reliably across domains with large gaps. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV2023

arXiv:2306.16643 [pdf]

Cautious explorers generate more future academic impact

Authors: Xingsheng Yang, Zhaoru Ke, Qing Ke, Haipeng Zhang, Fengnan Gao

Abstract: Some scientists are more likely to explore unfamiliar research topics while others tend to exploit existing ones. In previous work, correlations have been found between scientists' topic choices and their career performances. However, literature has yet to untangle the intricate interplay between scientific impact and research topic choices, where scientific exploration and exploitation intertwine… ▽ More Some scientists are more likely to explore unfamiliar research topics while others tend to exploit existing ones. In previous work, correlations have been found between scientists' topic choices and their career performances. However, literature has yet to untangle the intricate interplay between scientific impact and research topic choices, where scientific exploration and exploitation intertwine. Here we study two metrics that gauge how frequently scientists switch topic areas and how large those jumps are, and discover that 'cautious explorers' who switch topics frequently but do so to 'close' domains have notably better future performance and can be identified at a remarkably early career stage. Cautious explorers who balance exploration and exploitation in their first four career years have up to 19% more citations per future paper. Our results suggest that the proposed metrics depict the scholarly traits of scientists throughout their careers and provide fresh insight, especially for nurturing junior scientists. △ Less

Submitted 29 June, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 16 pages of main text and 94 pages of supplementary information. v2: Added page number and fixed typo in author list

arXiv:2306.13897 [pdf, other]

ICN: Interactive Convolutional Network for Forecasting Travel Demand of Shared Micromobility

Authors: Yiming Xu, Qian Ke, Xiaojian Zhang, Xilei Zhao

Abstract: Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning models provide powerful tools to deal with demand prediction problems, studies on forecasting highly-accurate spatiotemporal shared micromobility demand are still lacking. This paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecas… ▽ More Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning models provide powerful tools to deal with demand prediction problems, studies on forecasting highly-accurate spatiotemporal shared micromobility demand are still lacking. This paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecast spatiotemporal travel demand for shared micromobility. The proposed model develops a novel channel dilation method by utilizing multi-dimensional spatial information (i.e., demographics, functionality, and transportation supply) based on travel behavior knowledge for building the deep learning model. We use the convolution operation to process the dilated tensor to simultaneously capture temporal and spatial dependencies. Based on a binary-tree-structured architecture and interactive convolution, the ICN model extracts features at different temporal resolutions, and then generates predictions using a fully-connected layer. The proposed model is evaluated for two real-world case studies in Chicago, IL, and Austin, TX. The results show that the ICN model significantly outperforms all the selected benchmark models. The model predictions can help the micromobility operators develop optimal vehicle rebalancing schemes and guide cities to better manage the shared micromobility system. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2304.06724 [pdf, other]

GradMDM: Adversarial Attack on Dynamic Networks

Authors: Jianhong Pan, Lin Geng Foo, Qichen Zheng, Zhipeng Fan, Hossein Rahmani, Qiuhong Ke, Jun Liu

Abstract: Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the dir… ▽ More Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the direction and the magnitude of the gradients to effectively find a small perturbation for each input, that will activate more computational units of dynamic models during inference. We evaluate GradMDM on multiple datasets and dynamic models, where it outperforms previous energy-oriented attack techniques, significantly increasing computation complexity while reducing the perceptibility of the perturbations. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2304.00280 [pdf, other]

Progressive Channel-Shrinking Network

Authors: Jianhong Pan, Siyuan Yang, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Zhipeng Fan, Jun Liu

Abstract: Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is o… ▽ More Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.06596 [pdf, other]

Amodal Intra-class Instance Segmentation: Synthetic Datasets and Benchmark

Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

Abstract: Images of realistic scenes often contain intra-class objects that are heavily occluded from each other, making the amodal perception task that requires parsing the occluded parts of the objects challenging. Although important for downstream tasks such as robotic grasping systems, the lack of large-scale amodal datasets with detailed annotations makes it difficult to model intra-class occlusions ex… ▽ More Images of realistic scenes often contain intra-class objects that are heavily occluded from each other, making the amodal perception task that requires parsing the occluded parts of the objects challenging. Although important for downstream tasks such as robotic grasping systems, the lack of large-scale amodal datasets with detailed annotations makes it difficult to model intra-class occlusions explicitly. This paper introduces two new amodal datasets for image amodal completion tasks, which contain a total of over 267K images of intra-class occlusion scenarios, annotated with multiple masks, amodal bounding boxes, dual order relations and full appearance for instances and background. We also present a point-supervised scheme with layer priors for amodal instance segmentation specifically designed for intra-class occlusion scenarios. Experiments show that our weakly supervised approach outperforms the SOTA fully supervised methods, while our layer priors design exhibits remarkable performance improvements in the case of intra-class occlusion in both synthetic and real images. △ Less

Submitted 7 November, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted at WACV 2024. Datasets are available at https://github.com/saraao/amodal-dataset

arXiv:2303.01692 [pdf, other]

Travel Demand Forecasting: A Fair AI Approach

Authors: Xiaojian Zhang, Qian Ke, Xilei Zhao

Abstract: Artificial Intelligence (AI) and machine learning have been increasingly adopted for travel demand forecasting. The AI-based travel demand forecasting models, though generate accurate predictions, may produce prediction biases and raise fairness issues. Using such biased models for decision-making may lead to transportation policies that exacerbate social inequalities. However, limited studies hav… ▽ More Artificial Intelligence (AI) and machine learning have been increasingly adopted for travel demand forecasting. The AI-based travel demand forecasting models, though generate accurate predictions, may produce prediction biases and raise fairness issues. Using such biased models for decision-making may lead to transportation policies that exacerbate social inequalities. However, limited studies have been focused on addressing the fairness issues of these models. Therefore, in this study, we propose a novel methodology to develop fairness-aware, highly-accurate travel demand forecasting models. Particularly, the proposed methodology can enhance the fairness of AI models for multiple protected attributes (such as race and income) simultaneously. Specifically, we introduce a new fairness regularization term, which is explicitly designed to measure the correlation between prediction accuracy and multiple protected attributes, into the loss function of the travel demand forecasting model. We conduct two case studies to evaluate the performance of the proposed methodology using real-world ridesourcing-trip data in Chicago, IL and Austin, TX, respectively. Results highlight that our proposed methodology can effectively enhance fairness for multiple protected attributes while preserving prediction accuracy. Additionally, we have compared our methodology with three state-of-the-art methods that adopt the regularization term approach, and the results demonstrate that our approach significantly outperforms them in both preserving prediction accuracy and enhancing fairness. This study can provide transportation professionals with a new tool to achieve fair and accurate travel demand forecasting. △ Less

Submitted 25 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: improved the methodology; updated new contents

arXiv:2211.16940 [pdf, other]

DiffPose: Toward More Reliable 3D Pose Estimation

Authors: Jia Gong, Lin Geng Foo, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, Jun Liu

Abstract: Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimat… ▽ More Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose to facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP. Project page: https://gongjia0208.github.io/Diffpose/. △ Less

Submitted 9 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: Accepted to CVPR 2023

arXiv:2211.02883 [pdf, other]

Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework

Authors: Liangchen Liu, Qiuhong Ke, Chaojie Li, Feiping Nie, Yingying Zhu

Abstract: Spectral clustering is an effective methodology for unsupervised learning. Most traditional spectral clustering algorithms involve a separate two-step procedure and apply the transformed new representations for the final clustering results. Recently, much progress has been made to utilize the non-negative feature property in real-world data and to jointly learn the representation and clustering re… ▽ More Spectral clustering is an effective methodology for unsupervised learning. Most traditional spectral clustering algorithms involve a separate two-step procedure and apply the transformed new representations for the final clustering results. Recently, much progress has been made to utilize the non-negative feature property in real-world data and to jointly learn the representation and clustering results. However, to our knowledge, no previous work considers a unified model that incorporates the important multi-view information with those properties, which severely limits the performance of existing methods. In this paper, we formulate a novel clustering model, which exploits the non-negative feature property and, more importantly, incorporates the multi-view information into a unified joint learning framework: the unified multi-view orthonormal non-negative graph based clustering framework (Umv-ONGC). Then, we derive an effective three-stage iterative solution for the proposed model and provide analytic solutions for the three sub-problems from the three stages. We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features. Extensive experiments on three benchmark data sets demonstrate the effectiveness of the proposed method. △ Less

Submitted 1 December, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

arXiv:2209.13204 [pdf, other]

NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

Authors: Weiqiang Wang, Xuefei Zhe, Qiuhong Ke, Di Kang, Tingguang Li, Ruizhi Chen, Linchao Bao

Abstract: We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel… ▽ More We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel Transformer-based motion generation model, namely MARIONET, which can generate diverse motions given action tags. Different from existing motion generation models, MARIONET utilizes contextual information from the past motion clip and future action tag, dedicated to generating actions that can smoothly blend historical and future actions. Specifically, MARIONET first encodes target action tag and contextual information into an action-level latent code. The code is unfolded into frame-level control signals via a time unrolling module, which could be then combined with other frame-level control signals like the target trajectory. Motion frames are then generated in an auto-regressive way. By sequentially applying MARIONET, the system NEURAL MARIONETTE can robustly generate long-term, multi-action motions with the help of two simple schemes, namely "Shadow Start" and "Action Revision". Along with the novel system, we also present a new dataset dedicated to the multi-action motion synthesis task, which contains both action tags and their contextual information. Extensive experiments are conducted to study the action accuracy, naturalism, and transition smoothness of the motions generated by our system. △ Less

Submitted 27 November, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.10073 [pdf, other]

Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition

Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

Abstract: Skeleton-based action recognition receives increasing attention because the skeleton representations reduce the amount of training data by eliminating visual information irrelevant to actions. To further improve the sample efficiency, meta-learning-based one-shot learning solutions were developed for skeleton-based action recognition. These methods find the nearest neighbor according to the simila… ▽ More Skeleton-based action recognition receives increasing attention because the skeleton representations reduce the amount of training data by eliminating visual information irrelevant to actions. To further improve the sample efficiency, meta-learning-based one-shot learning solutions were developed for skeleton-based action recognition. These methods find the nearest neighbor according to the similarity between instance-level global average embedding. However, such measurement holds unstable representativity due to inadequate generalized learning on local invariant and noisy features, while intuitively, more fine-grained recognition usually relies on determining key local body movements. To address this limitation, we present the Adaptive Local-Component-aware Graph Convolutional Network, which replaces the comparison metric with a focused sum of similarity measurements on aligned local embedding of action-critical spatial/temporal segments. Comprehensive one-shot experiments on the public benchmark of NTU-RGB+D 120 indicate that our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.01425 [pdf, other]

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Authors: Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang, Jinghua Wang, Jun Liu

Abstract: The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neur… ▽ More The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: Accepted to ECCV 2022

arXiv:2207.12100 [pdf, other]

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Authors: Yunsheng Pang, Qiuhong Ke, Hossein Rahmani, James Bailey, Jun Liu

Abstract: Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs accord… ▽ More Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.09675 [pdf, other]

ERA: Expert Retrieval and Assembly for Early Action Prediction

Authors: Lin Geng Foo, Tianjiao Li, Hossein Rahmani, Qiuhong Ke, Jun Liu

Abstract: Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most sp… ▽ More Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. To encourage our model to effectively use subtle differences for early action prediction, we push experts to discriminate exclusively between samples that are highly similar, forcing these experts to learn to use subtle differences that exist between those samples. Additionally, we design an effective Expert Learning Rate Optimization method that balances the experts' optimization and leads to better performance. We evaluate our ERA module on four public action datasets and achieve state-of-the-art performance. △ Less

Submitted 22 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted to ECCV 2022

arXiv:2207.02062 [pdf, other]

doi 10.1016/j.cviu.2023.103661

Image Amodal Completion: A Survey

Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

Abstract: Existing computer vision systems can compete with humans in understanding the visible parts of objects, but still fall far short of humans when it comes to depicting the invisible parts of partially occluded objects. Image amodal completion aims to equip computers with human-like amodal completion functions to understand an intact object despite it being partially occluded. The main purpose of thi… ▽ More Existing computer vision systems can compete with humans in understanding the visible parts of objects, but still fall far short of humans when it comes to depicting the invisible parts of partially occluded objects. Image amodal completion aims to equip computers with human-like amodal completion functions to understand an intact object despite it being partially occluded. The main purpose of this survey is to provide an intuitive understanding of the research hotspots, key technologies and future trends in the field of image amodal completion. Firstly, we present a comprehensive review of the latest literature in this emerging field, exploring three key tasks in image amodal completion, including amodal shape completion, amodal appearance completion, and order perception. Then we examine popular datasets related to image amodal completion along with their common data collection methods and evaluation metrics. Finally, we discuss real-world applications and future research directions for image amodal completion, facilitating the reader's understanding of the challenges of existing technologies and upcoming research trends. △ Less

Submitted 7 November, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted at Computer Vision and Image Understanding. See https://doi.org/10.1016/j.cviu.2023.103661 for the final version

arXiv:2206.06544 [pdf, ps, other]

A Survey of Automated Data Augmentation Algorithms for Deep Learning-based Image Classification Tasks

Authors: Zihan Yang, Richard O. Sinnott, James Bailey, Qiuhong Ke

Abstract: In recent years, one of the most popular techniques in the computer vision community has been the deep learning technique. As a data-driven technique, deep model requires enormous amounts of accurately labelled training data, which is often inaccessible in many real-world applications. A data-space solution is Data Augmentation (DA), that can artificially generate new images out of original sample… ▽ More In recent years, one of the most popular techniques in the computer vision community has been the deep learning technique. As a data-driven technique, deep model requires enormous amounts of accurately labelled training data, which is often inaccessible in many real-world applications. A data-space solution is Data Augmentation (DA), that can artificially generate new images out of original samples. Image augmentation strategies can vary by dataset, as different data types might require different augmentations to facilitate model training. However, the design of DA policies has been largely decided by the human experts with domain knowledge, which is considered to be highly subjective and error-prone. To mitigate such problem, a novel direction is to automatically learn the image augmentation policies from the given dataset using Automated Data Augmentation (AutoDA) techniques. The goal of AutoDA models is to find the optimal DA policies that can maximize the model performance gains. This survey discusses the underlying reasons of the emergence of AutoDA technology from the perspective of image classification. We identify three key components of a standard AutoDA model: a search space, a search algorithm and an evaluation function. Based on their architecture, we provide a systematic taxonomy of existing image AutoDA approaches. This paper presents the major works in AutoDA field, discussing their pros and cons, and proposing several potential directions for future improvements. △ Less

Submitted 6 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 68 pages, 9 figures. Submitted to Knowledge and Information Systems (KAIS)

MSC Class: A.1; I.4.3; I.5.2

arXiv:2205.03825 [pdf, other]

Iterative Geometry-Aware Cross Guidance Network for Stereo Image Inpainting

Authors: Ang Li, Shanshan Zhao, Qingjie Zhang, Qiuhong Ke

Abstract: Currently, single image inpainting has achieved promising results based on deep convolutional neural networks. However, inpainting on stereo images with missing regions has not been explored thoroughly, which is also a significant but different problem. One crucial requirement for stereo image inpainting is stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross Guidance Ne… ▽ More Currently, single image inpainting has achieved promising results based on deep convolutional neural networks. However, inpainting on stereo images with missing regions has not been explored thoroughly, which is also a significant but different problem. One crucial requirement for stereo image inpainting is stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross Guidance Network (IGGNet). The IGGNet contains two key ingredients, i.e., a Geometry-Aware Attention (GAA) module and an Iterative Cross Guidance (ICG) strategy. The GAA module relies on the epipolar geometry cues and learns the geometry-aware guidance from one view to another, which is beneficial to make the corresponding regions in two views consistent. However, learning guidance from co-existing missing regions is challenging. To address this issue, the ICG strategy is proposed, which can alternately narrow down the missing regions of the two views in an iterative manner. Experimental results demonstrate that our proposed network outperforms the latest stereo image inpainting model and state-of-the-art single image inpainting models. △ Less

Submitted 10 May, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

Comments: Accepted by IJCAI 2022

arXiv:2110.09783 [pdf, other]

Spatial-Temporal Transformer for 3D Point Cloud Sequences

Authors: Yimin Wei, Hao Liu, Tingting Xie, Qiuhong Ke, Yulan Guo

Abstract: Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules:… ▽ More Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module. Our STSA module is introduced to capture the spatial-temporal context information across adjacent frames, while the RE module is proposed to aggregate features across neighbors to enhance the resolution of feature maps. We test the effectiveness our PST2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition. Extensive experiments on three benchmarks show that our PST2 outperforms existing methods on all datasets. The effectiveness of our STSA and RE modules have also been justified with ablation experiments. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Journal ref: WACV2022

arXiv:2108.08344 [pdf, other]

The Multi-Modal Video Reasoning and Analyzing Competition

Authors: Haoran Peng, He Huang, Li Xu, Tianjiao Li, Jun Liu, Hossein Rahmani, Qiuhong Ke, Zhicheng Guo, Cong Wu, Rongchang Li, Mang Ye, Jiahao Wang, Jiaxu Zhang, Yuanzhong Liu, Tao He, Fuwei Zhang, Xianbin Liu, Tao Lin

Abstract: In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summa… ▽ More In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV 2021 Workshops

ACM Class: I.2.10; I.2.6

arXiv:2107.09176 [pdf]

Temporal search in the scientific space predicts breakthrough inventions

Authors: Chao Min, Qing Ke

Abstract: The development of inventions is theorized as a process of searching and recombining existing knowledge components. Previous studies under this theory have examined myriad characteristics of recombined knowledge and their performance implications. One feature that has received much attention is technological knowledge age. Yet, little is known about how the age of scientific knowledge influences t… ▽ More The development of inventions is theorized as a process of searching and recombining existing knowledge components. Previous studies under this theory have examined myriad characteristics of recombined knowledge and their performance implications. One feature that has received much attention is technological knowledge age. Yet, little is known about how the age of scientific knowledge influences the impact of inventions, despite the widely known catalyzing role of science in the creation of new technologies. Here we use a large corpus of patents and derive features characterizing how patents temporally search in the scientific space. We find that patents that cite scientific papers have more citations and substantially more likely to become breakthroughs. Conditional on searching in the scientific space, referencing more recent papers increases the impact of patents and the likelihood of being breakthroughs. However, this positive effect can be offset if patents cite papers whose ages exhibit a low variance. These effects are consistent across technological fields. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2106.06487 [pdf, other]

A dataset of mentorship in science with semantic and demographic estimations

Authors: Qing Ke, Lizhen Liang, Ying Ding, Stephen V. David, Daniel E. Acuna

Abstract: Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe MENTORSHIP, a crow… ▽ More Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe MENTORSHIP, a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists across 112 fields that avoids these shortcomings. We enrich the scientists' profiles with publication data from the Microsoft Academic Graph and "semantic" representations of research using deep learning content analysis. Because gender and race have become critical dimensions when analyzing mentorship and disparities in science, we also provide estimations of these factors. We perform extensive validations of the profile--publication matching, semantic content, and demographic inferences. We anticipate this dataset will spur the study of mentorship in science and deepen our understanding of its role in scientists' career outcomes. △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: Data can be found at https://doi.org/10.5281/zenodo.4917086

arXiv:2106.01532 [pdf, other]

Noise Doesn't Lie: Towards Universal Detection of Deep Inpainting

Authors: Ang Li, Qiuhong Ke, Xingjun Ma, Haiqin Weng, Zhiyuan Zong, Feng Xue, Rui Zhang

Abstract: Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in a… ▽ More Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in an image. In this paper, we make the first attempt towards universal detection of deep inpainting, where the detection network can generalize well when detecting different deep inpainting methods. To this end, we first propose a novel data generation approach to generate a universal training dataset, which imitates the noise discrepancies exist in real versus inpainted image contents to train universal detectors. We then design a Noise-Image Cross-fusion Network (NIX-Net) to effectively exploit the discriminative information contained in both the images and their noise patterns. We empirically show, on multiple benchmark datasets, that our approach outperforms existing detection methods by a large margin and generalize well to unseen deep inpainting techniques. Our universal training dataset can also significantly boost the generalizability of existing detection methods. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted by IJCAI 2021

arXiv:2105.11537 [pdf, other]

Graph Neural Network Based VC Investment Success Prediction

Authors: Shiwei Lyu, Shuai Ling, Kaihao Guo, Haipeng Zhang, Kunpeng Zhang, Suting Hong, Qing Ke, Jinjie Gu

Abstract: Predicting the start-ups that will eventually succeed is essentially important for the venture capital business and worldwide policy makers, especially at an early stage such that rewards can possibly be exponential. Though various empirical studies and data-driven modeling work have been done, the predictive power of the complex networks of stakeholders including venture capital investors, star… ▽ More Predicting the start-ups that will eventually succeed is essentially important for the venture capital business and worldwide policy makers, especially at an early stage such that rewards can possibly be exponential. Though various empirical studies and data-driven modeling work have been done, the predictive power of the complex networks of stakeholders including venture capital investors, start-ups, and start-ups' managing members has not been thoroughly explored. We design an incremental representation learning mechanism and a sequential learning model, utilizing the network structure together with the rich attributes of the nodes. In general, our method achieves the state-of-the-art prediction performance on a comprehensive dataset of global venture capital investments and surpasses human investors by large margins. Specifically, it excels at predicting the outcomes for start-ups in industries such as healthcare and IT. Meanwhile, we shed light on impacts on start-up success from observable factors including gender, education, and networking, which can be of value for practitioners as well as policy makers when they screen ventures of high growth potentials. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: 11pages, 5figures

arXiv:2103.04778 [pdf, other]

Bridging the Distribution Gap of Visible-Infrared Person Re-identification with Modality Batch Normalization

Authors: Wenkang Li, Qi Ke, Wenbin Chen, Yicong Zhou

Abstract: Visible-infrared cross-modality person re-identification (VI-ReID), whose aim is to match person images between visible and infrared modality, is a challenging cross-modality image retrieval task. Most existing works integrate batch normalization layers into their neural network, but we found out that batch normalization layers would lead to two types of distribution gap: 1) inter-mini-batch distr… ▽ More Visible-infrared cross-modality person re-identification (VI-ReID), whose aim is to match person images between visible and infrared modality, is a challenging cross-modality image retrieval task. Most existing works integrate batch normalization layers into their neural network, but we found out that batch normalization layers would lead to two types of distribution gap: 1) inter-mini-batch distribution gap -- the distribution gap of the same modality between each mini-batch; 2) intra-mini-batch modality distribution gap -- the distribution gap of different modality within the same mini-batch. To address these problems, we propose a new batch normalization layer called Modality Batch Normalization (MBN), which normalizes each modality sub-mini-batch respectively instead of the whole mini-batch, and can reduce these distribution gap significantly. Extensive experiments show that our MBN is able to boost the performance of VI-ReID models, even with different datasets, backbones and losses. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:2101.10897 [pdf, other]

HexCNN: A Framework for Native Hexagonal Convolutional Neural Networks

Authors: Yunxiang Zhao, Qiuhong Ke, Flip Korn, Jianzhong Qi, Rui Zhang

Abstract: Hexagonal CNN models have shown superior performance in applications such as IACT data analysis and aerial scene classification due to their better rotation symmetry and reduced anisotropy. In order to realize hexagonal processing, existing studies mainly use the ZeroOut method to imitate hexagonal processing, which causes substantial memory and computation overheads. We address this deficiency wi… ▽ More Hexagonal CNN models have shown superior performance in applications such as IACT data analysis and aerial scene classification due to their better rotation symmetry and reduced anisotropy. In order to realize hexagonal processing, existing studies mainly use the ZeroOut method to imitate hexagonal processing, which causes substantial memory and computation overheads. We address this deficiency with a novel native hexagonal CNN framework named HexCNN. HexCNN takes hexagon-shaped input and performs forward and backward propagation on the original form of the input based on hexagon-shaped filters, hence avoiding computation and memory overheads caused by imitation. For applications with rectangle-shaped input but require hexagonal processing, HexCNN can be applied by padding the input into hexagon-shape as preprocessing. In this case, we show that the time and space efficiency of HexCNN still outperforms existing hexagonal CNN methods substantially. Experimental results show that compared with the state-of-the-art models, which imitate hexagonal processing but using rectangle-shaped filters, HexCNN reduces the training time by up to 42.2%. Meanwhile, HexCNN saves the memory space cost by up to 25% and 41.7% for loading the input and performing convolution, respectively. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2101.06704 [pdf, other]

Adversarial Interaction Attack: Fooling AI to Misinterpret Human Intentions

Authors: Nodens Koren, Qiuhong Ke, Yisen Wang, James Bailey, Xingjun Ma

Abstract: Understanding the actions of both humans and artificial intelligence (AI) agents is important before modern AI systems can be fully integrated into our daily life. In this paper, we show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise to misinterpret the intention of an action in interaction scenarios. Based on a case study… ▽ More Understanding the actions of both humans and artificial intelligence (AI) agents is important before modern AI systems can be fully integrated into our daily life. In this paper, we show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise to misinterpret the intention of an action in interaction scenarios. Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions, and demonstrate how DNN-based interaction models can be tricked to predict the participants' reactions in unexpected ways. From a broader perspective, the scope of our proposed attack method is not confined to problems related to skeleton data but can also be extended to any type of problems involving sequential regressions. Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications. △ Less

Submitted 17 January, 2021; originally announced January 2021.

Comments: Preprint

arXiv:2012.11866 [pdf, other]

doi 10.1109/TPAMI.2022.3183112

Human Action Recognition from Various Data Modalities: A Review

Authors: Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, Jun Liu

Abstract: Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal,… ▽ More Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions. △ Less

Submitted 21 June, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

arXiv:2010.09925 [pdf, other]

doi 10.1109/TIP.2020.3031173

Hierarchical Paired Channel Fusion Network for Street Scene Change Detection

Authors: Yinjie Lei, Duo Peng, Pingping Zhang, Qiuhong Ke, Haifeng Li

Abstract: Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key… ▽ More Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key for the SSCD task is to design an effective feature fusion method that can improve the accuracy of the corresponding change maps. To this end, we present a novel Hierarchical Paired Channel Fusion Network (HPCFNet), which utilizes the adaptive fusion of paired feature channels. Specifically, the features of a given image pair are jointly extracted by a Siamese Convolutional Neural Network (SCNN) and hierarchically combined by exploring the fusion of channel pairs at multiple feature levels. In addition, based on the observation that the distribution of scene changes is diverse, we further propose a Multi-Part Feature Learning (MPFL) strategy to detect diverse changes. Based on the MPFL strategy, our framework achieves a novel approach to adapt to the scale and location diversities of the scene change regions. Extensive experiments on three public datasets (i.e., PCD, VL-CMU-CD and CDnet2014) demonstrate that the proposed framework achieves superior performance which outperforms other state-of-the-art methods with a considerable margin. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: To appear in Transactions on Image Processing, including 13 pages, 13 figures, 9 tables

arXiv:2009.01142 [pdf, other]

Long-Term Anticipation of Activities with Cycle Consistency

Authors: Yazan Abu Farha, Qiuhong Ke, Bernt Schiele, Juergen Gall

Abstract: With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and th… ▽ More With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: GCPR 2020

arXiv:2006.15383 [pdf, other]

Interdisciplinary research and technological impact: Evidence from biomedicine

Authors: Qing Ke

Abstract: Interdisciplinary research (IDR) has been considered as an important source for scientific breakthroughs and as a solution to today's complex societal challenges. While ample empirical evidence has suggested its benefits within the academia such as better creativity and higher scientific impact and visibility, its societal benefits -- a key argument originally used for promoting IDR -- remain rela… ▽ More Interdisciplinary research (IDR) has been considered as an important source for scientific breakthroughs and as a solution to today's complex societal challenges. While ample empirical evidence has suggested its benefits within the academia such as better creativity and higher scientific impact and visibility, its societal benefits -- a key argument originally used for promoting IDR -- remain relatively unexplored. Here, we study one aspect of societal benefits, that is contributing to the development of patented technologies, and examine how IDR papers are referenced as "prior art" by patents over time. We draw on a large sample of biomedical papers published in 23 years and measure the degree of interdisciplinarity of a paper using three popular indicators, namely variety, balance, and disparity. We find that papers that cites more fields (variety) and whose distributions over those cited fields are more even (balance) are more likely to receive patent citations, but both effects can be offset if papers draw upon more distant fields (disparity). These associations are consistent across different citation-window lengths. We further find that conditional on receiving patent citations, the intensity of their technological impact, as measured as both raw and quality-adjusted number of citing patents, increases with balance and disparity. Our work may have policy implications for interdisciplinary research and scientific and technological impact. △ Less

Submitted 4 January, 2023; v1 submitted 27 June, 2020; originally announced June 2020.

Journal ref: Scientometrics 128, 2035-2077 (2023)

arXiv:2006.02472 [pdf, other]

doi 10.1016/j.respol.2020.104071

Technological impact of biomedical research: the role of basicness and novelty

Authors: Qing Ke

Abstract: An ongoing interest in innovation studies is to understand how knowledge generated from scientific research can be used in the development of technologies. While previous inquiries have devoted to studying the scientific capacity of technologies and institutional factors facilitating technology transfer, little is known about the intrinsic characteristics of scientific publications that gain direc… ▽ More An ongoing interest in innovation studies is to understand how knowledge generated from scientific research can be used in the development of technologies. While previous inquiries have devoted to studying the scientific capacity of technologies and institutional factors facilitating technology transfer, little is known about the intrinsic characteristics of scientific publications that gain direct technological impact. Here we focus on two features, namely basicness and novelty. Using a corpus of 3.8 million papers published between 1980 and 1999, we find that basic science papers and novel papers are substantially more likely to achieve direct technological impact. Further analysis that limits to papers with technological impact reveals that basic science and novel science have more patent citations, experience shorter time lag, and have impact in broader technological fields. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Journal ref: Research Policy 49, 104071 (2020)

arXiv:2001.08199 [pdf, other]

Neural Embeddings of Scholarly Periodicals Reveal Complex Disciplinary Organizations

Authors: Hao Peng, Qing Ke, Ceren Budak, Daniel M. Romero, Yong-Yeol Ahn

Abstract: Understanding the structure of knowledge domains is one of the foundational challenges in science of science. Here, we propose a neural embedding technique that leverages the information contained in the citation network to obtain continuous vector representations of scientific periodicals. We demonstrate that our periodical embeddings encode nuanced relationships between periodicals as well as th… ▽ More Understanding the structure of knowledge domains is one of the foundational challenges in science of science. Here, we propose a neural embedding technique that leverages the information contained in the citation network to obtain continuous vector representations of scientific periodicals. We demonstrate that our periodical embeddings encode nuanced relationships between periodicals as well as the complex disciplinary and interdisciplinary structure of science, allowing us to make cross-disciplinary analogies between periodicals. Furthermore, we show that the embeddings capture meaningful "axes" that encompass knowledge domains, such as an axis from "soft" to "hard" sciences or from "social" to "biological" sciences, which allow us to quantitatively ground periodicals on a given dimension. By offering novel quantification in science of science, our framework may in turn facilitate the study of how knowledge is created and organized. △ Less

Submitted 20 February, 2021; v1 submitted 22 January, 2020; originally announced January 2020.

arXiv:1912.01527 [pdf, other]

doi 10.1016/j.joi.2019.100998

The citation disadvantage of clinical research

Authors: Qing Ke

Abstract: Biomedical research encompasses diverse types of activities, from basic science ("bench") to clinical medicine ("bedside") to bench-to-bedside translational research. It, however, remains unclear whether different types of research receive citations at varying rates. Here we aim to answer this question by using a newly proposed paper-level indicator that quantifies the extent to which a paper is b… ▽ More Biomedical research encompasses diverse types of activities, from basic science ("bench") to clinical medicine ("bedside") to bench-to-bedside translational research. It, however, remains unclear whether different types of research receive citations at varying rates. Here we aim to answer this question by using a newly proposed paper-level indicator that quantifies the extent to which a paper is basic science or clinical medicine. Applying this measure to 5 million biomedical papers, we find a systematic citation disadvantage of clinical oriented papers; they tend to garner far fewer citations and are less likely to be hit works than papers oriented towards basic science. At the same time, clinical research has a higher variance in its citation. We also find that the citation difference between basic and clinical research decreases, yet still persists, if longer citation-window is used. Given the increasing adoption of short-term, citation-based bibliometric indicators in funding decisions, the under-cited effect of clinical research may provide disincentives for bio-researchers to venture into the translation of basic scientific discoveries into clinical applications, thus providing explanations of reasons behind the existence of the gap between basic and clinical research that is commented as "valley of death" and the commentary of "extinction" risk of translational researchers. Our work may provide insights to policy-makers on how to evaluate different types of biomedical research. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Journal ref: Journal of Informetrics 14, 100998 (2020)

arXiv:1903.10610 [pdf, other]

doi 10.1016/j.joi.2020.101074

An analysis of the evolution of science-technology linkage in biomedicine

Authors: Qing Ke

Abstract: Demonstrating the practical value of public research has been an important subject in science policy. Here we present a detailed study on the evolution of the citation linkage between life science related patents and biomedical research over a 37-year period. Our analysis relies on a newly-created dataset that systematically links millions of non-patent references to biomedical papers. We find a l… ▽ More Demonstrating the practical value of public research has been an important subject in science policy. Here we present a detailed study on the evolution of the citation linkage between life science related patents and biomedical research over a 37-year period. Our analysis relies on a newly-created dataset that systematically links millions of non-patent references to biomedical papers. We find a large disparity in the volume of science linkage among technology sectors, with biotechnology and drug patents dominating it. The linkage has been growing exponentially over a long period of time, doubling every 2.9 years. The U.S. has been the largest producer of cited science for years, receiving nearly half of the citations. More than half of citations goes to universities. We use a new paper-level indicator to quantify to what extent a paper is basic research or clinical medicine. We find that the cited papers are likely to be basic research, yet a significant portion of papers cited in patents that are related to FDA-approved drugs are clinical research. The U.S. National Institute of Health continues to be an important funder of cited science. For the majority of companies, more than half of citations in their patents are authored by public research. Taken together, these results indicate a continuous linkage of public science to private sector inventions. △ Less

Submitted 5 June, 2020; v1 submitted 25 March, 2019; originally announced March 2019.

Comments: 13 pages, 6 figures, 7 tables

Journal ref: Journal of Informetrics 14, 101074 (2020)

arXiv:1812.10609 [pdf]

doi 10.1093/jamia/ocy177

Identifying translational science through embeddings of controlled vocabularies

Authors: Qing Ke

Abstract: Objective: Translational science aims at "translating" basic scientific discoveries into clinical applications. The identification of translational science has practicality such as evaluating the effectiveness of investments made into large programs like the Clinical and Translational Science Awards. Despite several proposed methods that group publications---the primary unit of research output---i… ▽ More Objective: Translational science aims at "translating" basic scientific discoveries into clinical applications. The identification of translational science has practicality such as evaluating the effectiveness of investments made into large programs like the Clinical and Translational Science Awards. Despite several proposed methods that group publications---the primary unit of research output---into some categories, we still lack a quantitative way to place papers onto the full, continuous spectrum from basic research to clinical medicine. Methods: Here we learn vector-representations of controlled vocabularies assigned to MEDLINE papers to obtain a Translational Axis (TA) that points from basic science to clinical medicine. The projected position of a term on the TA, expressed by a continuous quantity, indicates the term's "appliedness." The position of a paper, determined by the average location over its terms, quantifies the degree of its "appliedness," which we term as "level score." Results: We validate our method by comparing with previous techniques, showing excellent agreement yet uncovering significant variations of scores of papers in previously defined categories. The measure allows us to characterize the standing of journals, disciplines, and the entire biomedical literature along the basic-applied spectrum. Analysis on large-scale citation network reveals two main findings. First, direct citations mainly occurred between papers with similar scores. Second, shortest paths are more likely ended up with a paper closer to the basic end of the spectrum, regardless of where the starting paper is on the spectrum. Conclusions: The proposed method provides a quantitative way to identify translational science. △ Less

Submitted 26 December, 2018; originally announced December 2018.

Comments: Accepted at JAMIA; Supporting Information at http://qke.github.io/assets/pdf/trans_supp.pdf

Journal ref: Journal of the American Medical Informatics Association 26, 516-523 (2019)

arXiv:1804.04105 [pdf, other]

doi 10.1016/j.joi.2018.06.010

Comparing scientific and technological impact of biomedical research

Authors: Qing Ke

Abstract: Traditionally, the number of citations that a scholarly paper receives from other papers is used as the proxy of its scientific impact. Yet citations can come from domains outside the scientific community, and one such example is through patented technologies---paper can be cited by patents, achieving technological impact. While the scientific impact of papers has been extensively studied, the tec… ▽ More Traditionally, the number of citations that a scholarly paper receives from other papers is used as the proxy of its scientific impact. Yet citations can come from domains outside the scientific community, and one such example is through patented technologies---paper can be cited by patents, achieving technological impact. While the scientific impact of papers has been extensively studied, the technological aspect remains less known in the literature. Here we aim to fill this gap by presenting a comparative study on how 919 thousand biomedical papers are cited by U.S. patents and by other papers over time. We observe a positive correlation between citations from patents and from papers, but there is little overlap between the two domains in either the most cited papers, or papers with the most delayed recognition. We also find that the two types of citations exhibit distinct temporal variations, with patent citations lagging behind paper citations for a median of 6 years for the majority of papers. Our work contributes to the understanding of the technological impact of papers. △ Less

Submitted 3 July, 2018; v1 submitted 11 April, 2018; originally announced April 2018.

Journal ref: Journal of Informetrics 12, 706-717 (2018)

arXiv:1709.07580 [pdf, other]

doi 10.1145/3134692

Service Providers of the Sharing Economy: Who Joins and Who Benefits?

Authors: Qing Ke

Abstract: Many "sharing economy" platforms, such as Uber and Airbnb, have become increasingly popular, providing consumers with more choices and suppliers a chance to make profit. They, however, have also brought about emerging issues regarding regulation, tax obligation, and impact on urban environment, and have generated heated debates from various interest groups. Empirical studies regarding these issues… ▽ More Many "sharing economy" platforms, such as Uber and Airbnb, have become increasingly popular, providing consumers with more choices and suppliers a chance to make profit. They, however, have also brought about emerging issues regarding regulation, tax obligation, and impact on urban environment, and have generated heated debates from various interest groups. Empirical studies regarding these issues are limited, partly due to the unavailability of relevant data. Here we aim to understand service providers of the sharing economy, investigating who joins and who benefits, using the Airbnb market in the United States as a case study. We link more than 211 thousand Airbnb listings owned by 188 thousand hosts with demographic, socio-economic status (SES), housing, and tourism characteristics. We show that income and education are consistently the two most influential factors that are linked to the joining of Airbnb, regardless of the form of participation or year. Areas with lower median household income, or higher fraction of residents who have Bachelor's and higher degrees, tend to have more hosts. However, when considering the performance of listings, as measured by number of newly received reviews, we find that income has a positive effect for entire-home listings; listings located in areas with higher median household income tend to have more new reviews. Our findings demonstrate empirically that the disadvantage of SES-disadvantaged areas and the advantage of SES-advantaged areas may be present in the sharing economy. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Comments: CSCW 2018 Online First

arXiv:1703.03492 [pdf, other]

doi 10.1109/CVPR.2017.486

A New Representation of Skeleton Sequences for 3D Action Recognition

Authors: Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid

Abstract: This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips each consisting of several frames for spatial temporal feature learning using deep neural networks. Each clip is generated from one channel of the cylindrical coordinates of the skeleton seq… ▽ More This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips each consisting of several frames for spatial temporal feature learning using deep neural networks. Each clip is generated from one channel of the cylindrical coordinates of the skeleton sequence. Each frame of the generated clips represents the temporal information of the entire skeleton sequence, and incorporates one particular spatial relationship between the joints. The entire clips include multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We propose to use deep convolutional neural networks to learn long-term temporal information of the skeleton sequence from the frames of the generated clips, and then use a Multi-Task Learning Network (MTLN) to jointly process all frames of the generated clips in parallel to incorporate spatial structural information for action recognition. Experimental results clearly show the effectiveness of the proposed new representation and feature learning method for 3D action recognition. △ Less

Submitted 4 June, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

Comments: CVPR 2017

arXiv:1701.01645 [pdf, other]

doi 10.1145/3091478.3091504

Sharing Means Renting?: An Entire-marketplace Analysis of Airbnb

Authors: Qing Ke

Abstract: Airbnb, an online marketplace for accommodations, has experienced a staggering growth accompanied by intense debates and scattered regulations around the world. Current discourses, however, are largely focused on opinions rather than empirical evidences. Here, we aim to bridge this gap by presenting the first large-scale measurement study on Airbnb, using a crawled data set containing 2.3 million… ▽ More Airbnb, an online marketplace for accommodations, has experienced a staggering growth accompanied by intense debates and scattered regulations around the world. Current discourses, however, are largely focused on opinions rather than empirical evidences. Here, we aim to bridge this gap by presenting the first large-scale measurement study on Airbnb, using a crawled data set containing 2.3 million listings, 1.3 million hosts, and 19.3 million reviews. We measure several key characteristics at the heart of the ongoing debate and the sharing economy. Among others, we find that Airbnb has reached a global yet heterogeneous coverage. The majority of its listings across many countries are entire homes, suggesting that Airbnb is actually more like a rental marketplace rather than a spare-room sharing platform. Analysis on star-ratings reveals that there is a bias toward positive ratings, amplified by a bias toward using positive words in reviews. The extent of such bias is greater than Yelp reviews, which were already shown to exhibit a positive bias. We investigate a key issue---commercial hosts who own multiple listings on Airbnb---repeatedly discussed in the current debate. We find that their existence is prevalent, they are early-movers towards joining Airbnb, and their listings are disproportionately entire homes and located in the US. Our work advances the current understanding of how Airbnb is being used and may serve as an independent and empirical reference to inform the debate. △ Less

Submitted 12 May, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

Comments: WebSci '17

Showing 1–50 of 56 results for author: Ke, Q