Search | arXiv e-print repository

COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

Authors: Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun

Abstract: The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framewo… ▽ More The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.00941 [pdf, ps, other]

Full Iso-recursive Types

Authors: Litao Zhou, Qianyong Wan, Bruno C. d. S. Oliveira

Abstract: There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reas… ▽ More There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reasoning about the equivalence of the two formulations of recursive types. This paper proposes a generalization of iso-recursive types called full iso-recursive types. Full iso-recursive types allow encoding all programs with equi-recursive types without computational overhead. Instead of explicit term coercions, all type transformations are captured by computationally irrelevant casts, which can be erased at runtime without affecting the semantics of the program. Consequently, reasoning about the equivalence between the two approaches can be greatly simplified. We present a calculus called $λ^μ_{Fi}$, which extends the simply typed lambda calculus (STLC) with full iso-recursive types. The $λ^μ_{Fi}$ calculus is proved to be type sound, and shown to have the same expressive power as a calculus with equi-recursive types. We also extend our results to subtyping, and show that equi-recursive subtyping can be expressed in terms of iso-recursive subtyping with cast operators. △ Less

Submitted 7 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

Comments: This work has been conditionally accepted to OOPSLA 2024

arXiv:2406.13987 [pdf]

Image anomaly detection and prediction scheme based on SSA optimized ResNet50-BiGRU model

Authors: Qianhui Wan, Zecheng Zhang, Liheng Jiang, Zhaoqi Wang, Yan Zhou

Abstract: Image anomaly detection is a popular research direction, with many methods emerging in recent years due to rapid advancements in computing. The use of artificial intelligence for image anomaly detection has been widely studied. By analyzing images of athlete posture and movement, it is possible to predict injury status and suggest necessary adjustments. Most existing methods rely on convolutional… ▽ More Image anomaly detection is a popular research direction, with many methods emerging in recent years due to rapid advancements in computing. The use of artificial intelligence for image anomaly detection has been widely studied. By analyzing images of athlete posture and movement, it is possible to predict injury status and suggest necessary adjustments. Most existing methods rely on convolutional networks to extract information from irrelevant pixel data, limiting model accuracy. This paper introduces a network combining Residual Network (ResNet) and Bidirectional Gated Recurrent Unit (BiGRU), which can predict potential injury types and provide early warnings by analyzing changes in muscle and bone poses from video images. To address the high complexity of this network, the Sparrow search algorithm was used for optimization. Experiments conducted on four datasets demonstrated that our model has the smallest error in image anomaly detection compared to other models, showing strong adaptability. This provides a new approach for anomaly detection and predictive analysis in images, contributing to the sustainable development of human health and performance. △ Less

Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.00017 [pdf, other]

PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

Authors: Shezheng Song, Shasha Li, Shan Zhao, Chengyu Wang, Xiaopeng Li, Jie Yu, Qian Wan, Jun Ma, Tianwei Yan, Wentao Ma, Xiaoguang Mao

Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text to… ▽ More Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text tokens with image patches, leading to misalignment and ineffective image utilization. In contrast, a pipeline framework first identifies aspects through MATE (Multimodal Aspect Term Extraction) and then aligns these aspects with image patches for sentiment classification (MASC: Multimodal Aspect-Oriented Sentiment Classification). This method is better suited for multimodal scenarios where effective image use is crucial. We present three key observations: (a) MATE and MASC have different feature requirements, with MATE focusing on token-level features and MASC on sequence-level features; (b) the aspect identified by MATE is crucial for effective image utilization; and (c) images play a trivial role in previous MABSA methods due to high noise. Based on these observations, we propose a pipeline framework that first predicts the aspect and then uses translation-based alignment (TBA) to enhance multimodal semantic consistency for better image utilization. Our method achieves state-of-the-art (SOTA) performance on widely used MABSA datasets Twitter-15 and Twitter-17. This demonstrates the effectiveness of the pipeline approach and its potential to provide valuable insights for future MABSA research. For reproducibility, the code and checkpoint will be released. △ Less

Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced June 2024.

Comments: Code will be released upon publication

arXiv:2405.13325 [pdf, other]

DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction with Slot Querying

Authors: Guanghui Wang, Dexi Liu, Jian-Yun Nie, Qizhi Wan, Rong Hu, Xiping Liu, Wanlong Liu, Jiaming Liu

Abstract: Recent advancements in event argument extraction (EAE) involve incorporating useful auxiliary information into models during training and inference, such as retrieved instances and event templates. These methods face two challenges: (1) the retrieval results may be irrelevant and (2) templates are developed independently for each event without considering their possible relationship. In this work,… ▽ More Recent advancements in event argument extraction (EAE) involve incorporating useful auxiliary information into models during training and inference, such as retrieved instances and event templates. These methods face two challenges: (1) the retrieval results may be irrelevant and (2) templates are developed independently for each event without considering their possible relationship. In this work, we propose DEGAP to address these challenges through a simple yet effective components: dual prefixes, i.e. learnable prompt vectors, where the instance-oriented prefix and template-oriented prefix are trained to learn information from different event instances and templates. Additionally, we propose an event-guided adaptive gating mechanism, which can adaptively leverage possible connections between different events and thus capture relevant information from the prefix. Finally, these event-guided prefixes provide relevant information as cues to EAE model without retrieval. Extensive experiments demonstrate that our method achieves new state-of-the-art performance on four datasets (ACE05, RAMS, WIKIEVENTS, and MLEE). Further analysis shows the impact of different components. △ Less

Submitted 15 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.00257

YOLOOC: YOLO-based Open-Class Incremental Object Detection with Novel Class Discovery

Authors: Qian Wan, Xiang Xiang, Qinhao Zhou

Abstract: Because of its use in practice, open-world object detection (OWOD) has gotten a lot of attention recently. The challenge is how can a model detect novel classes and then incrementally learn them without forgetting previously known classes. Previous approaches hinge on strongly-supervised or weakly-supervised novel-class data for novel-class detection, which may not apply to real applications. We c… ▽ More Because of its use in practice, open-world object detection (OWOD) has gotten a lot of attention recently. The challenge is how can a model detect novel classes and then incrementally learn them without forgetting previously known classes. Previous approaches hinge on strongly-supervised or weakly-supervised novel-class data for novel-class detection, which may not apply to real applications. We construct a new benchmark that novel classes are only encountered at the inference stage. And we propose a new OWOD detector YOLOOC, based on the YOLO architecture yet for the Open-Class setup. We introduce label smoothing to prevent the detector from over-confidently mapping novel classes to known classes and to discover novel classes. Extensive experiments conducted on our more realistic setup demonstrate the effectiveness of our method for discovering novel classes in our new benchmark. △ Less

Submitted 22 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Withdrawn because it was submitted without consent of the first author. In addition, this submission has some errors

arXiv:2403.00632 [pdf, other]

Metamorpheus: Interactive, Affective, and Creative Dream Narration Through Metaphorical Visual Storytelling

Authors: Qian Wan, Xin Feng, Yining Bei, Zhiqi Gao, Zhicong Lu

Abstract: Human emotions are essentially molded by lived experiences, from which we construct personalised meaning. The engagement in such meaning-making process has been practiced as an intervention in various psychotherapies to promote wellness. Nevertheless, to support recollecting and recounting lived experiences in everyday life remains under explored in HCI. It also remains unknown how technologies su… ▽ More Human emotions are essentially molded by lived experiences, from which we construct personalised meaning. The engagement in such meaning-making process has been practiced as an intervention in various psychotherapies to promote wellness. Nevertheless, to support recollecting and recounting lived experiences in everyday life remains under explored in HCI. It also remains unknown how technologies such as generative AI models can facilitate the meaning making process, and ultimately support affective mindfulness. In this paper we present Metamorpheus, an affective interface that engages users in a creative visual storytelling of emotional experiences during dreams. Metamorpheus arranges the storyline based on a dream's emotional arc, and provokes self-reflection through the creation of metaphorical images and text depictions. The system provides metaphor suggestions, and generates visual metaphors and text depictions using generative AI models, while users can apply generations to recolour and re-arrange the interface to be visually affective. Our experience-centred evaluation manifests that, by interacting with Metamorpheus, users can recall their dreams in vivid detail, through which they relive and reflect upon their experiences in a meaningful way. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted by CHI 2024

arXiv:2401.16087 [pdf, other]

High Resolution Image Quality Database

Authors: Huang Huang, Qiang Wan, Jari Korhonen

Abstract: With technology for digital photography and high resolution displays rapidly evolving and gaining popularity, there is a growing demand for blind image quality assessment (BIQA) models for high resolution images. Unfortunately, the publicly available large scale image quality databases used for training BIQA models contain mostly low or general resolution images. Since image resizing affects image… ▽ More With technology for digital photography and high resolution displays rapidly evolving and gaining popularity, there is a growing demand for blind image quality assessment (BIQA) models for high resolution images. Unfortunately, the publicly available large scale image quality databases used for training BIQA models contain mostly low or general resolution images. Since image resizing affects image quality, we assume that the accuracy of BIQA models trained on low resolution images would not be optimal for high resolution images. Therefore, we created a new high resolution image quality database (HRIQ), consisting of 1120 images with resolution of 2880x2160 pixels. We conducted a subjective study to collect the subjective quality ratings for HRIQ in a controlled laboratory setting, resulting in accurate MOS at high resolution. To demonstrate the importance of a high resolution image quality database for training BIQA models to predict mean opinion scores (MOS) of high resolution images accurately, we trained and tested several traditional and deep learning based BIQA methods on different resolution versions of our database. The database is publicly available in https://github.com/jarikorhonen/hriq. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.14733 [pdf, other]

Harnessing Diffusion Models for Visual Perception with Meta Prompts

Authors: Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang

Abstract: The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual inputs, a feat made possible through its pre-training on large-scale image-text pairs. This leads to a natural inquiry: can diffusion models be utilized to tack… ▽ More The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual inputs, a feat made possible through its pre-training on large-scale image-text pairs. This leads to a natural inquiry: can diffusion models be utilized to tackle visual perception tasks? In this paper, we propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. The effect of meta prompts are two-fold. First, as a direct replacement of the text embeddings in the T2I models, it can activate task-relevant features during feature extraction. Second, it will be used to re-arrange the extracted features to ensures that the model focuses on the most pertinent features for the task on hand. Additionally, we design a recurrent refinement training strategy that fully leverages the property of diffusion models, thereby yielding stronger visual features. Extensive experiments across various benchmarks validate the effectiveness of our approach. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes. Concurrently, the proposed method attains results comparable to the current state-of-the-art in semantic segmentation on ADE20K and pose estimation on COCO datasets, further exemplifying its robustness and versatility. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2307.11025 [pdf, other]

doi 10.1145/3637357

Investigating VTubing as a Reconstruction of Streamer Self-Presentation: Identity, Performance, and Gender

Authors: Qian Wan, Zhicong Lu

Abstract: VTubers, or Virtual YouTubers, are live streamers who create streaming content using animated 2D or 3D virtual avatars. In recent years, there has been a significant increase in the number of VTuber creators and viewers across the globe. This practise has drawn research attention into topics such as viewers' engagement behaviors and perceptions, however, as animated avatars offer more identity and… ▽ More VTubers, or Virtual YouTubers, are live streamers who create streaming content using animated 2D or 3D virtual avatars. In recent years, there has been a significant increase in the number of VTuber creators and viewers across the globe. This practise has drawn research attention into topics such as viewers' engagement behaviors and perceptions, however, as animated avatars offer more identity and performance flexibility than traditional live streaming where one uses their own body, little research has focused on how this flexibility influences how creators present themselves. This research thus seeks to fill this gap by presenting results from a qualitative study of 16 Chinese-speaking VTubers' streaming practices. The data revealed that the virtual avatars that were used while live streaming afforded creators opportunities to present themselves using inflated presentations and resulted in inclusive interactions with viewers. The results also unveiled the inflated, and often sexualized, gender expressions of VTubers while they were situated in misogynistic environments. The socio-technical facets of VTubing were found to potentially reduce sexual harassment and sexism, whilst also raising self-objectification concerns. △ Less

Submitted 29 February, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: To appear at ACM CSCW 2024 (Accepted to PACM HCI(CSCW))

ACM Class: H.5.m; K.4.0

Journal ref: Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 80 (April 2024), 22 pages

arXiv:2307.10811 [pdf, other]

doi 10.1145/3637361

"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models

Authors: Qian Wan, Siying Hu, Yu Zhang, Piaohong Wang, Bo Wen, Zhicong Lu

Abstract: Prewriting is the process of discovering and developing ideas before a first draft, which requires divergent thinking and often implies unstructured strategies such as diagramming, outlining, free-writing, etc. Although large language models (LLMs) have been demonstrated to be useful for a variety of tasks including creative writing, little is known about how users would collaborate with LLMs to s… ▽ More Prewriting is the process of discovering and developing ideas before a first draft, which requires divergent thinking and often implies unstructured strategies such as diagramming, outlining, free-writing, etc. Although large language models (LLMs) have been demonstrated to be useful for a variety of tasks including creative writing, little is known about how users would collaborate with LLMs to support prewriting. The preferred collaborative role and initiative of LLMs during such a creativity process is also unclear. To investigate human-LLM collaboration patterns and dynamics during prewriting, we conducted a three-session qualitative study with 15 participants in two creative tasks: story writing and slogan writing. The findings indicated that during collaborative prewriting, there appears to be a three-stage iterative Human-AI Co-creativity process that includes Ideation, Illumination, and Implementation stages. This collaborative process champions the human in a dominant role, in addition to mixed and shifting levels of initiative that exist between humans and LLMs. This research also reports on collaboration breakdowns that occur during this process, user perceptions of using existing LLMs during Human-AI Co-creativity, and discusses design implications to support this co-creativity process. △ Less

Submitted 29 February, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: To appear at ACM CSCW 2024; Accepted to PACM HCI (CSCW); 25 pages, 2 figures

ACM Class: H.5.m; K.4.0

Journal ref: Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 84 (2024)

arXiv:2306.17733 [pdf, other]

Token-Event-Role Structure-based Multi-Channel Document-Level Event Extraction

Authors: Qizhi Wan, Changxuan Wan, Keli Xiao, Hui Xiong, Dexi Liu, Xiping Liu

Abstract: Document-level event extraction is a long-standing challenging information retrieval problem involving a sequence of sub-tasks: entity extraction, event type judgment, and event type-specific multi-event extraction. However, addressing the problem as multiple learning tasks leads to increased model complexity. Also, existing methods insufficiently utilize the correlation of entities crossing diffe… ▽ More Document-level event extraction is a long-standing challenging information retrieval problem involving a sequence of sub-tasks: entity extraction, event type judgment, and event type-specific multi-event extraction. However, addressing the problem as multiple learning tasks leads to increased model complexity. Also, existing methods insufficiently utilize the correlation of entities crossing different events, resulting in limited event extraction performance. This paper introduces a novel framework for document-level event extraction, incorporating a new data structure called token-event-role and a multi-channel argument role prediction module. The proposed data structure enables our model to uncover the primary role of tokens in multiple events, facilitating a more comprehensive understanding of event relationships. By leveraging the multi-channel prediction module, we transform entity and multi-event extraction into a single task of predicting token-event pairs, thereby reducing the overall parameter size and enhancing model efficiency. The results demonstrate that our approach outperforms the state-of-the-art method by 9.5 percentage points in terms of the F1 score, highlighting its superior performance in event extraction. Furthermore, an ablation study confirms the significant value of the proposed data structure in improving event extraction tasks, further validating its importance in enhancing the overall performance of the framework. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2303.10136 [pdf, other]

MassNet: A Deep Learning Approach for Body Weight Extraction from A Single Pressure Image

Authors: Ziyu Wu, Quan Wan, Mingjie Zhao, Yi Ke, Yiran Fang, Zhen Liang, Fangting Xie, Jingyuan Cheng

Abstract: Body weight, as an essential physiological trait, is of considerable significance in many applications like body management, rehabilitation, and drug dosing for patient-specific treatments. Previous works on the body weight estimation task are mainly vision-based, using 2D/3D, depth, or infrared images, facing problems in illumination, occlusions, and especially privacy issues. The pressure mappin… ▽ More Body weight, as an essential physiological trait, is of considerable significance in many applications like body management, rehabilitation, and drug dosing for patient-specific treatments. Previous works on the body weight estimation task are mainly vision-based, using 2D/3D, depth, or infrared images, facing problems in illumination, occlusions, and especially privacy issues. The pressure mapping mattress is a non-invasive and privacy-preserving tool to obtain the pressure distribution image over the bed surface, which strongly correlates with the body weight of the lying person. To extract the body weight from this image, we propose a deep learning-based model, including a dual-branch network to extract the deep features and pose features respectively. A contrastive learning module is also combined with the deep-feature branch to help mine the mutual factors across different postures of every single subject. The two groups of features are then concatenated for the body weight regression task. To test the model's performance over different hardware and posture settings, we create a pressure image dataset of 10 subjects and 23 postures, using a self-made pressure-sensing bedsheet. This dataset, which is made public together with this paper, together with a public dataset, are used for the validation. The results show that our model outperforms the state-of-the-art algorithms over both 2 datasets. Our research constitutes an important step toward fully automatic weight estimation in both clinical and at-home practice. Our dataset is available for research purposes at: https://github.com/USTCWzy/MassEstimation. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Journal ref: PerCom 2023

arXiv:2301.13156 [pdf, other]

SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition

Authors: Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang

Abstract: Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer… ▽ More Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create a family of backbone architectures with superior cost-effectiveness. Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K, Cityscapes, Pascal Context and COCO-Stuff datasets. Critically, we beat both the mobilefriendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles. Furthermore, we incorporate a feature upsampling-based multi-resolution distillation technique, further reducing the inference latency of the proposed framework. Beyond semantic segmentation, we further apply the proposed SeaFormer architecture to image classification and object detection problems, demonstrating the potential of serving as a versatile mobile-friendly backbone. Our code and models are made publicly available at https://github.com/fudan-zvg/SeaFormer. △ Less

Submitted 17 June, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: V4 is the ICLR 2023 conference version, and V5 is the extended version

arXiv:2206.13231 [pdf, other]

QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer

Authors: Jinmiao Huang, Waseem Gharbieh, Qianhui Wan, Han Suk Shim, Chul Lee

Abstract: Current keyword spotting systems are typically trained with a large amount of pre-defined keywords. Recognizing keywords in an open-vocabulary setting is essential for personalizing smart device interaction. Towards this goal, we propose a pure MLP-based neural network that is based on MLPMixer - an MLP model architecture that effectively replaces the attention mechanism in Vision Transformers. We… ▽ More Current keyword spotting systems are typically trained with a large amount of pre-defined keywords. Recognizing keywords in an open-vocabulary setting is essential for personalizing smart device interaction. Towards this goal, we propose a pure MLP-based neural network that is based on MLPMixer - an MLP model architecture that effectively replaces the attention mechanism in Vision Transformers. We investigate different ways of adapting the MLPMixer architecture to the QbyE open-vocabulary keyword spotting task. Comparisons with the state-of-the-art RNN and CNN models show that our method achieves better performance in challenging situations (10dB and 6dB environments) on both the publicly available Hey-Snips dataset and a larger scale internal dataset with 400 speakers. Our proposed model also has a smaller number of parameters and MACs compared to the baseline models. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted to INTERSPEECH 2022

arXiv:2204.12176 [pdf, other]

Cross Pairwise Ranking for Unbiased Item Recommendation

Authors: Qi Wan, Xiangnan He, Xiang Wang, Jiancan Wu, Wei Guo, Ruiming Tang

Abstract: Most recommender systems optimize the model on observed interaction data, which is affected by the previous exposure mechanism and exhibits many biases like popularity bias. The loss functions, such as the mostly used pointwise Binary Cross-Entropy and pairwise Bayesian Personalized Ranking, are not designed to consider the biases in observed data. As a result, the model optimized on the loss woul… ▽ More Most recommender systems optimize the model on observed interaction data, which is affected by the previous exposure mechanism and exhibits many biases like popularity bias. The loss functions, such as the mostly used pointwise Binary Cross-Entropy and pairwise Bayesian Personalized Ranking, are not designed to consider the biases in observed data. As a result, the model optimized on the loss would inherit the data biases, or even worse, amplify the biases. For example, a few popular items take up more and more exposure opportunities, severely hurting the recommendation quality on niche items -- known as the notorious Mathew effect. In this work, we develop a new learning paradigm named Cross Pairwise Ranking (CPR) that achieves unbiased recommendation without knowing the exposure mechanism. Distinct from inverse propensity scoring (IPS), we change the loss term of a sample -- we innovatively sample multiple observed interactions once and form the loss as the combination of their predictions. We prove in theory that this way offsets the influence of user/item propensity on the learning, removing the influence of data biases caused by the exposure mechanism. Advantageous to IPS, our proposed CPR ensures unbiased learning for each training instance without the need of setting the propensity scores. Experimental results demonstrate the superiority of CPR over state-of-the-art debiasing solutions in both model generalization and training efficiency. The codes are available at https://github.com/Qcactus/CPR. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: WWW 2022

arXiv:2201.12673 [pdf, other]

doi 10.1109/JETCAS.2023.3330832

Building time-surfaces by exploiting the complex volatility of an ECRAM memristor

Authors: Marco Rasetto, Qingzhou Wan, Himanshu Akolkar, Feng Xiong, Bertram Shi, Ryad Benosman

Abstract: Memristors have emerged as a promising technology for efficient neuromorphic architectures owing to their ability to act as programmable synapses, combining processing and memory into a single device. Although they are most commonly used for static encoding of synaptic weights, recent work has begun to investigate the use of their dynamical properties, such as Short Term Plasticity (STP), to integ… ▽ More Memristors have emerged as a promising technology for efficient neuromorphic architectures owing to their ability to act as programmable synapses, combining processing and memory into a single device. Although they are most commonly used for static encoding of synaptic weights, recent work has begun to investigate the use of their dynamical properties, such as Short Term Plasticity (STP), to integrate events over time in event-based architectures. However, we are still far from completely understanding the range of possible behaviors and how they might be exploited in neuromorphic computation. This work focuses on a newly developed Li$_\textbf{x}$WO$_\textbf{3}$-based three-terminal memristor that exhibits tunable STP and a conductance response modeled by a double exponential decay. We derive a stochastic model of the device from experimental data and investigate how device stochasticity, STP, and the double exponential decay affect accuracy in a hierarchy of time-surfaces (HOTS) architecture. We found that the device's stochasticity does not affect accuracy, that STP can reduce the effect of salt and pepper noise in signals from event-based sensors, and that the double exponential decay improves accuracy by integrating temporal information over multiple time scales. Our approach can be generalized to study other memristive devices to build a better understanding of how control over temporal dynamics can enable neuromorphic engineers to fine-tune devices and architectures to fit their problems at hand. △ Less

Submitted 15 April, 2024; v1 submitted 29 January, 2022; originally announced January 2022.

arXiv:2201.09658

Real-Time Computer-Generated EIA for Light Field Display by Pre-Calculating and Pre-Storing the Invariable Voxel-Pixel Mapping

Authors: Quanzhen Wan

Abstract: The elemental image array (EIA) for light field display, especially integral imaging light field display, was reliant on a virtual camera array, novel sampling algorithms, high-performance hardware or corresponding complex algorithms, which hinder its application. Without sacrificing accuracy and precision, we innovate a novel algorithm set to achieve video-level EIA generation. The invariable vox… ▽ More The elemental image array (EIA) for light field display, especially integral imaging light field display, was reliant on a virtual camera array, novel sampling algorithms, high-performance hardware or corresponding complex algorithms, which hinder its application. Without sacrificing accuracy and precision, we innovate a novel algorithm set to achieve video-level EIA generation. The invariable voxel to pixel relationship is pre-calculated and pre-stored as a lookup table or mapping. Benefiting from the very lookup table, the voxel array could be fast mapped to an EIA without contingent upon any high-end hardware. △ Less

Submitted 27 April, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: We are reminded by our supervisors and peers that we have not taken many potential influential factors into consideration, which might lead to a rather different outcome. If the whole idea will be certified correctly in the future, we will resubmit our updated version at that time

arXiv:2201.08266

A Real-Time Rendering Method for Light Field Display

Authors: Quanzhen Wan

Abstract: A real-time elemental image array (EIA) generation method which does not sacrifice accuracy nor rely on high-performance hardware is developed, through raytracing and pre-stored voxel-pixel lookup table (LUT). Benefiting from both offline and online working flow, experiments will verified the effectiveness. A real-time elemental image array (EIA) generation method which does not sacrifice accuracy nor rely on high-performance hardware is developed, through raytracing and pre-stored voxel-pixel lookup table (LUT). Benefiting from both offline and online working flow, experiments will verified the effectiveness. △ Less

Submitted 27 April, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: We are reminded by our supervisors and peers that we have not taken many potential influential factors into consideration, which might lead to a rather different outcome. If the whole idea will be certified correctly in the future, we will resubmit our updated version at that time

arXiv:2112.05129 [pdf, other]

Assistive Tele-op: Leveraging Transformers to Collect Robotic Task Demonstrations

Authors: Henry M. Clever, Ankur Handa, Hammad Mazhar, Kevin Parker, Omer Shapira, Qian Wan, Yashraj Narang, Iretiayo Akinola, Maya Cakmak, Dieter Fox

Abstract: Sharing autonomy between robots and human operators could facilitate data collection of robotic task demonstrations to continuously improve learned models. Yet, the means to communicate intent and reason about the future are disparate between humans and robots. We present Assistive Tele-op, a virtual reality (VR) system for collecting robot task demonstrations that displays an autonomous trajector… ▽ More Sharing autonomy between robots and human operators could facilitate data collection of robotic task demonstrations to continuously improve learned models. Yet, the means to communicate intent and reason about the future are disparate between humans and robots. We present Assistive Tele-op, a virtual reality (VR) system for collecting robot task demonstrations that displays an autonomous trajectory forecast to communicate the robot's intent. As the robot moves, the user can switch between autonomous and manual control when desired. This allows users to collect task demonstrations with both a high success rate and with greater ease than manual teleoperation systems. Our system is powered by transformers, which can provide a window of potential states and actions far into the future -- with almost no added computation time. A key insight is that human intent can be injected at any location within the transformer sequence if the user decides that the model-predicted actions are inappropriate. At every time step, the user can (1) do nothing and allow autonomous operation to continue while observing the robot's future plan sequence, or (2) take over and momentarily prescribe a different set of actions to nudge the model back on track. We host the videos and other supplementary material at https://sites.google.com/view/assistive-teleop. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: 9 pages, 4 figures, 1 table. NeurIPS 2021 Workshop on Robot Learning: Self-Supervised and Lifelong Learning, Virtual, Virtual

arXiv:2111.14806 [pdf, other]

Coarse-To-Fine Incremental Few-Shot Learning

Authors: Xiang Xiang, Yuwen Tan, Qian Wan, Jing Ma

Abstract: Different from fine-tuning models pre-trained on a large-scale dataset of preset classes, class-incremental learning (CIL) aims to recognize novel classes over time without forgetting pre-trained classes. However, a given model will be challenged by test images with finer-grained classes, e.g., a basenji is at most recognized as a dog. Such images form a new training set (i.e., support set) so tha… ▽ More Different from fine-tuning models pre-trained on a large-scale dataset of preset classes, class-incremental learning (CIL) aims to recognize novel classes over time without forgetting pre-trained classes. However, a given model will be challenged by test images with finer-grained classes, e.g., a basenji is at most recognized as a dog. Such images form a new training set (i.e., support set) so that the incremental model is hoped to recognize a basenji (i.e., query) as a basenji next time. This paper formulates such a hybrid natural problem of coarse-to-fine few-shot (C2FS) recognition as a CIL problem named C2FSCIL, and proposes a simple, effective, and theoretically-sound strategy Knowe: to learn, normalize, and freeze a classifier's weights from fine labels, once learning an embedding space contrastively from coarse labels. Besides, as CIL aims at a stability-plasticity balance, new overall performance metrics are proposed. In that sense, on CIFAR-100, BREEDS, and tieredImageNet, Knowe outperforms all recent relevant CIL/FSCIL methods that are tailored to the new problem setting for the first time. △ Less

Submitted 24 November, 2021; originally announced November 2021.

arXiv:2110.03553 [pdf, other]

doi 10.1145/3466752.3480120

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Authors: Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu

Abstract: Bayesian Neural Networks (BNNs) that possess a property of uncertainty estimation have been increasingly adopted in a wide range of safety-critical AI applications which demand reliable and robust decision making, e.g., self-driving, rescue robots, medical image diagnosis. The training procedure of a probabilistic BNN model involves training an ensemble of sampled DNN models, which induces orders… ▽ More Bayesian Neural Networks (BNNs) that possess a property of uncertainty estimation have been increasingly adopted in a wide range of safety-critical AI applications which demand reliable and robust decision making, e.g., self-driving, rescue robots, medical image diagnosis. The training procedure of a probabilistic BNN model involves training an ensemble of sampled DNN models, which induces orders of magnitude larger volume of data movement than training a single DNN model. In this paper, we reveal that the root cause for BNN training inefficiency originates from the massive off-chip data transfer by Gaussian Random Variables (GRVs). To tackle this challenge, we propose a novel design that eliminates all the off-chip data transfer by GRVs through the reversed shifting of Linear Feedback Shift Registers (LFSRs) without incurring any training accuracy loss. To efficiently support our LFSR reversion strategy at the hardware level, we explore the design space of the current DNN accelerators and identify the optimal computation mapping scheme to best accommodate our strategy. By leveraging this finding, we design and prototype the first highly efficient BNN training accelerator, named Shift-BNN, that is low-cost and scalable. Extensive evaluation on five representative BNN models demonstrates that Shift-BNN achieves an average of 4.9x (up to 10.8x) boost in energy efficiency and 1.6x (up to 2.8x) speedup over the baseline DNN training accelerator. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 54th IEEE/ACM International Symposium on Microarchitecture

arXiv:2110.01736 [pdf, other]

doi 10.1016/j.neunet.2023.11.009

AdjointBackMapV2: Precise Reconstruction of Arbitrary CNN Unit's Activation via Adjoint Operators

Authors: Qing Wan, Siu Wun Cheung, Yoonsuck Choe

Abstract: Adjoint operators have been found to be effective in the exploration of CNN's inner workings [1]. However, the previous no-bias assumption restricted its generalization. We overcome the restriction via embedding input images into an extended normed space that includes bias in all CNN layers as part of the extended space and propose an adjoint-operator-based algorithm that maps high-level weights b… ▽ More Adjoint operators have been found to be effective in the exploration of CNN's inner workings [1]. However, the previous no-bias assumption restricted its generalization. We overcome the restriction via embedding input images into an extended normed space that includes bias in all CNN layers as part of the extended space and propose an adjoint-operator-based algorithm that maps high-level weights back to the extended input space for reconstructing an effective hypersurface. Such hypersurface can be computed for an arbitrary unit in the CNN, and we prove that this reconstructed hypersurface, when multiplied by the original input (through an inner product), will precisely replicate the output value of each unit. We show experimental results based on the CIFAR-10 and CIFAR-100 data sets where the proposed approach achieves near 0 activation value reconstruction error. △ Less

Submitted 9 November, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: This is a preprint prior to peer-review. For the revised/finalized version, please see https://doi.org/10.1016/j.neunet.2023.11.009

arXiv:2109.12275 [pdf, ps, other]

doi 10.1109/TSP.2022.3140926

A Variational Bayesian Inference-Inspired Unrolled Deep Network for MIMO Detection

Authors: Qian Wan, Jun Fang, Yinsen Huang, Huiping Duan, Hongbin Li

Abstract: The great success of deep learning (DL) has inspired researchers to develop more accurate and efficient symbol detectors for multi-input multi-output (MIMO) systems. Existing DL-based MIMO detectors, however, suffer several drawbacks. To address these issues, in this paper, we develop a model-driven DL detector based on variational Bayesian inference. Specifically, the proposed unrolled DL archite… ▽ More The great success of deep learning (DL) has inspired researchers to develop more accurate and efficient symbol detectors for multi-input multi-output (MIMO) systems. Existing DL-based MIMO detectors, however, suffer several drawbacks. To address these issues, in this paper, we develop a model-driven DL detector based on variational Bayesian inference. Specifically, the proposed unrolled DL architecture is inspired by an inverse-free variational Bayesian learning framework which circumvents matrix inversion via maximizing a relaxed evidence lower bound. Two networks are respectively developed for independent and identically distributed (i.i.d.) Gaussian channels and arbitrarily correlated channels. The proposed networks, referred to as VBINet, have only a few learnable parameters and thus can be efficiently trained with a moderate amount of training samples. The proposed VBINet-based detectors can work in both offline and online training modes. An important advantage of our proposed networks over state-of-the-art MIMO detection networks such as OAMPNet and MMNet is that the VBINet can automatically learn the noise variance from data, thus yielding a significant performance improvement over the OAMPNet and MMNet in the presence of noise variance uncertainty. Simulation results show that the proposed VBINet-based detectors achieve competitive performance for both i.i.d. Gaussian and realistic 3GPP MIMO channels. △ Less

Submitted 11 January, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

Comments: This paper has been accepted by IEEE Transactions on Signal Processing for future publication

arXiv:2109.10443 [pdf, other]

Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior

Authors: Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana, Buck Babich, Bryan Peele, Qian Wan, Iretiayo Akinola, Balakumar Sundaralingam, Dieter Fox, Byron Boots, Nathan D. Ratliff

Abstract: Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guara… ▽ More Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guarantees. In this work, we generalize classical mechanics to what we call geometric fabrics, whose expressivity and theory enable the design of systems that outperform RMPs in practice. Geometric fabrics strictly generalize classical mechanics forming a new physics of behavior by first generalizing them to Finsler geometries and then explicitly bending them to shape their behavior while maintaining stability. We develop the theory of fabrics and present both a collection of controlled experiments examining their theoretical properties and a set of robot system experiments showing improved performance over a well-engineered and hardened implementation of RMPs, our current state-of-the-art in controller design. △ Less

Submitted 18 January, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2105.14519 [pdf]

RFCBF: enhance the performance and stability of Fast Correlation-Based Filter

Authors: Xiongshi Deng, Min Li, Lei Wang, Qikang Wan

Abstract: Feature selection is a preprocessing step which plays a crucial role in the domain of machine learning and data mining. Feature selection methods have been shown to be effctive in removing redundant and irrelevant features, improving the learning algorithm's prediction performance. Among the various methods of feature selection based on redundancy, the fast correlation-based filter (FCBF) is one o… ▽ More Feature selection is a preprocessing step which plays a crucial role in the domain of machine learning and data mining. Feature selection methods have been shown to be effctive in removing redundant and irrelevant features, improving the learning algorithm's prediction performance. Among the various methods of feature selection based on redundancy, the fast correlation-based filter (FCBF) is one of the most effective. In this paper, we proposed a novel extension of FCBF, called RFCBF, which combines resampling technique to improve classification accuracy. We performed comprehensive experiments to compare the RFCBF with other state-of-the-art feature selection methods using the KNN classifier on 12 publicly available data sets. The experimental results show that the RFCBF algorithm yields significantly better results than previous state-of-the-art methods in terms of classification accuracy and runtime. △ Less

Submitted 30 May, 2021; originally announced May 2021.

arXiv:2012.09020 [pdf]

AdjointBackMap: Reconstructing Effective Decision Hypersurfaces from CNN Layers Using Adjoint Operators

Authors: Qing Wan, Yoonsuck Choe

Abstract: There are several effective methods in explaining the inner workings of convolutional neural networks (CNNs). However, in general, finding the inverse of the function performed by CNNs as a whole is an ill-posed problem. In this paper, we propose a method based on adjoint operators to reconstruct, given an arbitrary unit in the CNN (except for the first convolutional layer), its effective hypersur… ▽ More There are several effective methods in explaining the inner workings of convolutional neural networks (CNNs). However, in general, finding the inverse of the function performed by CNNs as a whole is an ill-posed problem. In this paper, we propose a method based on adjoint operators to reconstruct, given an arbitrary unit in the CNN (except for the first convolutional layer), its effective hypersurface in the input space that replicates that unit's decision surface conditioned on a particular input image. Our results show that the hypersurface reconstructed this way, when multiplied by the original input image, would give nearly the exact output value of that unit. We find that the CNN unit's decision surface is largely conditioned on the input, and this may explain why adversarial inputs can effectively deceive CNNs. △ Less

Submitted 29 March, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: 23 pages, 16 figures, 145MB. It may take some time to load

arXiv:2012.08501 [pdf, other]

NAPA: Neural Art Human Pose Amplifier

Authors: Qingfu Wan, Oliver Lu

Abstract: This is the project report for CSCI-GA.2271-001. We target human pose estimation in artistic images. For this goal, we design an end-to-end system that uses neural style transfer for pose regression. We collect a 277-style set for arbitrary style transfer and build an artistic 281-image test set. We directly run pose regression on the test set and show promising results. For pose regression, we pr… ▽ More This is the project report for CSCI-GA.2271-001. We target human pose estimation in artistic images. For this goal, we design an end-to-end system that uses neural style transfer for pose regression. We collect a 277-style set for arbitrary style transfer and build an artistic 281-image test set. We directly run pose regression on the test set and show promising results. For pose regression, we propose a 2d-induced bone map from which pose is lifted. To help such a lifting, we additionally annotate the pseudo 3d labels of the full in-the-wild MPII dataset. Further, we append another style transfer as self supervision to improve 2d. We perform extensive ablation studies to analyze the introduced features. We also compare end-to-end with per-style training and allude to the tradeoff between style transfer and pose regression. Lastly, we generalize our model to the real-world human dataset and show its potentiality as a generic pose model. We explain the theoretical foundation in Appendix. We release code at https://github.com/strawberryfg/NAPA-NST-HPE, data, and video. △ Less

Submitted 15 December, 2020; originally announced December 2020.

Comments: Tech Report; Graduate Course Project Report; Code, datasets and video released

arXiv:2010.14750 [pdf, other]

Geometric Fabrics for the Acceleration-based Design of Robotic Motion

Authors: Mandy Xie, Karl Van Wyk, Anqi Li, Muhammad Asif Rana, Qian Wan, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: This paper describes the pragmatic design and construction of geometric fabrics for shaping a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called… ▽ More This paper describes the pragmatic design and construction of geometric fabrics for shaping a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called optimization fabrics. Fabrics generalize recent work on Riemannian Motion Policies (RMPs); they add provable stability guarantees and improve design consistency while promoting the intuitive acceleration-based principles of modular design that make RMPs successful. We describe a suite of mathematical modeling tools that practitioners can employ in practice and demonstrate both how to mitigate system complexity by constructing behaviors layer-wise and how to employ these tools to design robust, strongly-generalizing, policies that solve practical problems one would expect to find in industry applications. Our system exhibits intelligent global navigation behaviors expressed entirely as provably stable fabrics with zero planning or state machine governance. △ Less

Submitted 25 June, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2003.13764 [pdf, other]

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Authors: Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hrúz, Jakub Kanis, Zdeněk Krňoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou , et al. (10 additional authors not shown)

Abstract: We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole… ▽ More We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones. △ Less

Submitted 10 September, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: European Conference on Computer Vision (ECCV), 2020

arXiv:2001.08665 [pdf, ps, other]

Action Recognition and State Change Prediction in a Recipe Understanding Task Using a Lightweight Neural Network Model

Authors: Qing Wan, Yoonsuck Choe

Abstract: Consider a natural language sentence describing a specific step in a food recipe. In such instructions, recognizing actions (such as press, bake, etc.) and the resulting changes in the state of the ingredients (shape molded, custard cooked, temperature hot, etc.) is a challenging task. One way to cope with this challenge is to explicitly model a simulator module that applies actions to entities an… ▽ More Consider a natural language sentence describing a specific step in a food recipe. In such instructions, recognizing actions (such as press, bake, etc.) and the resulting changes in the state of the ingredients (shape molded, custard cooked, temperature hot, etc.) is a challenging task. One way to cope with this challenge is to explicitly model a simulator module that applies actions to entities and predicts the resulting outcome (Bosselut et al. 2018). However, such a model can be unnecessarily complex. In this paper, we propose a simplified neural network model that separates action recognition and state change prediction, while coupling the two through a novel loss function. This allows learning to indirectly influence each other. Our model, although simpler, achieves higher state change prediction performance (67% average accuracy for ours vs. 55% in (Bosselut et al. 2018)) and takes fewer samples to train (10K ours vs. 65K+ by (Bosselut et al. 2018)). △ Less

Submitted 23 January, 2020; originally announced January 2020.

Comments: AAAI-2020 Student Abstract and Poster Program (Accept)

arXiv:1910.03135 [pdf, other]

DexPilot: Vision Based Teleoperation of Dexterous Robotic Hand-Arm System

Authors: Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, Dieter Fox

Abstract: Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system… ▽ More Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system, DexPilot, was developed that allows for complete control over the full 23 DoA robotic system by merely observing the bare human hand. DexPilot enables operators to carry out a variety of complex manipulation tasks that go beyond simple pick-and-place operations. This allows for collection of high dimensional, multi-modality, state-action data that can be leveraged in the future to learn sensorimotor policies for challenging manipulation tasks. The system performance was measured through speed and reliability metrics across two human demonstrators on a variety of tasks. The videos of the experiments can be found at https://sites.google.com/view/dex-pilot. △ Less

Submitted 14 October, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: 17 pages, first version of DexPilot

arXiv:1905.08231 [pdf, other]

Patch-based 3D Human Pose Refinement

Authors: Qingfu Wan, Weichao Qiu, Alan L. Yuille

Abstract: State-of-the-art 3D human pose estimation approaches typically estimate pose from the entire RGB image in a single forward run. In this paper, we develop a post-processing step to refine 3D human pose estimation from body part patches. Using local patches as input has two advantages. First, the fine details around body parts are zoomed in to high resolution for preciser 3D pose prediction. Second,… ▽ More State-of-the-art 3D human pose estimation approaches typically estimate pose from the entire RGB image in a single forward run. In this paper, we develop a post-processing step to refine 3D human pose estimation from body part patches. Using local patches as input has two advantages. First, the fine details around body parts are zoomed in to high resolution for preciser 3D pose prediction. Second, it enables the part appearance to be shared between poses to benefit rare poses. In order to acquire informative representation of patches, we explore different input modalities and validate the superiority of fusing predicted segmentation with RGB. We show that our method consistently boosts the accuracy of state-of-the-art 3D human pose methods. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: Accepted by CVPR 2019 Augmented Human: Human-centric Understanding and 2D/3D Synthesis, and the third Look Into Person (LIP) Challenge Workshop

arXiv:1712.03917 [pdf, other]

Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals

Authors: Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim

Abstract: In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during ob… ▽ More In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints. △ Less

Submitted 29 March, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

arXiv:1711.10796 [pdf, other]

DeepSkeleton: Skeleton Map for 3D Human Pose Regression

Authors: Qingfu Wan, Wei Zhang, Xiangyang Xue

Abstract: Despite recent success on 2D human pose estimation, 3D human pose estimation still remains an open problem. A key challenge is the ill-posed depth ambiguity nature. This paper presents a novel intermediate feature representation named skeleton map for regression. It distills structural context from irrelavant properties of RGB image e.g. illumination and texture. It is simple, clean and can be eas… ▽ More Despite recent success on 2D human pose estimation, 3D human pose estimation still remains an open problem. A key challenge is the ill-posed depth ambiguity nature. This paper presents a novel intermediate feature representation named skeleton map for regression. It distills structural context from irrelavant properties of RGB image e.g. illumination and texture. It is simple, clean and can be easily generated via deconvolution network. For the first time, we show that training regression network from skeleton map alone is capable of meeting the performance of state-of-theart 3D human pose estimation works. We further exploit the power of multiple 3D hypothesis generation to obtain reasonbale 3D pose in consistent with 2D pose detection. The effectiveness of our approach is validated on challenging in-the-wild dataset MPII and indoor dataset Human3.6M. △ Less

Submitted 29 November, 2017; originally announced November 2017.

arXiv:1610.02807 [pdf, ps, other]

Robust Bayesian Compressed sensing

Authors: Qian Wan, Huiping Duan, Jun Fang, Hongbin Li

Abstract: We consider the problem of robust compressed sensing whose objective is to recover a high-dimensional sparse signal from compressed measurements corrupted by outliers. A new sparse Bayesian learning method is developed for robust compressed sensing. The basic idea of the proposed method is to identify and remove the outliers from sparse signal recovery. To automatically identify the outliers, we e… ▽ More We consider the problem of robust compressed sensing whose objective is to recover a high-dimensional sparse signal from compressed measurements corrupted by outliers. A new sparse Bayesian learning method is developed for robust compressed sensing. The basic idea of the proposed method is to identify and remove the outliers from sparse signal recovery. To automatically identify the outliers, we employ a set of binary indicator hyperparameters to indicate which observations are outliers. These indicator hyperparameters are treated as random variables and assigned a beta process prior such that their values are confined to be binary. In addition, a Gaussian-inverse Gamma prior is imposed on the sparse signal to promote sparsity. Based on this hierarchical prior model, we develop a variational Bayesian method to estimate the indicator hyperparameters as well as the sparse signal. Simulation results show that the proposed method achieves a substantial performance improvement over existing robust compressed sensing techniques. △ Less

Submitted 21 October, 2016; v1 submitted 10 October, 2016; originally announced October 2016.

arXiv:1609.02554 [pdf]

A light-stimulated neuromorphic device based on graphene hybrid phototransistor

Authors: Shuchao Qin, Fengqiu Wang, Yujie Liu, Qing Wan, Xinran Wang, Yongbing Xu, Yi Shi, Xiaomu Wang, Rong Zhang

Abstract: Neuromorphic chip refers to an unconventional computing architecture that is modelled on biological brains. It is ideally suited for processing sensory data for intelligence computing, decision-making or context cognition. Despite rapid development, conventional artificial synapses exhibit poor connection flexibility and require separate data acquisition circuitry, resulting in limited functionali… ▽ More Neuromorphic chip refers to an unconventional computing architecture that is modelled on biological brains. It is ideally suited for processing sensory data for intelligence computing, decision-making or context cognition. Despite rapid development, conventional artificial synapses exhibit poor connection flexibility and require separate data acquisition circuitry, resulting in limited functionalities and significant hardware redundancy. Here we report a novel light-stimulated artificial synapse based on a graphene-nanotube hybrid phototransistor that can directly convert optical stimuli into a "neural image" for further neuronal analysis. Our optically-driven synapses involve multiple steps of plasticity mechanisms and importantly exhibit flexible tuning of both short- and long-term plasticity. Furthermore, our neuromorphic phototransistor can take multiple pre-synaptic light stimuli via wavelength-division multiplexing and allows advanced optical processing through charge-trap-mediated optical coupling. The capability of complex neuromorphic functionalities in a simple silicon-compatible device paves the way for novel neuromorphic computing architectures involving photonics. △ Less

Submitted 7 September, 2016; originally announced September 2016.

Comments: 20 pages, 4 figures

arXiv:1606.06854 [pdf, other]

Model-based Deep Hand Pose Estimation

Authors: Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, Yichen Wei

Abstract: Previous learning based hand pose estimation methods does not fully exploit the prior information in hand model geometry. Instead, they usually rely a separate model fitting step to generate valid hand poses. Such a post processing is inconvenient and sub-optimal. In this work, we propose a model based deep learning approach that adopts a forward kinematics based layer to ensure the geometric vali… ▽ More Previous learning based hand pose estimation methods does not fully exploit the prior information in hand model geometry. Instead, they usually rely a separate model fitting step to generate valid hand poses. Such a post processing is inconvenient and sub-optimal. In this work, we propose a model based deep learning approach that adopts a forward kinematics based layer to ensure the geometric validity of estimated poses. For the first time, we show that embedding such a non-linear generative process in deep learning is feasible for hand pose estimation. Our approach is verified on challenging public datasets and achieves state-of-the-art performance. △ Less

Submitted 22 June, 2016; originally announced June 2016.

arXiv:1510.06115 [pdf]

Proton Conducting Graphene Oxide Coupled Neuron Transistors for Brain-Inspired Cognitive Systems

Authors: Changjin Wan, Liqiang Zhu, Yanghui Liu, Ping Feng, Zhaoping Liu, Hailiang Cao, Peng Xiao, Yi Shi, Qing Wan

Abstract: Neuron is the most important building block in our brain, and information processing in individual neuron involves the transformation of input synaptic spike trains into an appropriate output spike train. Hardware implementation of neuron by individual ionic/electronic hybrid device is of great significance for enhancing our understanding of the brain and solving sensory processing and complex rec… ▽ More Neuron is the most important building block in our brain, and information processing in individual neuron involves the transformation of input synaptic spike trains into an appropriate output spike train. Hardware implementation of neuron by individual ionic/electronic hybrid device is of great significance for enhancing our understanding of the brain and solving sensory processing and complex recognition tasks. Here, we provide a proof-of-principle artificial neuron based on a proton conducting graphene oxide (GO) coupled oxide-based electric-double-layer (EDL) transistor with multiple driving inputs and one modulatory input terminal. Paired-pulse facilitation, dendritic integration and orientation tuning were successfully emulated. Additionally, neuronal gain control (arithmetic) in the scheme of rate coding is also experimentally demonstrated. Our results provide a new-concept approach for building brain-inspired cognitive systems. △ Less

Submitted 20 October, 2015; originally announced October 2015.

Comments: arXiv admin note: text overlap with arXiv:1506.04658

arXiv:1501.00158 [pdf]

Automatic Modulation Recognition of PSK Signals with Sub-Nyquist Sampling Based on High Order Statistics

Authors: Zhengli Xing, Jie Zhou, Jiangfeng Ye, Jun Yan, Jifeng Zou, Lin Zou, Qun Wan

Abstract: Sampling rate required in the Nth Power Nonlinear Transformation (NPT) method is typically much greater than Nyquist rate, which causes heavy burden for the Analog to Digital Converter (ADC). Taking advantage of the sparse property of PSK signals' spectrum under NPT, we develop the NPT method for PSK signals with Sub-Nyquist rate samples. In this paper, combined the NPT method with Compressive Sen… ▽ More Sampling rate required in the Nth Power Nonlinear Transformation (NPT) method is typically much greater than Nyquist rate, which causes heavy burden for the Analog to Digital Converter (ADC). Taking advantage of the sparse property of PSK signals' spectrum under NPT, we develop the NPT method for PSK signals with Sub-Nyquist rate samples. In this paper, combined the NPT method with Compressive Sensing (CS) theory, frequency spectrum reconstruction of the Nth power nonlinear transformation of PSK signals is presented, which can be further used for AMR and rough estimations of unknown carrier frequency and symbol rate. △ Less

Submitted 31 December, 2014; originally announced January 2015.

Comments: 7 pages, 8 figures, submitted to IEEE International Symposium on Signal Processing and Information Technology

arXiv:1501.00154 [pdf]

Automatic Modulation Recognition of PSK Signals Using Nonuniform Compressive Samples Based on High Order Statistics

Authors: Zhengli Xing, Jie Zhou, Jiangfeng Ye, Jun Yan, Lin Zou, Qun Wan

Abstract: Phase modulation is a commonly used modulation mode in digital communication, which usually brings phase sparsity to digital signals. It is naturally to connect the sparsity with the newly emerged theory of compressed sensing (CS), which enables sub-Nyquist sampling of high-bandwidth to sparse signals. For the present, applications of CS theory in communication field mainly focus on spectrum sensi… ▽ More Phase modulation is a commonly used modulation mode in digital communication, which usually brings phase sparsity to digital signals. It is naturally to connect the sparsity with the newly emerged theory of compressed sensing (CS), which enables sub-Nyquist sampling of high-bandwidth to sparse signals. For the present, applications of CS theory in communication field mainly focus on spectrum sensing, sparse channel estimation etc. Few of current researches take the phase sparse character into consideration. In this paper, we establish the novel model of phase modulation signals based on phase sparsity, and introduce CS theory to the phase domain. According to CS theory, rather than the bandwidth, the sampling rate required here is scaling with the symbol rate, which is usually much lower than the Nyquist rate. In this paper, we provide analytical support for the model, and simulations verify its validity. △ Less

Submitted 31 December, 2014; originally announced January 2015.

Comments: 4 pages, 6 figures, submitted to the International Conference on Communications Problem -Solving (ICCP) 2014

arXiv:1501.00151 [pdf]

A Novel Compressed Sensing Based Model for Reconstructing Sparse Signals Using Phase Sparse Character

Authors: Zhengli Xing, Jie Zhou, Jiangfeng Ye, Jun Yan, Lin Zou, Qun Wan

Abstract: Phase modulation is a commonly used modulation mode in digital communication, which usually brings phase sparsity to digital signals. It is naturally to connect the sparsity with the newly emerged theory of compressed sensing (CS), which enables sub-Nyquist sampling of high-bandwidth to sparse signals. For the present, applications of CS theory in communication field mainly focus on spectrum sensi… ▽ More Phase modulation is a commonly used modulation mode in digital communication, which usually brings phase sparsity to digital signals. It is naturally to connect the sparsity with the newly emerged theory of compressed sensing (CS), which enables sub-Nyquist sampling of high-bandwidth to sparse signals. For the present, applications of CS theory in communication field mainly focus on spectrum sensing, sparse channel estimation etc. Few of current researches take the phase sparse character into consideration. In this paper, we establish the novel model of phase modulation signals based on phase sparsity, and introduce CS theory to the phase domain. According to CS theory, rather than the bandwidth, the sampling rate required here is scaling with the symbol rate, which is usually much lower than the Nyquist rate. In this paper, we provide analytical support for the model, and simulations verify its validity. △ Less

Submitted 31 December, 2014; originally announced January 2015.

Comments: 8 pages, 39 figures, subjected to "Elektronika ir Elektrotechnika"

arXiv:1304.7072 [pdf]

Learning and Spatiotemporally Correlated Functions Mimicked in Oxide-Based Artificial Synaptic Transistors

Authors: Chang Jin Wan, Li Qiang Zhu, Yi Shi, Qing Wan

Abstract: Learning and logic are fundamental brain functions that make the individual to adapt to the environment, and such functions are established in human brain by modulating ionic fluxes in synapses. Nanoscale ionic/electronic devices with inherent synaptic functions are considered to be essential building blocks for artificial neural networks. Here, Multi-terminal IZO-based artificial synaptic transis… ▽ More Learning and logic are fundamental brain functions that make the individual to adapt to the environment, and such functions are established in human brain by modulating ionic fluxes in synapses. Nanoscale ionic/electronic devices with inherent synaptic functions are considered to be essential building blocks for artificial neural networks. Here, Multi-terminal IZO-based artificial synaptic transistors gated by fast proton-conducting phosphosilicate electrolytes are fabricated on glass substrates. Proton in the SiO2 electrolyte and IZO channel conductance are regarded as the neurotransmitter and synaptic weight, respectively. Spike-timing dependent plasticity, short-term memory and long-term memory were successfully mimicked in such protonic/electronic hybrid artificial synapses. And most importantly, spatiotemporally correlated logic functions are also mimicked in a simple artificial neural network without any intentional hard-wire connections due to the naturally proton-related coupling effect. The oxide-based protonic/electronic hybrid artificial synaptic transistors reported here are potential building blocks for artificial neural networks. △ Less

Submitted 26 April, 2013; originally announced April 2013.

arXiv:1209.4405 [pdf, ps, other]

Strongly Convex Programming for Principal Component Pursuit

Authors: Qingshan You, Qun Wan, Yipeng Liu

Abstract: In this paper, we address strongly convex programming for princi- pal component pursuit with reduced linear measurements, which decomposes a superposition of a low-rank matrix and a sparse matrix from a small set of linear measurements. We first provide sufficient conditions under which the strongly convex models lead to the exact low-rank and sparse matrix recov- ery; Second, we also give suggest… ▽ More In this paper, we address strongly convex programming for princi- pal component pursuit with reduced linear measurements, which decomposes a superposition of a low-rank matrix and a sparse matrix from a small set of linear measurements. We first provide sufficient conditions under which the strongly convex models lead to the exact low-rank and sparse matrix recov- ery; Second, we also give suggestions on how to choose suitable parameters in practical algorithms. △ Less

Submitted 19 September, 2012; originally announced September 2012.

Comments: 10 pages

arXiv:1206.2322 [pdf, other]

A Fast HRRP Synthesis Algorithm with Sensing Dictionary in GTD Model

Authors: Rong Fan, Qun Wan, Xiao Zhang, Hui Chen, Yipeng Liu

Abstract: To achieve high range resolution profile (HRRP), the geometric theory of diffraction (GTD) parametric model is widely used in stepped-frequency radar system. In the paper, a fast synthetic range profile algorithm, called orthogonal matching pursuit with sensing dictionary (OMP-SD), is proposed. It formulates the traditional HRRP synthetic to be a sparse approximation problem over redundant diction… ▽ More To achieve high range resolution profile (HRRP), the geometric theory of diffraction (GTD) parametric model is widely used in stepped-frequency radar system. In the paper, a fast synthetic range profile algorithm, called orthogonal matching pursuit with sensing dictionary (OMP-SD), is proposed. It formulates the traditional HRRP synthetic to be a sparse approximation problem over redundant dictionary. As it employs a priori information that targets are sparsely distributed in the range space, the synthetic range profile (SRP) can be accomplished even in presence of data lost. Besides, the computational complexity is reduced by introducing sensing dictionary (SD) and it mitigates the model mismatch at the same time. The computation complexity decreases from O(MNDK) flops for OMP to O(M(N +D)K) flops for OMP-SD. Simulation experiments illustrate its advantages both in additive white Gaussian noise (AWGN) and noiseless situation, respectively. △ Less

Submitted 11 June, 2012; originally announced June 2012.

Comments: 16 pages, 8 figures, 2 tables

arXiv:1206.2197 [pdf, other]

Complex Orthogonal Matching Pursuit and Its Exact Recovery Conditions

Authors: Rong Fan, Qun Wan, Yipeng Liu, Hui Chen, Xiao Zhang

Abstract: In this paper, we present new results on using orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries for complex cases (i.e., complex measurement vector, complex dictionary and complex additive white Gaussian noise (CAWGN)). A sufficient condition that OMP can recover the optimal representation of an exactly sparse signal in the complex cases is p… ▽ More In this paper, we present new results on using orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries for complex cases (i.e., complex measurement vector, complex dictionary and complex additive white Gaussian noise (CAWGN)). A sufficient condition that OMP can recover the optimal representation of an exactly sparse signal in the complex cases is proposed both in noiseless and bound Gaussian noise settings. Similar to exact recovery condition (ERC) results in real cases, we extend them to complex case and derivate the corresponding ERC in the paper. It leverages this theory to show that OMP succeed for k-sparse signal from a class of complex dictionary. Besides, an application with geometrical theory of diffraction (GTD) model is presented for complex cases. Finally, simulation experiments illustrate the validity of the theoretical analysis. △ Less

Submitted 11 June, 2012; originally announced June 2012.

Comments: 18 pages, 5 figures

arXiv:1107.1642 [pdf]

Indirect Channel Sensing for Cognitive Amplify-and-Forward Relay Networks

Authors: Yipeng Liu, Qun Wan

Abstract: In cognitive radio network the primary channel information is beneficial. But it can not be obtained by direct channel estimation in cognitive system as pervious methods. And only one possible way is the primary receiver broadcasts the primary channel information to the cognitive users, but it would require the modification of the primary receiver and additional precious spectrum resource. Coopera… ▽ More In cognitive radio network the primary channel information is beneficial. But it can not be obtained by direct channel estimation in cognitive system as pervious methods. And only one possible way is the primary receiver broadcasts the primary channel information to the cognitive users, but it would require the modification of the primary receiver and additional precious spectrum resource. Cooperative communication is also a promising technique. And this paper introduces an indirect channel sensing method for the primary channel in cognitive amplify-and-forward (AF) relay network. As the signal retransmitted from the primary AF relay node includes channel effects, the cognitive radio can receive retransmitted signal from AF node, and then extract the channel information from them. Afterwards, Least squares channel estimation and sparse channel estimation can be used to address the dense and sparse multipath channels respectively. Numerical experiment demonstrates that the proposed indirect channel sensing method has an acceptable performance. △ Less

Submitted 8 July, 2011; originally announced July 2011.

Comments: 5 pages, 5 figures

arXiv:1106.3711 [pdf, ps, other]

doi 10.1109/LAWP.2012.2223451

Sidelobe Suppression for Capon Beamforming with Mainlobe to Sidelobe Power Ratio Maximization

Authors: Yipeng Liu, Qun Wan

Abstract: High sidelobe level is a major disadvantage of the Capon beamforming. To suppress the sidelobe, this paper introduces a mainlobe to sidelobe power ratio constraint to the Capon beamforming. it minimizes the sidelobe power while keeping the mainlobe power constant. Simulations show that the obtained beamformer outperforms the Capon beamformer. High sidelobe level is a major disadvantage of the Capon beamforming. To suppress the sidelobe, this paper introduces a mainlobe to sidelobe power ratio constraint to the Capon beamforming. it minimizes the sidelobe power while keeping the mainlobe power constant. Simulations show that the obtained beamformer outperforms the Capon beamformer. △ Less

Submitted 19 June, 2011; originally announced June 2011.

Comments: 8 pages, 2 figures

arXiv:1106.3629 [pdf, ps, other]

Total Variation Minimization Based Compressive Wideband Spectrum Sensing for Cognitive Radios

Authors: Yipeng Liu, Qun Wan

Abstract: Wideband spectrum sensing is a critical component of a functioning cognitive radio system. Its major challenge is the too high sampling rate requirement. Compressive sensing (CS) promises to be able to deal with it. Nearly all the current CS based compressive wideband spectrum sensing methods exploit only the frequency sparsity to perform. Motivated by the achievement of a fast and robust detectio… ▽ More Wideband spectrum sensing is a critical component of a functioning cognitive radio system. Its major challenge is the too high sampling rate requirement. Compressive sensing (CS) promises to be able to deal with it. Nearly all the current CS based compressive wideband spectrum sensing methods exploit only the frequency sparsity to perform. Motivated by the achievement of a fast and robust detection of the wideband spectrum change, total variation mnimization is incorporated to exploit the temporal and frequency structure information to enhance the sparse level. As a sparser vector is obtained, the spectrum sensing period would be shorten and sensing accuracy would be enhanced. Both theoretical evaluation and numerical experiments can demonstrate the performance improvement. △ Less

Submitted 18 June, 2011; originally announced June 2011.

Comments: 20 pages, 5 figures

arXiv:1008.5254

Sparse Channel Estimation for Amplify-and-Forward Two-way Relay Network with Compressed Sensing

Authors: Guan Gui, Qun Wan, Wei Peng, Fumiyuki Adachi

Abstract: Amplify-and-forward two-way relay network (AFTWRN) was introduced to realize high-data rate transmission over the wireless frequency-selective channel. However, AFTWRC requires the knowledge of channel state information (CSI) not only for coherent data detection but also for the selfdata removal. This is partial accomplished by training sequence-based linear channel estimation. However, convention… ▽ More Amplify-and-forward two-way relay network (AFTWRN) was introduced to realize high-data rate transmission over the wireless frequency-selective channel. However, AFTWRC requires the knowledge of channel state information (CSI) not only for coherent data detection but also for the selfdata removal. This is partial accomplished by training sequence-based linear channel estimation. However, conventional linear estimation techniques neglect anticipated sparsity of multipath channel and thus lead to low spectral efficiency which is scarce in the field of wireless communication. Unlike the previous methods, we propose a sparse channel estimation method which can exploit the sparse structure and hence provide significant improvements in MSE performance when compared with traditional LS-based linear channel probing strategies in AF-TWRN. Simulation results confirm the proposed methods. △ Less

Submitted 23 June, 2013; v1 submitted 31 August, 2010; originally announced August 2010.

Comments: the paper has been withdrawn

Showing 1–50 of 66 results for author: Wan, Q