Search | arXiv e-print repository

Learning Granular Media Avalanche Behavior for Indirectly Manipulating Obstacles on a Granular Slope

Authors: Haodi Hu, Feifei Qian, Daniel Seita

Abstract: Legged robot locomotion on sand slopes is challenging due to the complex dynamics of granular media and how the lack of solid surfaces can hinder locomotion. A promising strategy, inspired by ghost crabs and other organisms in nature, is to strategically interact with rocks, debris, and other obstacles to facilitate movement. To provide legged robots with this ability, we present a novel approach… ▽ More Legged robot locomotion on sand slopes is challenging due to the complex dynamics of granular media and how the lack of solid surfaces can hinder locomotion. A promising strategy, inspired by ghost crabs and other organisms in nature, is to strategically interact with rocks, debris, and other obstacles to facilitate movement. To provide legged robots with this ability, we present a novel approach that leverages avalanche dynamics to indirectly manipulate objects on a granular slope. We use a Vision Transformer (ViT) to process image representations of granular dynamics and robot excavation actions. The ViT predicts object movement, which we use to determine which leg excavation action to execute. We collect training data from 100 real physical trials and, at test time, deploy our trained model in novel settings. Experimental results suggest that our model can accurately predict object movements and achieve a success rate $\geq 80\%$ in a variety of manipulation tasks with up to four obstacles, and can also generalize to objects with different physics properties. To our knowledge, this is the first paper to leverage granular media avalanche dynamics to indirectly manipulate objects on granular slopes. Supplementary material is available at https://sites.google.com/view/grain-corl2024/home. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Submitted to CoRL 2024

arXiv:2406.12359 [pdf, other]

Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents

Authors: Menglong Zhang, Fuyuan Qian, Quanying Liu

Abstract: Fast adaptation to new tasks is extremely important for embodied agents in the real world. Meta-reinforcement learning (meta-RL) has emerged as an effective method to enable fast adaptation in unknown environments. Compared to on-policy meta-RL algorithms, off-policy algorithms rely heavily on efficient data sampling strategies to extract and represent the historical trajectories. However, little… ▽ More Fast adaptation to new tasks is extremely important for embodied agents in the real world. Meta-reinforcement learning (meta-RL) has emerged as an effective method to enable fast adaptation in unknown environments. Compared to on-policy meta-RL algorithms, off-policy algorithms rely heavily on efficient data sampling strategies to extract and represent the historical trajectories. However, little is known about how different data sampling methods impact the ability of meta-RL agents to represent unknown environments. Here, we investigate the impact of data sampling strategies on the exploration and adaptability of meta-RL agents. Specifically, we conducted experiments with two types of off-policy meta-RL algorithms based on Thompson sampling and Bayes-optimality theories in continuous control tasks within the MuJoCo environment and sparse reward navigation tasks. Our analysis revealed the long-memory and short-memory sequence sampling strategies affect the representation and adaptive capabilities of meta-RL agents. We found that the algorithm based on Bayes-optimality theory exhibited more robust and better adaptability than the algorithm based on Thompson sampling, highlighting the importance of appropriate data sampling strategies for the agent's representation of an unknown environment, especially in the case of sparse rewards. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2403.16130 [pdf, other]

AKBR: Learning Adaptive Kernel-based Representations for Graph Classification

Authors: Feifei Qian, Lixin Cui, Ming Li, Yue Wang, Hangyuan Du, Lixiang Xu, Lu Bai, Philip S. Yu, Edwin R. Hancock

Abstract: In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation… ▽ More In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation learning model to construct an adaptive kernel matrix for graphs. To this end, we commence by leveraging a novel feature-channel attention mechanism to capture the interdependencies between different substructure invariants of original graphs. The proposed AKBR model can thus effectively identify the structural importance of different substructures, and compute the R-convolution kernel between pairwise graphs associated with the more significant substructures specified by their structural attentions. Since each row of the resulting kernel matrix can be theoretically seen as the embedding vector of a sample graph, the proposed AKBR model is able to directly employ the resulting kernel matrix as the graph feature matrix and input it into the classifier for classification (i.e., the SoftMax layer), naturally providing an end-to-end learning architecture between the kernel computation as well as the classifier. Experimental results show that the proposed AKBR model outperforms existing state-of-the-art graph kernels and deep learning methods on standard graph benchmarks. △ Less

Submitted 13 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2402.15105 [pdf, other]

A First Look at GPT Apps: Landscape and Vulnerability

Authors: Zejun Zhang, Li Zhang, Xin Yuan, Anlan Zhang, Mengwei Xu, Feng Qian

Abstract: Following OpenAI's introduction of GPTs, a surge in GPT apps has led to the launch of dedicated LLM app stores. Nevertheless, given its debut, there is a lack of sufficient understanding of this new ecosystem. To fill this gap, this paper presents a first comprehensive longitudinal (5-month) study of the evolution, landscape, and vulnerability of the emerging LLM app ecosystem, focusing on two GPT… ▽ More Following OpenAI's introduction of GPTs, a surge in GPT apps has led to the launch of dedicated LLM app stores. Nevertheless, given its debut, there is a lack of sufficient understanding of this new ecosystem. To fill this gap, this paper presents a first comprehensive longitudinal (5-month) study of the evolution, landscape, and vulnerability of the emerging LLM app ecosystem, focusing on two GPT app stores: \textit{GPTStore.AI} and the official \textit{OpenAI GPT Store}. Specifically, we develop two automated tools and a TriLevel configuration extraction strategy to efficiently gather metadata (\ie names, creators, descriptions, \etc) and user feedback for all GPT apps across these two stores, as well as configurations (\ie system prompts, knowledge files, and APIs) for the top 10,000 popular apps. Our extensive analysis reveals: (1) the user enthusiasm for GPT apps consistently rises, whereas creator interest plateaus within three months of GPTs' launch; (2) nearly 90\% system prompts can be easily accessed due to widespread failure to secure GPT app configurations, leading to considerable plagiarism and duplication among apps. Our findings highlight the necessity of enhancing the LLM app ecosystem by the app stores, creators, and users. △ Less

Submitted 23 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.12280 [pdf, other]

Adaptive Skeleton Graph Decoding

Authors: Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

Abstract: Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches propose parallel decoding strategies, such as Skeleton-of-Thought (SoT), to improve performance by breaking prompts down into sub-problems that can b… ▽ More Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches propose parallel decoding strategies, such as Skeleton-of-Thought (SoT), to improve performance by breaking prompts down into sub-problems that can be decoded in parallel; however, they often suffer from reduced response quality. Our key insight is that we can request additional information, specifically dependencies and difficulty, when generating the sub-problems to improve both response quality and performance. In this paper, we propose Skeleton Graph Decoding (SGD), which uses dependencies exposed between sub-problems to support information forwarding between dependent sub-problems for improved quality while exposing parallelization opportunities for decoding independent sub-problems. Additionally, we leverage difficulty estimates for each sub-problem to select an appropriately-sized model, improving performance without significantly reducing quality. Compared to standard autoregressive generation and SoT, SGD achieves a 1.69x speedup while improving quality by up to 51%. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.03435 [pdf, other]

Deciphering the Enigma of Satellite Computing with COTS Devices: Measurement and Analysis

Authors: Ruolin Xing, Mengwei Xu, Ao Zhou, Qing Li, Yiran Zhang, Feng Qian, Shangguang Wang

Abstract: In the wake of the rapid deployment of large-scale low-Earth orbit satellite constellations, exploiting the full computing potential of Commercial Off-The-Shelf (COTS) devices in these environments has become a pressing issue. However, understanding this problem is far from straightforward due to the inherent differences between the terrestrial infrastructure and the satellite platform in space. I… ▽ More In the wake of the rapid deployment of large-scale low-Earth orbit satellite constellations, exploiting the full computing potential of Commercial Off-The-Shelf (COTS) devices in these environments has become a pressing issue. However, understanding this problem is far from straightforward due to the inherent differences between the terrestrial infrastructure and the satellite platform in space. In this paper, we take an important step towards closing this knowledge gap by presenting the first measurement study on the thermal control, power management, and performance of COTS computing devices on satellites. Our measurements reveal that the satellite platform and COTS computing devices significantly interplay in terms of the temperature and energy, forming the main constraints on satellite computing. Further, we analyze the critical factors that shape the characteristics of onboard COTS computing devices. We provide guidelines for future research on optimizing the use of such devices for computing purposes. Finally, we have released the datasets to facilitate further study in satellite computing. △ Less

Submitted 18 March, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.09716 [pdf, other]

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

Authors: Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang, Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang

Abstract: Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowled… ▽ More Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval. Furthermore, we discover that the similarities obtained by different retrieval models are diversified and incommensurable, which makes it challenging to jointly distill knowledge from multiple models. Therefore, we propose to whiten the output of teacher models before fusion, which enables effective multi-teacher distillation for retrieval models. Whiten-MTD is conceptually simple and practically effective. Extensive experiments on two landmark image retrieval datasets and one video retrieval dataset demonstrate the effectiveness of our proposed method, and its good balance of retrieval performance and efficiency. Our source code is released at https://github.com/Maryeon/whiten_mtd. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2310.14783 [pdf, other]

Interpretable Deep Reinforcement Learning for Optimizing Heterogeneous Energy Storage Systems

Authors: Luolin Xiong, Yang Tang, Chensheng Liu, Shuai Mao, Ke Meng, Zhaoyang Dong, Feng Qian

Abstract: Energy storage systems (ESS) are pivotal component in the energy market, serving as both energy suppliers and consumers. ESS operators can reap benefits from energy arbitrage by optimizing operations of storage equipment. To further enhance ESS flexibility within the energy market and improve renewable energy utilization, a heterogeneous photovoltaic-ESS (PV-ESS) is proposed, which leverages the u… ▽ More Energy storage systems (ESS) are pivotal component in the energy market, serving as both energy suppliers and consumers. ESS operators can reap benefits from energy arbitrage by optimizing operations of storage equipment. To further enhance ESS flexibility within the energy market and improve renewable energy utilization, a heterogeneous photovoltaic-ESS (PV-ESS) is proposed, which leverages the unique characteristics of battery energy storage (BES) and hydrogen energy storage (HES). For scheduling tasks of the heterogeneous PV-ESS, cost description plays a crucial role in guiding operator's strategies to maximize benefits. We develop a comprehensive cost function that takes into account degradation, capital, and operation/maintenance costs to reflect real-world scenarios. Moreover, while numerous methods excel in optimizing ESS energy arbitrage, they often rely on black-box models with opaque decision-making processes, limiting practical applicability. To overcome this limitation and enable transparent scheduling strategies, a prototype-based policy network with inherent interpretability is introduced. This network employs human-designed prototypes to guide decision-making by comparing similarities between prototypical situations and encountered situations, which allows for naturally explained scheduling strategies. Comparative results across four distinct cases underscore the effectiveness and practicality of our proposed pre-hoc interpretable optimization method when contrasted with black-box models. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.11000 [pdf, other]

Mid-Band 5G: A Measurement Study in Europe and US

Authors: Rostand A. K. Fezeu, Jason Carpenter, Claudio Fiandrino, Eman Ramadan, Wei Ye, Joerg Widmer, Feng Qian, Zhi-Li Zhang

Abstract: Fifth Generation (5G) mobile networks mark a significant shift from previous generations of networks. By introducing a flexible design, 5G networks support highly diverse application requirements. Currently, the landscape of previous measurement studies does not shed light on 5G network configuration and the inherent implications to application performance. In this paper, we precisely fill this ga… ▽ More Fifth Generation (5G) mobile networks mark a significant shift from previous generations of networks. By introducing a flexible design, 5G networks support highly diverse application requirements. Currently, the landscape of previous measurement studies does not shed light on 5G network configuration and the inherent implications to application performance. In this paper, we precisely fill this gap and report our in-depth multi-country measurement study on 5G deployed at mid-bands. This is the common playground for U.S. and European carriers. Our findings reveal key aspects on how carriers configure their network, including spectrum utilization, frame configuration, resource allocation and their implication on the application performance. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 18 pages, 36 figures

arXiv:2310.09423 [pdf, other]

QUIC is not Quick Enough over Fast Internet

Authors: Xumiao Zhang, Shuowei Jin, Yi He, Ahmad Hassan, Z. Morley Mao, Feng Qian, Zhi-Li Zhang

Abstract: QUIC is expected to be a game-changer in improving web application performance. In this paper, we conduct a systematic examination of QUIC's performance over high-speed networks. We find that over fast Internet, the UDP+QUIC+HTTP/3 stack suffers a data rate reduction of up to 45.2% compared to the TCP+TLS+HTTP/2 counterpart. Moreover, the performance gap between QUIC and HTTP/2 grows as the underl… ▽ More QUIC is expected to be a game-changer in improving web application performance. In this paper, we conduct a systematic examination of QUIC's performance over high-speed networks. We find that over fast Internet, the UDP+QUIC+HTTP/3 stack suffers a data rate reduction of up to 45.2% compared to the TCP+TLS+HTTP/2 counterpart. Moreover, the performance gap between QUIC and HTTP/2 grows as the underlying bandwidth increases. We observe this issue on lightweight data transfer clients and major web browsers (Chrome, Edge, Firefox, Opera), on different hosts (desktop, mobile), and over diverse networks (wired broadband, cellular). It affects not only file transfers, but also various applications such as video streaming (up to 9.8% video bitrate reduction) and web browsing. Through rigorous packet trace analysis and kernel- and user-space profiling, we identify the root cause to be high receiver-side processing overhead, in particular, excessive data packets and QUIC's user-space ACKs. We make concrete recommendations for mitigating the observed performance issues. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 10 pages, 16 figures

arXiv:2309.06877 [pdf, other]

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

Authors: Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang

Abstract: The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to… ▽ More The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: This paper is accepted by ACM MM 2023

arXiv:2309.02929 [pdf]

Reinforcement Learning Based Gasoline Blending Optimization: Achieving More Efficient Nonlinear Online Blending of Fuels

Authors: Muyi Huang, Renchu He, Xin Dai, Xin Peng, Wenli Du, Feng Qian

Abstract: The online optimization of gasoline blending benefits refinery economies. However, the nonlinear blending mechanism, the oil property fluctuations, and the blending model mismatch bring difficulties to the optimization. To solve the above issues, this paper proposes a novel online optimization method based on deep reinforcement learning algorithm (DRL). The Markov decision process (MDP) expression… ▽ More The online optimization of gasoline blending benefits refinery economies. However, the nonlinear blending mechanism, the oil property fluctuations, and the blending model mismatch bring difficulties to the optimization. To solve the above issues, this paper proposes a novel online optimization method based on deep reinforcement learning algorithm (DRL). The Markov decision process (MDP) expression are given considering a practical gasoline blending system. Then, the environment simulator of gasoline blending process is established based on the MDP expression and the one-year measurement data of a real-world refinery. The soft actor-critic (SAC) DRL algorithm is applied to improve the DRL agent policy by using the data obtained from the interaction between DRL agent and environment simulator. Compared with a traditional method, the proposed method has better economic performance. Meanwhile, it is more robust under property fluctuations and component oil switching. Furthermore, the proposed method maintains performance by automatically adapting to system drift. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 30 pages,13 figures

arXiv:2308.02702 [pdf]

doi 10.1038/d41586-023-00710-0

Swift progress for robots over complex terrain

Authors: Chen Li, Feifei Qian

Abstract: A four-legged robot has learned to run on sand at faster pace than humans jog on solid ground. With low energy use and few failures, this rapid robot shows the value of combining data-driven learning with accurate yet simple models. A four-legged robot has learned to run on sand at faster pace than humans jog on solid ground. With low energy use and few failures, this rapid robot shows the value of combining data-driven learning with accurate yet simple models. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 4 pages, 1 figure

Journal ref: 2023 Nature News & Reviews, 616 (7956), 252-253

arXiv:2307.06501 [pdf, other]

Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

Authors: Wenzhou Lv, Tianyu Wu, Luolin Xiong, Liang Wu, Jian Zhou, Yang Tang, Feng Qian

Abstract: Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety… ▽ More Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control. △ Less

Submitted 13 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: 12 pages

arXiv:2302.10756 [pdf, other]

doi 10.1109/TGRS.2023.3277973

Unsupervised Seismic Footprint Removal With Physical Prior Augmented Deep Autoencoder

Authors: Feng Qian, Yuehua Yue, Yu He, Hongtao Yu, Yingjie Zhou, Jinliang Tang, Guangmin Hu

Abstract: Seismic acquisition footprints appear as stably faint and dim structures and emerge fully spatially coherent, causing inevitable damage to useful signals during the suppression process. Various footprint removal methods, including filtering and sparse representation (SR), have been reported to attain promising results for surmounting this challenge. However, these methods, e.g., SR, rely solely on… ▽ More Seismic acquisition footprints appear as stably faint and dim structures and emerge fully spatially coherent, causing inevitable damage to useful signals during the suppression process. Various footprint removal methods, including filtering and sparse representation (SR), have been reported to attain promising results for surmounting this challenge. However, these methods, e.g., SR, rely solely on the handcrafted image priors of useful signals, which is sometimes an unreasonable demand if complex geological structures are contained in the given seismic data. As an alternative, this article proposes a footprint removal network (dubbed FR-Net) for the unsupervised suppression of acquired footprints without any assumptions regarding valuable signals. The key to the FR-Net is to design a unidirectional total variation (UTV) model for footprint acquisition according to the intrinsically directional property of noise. By strongly regularizing a deep convolutional autoencoder (DCAE) using the UTV model, our FR-Net transforms the DCAE from an entirely data-driven model to a \textcolor{black}{prior-augmented} approach, inheriting the superiority of the DCAE and our footprint model. Subsequently, the complete separation of the footprint noise and useful signals is projected in an unsupervised manner, specifically by optimizing the FR-Net via the backpropagation (BP) algorithm. We provide qualitative and quantitative evaluations conducted on three synthetic and field datasets, demonstrating that our FR-Net surpasses the previous state-of-the-art (SOTA) methods. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Report number: 2302.10756

Journal ref: IEEE Transactions on Geoscience and Remote Sensing,2023

arXiv:2302.09228 [pdf, other]

Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

Authors: Feng Qian, Sifeng He, Honghao Huang, Huanyu Ma, Xiaobo Zhang, Lei Yang

Abstract: With the growing popularity of smartphone photography in recent years, web photos play an increasingly important role in all walks of life. Source camera identification of web photos aims to establish a reliable linkage from the captured images to their source cameras, and has a broad range of applications, such as image copyright protection, user authentication, investigated evidence verification… ▽ More With the growing popularity of smartphone photography in recent years, web photos play an increasingly important role in all walks of life. Source camera identification of web photos aims to establish a reliable linkage from the captured images to their source cameras, and has a broad range of applications, such as image copyright protection, user authentication, investigated evidence verification, etc. This paper presents an innovative and practical source identification framework that employs neural-network enhanced sensor pattern noise to trace back web photos efficiently while ensuring security. Our proposed framework consists of three main stages: initial device fingerprint registration, fingerprint extraction and cryptographic connection establishment while taking photos, and connection verification between photos and source devices. By incorporating metric learning and frequency consistency into the deep network design, our proposed fingerprint extraction algorithm achieves state-of-the-art performance on modern smartphone photos for reliable source identification. Meanwhile, we also propose several optimization sub-modules to prevent fingerprint leakage and improve accuracy and efficiency. Finally for practical system design, two cryptographic schemes are introduced to reliably identify the correlation between registered fingerprint and verified photo fingerprint, i.e. fuzzy extractor and zero-knowledge proof (ZKP). The codes for fingerprint extraction network and benchmark dataset with modern smartphone cameras photos are all publicly available at https://github.com/PhotoNecf/PhotoNecf. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted by WWW2023 (https://www2023.thewebconf.org/). Codes are all publicly available at https://github.com/PhotoNecf/PhotoNecf

arXiv:2211.13090 [pdf, other]

TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision

Authors: Sifeng He, Yue He, Minlong Lu, Chen Jiang, Xudong Yang, Feng Qian, Xiaobo Zhang, Lei Yang, Jiandong Zhang

Abstract: Video copy localization aims to precisely localize all the copied segments within a pair of untrimmed videos in video retrieval applications. Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal const… ▽ More Video copy localization aims to precisely localize all the copied segments within a pair of untrimmed videos in video retrieval applications. Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal constraints. In this paper, we propose TransVCL: an attention-enhanced video copy localization network, which is optimized directly from initial frame-level features and trained end-to-end with three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for similarity matrix generation, and a temporal alignment module for copied segments localization. In contrast to previous methods demanding the handcrafted similarity matrix, TransVCL incorporates long-range temporal information between feature sequence pair using self- and cross- attention layers. With the joint design and optimization of three components, the similarity matrix can be learned to present more discriminative copied patterns, leading to significant improvements over previous methods on segment-level labeled datasets (VCSL and VCDB). Besides the state-of-the-art performance in fully supervised setting, the attention architecture facilitates TransVCL to further exploit unlabeled or simply video-level labeled data. Additional experiments of supplementing video-level labeled datasets including SVD and FIVR reveal the high flexibility of TransVCL from full supervision to semi-supervision (with or without video-level annotation). Code is publicly available at https://github.com/transvcl/TransVCL. △ Less

Submitted 23 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023)

arXiv:2211.10885 [pdf, other]

Contrastive Regularization for Multimodal Emotion Recognition Using Audio and Text

Authors: Fan Qian, Jiqing Han

Abstract: Speech emotion recognition is a challenge and an important step towards more natural human-computer interaction (HCI). The popular approach is multimodal emotion recognition based on model-level fusion, which means that the multimodal signals can be encoded to acquire embeddings, and then the embeddings are concatenated together for the final classification. However, due to the influence of noise… ▽ More Speech emotion recognition is a challenge and an important step towards more natural human-computer interaction (HCI). The popular approach is multimodal emotion recognition based on model-level fusion, which means that the multimodal signals can be encoded to acquire embeddings, and then the embeddings are concatenated together for the final classification. However, due to the influence of noise or other factors, each modality does not always tend to the same emotional category, which affects the generalization of a model. In this paper, we propose a novel regularization method via contrastive learning for multimodal emotion recognition using audio and text. By introducing a discriminator to distinguish the difference between the same and different emotional pairs, we explicitly restrict the latent code of each modality to contain the same emotional information, so as to reduce the noise interference and get more discriminative representation. Experiments are performed on the standard IEMOCAP dataset for 4-class emotion recognition. The results show a significant improvement of 1.44\% and 1.53\% in terms of weighted accuracy (WA) and unweighted accuracy (UA) compared to the baseline system. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Completed in October 2020 and submitted to ICASSP2021

arXiv:2208.02792 [pdf]

A Cooperative Perception Environment for Traffic Operations and Control

Authors: Hanlin Chen, Brian Liu, Xumiao Zhang, Feng Qian, Z. Morley Mao, Yiheng Feng

Abstract: Existing data collection methods for traffic operations and control usually rely on infrastructure-based loop detectors or probe vehicle trajectories. Connected and automated vehicles (CAVs) not only can report data about themselves but also can provide the status of all detected surrounding vehicles. Integration of perception data from multiple CAVs as well as infrastructure sensors (e.g., LiDAR)… ▽ More Existing data collection methods for traffic operations and control usually rely on infrastructure-based loop detectors or probe vehicle trajectories. Connected and automated vehicles (CAVs) not only can report data about themselves but also can provide the status of all detected surrounding vehicles. Integration of perception data from multiple CAVs as well as infrastructure sensors (e.g., LiDAR) can provide richer information even under a very low penetration rate. This paper aims to develop a cooperative data collection system, which integrates Lidar point cloud data from both infrastructure and CAVs to create a cooperative perception environment for various transportation applications. The state-of-the-art 3D detection models are applied to detect vehicles in the merged point cloud. We test the proposed cooperative perception environment with the max pressure adaptive signal control model in a co-simulation platform with CARLA and SUMO. Results show that very low penetration rates of CAV plus an infrastructure sensor are sufficient to achieve comparable performance with 30% or higher penetration rates of connected vehicles (CV). We also show the equivalent CV penetration rate (E-CVPR) under different CAV penetration rates to demonstrate the data collection efficiency of the cooperative perception environment. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2204.13704 [pdf, other]

Hyperbolic Hierarchical Knowledge Graph Embeddings for Link Prediction in Low Dimensions

Authors: Wenjie Zheng, Wenxue Wang, Shu Zhao, Fulan Qian

Abstract: Knowledge graph embeddings (KGE) have been validated as powerful methods for inferring missing links in knowledge graphs (KGs) that they typically map entities into Euclidean space and treat relations as transformations of entities. Recently, some Euclidean KGE methods have been enhanced to model semantic hierarchies commonly found in KGs, improving the performance of link prediction. To embed hie… ▽ More Knowledge graph embeddings (KGE) have been validated as powerful methods for inferring missing links in knowledge graphs (KGs) that they typically map entities into Euclidean space and treat relations as transformations of entities. Recently, some Euclidean KGE methods have been enhanced to model semantic hierarchies commonly found in KGs, improving the performance of link prediction. To embed hierarchical data, hyperbolic space has emerged as a promising alternative to traditional Euclidean space, offering high fidelity and lower memory consumption. Unlike Euclidean, hyperbolic space provides countless curvatures to choose from. However, it is difficult for existing hyperbolic KGE methods to obtain the optimal curvature settings manually, thereby limiting their ability to effectively model semantic hierarchies. To address this limitation, we propose a novel KGE model called $\textbf{Hyp}$erbolic $\textbf{H}$ierarchical $\textbf{KGE}$ (HypHKGE). This model introduces attention-based learnable curvatures for hyperbolic space, which helps preserve rich semantic hierarchies. Furthermore, to utilize the preserved hierarchies for inferring missing links, we define hyperbolic hierarchical transformations based on the theory of hyperbolic geometry, including both inter-level and intra-level modeling. Experiments demonstrate the effectiveness of the proposed HypHKGE model on the three benchmark datasets (WN18RR, FB15K-237, and YAGO3-10). The source code will be publicly released at https://github.com/wjzheng96/HypHKGE. △ Less

Submitted 23 February, 2024; v1 submitted 27 April, 2022; originally announced April 2022.

arXiv:2203.02654 [pdf, other]

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

Authors: Sifeng He, Xudong Yang, Chen Jiang, Gang Liang, Wei Zhang, Tan Pan, Qing Wang, Furong Xu, Chunguang Li, Jingxiong Liu, Hui Xu, Kaiming Huang, Yuan Cheng, Feng Qian, Xiaobo Zhang, Lei Yang

Abstract: In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segme… ▽ More In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps. Alongside the dataset, we also propose a novel evaluation protocol that better measures the prediction accuracy of copy overlapping segments between a video pair and shows improved adaptability in different scenarios. By benchmarking several baseline and state-of-the-art segment-level video copy detection methods with the proposed dataset and evaluation metric, we provide a comprehensive analysis that uncovers the strengths and weaknesses of current approaches, hoping to open up promising directions for future works. The VCSL dataset, metric and benchmark codes are all publicly available at https://github.com/alipay/VCSL. △ Less

Submitted 16 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022. Codes are all publicly available at https://github.com/alipay/VCSL

arXiv:2109.03395 [pdf, other]

From Cloud to Edge: A First Look at Public Edge Platforms

Authors: Mengwei Xu, Zhe Fu, Xiao Ma, Li Zhang, Yanan Li, Feng Qian, Shangguang Wang, Ke Li, Jingyu Yang, Xuanzhe Liu

Abstract: Public edge platforms have drawn increasing attention from both academia and industry. In this study, we perform a first-of-its-kind measurement study on a leading public edge platform that has been densely deployed in China. Based on this measurement, we quantitatively answer two critical yet unexplored questions. First, from end users' perspective, what is the performance of commodity edge platf… ▽ More Public edge platforms have drawn increasing attention from both academia and industry. In this study, we perform a first-of-its-kind measurement study on a leading public edge platform that has been densely deployed in China. Based on this measurement, we quantitatively answer two critical yet unexplored questions. First, from end users' perspective, what is the performance of commodity edge platforms compared to cloud, in terms of the end-to-end network delay, throughput, and the application QoE. Second, from the edge service provider's perspective, how are the edge workloads different from cloud, in terms of their VM subscription, monetary cost, and resource usage. Our study quantitatively reveals the status quo of today's public edge platforms, and provides crucial insights towards developing and operating future edge services. △ Less

Submitted 8 November, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

arXiv:2003.12948 [pdf, other]

When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey

Authors: Chongzhen Zhang, Jianrui Wang, Gary G. Yen, Chaoqiang Zhao, Qiyu Sun, Yang Tang, Feng Qian, Jürgen Kurths

Abstract: With widespread applications of artificial intelligence (AI), the capabilities of the perception, understanding, decision-making and control for autonomous systems have improved significantly in the past years. When autonomous systems consider the performance of accuracy and transferability, several AI methods, like adversarial learning, reinforcement learning (RL) and meta-learning, show their po… ▽ More With widespread applications of artificial intelligence (AI), the capabilities of the perception, understanding, decision-making and control for autonomous systems have improved significantly in the past years. When autonomous systems consider the performance of accuracy and transferability, several AI methods, like adversarial learning, reinforcement learning (RL) and meta-learning, show their powerful performance. Here, we review the learning-based approaches in autonomous systems from the perspectives of accuracy and transferability. Accuracy means that a well-trained model shows good results during the testing phase, in which the testing set shares a same task or a data distribution with the training set. Transferability means that when a well-trained model is transferred to other testing domains, the accuracy is still good. Firstly, we introduce some basic concepts of transfer learning and then present some preliminaries of adversarial learning, RL and meta-learning. Secondly, we focus on reviewing the accuracy or transferability or both of them to show the advantages of adversarial learning, like generative adversarial networks (GANs), in typical computer vision tasks in autonomous systems, including image style transfer, image superresolution, image deblurring/dehazing/rain removal, semantic segmentation, depth estimation, pedestrian detection and person re-identification (re-ID). Then, we further review the performance of RL and meta-learning from the aspects of accuracy or transferability or both of them in autonomous systems, involving pedestrian tracking, robot navigation and robotic manipulation. Finally, we discuss several challenges and future topics for using adversarial learning, RL and meta-learning in autonomous systems. △ Less

Submitted 24 May, 2020; v1 submitted 29 March, 2020; originally announced March 2020.

arXiv:2003.06620 [pdf, other]

doi 10.1007/s11431-020-1582-8

Monocular Depth Estimation Based On Deep Learning: An Overview

Authors: Chaoqiang Zhao, Qiyu Sun, Chongzhen Zhang, Yang Tang, Feng Qian

Abstract: Depth information is important for autonomous systems to perceive environments and estimate their own state. Traditional depth estimation methods, like structure from motion and stereo vision matching, are built on feature correspondences of multiple viewpoints. Meanwhile, the predicted depth maps are sparse. Inferring depth information from a single image (monocular depth estimation) is an ill-po… ▽ More Depth information is important for autonomous systems to perceive environments and estimate their own state. Traditional depth estimation methods, like structure from motion and stereo vision matching, are built on feature correspondences of multiple viewpoints. Meanwhile, the predicted depth maps are sparse. Inferring depth information from a single image (monocular depth estimation) is an ill-posed problem. With the rapid development of deep neural networks, monocular depth estimation based on deep learning has been widely studied recently and achieved promising performance in accuracy. Meanwhile, dense depth maps are estimated from single images by deep neural networks in an end-to-end manner. In order to improve the accuracy of depth estimation, different kinds of network frameworks, loss functions and training strategies are proposed subsequently. Therefore, we survey the current monocular depth estimation methods based on deep learning in this review. Initially, we conclude several widely used datasets and evaluation indicators in deep learning-based depth estimation. Furthermore, we review some representative existing methods according to different training manners: supervised, unsupervised and semi-supervised. Finally, we discuss the challenges and provide some ideas for future researches in monocular depth estimation. △ Less

Submitted 3 July, 2020; v1 submitted 14 March, 2020; originally announced March 2020.

Comments: 14 pages, 4 figures

arXiv:2001.02319 [pdf, other]

doi 10.1109/TNNLS.2022.3167688

Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey

Authors: Yang Tang, Chaoqiang Zhao, Jianrui Wang, Chongzhen Zhang, Qiyu Sun, Weixing Zheng, Wenli Du, Feng Qian, Juergen Kurths

Abstract: Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-… ▽ More Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics. △ Less

Submitted 30 April, 2022; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: This paper has been accepted by IEEE TNNLS

arXiv:1909.07532 [pdf, other]

doi 10.1145/3366423.3380169

A First Look at Commercial 5G Performance on Smartphones

Authors: Arvind Narayanan, Eman Ramadan, Jason Carpenter, Qingxu Liu, Yu Liu, Feng Qian, Zhi-Li Zhang

Abstract: We conduct to our knowledge a first measurement study of commercial 5G performance on smartphones by closely examining 5G networks of three carriers (two mmWave carriers, one mid-band carrier) in three U.S. cities. We conduct extensive field tests on 5G performance in diverse urban environments. We systematically analyze the handoff mechanisms in 5G and their impact on network performance. We expl… ▽ More We conduct to our knowledge a first measurement study of commercial 5G performance on smartphones by closely examining 5G networks of three carriers (two mmWave carriers, one mid-band carrier) in three U.S. cities. We conduct extensive field tests on 5G performance in diverse urban environments. We systematically analyze the handoff mechanisms in 5G and their impact on network performance. We explore the feasibility of using location and possibly other environmental information to predict the network performance. We also study the app performance (web browsing and HTTP download) over 5G. Our study consumes more than 15 TB of cellular data. Conducted when 5G just made its debut, it provides a "baseline" for studying how 5G performance evolves, and identifies key research directions on improving 5G users' experience in a cross-layer manner. We have released the data collected from our study (referred to as 5Gophers) at https://fivegophers.umn.edu/www20. △ Less

Submitted 28 April, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: Published at The Web Conference 2020 (WWW 2020). Please include WWW in any citations

Journal ref: Proceedings of The Web Conference 2020 (WWW'20)

arXiv:1901.06437 [pdf, other]

Combating Fake News: A Survey on Identification and Mitigation Techniques

Authors: Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, Yan Liu

Abstract: The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news, and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users' engagements with the news on social media, there has been a rising intere… ▽ More The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news, and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users' engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions. △ Less

Submitted 18 January, 2019; originally announced January 2019.

Journal ref: ACM Transactions on Intelligent Systems and Technology, 2019

arXiv:1812.04823 [pdf, other]

An Active-Passive Measurement Study of TCP Performance over LTE on High-speed Rails

Authors: Jing Wang, Yufan Zheng, Yunzhe Ni, Chenren Xu, Feng Qian, Wangyang Li, Wantong Jiang, Yihua Cheng, Zhuo Cheng, Yuanjie Li, Xiufeng Xie, Yi Sun, Zhongfeng Wang

Abstract: High-speed rail (HSR) systems potentially provide a more efficient way of door-to-door transportation than airplane. However, they also pose unprecedented challenges in delivering seamless Internet service for on-board passengers. In this paper, we conduct a large-scale active-passive measurement study of TCP performance over LTE on HSR. Our measurement targets the HSR routes in China operating at… ▽ More High-speed rail (HSR) systems potentially provide a more efficient way of door-to-door transportation than airplane. However, they also pose unprecedented challenges in delivering seamless Internet service for on-board passengers. In this paper, we conduct a large-scale active-passive measurement study of TCP performance over LTE on HSR. Our measurement targets the HSR routes in China operating at above 300 km/h. We performed extensive data collection through both controlled setting and passive monitoring, obtaining 1732.9 GB data collected over 135719 km of trips. Leveraging such a unique dataset, we measure important performance metrics such as TCP goodput, latency, loss rate, as well as key characteristics of TCP flows, application breakdown, and users' behaviors. We further quantitatively study the impact of frequent cellular handover on HSR networking performance, and conduct in-depth examination of the performance of two widely deployed transport-layer protocols: TCP CUBIC and TCP BBR. Our findings reveal the performance of today's commercial HSR networks "in the wild", as well as identify several performance inefficiencies, which motivate us to design a simple yet effective congestion control algorithm based on BBR to further boost the throughput by up to 36.5%. They together highlight the need to develop dedicated protocol mechanisms that are friendly to extreme mobility. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Comments: This is a pre-print version to appear in the 25th Annual International Conference on Mobile Computing and Networking (MobiCom'19)

arXiv:1812.00816 [pdf, other]

A Robust Algorithm for Tile-based 360-degree Video Streaming with Uncertain FoV Estimation

Authors: Arnob Ghosh, Vaneet Aggarwal, Feng Qian

Abstract: We propose a robust scheme for streaming 360-degree immersive videos to maximize the quality of experience (QoE). Our streaming approach introduces a holistic analytical framework built upon the formal method of stochastic optimization. We propose a robust algorithm which provides a streaming rate such that the video quality degrades below that rate with very low probability even in presence of un… ▽ More We propose a robust scheme for streaming 360-degree immersive videos to maximize the quality of experience (QoE). Our streaming approach introduces a holistic analytical framework built upon the formal method of stochastic optimization. We propose a robust algorithm which provides a streaming rate such that the video quality degrades below that rate with very low probability even in presence of uncertain head movement, and bandwidth. It assumes the knowledge of the viewing probability of different portions (tiles) of a panoramic scene. Such probabilities can be easily derived from crowdsourced measurements performed by 360 video content providers. We then propose efficient methods to solve the problem at runtime while achieving a bounded optimality gap (in terms of the QoE). We implemented our proposed approaches using emulation. Using real users' head movement traces and real cellular bandwidth traces, we show that our algorithms significantly outperform the baseline algorithms by at least in $30\%$ in the QoE metric. Our algorithm gives a streaming rate which is $50\%$ higher compared to the baseline algorithms when the prediction error is high. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: arXiv admin note: text overlap with arXiv:1704.08215

arXiv:1805.01525 [pdf, other]

Understanding and Mitigating the Security Risks of Voice-Controlled Third-Party Skills on Amazon Alexa and Google Home

Authors: Nan Zhang, Xianghang Mi, Xuan Feng, XiaoFeng Wang, Yuan Tian, Feng Qian

Abstract: Virtual personal assistants (VPA) (e.g., Amazon Alexa and Google Assistant) today mostly rely on the voice channel to communicate with their users, which however is known to be vulnerable, lacking proper authentication. The rapid growth of VPA skill markets opens a new attack avenue, potentially allowing a remote adversary to publish attack skills to attack a large number of VPA users through popu… ▽ More Virtual personal assistants (VPA) (e.g., Amazon Alexa and Google Assistant) today mostly rely on the voice channel to communicate with their users, which however is known to be vulnerable, lacking proper authentication. The rapid growth of VPA skill markets opens a new attack avenue, potentially allowing a remote adversary to publish attack skills to attack a large number of VPA users through popular IoT devices such as Amazon Echo and Google Home. In this paper, we report a study that concludes such remote, large-scale attacks are indeed realistic. More specifically, we implemented two new attacks: voice squatting in which the adversary exploits the way a skill is invoked (e.g., "open capital one"), using a malicious skill with similarly pronounced name (e.g., "capital won") or paraphrased name (e.g., "capital one please") to hijack the voice command meant for a different skill, and voice masquerading in which a malicious skill impersonates the VPA service or a legitimate skill to steal the user's data or eavesdrop on her conversations. These attacks aim at the way VPAs work or the user's mis-conceptions about their functionalities, and are found to pose a realistic threat by our experiments (including user studies and real-world deployments) on Amazon Echo and Google Home. The significance of our findings have already been acknowledged by Amazon and Google, and further evidenced by the risky skills discovered on Alexa and Google markets by the new detection systems we built. We further developed techniques for automatic detection of these attacks, which already capture real-world skills likely to pose such threats. △ Less

Submitted 29 June, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

arXiv:1805.00041 [pdf, other]

doi 10.1109/TNET.2018.2844123

LBP: Robust Rate Adaptation Algorithm for SVC Video Streaming

Authors: Anis Elgabli, Vaneet Aggarwal, Shuai Hao, Feng Qian, Subhabrata Sen

Abstract: Video streaming today accounts for up to 55\% of mobile traffic. In this paper, we explore streaming videos encoded using Scalable Video Coding scheme (SVC) over highly variable bandwidth conditions such as cellular networks. SVC's unique encoding scheme allows the quality of a video chunk to change incrementally, making it more flexible and adaptive to challenging network conditions compared to o… ▽ More Video streaming today accounts for up to 55\% of mobile traffic. In this paper, we explore streaming videos encoded using Scalable Video Coding scheme (SVC) over highly variable bandwidth conditions such as cellular networks. SVC's unique encoding scheme allows the quality of a video chunk to change incrementally, making it more flexible and adaptive to challenging network conditions compared to other encoding schemes. Our contribution is threefold. First, we formulate the quality decisions of video chunks constrained by the available bandwidth, the playback buffer, and the chunk deadlines as an optimization problem. The objective is to optimize a novel QoE metric that models a combination of the three objectives of minimizing the stall/skip duration of the video, maximizing the playback quality of every chunk, and minimizing the number of quality switches. Second, we develop Layered Bin Packing (LBP) Adaptation Algorithm, a novel algorithm that solves the proposed optimization problem. Moreover, we show that LBP achieves the optimal solution of the proposed optimization problem with linear complexity in the number of video chunks. Third, we propose an online algorithm (online LBP) where several challenges are addressed including handling bandwidth prediction errors, and short prediction duration. Extensive simulations with real bandwidth traces of public datasets reveal the robustness of our scheme and demonstrate its significant performance improvement as compared to the state-of-the-art SVC streaming algorithms. The proposed algorithm is also implemented on a TCP/IP emulation test bed with real LTE bandwidth traces, and the emulation confirms the simulation results and validates that the algorithm can be implemented and deployed on today's mobile devices. △ Less

Submitted 13 June, 2018; v1 submitted 30 April, 2018; originally announced May 2018.

Comments: 22 pages, IEEE/ACM Transactions on Networking, 2018. Fixed repetition of references in the last version (minor change)

arXiv:1712.04919 [pdf, ps, other]

Multidimensional Data Tensor Sensing for RF Tomographic Imaging

Authors: Tao Deng, Xiao-Yang Liu, Feng Qian, Anwar Walid

Abstract: Radio-frequency (RF) tomographic imaging is a promising technique for inferring multi-dimensional physical space by processing RF signals traversed across a region of interest. However, conventional RF tomography schemes are generally based on vector compressed sensing, which ignores the geometric structures of the target spaces and leads to low recovery precision. The recently proposed transform-… ▽ More Radio-frequency (RF) tomographic imaging is a promising technique for inferring multi-dimensional physical space by processing RF signals traversed across a region of interest. However, conventional RF tomography schemes are generally based on vector compressed sensing, which ignores the geometric structures of the target spaces and leads to low recovery precision. The recently proposed transform-based tensor model is more appropriate for sensory data processing, as it helps exploit the geometric structures of the three-dimensional target and improve the recovery precision. In this paper, we propose a novel tensor sensing approach that achieves highly accurate estimation for real-world three-dimensional spaces. First, we use the transform-based tensor model to formulate a tensor sensing problem, and propose a fast alternating minimization algorithm called Alt-Min. Secondly, we drive an algorithm which is optimized to reduce memory and computation requirements. Finally, we present evaluation of our Alt-Min approach using IKEA 3D data and demonstrate significant improvement in recovery error and convergence speed compared to prior tensor-based compressed sensing. △ Less

Submitted 16 December, 2017; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: 6 pages, 4 figures

arXiv:1712.03073 [pdf, other]

DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning

Authors: Mengwei Xu, Feng Qian, Mengze Zhu, Feifan Huang, Saumay Pushp, Xuanzhe Liu

Abstract: Due to their on-body and ubiquitous nature, wearables can generate a wide range of unique sensor data creating countless opportunities for deep learning tasks. We propose DeepWear, a deep learning (DL) framework for wearable devices to improve the performance and reduce the energy footprint. DeepWear strategically offloads DL tasks from a wearable device to its paired handheld device through local… ▽ More Due to their on-body and ubiquitous nature, wearables can generate a wide range of unique sensor data creating countless opportunities for deep learning tasks. We propose DeepWear, a deep learning (DL) framework for wearable devices to improve the performance and reduce the energy footprint. DeepWear strategically offloads DL tasks from a wearable device to its paired handheld device through local network. Compared to the remote-cloud-based offloading, DeepWear requires no Internet connectivity, consumes less energy, and is robust to privacy breach. DeepWear provides various novel techniques such as context-aware offloading, strategic model partition, and pipelining support to efficiently utilize the processing capacity from nearby paired handhelds. Deployed as a user-space library, DeepWear offers developer-friendly APIs that are as simple as those in traditional DL libraries such as TensorFlow. We have implemented DeepWear on the Android OS and evaluated it on COTS smartphones and smartwatches with real DL models. DeepWear brings up to 5.08X and 23.0X execution speedup, as well as 53.5% and 85.5% energy saving compared to wearable-only and handheld-only strategies, respectively. △ Less

Submitted 12 January, 2021; v1 submitted 1 December, 2017; originally announced December 2017.

arXiv:1712.00136 [pdf, other]

doi 10.1103/PhysRevE.99.022606

The dynamics of scattering in undulatory active collisions

Authors: Jennifer M. Rieser, Perrin E. Schiebel, Arman Pazouki, Feifei Qian, Zachary Goddard, Andrew Zangwill, Dan Negrut, Daniel I. Goldman

Abstract: Natural and artificial self-propelled systems must manage environmental interactions during movement. Such interactions, which we refer to as active collisions, are fundamentally different from momentum-conserving interactions studied in classical physics, largely because the internal driving of the locomotor can lead to persistent contact with heterogeneities. Here, we experimentally and numerica… ▽ More Natural and artificial self-propelled systems must manage environmental interactions during movement. Such interactions, which we refer to as active collisions, are fundamentally different from momentum-conserving interactions studied in classical physics, largely because the internal driving of the locomotor can lead to persistent contact with heterogeneities. Here, we experimentally and numerically study the effects of active collisions on a laterally-undulating sensory-deprived robophysical model, whose dynamics are applicable to self-propelled systems across length scales and environments. The robot moves via spatial undulation of body segments, with a nearly-linear center-of-geometry trajectory. Interactions with a single rigid post scatter the robot, and these deflections are proportional to the head-post contact duration. The distribution of scattering angles is smooth and strongly-peaked directly behind the post. Interactions with a single row of evenly-spaced posts (with inter-post spacing $d$) produce distributions reminiscent of far-field diffraction patterns: as $d$ decreases, distinct secondary peaks emerge as large deflections become more likely. Surprisingly, we find that the presence of multiple posts does not change the nature of individual collisions; instead, multi-modal scattering patterns arise from multiple posts altering the likelihood of individual collisions to occur. As $d$ decreases, collisions near the leading edges of the posts become more probable, and we find that these interactions are associated with larger deflections. Our results, which highlight the surprising dynamics that can occur during active collisions of self-propelled systems, can inform control principles for locomotors in complex terrain and facilitate design of task-capable active matter. △ Less

Submitted 11 June, 2018; v1 submitted 30 November, 2017; originally announced December 2017.

Comments: 32 pages, 20 main figures, 8 supplemental figures

Journal ref: Phys. Rev. E 99, 022606 (2019)

arXiv:1711.02666 [pdf, other]

Tensor-Generative Adversarial Network with Two-dimensional Sparse Coding: Application to Real-time Indoor Localization

Authors: Chenxiao Zhu, Lingqing Xu, Xiao-Yang Liu, Feng Qian

Abstract: Localization technology is important for the development of indoor location-based services (LBS). Global Positioning System (GPS) becomes invalid in indoor environments due to the non-line-of-sight issue, so it is urgent to develop a real-time high-accuracy localization approach for smartphones. However, accurate localization is challenging due to issues such as real-time response requirements, li… ▽ More Localization technology is important for the development of indoor location-based services (LBS). Global Positioning System (GPS) becomes invalid in indoor environments due to the non-line-of-sight issue, so it is urgent to develop a real-time high-accuracy localization approach for smartphones. However, accurate localization is challenging due to issues such as real-time response requirements, limited fingerprint samples and mobile device storage. To address these problems, we propose a novel deep learning architecture: Tensor-Generative Adversarial Network (TGAN). We first introduce a transform-based 3D tensor to model fingerprint samples. Instead of those passive methods that construct a fingerprint database as a prior, our model applies artificial neural network with deep learning to train network classifiers and then gives out estimations. Then we propose a novel tensor-based super-resolution scheme using the generative adversarial network (GAN) that adopts sparse coding as the generator network and a residual learning network as the discriminator. Further, we analyze the performance of tensor-GAN and implement a trace-based localization experiment, which achieves better performance. Compared to existing methods for smartphones indoor positioning, that are energy-consuming and high demands on devices, TGAN can give out an improved solution in localization accuracy, response time and implementation complexity. △ Less

Submitted 7 November, 2017; originally announced November 2017.

Comments: 6 pages, 9 figures

arXiv:1704.08215 [pdf, other]

A Rate Adaptation Algorithm for Tile-based 360-degree Video Streaming

Authors: Arnob Ghosh, Vaneet Aggarwal, Feng Qian

Abstract: In the 360-degree immersive video, a user only views a part of the entire raw video frame based on her viewing direction. However, today's 360-degree video players always fetch the entire panoramic view regardless of users' head movement, leading to significant bandwidth waste that can be potentially avoided. In this paper, we propose a novel adaptive streaming scheme for 360-degree videos. The ba… ▽ More In the 360-degree immersive video, a user only views a part of the entire raw video frame based on her viewing direction. However, today's 360-degree video players always fetch the entire panoramic view regardless of users' head movement, leading to significant bandwidth waste that can be potentially avoided. In this paper, we propose a novel adaptive streaming scheme for 360-degree videos. The basic idea is to fetch the invisible portion of a video at the lowest quality based on users' head movement prediction and to adaptively decide the video playback quality for the visible portion based on bandwidth prediction. Doing both in a robust manner requires overcome a series of challenges, such as jointly considering the spatial and temporal domains, tolerating prediction errors, and achieving low complexity. To overcome these challenges, we first define quality of experience (QoE) metrics for adaptive 360-degree video streaming. We then formulate an optimization problem and solve it at a low complexity. The algorithm strategically leverages both future bandwidth and the distribution of users' head positions to determine the quality level of each tile (i.e., a sub-area of a raw frame). We further provide theoretical proof showing that our algorithm achieves optimality under practical assumptions. Numerical results show that our proposed algorithms significantly boost the user QoE by at least 20\% compared to baseline algorithms. △ Less

Submitted 26 April, 2017; originally announced April 2017.

arXiv:1704.04429 [pdf, other]

3D seismic data denoising using two-dimensional sparse coding scheme

Authors: Ming-Jun Su, Jingbo Chang, Feng Qian, Guangmin Hu, Xiao-Yang Liu

Abstract: Seismic data denoising is vital to geophysical applications and the transform-based function method is one of the most widely used techniques. However, it is challenging to design a suit- able sparse representation to express a transform-based func- tion group due to the complexity of seismic data. In this paper, we apply a seismic data denoising method based on learning- type overcomplete diction… ▽ More Seismic data denoising is vital to geophysical applications and the transform-based function method is one of the most widely used techniques. However, it is challenging to design a suit- able sparse representation to express a transform-based func- tion group due to the complexity of seismic data. In this paper, we apply a seismic data denoising method based on learning- type overcomplete dictionaries which uses two-dimensional sparse coding (2DSC). First, we model the input seismic data and dictionaries as third-order tensors and introduce tensor- linear combinations for data approximation. Second, we ap- ply learning-type overcomplete dictionary, i.e., optimal sparse data representation is achieved through learning and training. Third, we exploit the alternating minimization algorithm to solve the optimization problem of seismic denoising. Finally we evaluate its denoising performance on synthetic seismic data and land data survey. Experiment results show that the two-dimensional sparse coding scheme reduces computational costs and enhances the signal-to-noise ratio. △ Less

Submitted 8 April, 2017; originally announced April 2017.

arXiv:1704.02446 [pdf, other]

doi 10.1190/geo2017-0524.1

Seismic facies recognition based on prestack data using deep convolutional autoencoder

Authors: Feng Qian, Miao Yin, Ming-Jun Su, Yaojun Wang, Guangmin Hu

Abstract: Prestack seismic data carries much useful information that can help us find more complex atypical reservoirs. Therefore, we are increasingly inclined to use prestack seismic data for seis- mic facies recognition. However, due to the inclusion of ex- cessive redundancy, effective feature extraction from prestack seismic data becomes critical. In this paper, we consider seis- mic facies recognition… ▽ More Prestack seismic data carries much useful information that can help us find more complex atypical reservoirs. Therefore, we are increasingly inclined to use prestack seismic data for seis- mic facies recognition. However, due to the inclusion of ex- cessive redundancy, effective feature extraction from prestack seismic data becomes critical. In this paper, we consider seis- mic facies recognition based on prestack data as an image clus- tering problem in computer vision (CV) by thinking of each prestack seismic gather as a picture. We propose a convo- lutional autoencoder (CAE) network for deep feature learn- ing from prestack seismic data, which is more effective than principal component analysis (PCA) in redundancy removing and valid information extraction. Then, using conventional classification or clustering techniques (e.g. K-means or self- organizing maps) on the extracted features, we can achieve seismic facies recognition. We applied our method to the prestack data from physical model and LZB region. The re- sult shows that our approach is superior to the conventionals. △ Less

Submitted 8 April, 2017; originally announced April 2017.

Journal ref: GEOPHYSICS, 2018, 83(3): A39-A43

arXiv:1704.02445 [pdf, other]

Exact 3D seismic data reconstruction using Tubal-Alt-Min algorithm

Authors: Feng Qian, Quan Chen, Ming-Jun Su, Guang-Min Hu, Xiao-Yang Liu

Abstract: Data missing is an common issue in seismic data, and many methods have been proposed to solve it. In this paper, we present the low-tubal-rank tensor model and a novel tensor completion algorithm to recover 3D seismic data. This is a fast iterative algorithm, called Tubal-Alt-Min which completes our 3D seismic data by exploiting the low-tubal-rank property expressed as the product of two much smal… ▽ More Data missing is an common issue in seismic data, and many methods have been proposed to solve it. In this paper, we present the low-tubal-rank tensor model and a novel tensor completion algorithm to recover 3D seismic data. This is a fast iterative algorithm, called Tubal-Alt-Min which completes our 3D seismic data by exploiting the low-tubal-rank property expressed as the product of two much smaller tensors. TubalAlt-Min alternates between estimating those two tensor using least squares minimization. We evaluate its reconstruction performance both on synthetic seismic data and land data survey. The experimental results show that compared with the tensor nuclear norm minimization algorithm, Tubal-Alt-Min improves the reconstruction error by orders of magnitude. △ Less

Submitted 8 April, 2017; originally announced April 2017.

arXiv:1704.00405 [pdf, other]

Syntax Aware LSTM Model for Chinese Semantic Role Labeling

Authors: Feng Qian, Lei Sha, Baobao Chang, Lu-chen Liu, Ming Zhang

Abstract: As for semantic role labeling (SRL) task, when it comes to utilizing parsing information, both traditional methods and recent recurrent neural network (RNN) based methods use the feature engineering way. In this paper, we propose Syntax Aware Long Short Time Memory(SA-LSTM). The structure of SA-LSTM modifies according to dependency parsing information in order to model parsing information directly… ▽ More As for semantic role labeling (SRL) task, when it comes to utilizing parsing information, both traditional methods and recent recurrent neural network (RNN) based methods use the feature engineering way. In this paper, we propose Syntax Aware Long Short Time Memory(SA-LSTM). The structure of SA-LSTM modifies according to dependency parsing information in order to model parsing information directly in an architecture engineering way instead of feature engineering way. We experimentally demonstrate that SA-LSTM gains more improvement from the model architecture. Furthermore, SA-LSTM outperforms the state-of-the-art on CPB 1.0 significantly according to Student t-test ($p<0.05$). △ Less

Submitted 19 April, 2017; v1 submitted 2 April, 2017; originally announced April 2017.

arXiv:1703.09809 [pdf, other]

Understanding IoT Security Through the Data Crystal Ball: Where We Are Now and Where We Are Going to Be

Authors: Nan Zhang, Soteris Demetriou, Xianghang Mi, Wenrui Diao, Kan Yuan, Peiyuan Zong, Feng Qian, XiaoFeng Wang, Kai Chen, Yuan Tian, Carl A. Gunter, Kehuan Zhang, Patrick Tague, Yue-Hsun Lin

Abstract: Inspired by the boom of the consumer IoT market, many device manufacturers, start-up companies and technology giants have jumped into the space. Unfortunately, the exciting utility and rapid marketization of IoT, come at the expense of privacy and security. Industry reports and academic work have revealed many attacks on IoT systems, resulting in privacy leakage, property loss and large-scale avai… ▽ More Inspired by the boom of the consumer IoT market, many device manufacturers, start-up companies and technology giants have jumped into the space. Unfortunately, the exciting utility and rapid marketization of IoT, come at the expense of privacy and security. Industry reports and academic work have revealed many attacks on IoT systems, resulting in privacy leakage, property loss and large-scale availability problems. To mitigate such threats, a few solutions have been proposed. However, it is still less clear what are the impacts they can have on the IoT ecosystem. In this work, we aim to perform a comprehensive study on reported attacks and defenses in the realm of IoT aiming to find out what we know, where the current studies fall short and how to move forward. To this end, we first build a toolkit that searches through massive amount of online data using semantic analysis to identify over 3000 IoT-related articles. Further, by clustering such collected data using machine learning technologies, we are able to compare academic views with the findings from industry and other sources, in an attempt to understand the gaps between them, the trend of the IoT security risks and new problems that need further attention. We systemize this process, by proposing a taxonomy for the IoT ecosystem and organizing IoT security into five problem areas. We use this taxonomy as a beacon to assess each IoT work across a number of properties we define. Our assessment reveals that relevant security and privacy problems are far from solved. We discuss how each proposed solution can be applied to a problem area and highlight their strengths, assumptions and constraints. We stress the need for a security framework for IoT vendors and discuss the trend of shifting security liability to external or centralized entities. We also identify open research problems and provide suggestions towards a secure IoT ecosystem. △ Less

Submitted 28 March, 2017; originally announced March 2017.

arXiv:1702.00159 [pdf, ps, other]

Robust Order Scheduling in the Fashion Industry: A Multi-Objective Optimization Approach

Authors: Wei Du, Yang Tang, Sunney Yung Sun Leung, Le Tong, Athanasios V. Vasilakos, Feng Qian

Abstract: In the fashion industry, order scheduling focuses on the assignment of production orders to appropriate production lines. In reality, before a new order can be put into production, a series of activities known as pre-production events need to be completed. In addition, in real production process, owing to various uncertainties, the daily production quantity of each order is not always as expected.… ▽ More In the fashion industry, order scheduling focuses on the assignment of production orders to appropriate production lines. In reality, before a new order can be put into production, a series of activities known as pre-production events need to be completed. In addition, in real production process, owing to various uncertainties, the daily production quantity of each order is not always as expected. In this research, by considering the pre-production events and the uncertainties in the daily production quantity, robust order scheduling problems in the fashion industry are investigated with the aid of a multi-objective evolutionary algorithm (MOEA) called nondominated sorting adaptive differential evolution (NSJADE). The experimental results illustrate that it is of paramount importance to consider pre-production events in order scheduling problems in the fashion industry. We also unveil that the existence of the uncertainties in the daily production quantity heavily affects the order scheduling. △ Less

Submitted 1 February, 2017; originally announced February 2017.

arXiv:1602.04712 [pdf, other]

doi 10.1088/0034-4885/79/11/110001

A review on locomotion robophysics: the study of movement at the intersection of robotics, soft matter and dynamical systems

Authors: Jeffrey Aguilar, Tingnan Zhang, Feifei Qian, Mark Kingsbury, Benjamin McInroe, Nicole Mazouchova, Chen Li, Ryan Maladen, Chaohui Gong, Matt Travers, Ross L. Hatton, Howie Choset, Paul B. Umbanhowar, Daniel I. Goldman

Abstract: In this review we argue for the creation of a physics of moving systems -- a locomotion "robophysics" -- which we define as the pursuit of the discovery of principles of self generated motion. Robophysics can provide an important intellectual complement to the discipline of robotics, largely the domain of researchers from engineering and computer science. The essential idea is that we must complem… ▽ More In this review we argue for the creation of a physics of moving systems -- a locomotion "robophysics" -- which we define as the pursuit of the discovery of principles of self generated motion. Robophysics can provide an important intellectual complement to the discipline of robotics, largely the domain of researchers from engineering and computer science. The essential idea is that we must complement study of complex robots in complex situations with systematic study of simplified robophysical devices in controlled laboratory settings and simplified theoretical models. We must thus use the methods of physics to examine successful and failed locomotion in simplified (abstracted) devices using parameter space exploration, systematic control, and techniques from dynamical systems. Using examples from our and other's research, we will discuss how such robophysical studies have begun to aid engineers in the creation of devices that begin to achieve life-like locomotor abilities on and within complex environments, have inspired interesting physics questions in low dimensional dynamical systems, geometric mechanics and soft matter physics, and have been useful to develop models for biological locomotion in complex terrain. The rapidly decreasing cost of constructing sophisticated robot models with easy access to significant computational power bodes well for scientists and engineers to engage in a discipline which can readily integrate experiment, theory and computation. △ Less

Submitted 12 February, 2016; originally announced February 2016.

Comments: 61 pages, 18 figures, IOPScience journal: Reports on Progress in Physics

Showing 1–43 of 43 results for author: Qian, F