Search | arXiv e-print repository

Design and Implementation of ARA Wireless Living Lab for Rural Broadband and Applications

Authors: Taimoor Ul Islam, Joshua Ofori Boateng, Md Nadim, Guoying Zu, Mukaram Shahid, Xun Li, Tianyi Zhang, Salil Reddy, Wei Xu, Ataberk Atalar, Vincent Lee, Yung-Fu Chen, Evan Gosling, Elisabeth Permatasari, Christ Somiah, Zhibo Meng, Sarath Babu, Mohammed Soliman, Ali Hussain, Daji Qiao, Mai Zheng, Ozdal Boyraz, Yong Guan, Anish Arora, Mohamed Selim , et al. (6 additional authors not shown)

Abstract: To address the rural broadband challenge and to leverage the unique opportunities that rural regions provide for piloting advanced wireless applications, we design and implement the ARA wireless living lab for research and innovation in rural wireless systems and their applications in precision agriculture, community services, and so on. ARA focuses on the unique community, application, and econom… ▽ More To address the rural broadband challenge and to leverage the unique opportunities that rural regions provide for piloting advanced wireless applications, we design and implement the ARA wireless living lab for research and innovation in rural wireless systems and their applications in precision agriculture, community services, and so on. ARA focuses on the unique community, application, and economic context of rural regions, and it features the first-of-its-kind, real-world deployment of long-distance, high-capacity wireless x-haul and access platforms across a rural area of diameter over 30 km. With both software-defined radios and programmable COTS systems and through effective orchestration of these wireless resources with fiber as well as compute resources embedded end-to-end across user equipment, base stations, edge, and cloud, ARA offers programmability, performance, robustness, and heterogeneity at the same time, thus enabling rural-focused co-evolution of wireless and applications while helping advance the frontiers of wireless systems in domains such as O-RAN, NextG, and agriculture applications. Here we present the design principles and implementation strategies of ARA, characterize its performance and heterogeneity, and highlight example wireless and application experiments uniquely enabled by ARA. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 17 pages, 18 figures

arXiv:2407.12282 [pdf, other]

Chip Placement with Diffusion

Authors: Vint Lee, Chun Deng, Leena Elzeiny, Pieter Abbeel, John Wawrzynek

Abstract: Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2-dimensional chip. The physical layout obtained during placement determines key performance metrics of the chip, such as power consumption, area, and performance. Existing learning-based methods typically fall short because of their reliance on rei… ▽ More Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2-dimensional chip. The physical layout obtained during placement determines key performance metrics of the chip, such as power consumption, area, and performance. Existing learning-based methods typically fall short because of their reliance on reinforcement learning, which is slow and limits the flexibility of the agent by casting placement as a sequential process. Instead, we use a powerful diffusion model to place all components simultaneously. To enable such models to train at scale, we propose a novel architecture for the denoising model, as well as an algorithm to generate large synthetic datasets for pre-training. We empirically show that our model can tackle the placement task, and achieve competitive performance on placement benchmarks compared to state-of-the-art methods. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2403.13822 [pdf, other]

An Effective Learning Management System for Revealing Student Performance Attributes

Authors: Xinyu Zhang, Vincent CS Lee, Duo Xu, Jun Chen, Mohammad S. Obaidat

Abstract: A learning management system streamlines the management of the teaching process in a centralized place, recording, tracking, and reporting the delivery of educational courses and student performance. Educational knowledge discovery from such an e-learning system plays a crucial role in rule regulation, policy establishment, and system development. However, existing LMSs do not have embedded mining… ▽ More A learning management system streamlines the management of the teaching process in a centralized place, recording, tracking, and reporting the delivery of educational courses and student performance. Educational knowledge discovery from such an e-learning system plays a crucial role in rule regulation, policy establishment, and system development. However, existing LMSs do not have embedded mining modules to directly extract knowledge. As educational modes become more complex, educational data mining efficiency from those heterogeneous student learning behaviours is gradually degraded. Therefore, an LMS incorporated with an advanced educational mining module is proposed in this study, as a means to mine efficiently from student performance records to provide valuable insights for educators in helping plan effective learning pedagogies, improve curriculum design, and guarantee quality of teaching. Through two illustrative case studies, experimental results demonstrate increased mining efficiency of the proposed mining module without information loss compared to classic educational mining algorithms. The mined knowledge reveals a set of attributes that significantly impact student academic performance, and further classification evaluation validates the identified attributes. The design and application of such an effective LMS can enable educators to learn from past student performance experiences, empowering them to guide and intervene with students in time, and eventually improve their academic success. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.03722 [pdf, other]

From Data to Insights: A Comprehensive Survey on Advanced Applications in Thyroid Cancer Research

Authors: Xinyu Zhang, Vincent CS Lee, Feng Liu

Abstract: Thyroid cancer, the most prevalent endocrine cancer, has gained significant global attention due to its impact on public health. Extensive research efforts have been dedicated to leveraging artificial intelligence (AI) methods for the early detection of this disease, aiming to reduce its morbidity rates. However, a comprehensive understanding of the structured organization of research applications… ▽ More Thyroid cancer, the most prevalent endocrine cancer, has gained significant global attention due to its impact on public health. Extensive research efforts have been dedicated to leveraging artificial intelligence (AI) methods for the early detection of this disease, aiming to reduce its morbidity rates. However, a comprehensive understanding of the structured organization of research applications in this particular field remains elusive. To address this knowledge gap, we conducted a systematic review and developed a comprehensive taxonomy of machine learning-based applications in thyroid cancer pathogenesis, diagnosis, and prognosis. Our primary objective was to facilitate the research community's ability to stay abreast of technological advancements and potentially lead the emerging trends in this field. This survey presents a coherent literature review framework for interpreting the advanced techniques used in thyroid cancer research. A total of 758 related studies were identified and scrutinized. To the best of our knowledge, this is the first review that provides an in-depth analysis of the various aspects of AI applications employed in the context of thyroid cancer. Furthermore, we highlight key challenges encountered in this domain and propose future research opportunities for those interested in studying the latest trends or exploring less-investigated aspects of thyroid cancer research. By presenting this comprehensive review and taxonomy, we contribute to the existing knowledge in the field, while providing valuable insights for researchers, clinicians, and stakeholders in advancing the understanding and management of this disease. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.13592 by other authors

arXiv:2311.01450 [pdf, other]

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

Authors: Vint Lee, Pieter Abbeel, Youngwoon Lee

Abstract: Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Mot… ▽ More Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Motivated by the intuition that humans can learn from rough reward estimates, we propose a simple yet effective reward smoothing approach, DreamSmooth, which learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep. We empirically show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks both in sample efficiency and final performance without losing performance on common benchmarks, such as Deepmind Control Suite and Atari benchmarks. △ Less

Submitted 17 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: For code and website, see https://vint-1.github.io/dreamsmooth/

arXiv:2310.14545 [pdf]

Harnessing ChatGPT for thematic analysis: Are we ready?

Authors: V Vien Lee, Stephanie C. C. van der Lubbe, Lay Hoon Goh, Jose M. Valderas

Abstract: ChatGPT is an advanced natural language processing tool with growing applications across various disciplines in medical research. Thematic analysis, a qualitative research method to identify and interpret patterns in data, is one application that stands to benefit from this technology. This viewpoint explores the utilization of ChatGPT in three core phases of thematic analysis within a medical con… ▽ More ChatGPT is an advanced natural language processing tool with growing applications across various disciplines in medical research. Thematic analysis, a qualitative research method to identify and interpret patterns in data, is one application that stands to benefit from this technology. This viewpoint explores the utilization of ChatGPT in three core phases of thematic analysis within a medical context: 1) direct coding of transcripts, 2) generating themes from a predefined list of codes, and 3) preprocessing quotes for manuscript inclusion. Additionally, we explore the potential of ChatGPT to generate interview transcripts, which may be used for training purposes. We assess the strengths and limitations of using ChatGPT in these roles, highlighting areas where human intervention remains necessary. Overall, we argue that ChatGPT can function as a valuable tool during analysis, enhancing the efficiency of the thematic analysis and offering additional insights into the qualitative data. △ Less

Submitted 23 October, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

Comments: 23 pages, 7 figures, 3 tables, 1 textbox

arXiv:2306.07608 [pdf, other]

Finding the Missing-half: Graph Complementary Learning for Homophily-prone and Heterophily-prone Graphs

Authors: Yizhen Zheng, He Zhang, Vincent CS Lee, Yu Zheng, Xiao Wang, Shirui Pan

Abstract: Real-world graphs generally have only one kind of tendency in their connections. These connections are either homophily-prone or heterophily-prone. While graphs with homophily-prone edges tend to connect nodes with the same class (i.e., intra-class nodes), heterophily-prone edges tend to build relationships between nodes with different classes (i.e., inter-class nodes). Existing GNNs only take the… ▽ More Real-world graphs generally have only one kind of tendency in their connections. These connections are either homophily-prone or heterophily-prone. While graphs with homophily-prone edges tend to connect nodes with the same class (i.e., intra-class nodes), heterophily-prone edges tend to build relationships between nodes with different classes (i.e., inter-class nodes). Existing GNNs only take the original graph during training. The problem with this approach is that it forgets to take into consideration the ``missing-half" structural information, that is, heterophily-prone topology for homophily-prone graphs and homophily-prone topology for heterophily-prone graphs. In our paper, we introduce Graph cOmplementAry Learning, namely GOAL, which consists of two components: graph complementation and complemented graph convolution. The first component finds the missing-half structural information for a given graph to complement it. The complemented graph has two sets of graphs including both homophily- and heterophily-prone topology. In the latter component, to handle complemented graphs, we design a new graph convolution from the perspective of optimisation. The experiment results show that GOAL consistently outperforms all baselines in eight real-world datasets. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: Accepted by ICML 2023

arXiv:2305.18457 [pdf, other]

doi 10.1145/3580305.3599410

Learning Strong Graph Neural Networks with Weak Information

Authors: Yixin Liu, Kaize Ding, Jianling Wang, Vincent Lee, Huan Liu, Shirui Pan

Abstract: Graph Neural Networks (GNNs) have exhibited impressive performance in many graph learning tasks. Nevertheless, the performance of GNNs can deteriorate when the input graph data suffer from weak information, i.e., incomplete structure, incomplete features, and insufficient labels. Most prior studies, which attempt to learn from the graph data with a specific type of weak information, are far from e… ▽ More Graph Neural Networks (GNNs) have exhibited impressive performance in many graph learning tasks. Nevertheless, the performance of GNNs can deteriorate when the input graph data suffer from weak information, i.e., incomplete structure, incomplete features, and insufficient labels. Most prior studies, which attempt to learn from the graph data with a specific type of weak information, are far from effective in dealing with the scenario where diverse data deficiencies exist and mutually affect each other. To fill the gap, in this paper, we aim to develop an effective and principled approach to the problem of graph learning with weak information (GLWI). Based on the findings from our empirical analysis, we derive two design focal points for solving the problem of GLWI, i.e., enabling long-range propagation in GNNs and allowing information propagation to those stray nodes isolated from the largest connected component. Accordingly, we propose D$^2$PT, a dual-channel GNN framework that performs long-range information propagation not only on the input graph with incomplete structure, but also on a global graph that encodes global semantic similarities. We further develop a prototype contrastive alignment algorithm that aligns the class-level prototypes learned from two channels, such that the two different information propagation processes can mutually benefit from each other and the finally learned model can well handle the GLWI problem. Extensive experiments on eight real-world benchmark datasets demonstrate the effectiveness and efficiency of our proposed methods in various GLWI scenarios. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted by KDD 2023. 13 pages, 7 figures, 9 tables

arXiv:2302.13251 [pdf, other]

Unsupervised Domain Adaptation for Low-dose CT Reconstruction via Bayesian Uncertainty Alignment

Authors: Kecheng Chen, Jie Liu, Renjie Wan, Victor Ho-Fun Lee, Varut Vardhanabhuti, Hong Yan, Haoliang Li

Abstract: Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised d… ▽ More Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance. △ Less

Submitted 2 June, 2024; v1 submitted 26 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2211.14065 [pdf, other]

Beyond Smoothing: Unsupervised Graph Representation Learning with Edge Heterophily Discriminating

Authors: Yixin Liu, Yizhen Zheng, Daokun Zhang, Vincent CS Lee, Shirui Pan

Abstract: Unsupervised graph representation learning (UGRL) has drawn increasing research attention and achieved promising results in several graph analytic tasks. Relying on the homophily assumption, existing UGRL methods tend to smooth the learned node representations along all edges, ignoring the existence of heterophilic edges that connect nodes with distinct attributes. As a result, current methods are… ▽ More Unsupervised graph representation learning (UGRL) has drawn increasing research attention and achieved promising results in several graph analytic tasks. Relying on the homophily assumption, existing UGRL methods tend to smooth the learned node representations along all edges, ignoring the existence of heterophilic edges that connect nodes with distinct attributes. As a result, current methods are hard to generalize to heterophilic graphs where dissimilar nodes are widely connected, and also vulnerable to adversarial attacks. To address this issue, we propose a novel unsupervised Graph Representation learning method with Edge hEterophily discriminaTing (GREET) which learns representations by discriminating and leveraging homophilic edges and heterophilic edges. To distinguish two types of edges, we build an edge discriminator that infers edge homophily/heterophily from feature and structure information. We train the edge discriminator in an unsupervised way through minimizing the crafted pivot-anchored ranking loss, with randomly sampled node pairs acting as pivots. Node representations are learned through contrasting the dual-channel encodings obtained from the discriminated homophilic and heterophilic edges. With an effective interplaying scheme, edge discriminating and representation learning can mutually boost each other during the training phase. We conducted extensive experiments on 14 benchmark datasets and multiple learning scenarios to demonstrate the superiority of GREET. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 14 pages, 7 tables, 6 figures, accepted by AAAI 2023

arXiv:2210.08792 [pdf, other]

Unifying Graph Contrastive Learning with Flexible Contextual Scopes

Authors: Yizhen Zheng, Yu Zheng, Xiaofei Zhou, Chen Gong, Vincent CS Lee, Shirui Pan

Abstract: Graph contrastive learning (GCL) has recently emerged as an effective learning paradigm to alleviate the reliance on labelling information for graph representation learning. The core of GCL is to maximise the mutual information between the representation of a node and its contextual representation (i.e., the corresponding instance with similar semantic information) summarised from the contextual s… ▽ More Graph contrastive learning (GCL) has recently emerged as an effective learning paradigm to alleviate the reliance on labelling information for graph representation learning. The core of GCL is to maximise the mutual information between the representation of a node and its contextual representation (i.e., the corresponding instance with similar semantic information) summarised from the contextual scope (e.g., the whole graph or 1-hop neighbourhood). This scheme distils valuable self-supervision signals for GCL training. However, existing GCL methods still suffer from limitations, such as the incapacity or inconvenience in choosing a suitable contextual scope for different datasets and building biased contrastiveness. To address aforementioned problems, we present a simple self-supervised learning method termed Unifying Graph Contrastive Learning with Flexible Contextual Scopes (UGCL for short). Our algorithm builds flexible contextual representations with tunable contextual scopes by controlling the power of an adjacency matrix. Additionally, our method ensures contrastiveness is built within connected components to reduce the bias of contextual representations. Based on representations from both local and contextual scopes, UGCL optimises a very simple contrastive loss function for graph representation learning. Essentially, the architecture of UGCL can be considered as a general framework to unify existing GCL methods. We have conducted intensive experiments and achieved new state-of-the-art performance in six out of eight benchmark datasets compared with self-supervised graph representation learning baselines. Our code has been open-sourced. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted in ICDM2022

arXiv:2206.03638 [pdf, other]

Alternately Optimized Graph Neural Networks

Authors: Haoyu Han, Xiaorui Liu, Haitao Mao, MohamadAli Torkamani, Feng Shi, Victor Lee, Jiliang Tang

Abstract: Graph Neural Networks (GNNs) have greatly advanced the semi-supervised node classification task on graphs. The majority of existing GNNs are trained in an end-to-end manner that can be viewed as tackling a bi-level optimization problem. This process is often inefficient in computation and memory usage. In this work, we propose a new optimization framework for semi-supervised learning on graphs. Th… ▽ More Graph Neural Networks (GNNs) have greatly advanced the semi-supervised node classification task on graphs. The majority of existing GNNs are trained in an end-to-end manner that can be viewed as tackling a bi-level optimization problem. This process is often inefficient in computation and memory usage. In this work, we propose a new optimization framework for semi-supervised learning on graphs. The proposed framework can be conveniently solved by the alternating optimization algorithms, resulting in significantly improved efficiency. Extensive experiments demonstrate that the proposed method can achieve comparable or better performance with state-of-the-art baselines while it has significantly better computation and memory efficiency. △ Less

Submitted 19 July, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.01535 [pdf, other]

Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination

Authors: Yizhen Zheng, Shirui Pan, Vincent Cs Lee, Yu Zheng, Philip S. Yu

Abstract: Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL) via self-supervised learning schemes. The core idea is to learn by maximising mutual information for similar instances, which requires similarity computation between two node instances. However, GCL is inefficient in both time and memory consumption. In addition, GCL normally… ▽ More Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL) via self-supervised learning schemes. The core idea is to learn by maximising mutual information for similar instances, which requires similarity computation between two node instances. However, GCL is inefficient in both time and memory consumption. In addition, GCL normally requires a large number of training epochs to be well-trained on large-scale datasets. Inspired by an observation of a technical defect (i.e., inappropriate usage of Sigmoid function) commonly used in two representative GCL works, DGI and MVGRL, we revisit GCL and introduce a new learning paradigm for self-supervised graph representation learning, namely, Group Discrimination (GD), and propose a novel GD-based method called Graph Group Discrimination (GGD). Instead of similarity computation, GGD directly discriminates two groups of node samples with a very simple binary cross-entropy loss. In addition, GGD requires much fewer training epochs to obtain competitive performance compared with GCL methods on large-scale datasets. These two advantages endow GGD with very efficient property. Extensive experiments show that GGD outperforms state-of-the-art self-supervised methods on eight datasets. In particular, GGD can be trained in 0.18 seconds (6.44 seconds including data preprocessing) on ogbn-arxiv, which is orders of magnitude (10,000+) faster than GCL baselines while consuming much less memory. Trained with 9 hours on ogbn-papers100M with billion edges, GGD outperforms its GCL counterparts in both accuracy and efficiency. △ Less

Submitted 16 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Accepted in NeurIPS 2022

arXiv:2204.02521 [pdf]

doi 10.1108/IMDS-03-2023-0173

Optimal service resource management strategy for IoT-based health information system considering value co-creation of users

Authors: Ji Fang, Vincent CS Lee, Haiyan Wang

Abstract: This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service. An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on col… ▽ More This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service. An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service. △ Less

Submitted 30 January, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Fang, J., Lee, V.C.S. and Wang, H. (2024), "Optimal service resource management strategy for IoT-based health information system considering value co-creation of users", Industrial Management & Data Systems, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/IMDS-03-2023-0173

arXiv:2204.00961 [pdf]

doi 10.1177/20552076241233247

Enhancing Digital Health Services: A Machine Learning Approach to Personalized Exercise Goal Setting

Authors: Ji Fang, Vincent CS Lee, Hao Ji, Haiyan Wang

Abstract: The utilization of digital health has increased recently, and these services provide extensive guidance to encourage users to exercise frequently by setting daily exercise goals to promote a healthy lifestyle. These comprehensive guides evolved from the consideration of various personalized behavioral factors. Nevertheless, existing approaches frequently neglect the users dynamic behavior and the… ▽ More The utilization of digital health has increased recently, and these services provide extensive guidance to encourage users to exercise frequently by setting daily exercise goals to promote a healthy lifestyle. These comprehensive guides evolved from the consideration of various personalized behavioral factors. Nevertheless, existing approaches frequently neglect the users dynamic behavior and the changing in their health conditions. This study aims to fill this gap by developing a machine learning algorithm that dynamically updates auto-suggestion exercise goals using retrospective data and realistic behavior trajectory. We conducted a methodological study by designing a deep reinforcement learning algorithm to evaluate exercise performance, considering fitness-fatigue effects. The deep reinforcement learning algorithm combines deep learning techniques to analyse time series data and infer user exercise behavior. In addition, we use the asynchronous advantage actor-critic algorithm for reinforcement learning to determine the optimal exercise intensity through exploration and exploitation. The personalized exercise data and biometric data used in this study were collected from publicly available datasets, encompassing walking, sports logs, and running. In our study, we conducted The statistical analyses/inferential tests to compare the effectiveness of machine learning approach in exercise goal setting across different exercise goal setting strategies. △ Less

Submitted 4 March, 2024; v1 submitted 2 April, 2022; originally announced April 2022.

arXiv:2203.13308 [pdf, other]

Verifiable Access Control for Augmented Reality Localization and Mapping

Authors: Shaowei Zhu, Hyo Jin Kim, Maurizio Monge, G. Edward Suh, Armin Alaghi, Brandon Reagen, Vincent Lee

Abstract: Localization and mapping is a key technology for bridging the virtual and physical worlds in augmented reality (AR). Localization and mapping works by creating and querying maps made of anchor points that enable the overlay of these two worlds. As a result, information about the physical world is captured in the map and naturally gives rise to concerns around who can map physical spaces as well as… ▽ More Localization and mapping is a key technology for bridging the virtual and physical worlds in augmented reality (AR). Localization and mapping works by creating and querying maps made of anchor points that enable the overlay of these two worlds. As a result, information about the physical world is captured in the map and naturally gives rise to concerns around who can map physical spaces as well as who can access or modify the virtual ones. This paper discusses how we can provide access controls over virtual maps as a basic building block to enhance security and privacy of AR systems. In particular, we propose VACMaps: an access control system for localization and mapping using formal methods. VACMaps defines a domain-specific language that enables users to specify access control policies for virtual spaces. Access requests to virtual spaces are then evaluated against relevant policies in a way that preserves confidentiality and integrity of virtual spaces owned by the users. The precise semantics of the policies are defined by SMT formulas, which allow VACMaps to reason about properties of access policies automatically. An evaluation of VACMaps is provided using an AR testbed of a single-family home. We show that VACMaps is scalable in that it can run at practical speeds and that it can also reason about access control policies automatically to detect potential policy misconfigurations. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.03627 [pdf]

Multi-channel deep convolutional neural networks for multi-classifying thyroid disease

Authors: Xinyu Zhang, Vincent CS. Lee, Jia Rong, James C. Lee, Jiangning Song, Feng Liu

Abstract: Thyroid disease instances have been continuously increasing since the 1990s, and thyroid cancer has become the most rapidly rising disease among all the malignancies in recent years. Most existing studies focused on applying deep convolutional neural networks for detecting thyroid cancer. Despite their satisfactory performance on binary classification tasks, limited studies have explored multi-cla… ▽ More Thyroid disease instances have been continuously increasing since the 1990s, and thyroid cancer has become the most rapidly rising disease among all the malignancies in recent years. Most existing studies focused on applying deep convolutional neural networks for detecting thyroid cancer. Despite their satisfactory performance on binary classification tasks, limited studies have explored multi-class classification of thyroid disease types; much less is known of the diagnosis of co-existence situation for different types of thyroid diseases. Therefore, this study proposed a novel multi-channel convolutional neural network (CNN) architecture to address the multi-class classification task of thyroid disease. The multi-channel CNN merits from computed tomography to drive a comprehensive diagnostic decision for the overall thyroid gland, emphasizing the disease co-existence circumstance. Moreover, this study also examined alternative strategies to enhance the diagnostic accuracy of CNN models through concatenation of different scales of feature maps. Benchmarking experiments demonstrate the improved performance of the proposed multi-channel CNN architecture compared with the standard single-channel CNN architecture. More specifically, the multi-channel CNN achieved an accuracy of 0.909, precision of 0.944, recall of 0.896, specificity of 0.994, and F1 of 0.917, in contrast to the single-channel CNN, which obtained 0.902, 0.892, 0.909, 0.993, 0.898, respectively. In addition, the proposed model was evaluated in different gender groups; it reached a diagnostic accuracy of 0.908 for the female group and 0.901 for the male group. Collectively, the results highlight that the proposed multi-channel CNN has excellent generalization and has the potential to be deployed to provide computational decision support in clinical settings. △ Less

Submitted 5 March, 2022; originally announced March 2022.

arXiv:2203.02547 [pdf, other]

Homomorphically Encrypted Computation using Stochastic Encodings

Authors: Hsuan Hsiao, Vincent Lee, Brandon Reagen, Armin Alaghi

Abstract: Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly over ciphertext. Unfortunately, a key challenge for HE is that implementations can be impractically slow and have limits on computation that can be efficiently implemented. For instance, in Boolean constructions of HE like TFHE, arithmetic operations need to be decomposed into constituent elementary log… ▽ More Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly over ciphertext. Unfortunately, a key challenge for HE is that implementations can be impractically slow and have limits on computation that can be efficiently implemented. For instance, in Boolean constructions of HE like TFHE, arithmetic operations need to be decomposed into constituent elementary logic gates to implement so performance depends on logical circuit depth. For even heavily quantized fixed-point arithmetic operations, these HE circuit implementations can be slow. This paper explores the merit of using stochastic computing (SC) encodings to reduce the logical depth required for HE computation to enable more efficient implementations. Contrary to computation in the plaintext space where many efficient hardware implementations are available, HE provides support for only a limited number of primitive operators and their performance may not directly correlate to their plaintext performance. Our results show that by layering SC encodings on top of TFHE, we observe similar challenges and limitations that SC faces in the plaintext space. Additional breakthroughs would require more support from the HE libraries to make SC with HE a viable solution. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2202.13433 [pdf, other]

Feasibility and Acceptability of Remote Neuromotor Rehabilitation Interactions Using Social Robot Augmented Telepresence: A Case Study

Authors: Michael J. Sobrepera, Vera G. Lee, Suveer Garg, Michelle J. Johnson, Ph. D

Abstract: There is a growing need to deliver rehabilitation care to patients remotely. Long term demographic changes, geographic shortages of care providers, and now a global pandemic contribute to this need. Telepresence provides an option for delivering this care. However, telepresence using video and audio alone does not provide an interaction of the same quality as in-person. To bridge this gap, we prop… ▽ More There is a growing need to deliver rehabilitation care to patients remotely. Long term demographic changes, geographic shortages of care providers, and now a global pandemic contribute to this need. Telepresence provides an option for delivering this care. However, telepresence using video and audio alone does not provide an interaction of the same quality as in-person. To bridge this gap, we propose the use of social robot augmented telepresence (SRAT). We have constructed a demonstration SRAT system for upper extremity rehab, in which a humanoid, with a head, body, face, and arms, is attached to a mobile telepresence system, to collaborate with the patient and clinicians as an independent social entity. The humanoid can play games with the patient and demonstrate activities.These activities could be used both to perform assessments in support of self-directed rehab and to perform exercises. In this paper, we present a case series with six subjects who completed interactions with the robot, three subjects who have previously suffered a stroke and three pediatric subjects who are typically developing. Subjects performed a Simon Says activity and a target touch activity in person, using classical telepresence (CT), and using SRAT. Subjects were able to effectively work with the social robot guiding interactions and 5 of 6 rated SRAT better than CT. This study demonstrates the feasibility of SRAT and some of its benefits. △ Less

Submitted 27 February, 2022; originally announced February 2022.

arXiv:2201.05232 [pdf, other]

FARSI: Facebook AR System Investigator for Agile Domain-Specific System-on-Chip Exploration

Authors: Behzad Boroujerdian, Ying Jing, Amit Kumar, Lavanya Subramanian, Luke Yen, Vincent Lee, Vivek Venkatesan, Amit Jindal, Robert Shearer, Vijay Janapa Reddi

Abstract: Domain-specific SoCs (DSSoCs) are attractive solutions for domains with stringent power/performance/area constraints; however, they suffer from two fundamental complexities. On the one hand, their many specialized hardware blocks result in complex systems and thus high development effort. On the other, their many system knobs expand the complexity of design space, making the search for the optimal… ▽ More Domain-specific SoCs (DSSoCs) are attractive solutions for domains with stringent power/performance/area constraints; however, they suffer from two fundamental complexities. On the one hand, their many specialized hardware blocks result in complex systems and thus high development effort. On the other, their many system knobs expand the complexity of design space, making the search for the optimal design difficult. Thus to reach prevalence, taming such complexities is necessary. This work identifies necessary features of an early-stage design space exploration (DSE) framework that targets the complex design space of DSSoCs and further provides an instance of one called FARSI, (F)acebook (AR) (S)ystem (I)nvestigator. Concretely, FARSI provides an agile system-level simulator with speed up and accuracy of 8,400X and 98.5% comparing to Synopsys Platform Architect. FARSI also provides an efficient exploration heuristic and achieves up to 16X improvementin convergence time comparing to naive simulated annealing (SA). This is done by augmenting SA with architectural reasoning such as locality exploitation and bottleneck relaxation. Furthermore, we embed various co-design capabilities and show that on average, they have a 32% impact on the convergence rate. Finally, we demonstrate that using simple development-cost-aware policies can lower the system complexity, both in terms of the component count and variation by as much as 150% and 118% (e,g., for Network-on-a-Chip subsystem) △ Less

Submitted 17 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

arXiv:2112.12785 [pdf, other]

NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning

Authors: Tony Ng, Hyo Jin Kim, Vincent Lee, Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios Balntas, Krystian Mikolajczyk, Chris Sweeney

Abstract: In the light of recent analyses on privacy-concerning scene revelation from visual descriptors, we develop descriptors that conceal the input image content. In particular, we propose an adversarial learning framework for training visual descriptors that prevent image reconstruction, while maintaining the matching accuracy. We let a feature encoding network and image reconstruction network compete… ▽ More In the light of recent analyses on privacy-concerning scene revelation from visual descriptors, we develop descriptors that conceal the input image content. In particular, we propose an adversarial learning framework for training visual descriptors that prevent image reconstruction, while maintaining the matching accuracy. We let a feature encoding network and image reconstruction network compete with each other, such that the feature encoder tries to impede the image reconstruction with its generated descriptors, while the reconstructor tries to recover the input image from the descriptors. The experimental results demonstrate that the visual descriptors obtained with our method significantly deteriorate the image reconstruction quality with minimal impact on correspondence matching and camera localization performance. △ Less

Submitted 29 March, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: Accepted at CVPR 2022. Supplementary material included after references. 15 pages, 14 figures, 6 tables

arXiv:2108.09606 [pdf, other]

Online Ride-Hitching in UAV Travelling

Authors: Songhua Li, Minming Li, Lingjie Duan, Victor C. S. Lee

Abstract: The unmanned aerial vehicle (UAV) has emerged as a promising solution to provide delivery and other mobile services to customers rapidly, yet it drains its stored energy quickly when travelling on the way and (even if solar-powered) it takes time for charging power on the way before reaching the destination. To address this issue, existing works focus more on UAV's path planning with designated sy… ▽ More The unmanned aerial vehicle (UAV) has emerged as a promising solution to provide delivery and other mobile services to customers rapidly, yet it drains its stored energy quickly when travelling on the way and (even if solar-powered) it takes time for charging power on the way before reaching the destination. To address this issue, existing works focus more on UAV's path planning with designated system vehicles providing charging service. However, in some emergency cases and rural areas where system vehicles are not available, public trucks can provide more feasible and cost-saving services and hence a silver lining. In this paper, we explore how a single UAV can save flying distance by exploiting public trucks, to minimize the travel time of the UAV. We give the first theoretical work studying online algorithms for the problem, which guarantees a worst-case performance. We first consider the offline problem knowing future truck trip information far ahead of time. By delicately transforming the problem into a graph satisfying both time and power constraints, we present a shortest-path algorithm that outputs the optimal solution of the problem. Then, we proceed to the online setting where trucks appear in real-time and only inform the UAV of their trip information some certain time $Δt$ beforehand. As a benchmark, we propose a well-constructed lower bound that an online algorithm could achieve. We propose an online algorithm MyopicHitching that greedily takes truck trips and an improved algorithm $Δt$-Adaptive that further tolerates a waiting time in taking a ride. Our theoretical analysis shows that $Δt$-Adaptive is asymptotically optimal in the sense that its ratio approaches the proposed lower bounds as $Δt$ increases. △ Less

Submitted 21 August, 2021; originally announced August 2021.

Comments: A preliminary version of this paper is to appear at COCOON 2021

arXiv:2108.04097 [pdf, other]

Deep Learning for Embodied Vision Navigation: A Survey

Authors: Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang

Abstract: "Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This problem has attracted rising attention in recent years due to its wide application in autonomous driving, vacuum cleaner, and rescue robot. A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring… ▽ More "Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This problem has attracted rising attention in recent years due to its wide application in autonomous driving, vacuum cleaner, and rescue robot. A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc. Building such an agent that observes, thinks, and acts is a key to real intelligence. The remarkable learning ability of deep learning methods empowered the agents to accomplish embodied visual navigation tasks. Despite this, embodied visual navigation is still in its infancy since a lot of advanced skills are required, including perceiving partially observed visual input, exploring unseen areas, memorizing and modeling seen scenarios, understanding cross-modal instructions, and adapting to a new environment, etc. Recently, embodied visual navigation has attracted rising attention of the community, and numerous works has been proposed to learn these skills. This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey. We summarize the benchmarks and metrics, review different methods, analysis the challenges, and highlight the state-of-the-art methods. Finally, we discuss unresolved challenges in the field of embodied visual navigation and give promising directions in pursuing future research. △ Less

Submitted 11 October, 2021; v1 submitted 7 July, 2021; originally announced August 2021.

Comments: 20 pages

arXiv:2106.09876 [pdf, other]

doi 10.1109/TKDE.2021.3124061

Anomaly Detection in Dynamic Graphs via Transformer

Authors: Yixin Liu, Shirui Pan, Yu Guang Wang, Fei Xiong, Liang Wang, Qingfeng Chen, Vincent CS Lee

Abstract: Detecting anomalies for dynamic graphs has drawn increasing attention due to their wide applications in social networks, e-commerce, and cybersecurity. Recent deep learning-based approaches have shown promising results over shallow methods. However, they fail to address two core challenges of anomaly detection in dynamic graphs: the lack of informative encoding for unattributed nodes and the diffi… ▽ More Detecting anomalies for dynamic graphs has drawn increasing attention due to their wide applications in social networks, e-commerce, and cybersecurity. Recent deep learning-based approaches have shown promising results over shallow methods. However, they fail to address two core challenges of anomaly detection in dynamic graphs: the lack of informative encoding for unattributed nodes and the difficulty of learning discriminate knowledge from coupled spatial-temporal dynamic graphs. To overcome these challenges, in this paper, we present a novel Transformer-based Anomaly Detection framework for DYnamic graphs (TADDY). Our framework constructs a comprehensive node encoding strategy to better represent each node's structural and temporal roles in an evolving graphs stream. Meanwhile, TADDY captures informative representation from dynamic graphs with coupled spatial-temporal patterns via a dynamic graph transformer model. The extensive experimental results demonstrate that our proposed TADDY framework outperforms the state-of-the-art methods by a large margin on six real-world datasets. △ Less

Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 13 pages, 5 figures

arXiv:2105.03812 [pdf, other]

Analysis and Mitigations of Reverse Engineering Attacks on Local Feature Descriptors

Authors: Deeksha Dangwal, Vincent T. Lee, Hyo Jin Kim, Tianwei Shen, Meghan Cowan, Rajvi Shah, Caroline Trippel, Brandon Reagen, Timothy Sherwood, Vasileios Balntas, Armin Alaghi, Eddy Ilg

Abstract: As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user images. The widely adopted technology uses local feature descriptors, which are derived from the images and it was long thought that they could not be reverted back. However, recent work has demonstrated that under certain conditions reverse… ▽ More As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user images. The widely adopted technology uses local feature descriptors, which are derived from the images and it was long thought that they could not be reverted back. However, recent work has demonstrated that under certain conditions reverse engineering attacks are possible and allow an adversary to reconstruct RGB images. This poses a potential risk to user privacy. We take this a step further and model potential adversaries using a privacy threat model. Subsequently, we show under controlled conditions a reverse engineering attack on sparse feature maps and analyze the vulnerability of popular descriptors including FREAK, SIFT and SOSNet. Finally, we evaluate potential mitigation techniques that select a subset of descriptors to carefully balance privacy reconstruction risk while preserving image matching accuracy; our results show that similar accuracy can be obtained when revealing less information. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: 13 pages

arXiv:2105.00378 [pdf, other]

doi 10.1145/3458903.345891

SoK: Opportunities for Software-Hardware-Security Codesign for Next Generation Secure Computing

Authors: Deeksha Dangwal, Meghan Cowan, Armin Alaghi, Vincent T. Lee, Brandon Reagen, Caroline Trippel

Abstract: Users are demanding increased data security. As a result, security is rapidly becoming a first-order design constraint in next generation computing systems. Researchers and practitioners are exploring various security technologies to meet user demand such as trusted execution environments (e.g., Intel SGX, ARM TrustZone), homomorphic encryption, and differential privacy. Each technique provides so… ▽ More Users are demanding increased data security. As a result, security is rapidly becoming a first-order design constraint in next generation computing systems. Researchers and practitioners are exploring various security technologies to meet user demand such as trusted execution environments (e.g., Intel SGX, ARM TrustZone), homomorphic encryption, and differential privacy. Each technique provides some degree of security, but differs with respect to threat coverage, performance overheads, as well as implementation and deployment challenges. In this paper, we present a systemization of knowledge (SoK) on these design considerations and trade-offs using several prominent security technologies. Our study exposes the need for \textit{software-hardware-security} codesign to realize efficient and effective solutions of securing user data. In particular, we explore how design considerations across applications, hardware, and security mechanisms must be combined to overcome fundamental limitations in current technologies so that we can minimize performance overhead while achieving sufficient threat model coverage. Finally, we propose a set of guidelines to facilitate putting these secure computing technologies into practice. △ Less

Submitted 1 May, 2021; originally announced May 2021.

Comments: 9 pages

arXiv:2101.07841 [pdf, other]

Porcupine: A Synthesizing Compiler for Vectorized Homomorphic Encryption

Authors: Meghan Cowan, Deeksha Dangwal, Armin Alaghi, Caroline Trippel, Vincent T. Lee, Brandon Reagen

Abstract: Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly on encrypted data. Despite its promise, HE has seen limited use due to performance overheads and compilation challenges. Recent work has made significant advances to address the performance overheads but automatic compilation of efficient HE kernels remains relatively unexplored. This paper presents P… ▽ More Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly on encrypted data. Despite its promise, HE has seen limited use due to performance overheads and compilation challenges. Recent work has made significant advances to address the performance overheads but automatic compilation of efficient HE kernels remains relatively unexplored. This paper presents Porcupine, an optimizing compiler, and HE DSL named Quill to automatically generate HE code using program synthesis. HE poses three major compilation challenges: it only supports a limited set of SIMD-like operators, it uses long-vector operands, and decryption can fail if ciphertext noise growth is not managed properly. Quill captures the underlying HE operator behavior that enables Porcupine to reason about the complex trade-offs imposed by the challenges and generate optimized, verified HE kernels. To improve synthesis time, we propose a series of optimizations including a sketch design tailored to HE and instruction restriction to narrow the program search space. We evaluate Procupine using a set of kernels and show speedups of up to 51% (11% geometric mean) compared to heuristic-driven hand-optimized kernels. Analysis of Porcupine's synthesized code reveals that optimal solutions are not always intuitive, underscoring the utility of automated reasoning in this domain. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2011.10938 [pdf, other]

Online Maximum $k$-Interval Coverage Problem

Authors: Songhua Li, Minming Li, Lingjie Duan, Victor C. S. Lee

Abstract: We study the online maximum coverage problem on a line, in which, given an online sequence of sub-intervals (which may intersect among each other) of a target large interval and an integer $k$, we aim to select at most $k$ of the sub-intervals such that the total covered length of the target interval is maximized. The decision to accept or reject each sub-interval is made immediately and irrevocab… ▽ More We study the online maximum coverage problem on a line, in which, given an online sequence of sub-intervals (which may intersect among each other) of a target large interval and an integer $k$, we aim to select at most $k$ of the sub-intervals such that the total covered length of the target interval is maximized. The decision to accept or reject each sub-interval is made immediately and irrevocably (no preemption) right at the release timestamp of the sub-interval. We comprehensively study different settings of this problem regarding both the length of a released sub-interval and the total number of released sub-intervals. We first present lower bounds on the competitive ratio for the settings concerned in this paper, respectively. For the offline problem where the sequence of all the released sub-intervals is known in advance to the decision-maker, we propose a dynamic-programming-based optimal approach as the benchmark. For the online problem, we first propose a single-threshold-based deterministic algorithm SOA by adding a sub-interval if the added length exceeds a certain threshold, achieving competitive ratios close to the lower bounds, respectively. Then, we extend to a double-thresholds-based algorithm DOA, by using the first threshold for exploration and the second threshold (larger than the first one) for exploitation. With the two thresholds solved by our proposed program, we show that DOA improves SOA in the worst-case performance. Moreover, we prove that a deterministic algorithm that accepts sub-intervals by multi non-increasing thresholds cannot outperform even SOA. △ Less

Submitted 22 November, 2020; originally announced November 2020.

Comments: An extended abstract of this full version is to appear in COCOA 2020

arXiv:2006.00505 [pdf, other]

Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference

Authors: Brandon Reagen, Wooseok Choi, Yeongil Ko, Vincent Lee, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks

Abstract: As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally, big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inference directly on… ▽ More As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally, big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inference directly on the client's encrypted data. While HE can meet privacy constraints, it introduces enormous computational challenges and remains impractically slow in current systems. This paper introduces Cheetah, a set of algorithmic and hardware optimizations for HE DNN inference to achieve plaintext DNN inference speeds. Cheetah proposes HE-parameter tuning optimization and operator scheduling optimizations, which together deliver 79x speedup over the state-of-the-art. However, this still falls short of plaintext inference speeds by almost four orders of magnitude. To bridge the remaining performance gap, Cheetah further proposes an accelerator architecture that, when combined with the algorithmic optimizations, approaches plaintext DNN inference speeds. We evaluate several common neural network models (e.g., ResNet50, VGG16, and AlexNet) and show that plaintext-level HE inference for each is feasible with a custom accelerator consuming 30W and 545mm^2. △ Less

Submitted 8 October, 2020; v1 submitted 31 May, 2020; originally announced June 2020.

arXiv:1910.12415 [pdf, other]

doi 10.1016/j.eswa.2021.115675

Robotic Hierarchical Graph Neurons. A novel implementation of HGN for swarm robotic behaviour control

Authors: Phillip Smith, Aldeida Aleti, Vincent C. S. Lee, Robert Hunjet, Asad Khan

Abstract: This paper explores the use of a novel form of Hierarchical Graph Neurons (HGN) for in-operation behaviour selection in a swarm of robotic agents. This new HGN is called Robotic-HGN (R-HGN), as it matches robot environment observations to environment labels via fusion of match probabilities from both temporal and intra-swarm collections. This approach is novel for HGN as it addresses robotic obser… ▽ More This paper explores the use of a novel form of Hierarchical Graph Neurons (HGN) for in-operation behaviour selection in a swarm of robotic agents. This new HGN is called Robotic-HGN (R-HGN), as it matches robot environment observations to environment labels via fusion of match probabilities from both temporal and intra-swarm collections. This approach is novel for HGN as it addresses robotic observations being pseudo-continuous numbers, rather than categorical values. Additionally, the proposed approach is memory and computation-power conservative and thus is acceptable for use in mobile devices such as single-board computers, which are often used in mobile robotic agents. This R-HGN approach is validated against individual behaviour implementation and random behaviour selection. This contrast is made in two sets of simulated environments: environments designed to challenge the held behaviours of the R-HGN, and randomly generated environments which are more challenging for the robotic swarm than R-HGN training conditions. R-HGN has been found to enable appropriate behaviour selection in both these sets, allowing significant swarm performance in pre-trained and unexpected environment conditions. △ Less

Submitted 27 October, 2019; originally announced October 2019.

Journal ref: Expert Systems with Applications 2021

arXiv:1909.11822 [pdf, other]

DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems

Authors: Adam Rupe, Nalini Kumar, Vladislav Epifanov, Karthik Kashinath, Oleksandr Pavlyk, Frank Schlimbach, Mostofa Patwary, Sergey Maidanov, Victor Lee, Prabhat, James P. Crutchfield

Abstract: Extracting actionable insight from complex unlabeled scientific data is an open challenge and key to unlocking data-driven discovery in science. Complementary and alternative to supervised machine learning approaches, unsupervised physics-based methods based on behavior-driven theories hold great promise. Due to computational limitations, practical application on real-world domain science problems… ▽ More Extracting actionable insight from complex unlabeled scientific data is an open challenge and key to unlocking data-driven discovery in science. Complementary and alternative to supervised machine learning approaches, unsupervised physics-based methods based on behavior-driven theories hold great promise. Due to computational limitations, practical application on real-world domain science problems has lagged far behind theoretical development. We present our first step towards bridging this divide - DisCo - a high-performance distributed workflow for the behavior-driven local causal state theory. DisCo provides a scalable unsupervised physics-based representation learning method that decomposes spatiotemporal systems into their structurally relevant components, which are captured by the latent local causal state variables. Complex spatiotemporal systems are generally highly structured and organize around a lower-dimensional skeleton of coherent structures, and in several firsts we demonstrate the efficacy of DisCo in capturing such structures from observational and simulated scientific data. To the best of our knowledge, DisCo is also the first application software developed entirely in Python to scale to over 1000 machine nodes, providing good performance along with ensuring domain scientists' productivity. We developed scalable, performant methods optimized for Intel many-core processors that will be upstreamed to open-source Python library packages. Our capstone experiment, using newly developed DisCo workflow and libraries, performs unsupervised spacetime segmentation analysis of CAM5.1 climate simulation data, processing an unprecedented 89.5 TB in 6.6 minutes end-to-end using 1024 Intel Haswell nodes on the Cori supercomputer obtaining 91% weak-scaling and 64% strong-scaling efficiency. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.11251 [pdf]

Online Semi-Supervised Concept Drift Detection with Density Estimation

Authors: Chang How Tan, Vincent CS Lee, Mahsa Salehi

Abstract: Concept drift is formally defined as the change in joint distribution of a set of input variables X and a target variable y. The two types of drift that are extensively studied are real drift and virtual drift where the former is the change in posterior probabilities p(y|X) while the latter is the change in distribution of X without affecting the posterior probabilities. Many approaches on concept… ▽ More Concept drift is formally defined as the change in joint distribution of a set of input variables X and a target variable y. The two types of drift that are extensively studied are real drift and virtual drift where the former is the change in posterior probabilities p(y|X) while the latter is the change in distribution of X without affecting the posterior probabilities. Many approaches on concept drift detection either assume full availability of data labels, y or handle only the virtual drift. In a streaming environment, the assumption of full availability of data labels, y is questioned. On the other hand, approaches that deal with virtual drift failed to address real drift. Rather than improving the state-of-the-art methods, this paper presents a semi-supervised framework to deal with the challenges above. The objective of the proposed framework is to learn from streaming environment with limited data labels, y and detect real drift concurrently. This paper proposes a novel concept drift detection method utilizing the densities of posterior probabilities in partially labeled streaming environments. Experimental results on both synthetic and realworld datasets show that our proposed semi-supervised framework enables the detection of concept drift in such environment while achieving comparable prediction performance to the state-of-the-art methods. △ Less

Submitted 10 November, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

arXiv:1909.07520 [pdf, other]

Towards Unsupervised Segmentation of Extreme Weather Events

Authors: Adam Rupe, Karthik Kashinath, Nalini Kumar, Victor Lee, Prabhat, James P. Crutchfield

Abstract: Extreme weather is one of the main mechanisms through which climate change will directly impact human society. Coping with such change as a global community requires markedly improved understanding of how global warming drives extreme weather events. While alternative climate scenarios can be simulated using sophisticated models, identifying extreme weather events in these simulations requires aut… ▽ More Extreme weather is one of the main mechanisms through which climate change will directly impact human society. Coping with such change as a global community requires markedly improved understanding of how global warming drives extreme weather events. While alternative climate scenarios can be simulated using sophisticated models, identifying extreme weather events in these simulations requires automation due to the vast amounts of complex high-dimensional data produced. Atmospheric dynamics, and hydrodynamic flows more generally, are highly structured and largely organize around a lower dimensional skeleton of coherent structures. Indeed, extreme weather events are a special case of more general hydrodynamic coherent structures. We present a scalable physics-based representation learning method that decomposes spatiotemporal systems into their structurally relevant components, which are captured by latent variables known as local causal states. For complex fluid flows we show our method is capable of capturing known coherent structures, and with promising segmentation results on CAM5.1 water vapor data we outline the path to extreme weather identification from unlabeled climate model simulation data. △ Less

Submitted 16 September, 2019; originally announced September 2019.

arXiv:1908.07951

Secure practical indoor optical wireless communications using quantum key distribution

Authors: Vincent Lee, Dominic OBrien

Abstract: Quantum Key Distribution (QKD) can guarantee security for practical indoor optical wireless environments. The key challenges are to mitigate artificial lighting and ambient light at the receiver. A new spectral region for QKD is proposed and an ideal QKD link model is simulated with experimental ambient light power measurements. Simulation, modelling, and analysis indicates that the carbon dioxide… ▽ More Quantum Key Distribution (QKD) can guarantee security for practical indoor optical wireless environments. The key challenges are to mitigate artificial lighting and ambient light at the receiver. A new spectral region for QKD is proposed and an ideal QKD link model is simulated with experimental ambient light power measurements. Simulation, modelling, and analysis indicates that the carbon dioxide and water absorption band (1370 nm) is a new wavelength region for QKD operation in indoor optical wireless environments. For a feasible QKD link, approximately 20 dB of signal to noise ratio (SNR) is required and a maximum quantum bit error rate (QBER) of 11% when using the BB84 protocol. Links in the new spectral region with a FOV of several degrees are feasible, depending on available components. △ Less

Submitted 28 April, 2020; v1 submitted 18 August, 2019; originally announced August 2019.

Comments: Page 10, experimental results. Authors decision to revisit and resolve orders of magnitude discrepancy

arXiv:1907.03382 [pdf, other]

doi 10.1145/3295500.3356180

Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale

Authors: Atılım Güneş Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, Saeid Naderiparizi, Bradley Gram-Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip Torr, Victor Lee, Kyle Cranmer, Prabhat, Frank Wood

Abstract: Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL frame… ▽ More Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL framework that couples directly to existing scientific simulators through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN--LSTM architecture with a PyTorch-MPI-based framework on 1,024 32-core CPU nodes of the Cori supercomputer with a global minibatch size of 128k: achieving a performance of 450 Tflop/s through enhancements to PyTorch. We demonstrate a Large Hadron Collider (LHC) use-case with the C++ Sherpa simulator and achieve the largest-scale posterior inference in a Turing-complete PPL. △ Less

Submitted 27 August, 2019; v1 submitted 7 July, 2019; originally announced July 2019.

Comments: 14 pages, 8 figures

MSC Class: 68T37; 68T05; 62P35 ACM Class: G.3; I.2.6; J.2

Journal ref: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 17--22, 2019

arXiv:1907.00378 [pdf, other]

doi 10.1609/aaai.v33i01.33014755

Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

Authors: Xiaoyu Qin, Kai Ming Ting, Ye Zhu, Vincent CS Lee

Abstract: A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Simila… ▽ More A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot. △ Less

Submitted 30 June, 2019; originally announced July 2019.

Journal ref: Qin, Xiaoyu, et al. "Nearest-neighbour-induced isolation similarity and its impact on density-based clustering." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019

arXiv:1902.05971 [pdf, other]

Synthesizing Number Generators for Stochastic Computing using Mixed Integer Programming

Authors: Vincent T. Lee, Samuel Archibald Elliot, Armin Alaghi, Luis Ceze

Abstract: Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values. Practical SC implementations require deterministic or pseudo-random number sequences which are optimally correlated to generate bitstreams and achieve accurate results. Unfortunately, the size of the search space makes manually designing optim… ▽ More Stochastic computing (SC) is a high density, low-power computation technique which encodes values as unary bitstreams instead of binary-encoded (BE) values. Practical SC implementations require deterministic or pseudo-random number sequences which are optimally correlated to generate bitstreams and achieve accurate results. Unfortunately, the size of the search space makes manually designing optimally correlated number sequences a difficult task. To automate this design burden, we propose a synthesis formulation using mixed integer programming to automatically generate optimally correlated number sequences. In particular, our synthesis formulation improves the accuracy of arithmetic operations such as multiplication and squaring circuits by up to 2.5x and 20x respectively. We also show how our technique can be extended to scale to larger circuits. △ Less

Submitted 26 February, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

Comments: 6 pages, 5 figures, 3 tables

arXiv:1901.08248 [pdf, other]

TigerGraph: A Native MPP Graph Database

Authors: Alin Deutsch, Yu Xu, Mingxi Wu, Victor Lee

Abstract: We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. TigerGraph's high-level query language, GSQL, is designed for compatibility with SQL, while simultaneously allowing NoSQL programmers to continue thinking in Bulk-Synchronous Processing (BSP) terms and reap the benefits of high-level specification. GSQL is… ▽ More We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. TigerGraph's high-level query language, GSQL, is designed for compatibility with SQL, while simultaneously allowing NoSQL programmers to continue thinking in Bulk-Synchronous Processing (BSP) terms and reap the benefits of high-level specification. GSQL is sufficiently high-level to allow declarative SQL-style programming, yet sufficiently expressive to concisely specify the sophisticated iterative algorithms required by modern graph analytics and traditionally coded in general-purpose programming languages like C++ and Java. We report very strong scale-up and scale-out performance over a benchmark we published on GitHub for full reproducibility. △ Less

Submitted 24 January, 2019; originally announced January 2019.

arXiv:1811.09538 [pdf, ps, other]

A Game Model of Search and Pursuit

Authors: Steve Alpern, Viciano Lee

Abstract: Shmuel Gal and Jerome Casas have recently introduced a game theoretic model that combines search and pursuit by a predator for a prey animal. The prey (hider) can hide in a finite number of locations. The predator (searcher) can inspect any k of these locations. If the prey is not in any of these, the prey wins. If the prey is found at an inspected location, a pursuit begins which is successful fo… ▽ More Shmuel Gal and Jerome Casas have recently introduced a game theoretic model that combines search and pursuit by a predator for a prey animal. The prey (hider) can hide in a finite number of locations. The predator (searcher) can inspect any k of these locations. If the prey is not in any of these, the prey wins. If the prey is found at an inspected location, a pursuit begins which is successful for the predator with a known capture probability which depends on the location. We modify the problem so that each location takes a certain time to inspect and the predator has total inspection time k. We also consider a repeated game model where the capture probabilities only become known to the players over time, as each successful escape from a location lowers its perceived value capture probability. △ Less

Submitted 9 December, 2019; v1 submitted 23 November, 2018; originally announced November 2018.

Comments: 17 pages, 0 figures, presented at the 18th International Symposium on Dynamic Games and Application July 9-12, 2018

arXiv:1810.04756 [pdf, other]

Stochastic Synthesis for Stochastic Computing

Authors: Vincent T. Lee, Armin Alaghi, Luis Ceze, Mark Oskin

Abstract: Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation. Unlike BE computation, SC encodes values as probabilistic bitstreams which makes designing new circuits unintuitive. Existing techniques for synthesizing SC circuits are limited to specific classes of functions such as polynomial evaluation o… ▽ More Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation. Unlike BE computation, SC encodes values as probabilistic bitstreams which makes designing new circuits unintuitive. Existing techniques for synthesizing SC circuits are limited to specific classes of functions such as polynomial evaluation or constant scaling. In this paper, we propose using stochastic synthesis, which is originally a program synthesis technique, to automate the task of synthesizing new SC circuits. Our results show stochastic synthesis is more general than past techniques and can synthesize manually designed SC circuits as well as new ones such as an approximate square root unit. △ Less

Submitted 10 October, 2018; originally announced October 2018.

Comments: 7 pages, 4 figures, 3 tables

arXiv:1808.04728 [pdf, other]

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Authors: Amrita Mathuriya, Deborah Bard, Peter Mendygral, Lawrence Meadows, James Arnemann, Lei Shao, Siyu He, Tuomas Karna, Daina Moise, Simon J. Pennycook, Kristyn Maschoff, Jason Sewall, Nalini Kumar, Shirley Ho, Mike Ringenburg, Prabhat, Victor Lee

Abstract: Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many el… ▽ More Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters $Ω_M$, $σ_8$ and n$_s$ with unprecedented accuracy. △ Less

Submitted 9 November, 2018; v1 submitted 14 August, 2018; originally announced August 2018.

Comments: 11 pages, 6 pages, presented at SuperComputing 2018

arXiv:1807.07706 [pdf, other]

Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model

Authors: Atılım Güneş Baydin, Lukas Heinrich, Wahid Bhimji, Lei Shao, Saeid Naderiparizi, Andreas Munk, Jialin Liu, Bradley Gram-Hansen, Gilles Louppe, Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Kyle Cranmer, Frank Wood

Abstract: We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable po… ▽ More We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline. △ Less

Submitted 17 February, 2020; v1 submitted 20 July, 2018; originally announced July 2018.

Comments: 20 pages, 9 figures

MSC Class: 68T37; 68T05; 62P35 ACM Class: G.3; I.2.6; J.2

Journal ref: In Advances in Neural Information Processing Systems 33 (NeurIPS), Vancouver, Canada, 2019

arXiv:1803.04862 [pdf, other]

Correlation Manipulating Circuits for Stochastic Computing

Authors: Vincent T. Lee, Armin Alaghi, Luis Ceze

Abstract: Stochastic computing (SC) is an emerging computing technique that promises high density, low power, and error tolerant solutions. In SC, values are encoded as unary bitstreams and SC arithmetic circuits operate on one or more bitstreams. In many cases, the input bitstreams must be correlated or uncorrelated for SC arithmetic to produce accurate results. As a result, a key challenge for designing S… ▽ More Stochastic computing (SC) is an emerging computing technique that promises high density, low power, and error tolerant solutions. In SC, values are encoded as unary bitstreams and SC arithmetic circuits operate on one or more bitstreams. In many cases, the input bitstreams must be correlated or uncorrelated for SC arithmetic to produce accurate results. As a result, a key challenge for designing SC accelerators is manipulating the impact of correlation across SC operations. This paper presents and evaluates a set of novel correlation manipulating circuits to manage correlation in SC computation: a synchronizer, desynchronizer, and decorrelator. We then use these circuits to propose improved SC maximum, minimum, and saturating adder designs. Compared to existing correlation manipulation techniques, our circuits are more accurate and up to 3x more energy efficient. In the context of an image processing pipeline, these circuits can reduce the total energy consumption by up to 24%. △ Less

Submitted 1 March, 2018; originally announced March 2018.

Comments: 6 pages, 5 figures, 4 tables, Design, Automation and Test in Europe Conference and Exhibition (2018)

arXiv:1712.09388 [pdf, other]

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

Authors: Amrita Mathuriya, Thorsten Kurth, Vivek Rane, Mustafa Mustafa, Lei Shao, Debbie Bard, Prabhat, Victor W Lee

Abstract: We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC supercomputer at such large scale with s… ▽ More We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC supercomputer at such large scale with synchronous SGD. We studied scaling of two convolution neural networks - ResNet-50, a state-of-the-art deep network for classification with roughly 25.5 million parameters, and HEP-CNN, a shallow topology with less than 1 million parameters for common scientific usages. For ResNet-50, we achieve >80% scaling efficiency on up to 128 workers, using 32 parameter servers (PS tasks) with a steep decline down to 23% for 512 workers using 64 PS tasks. Our analysis of the efficiency drop points to low network bandwidth utilization due to combined effect of three factors. (a) Heterogeneous distributed parallelization algorithm which uses PS tasks as centralized servers for gradient averaging is suboptimal for utilizing interconnect bandwidth. (b) Load imbalance among PS tasks hinders their efficient scaling. (c) Underlying communication primitive GRPC is currently inefficient on Cori high-speed interconnect. The HEP-CNN demands less interconnect bandwidth, and shows >80% weak scaling efficiency for up to 256 nodes with only 1 PS task. Our findings are applicable to other deep learning networks. Big networks with millions of parameters stumble upon the issues discussed here. Shallower networks like HEP-CNN with relatively lower number of parameters can efficiently enjoy weak scaling even with a single parameter server. △ Less

Submitted 26 December, 2017; originally announced December 2017.

Comments: Published as a poster in NIPS 2017 Workshop: Deep Learning At Supercomputer Scale

arXiv:1706.02344 [pdf]

Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing

Authors: Vincent T. Lee, Armin Alaghi, John P. Hayes, Visvesh Sathe, Luis Ceze

Abstract: Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data mo… ▽ More Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near- sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic- binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8x energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: 6 pages, 3 figures, Design, Automata and Test in Europe (DATE) 2017

arXiv:1611.07409 [pdf, ps, other]

A Metric for Performance Portability

Authors: S. J. Pennycook, J. D. Sewall, V. W. Lee

Abstract: The term "performance portability" has been informally used in computing to refer to a variety of notions which generally include: 1) the ability to run one application across multiple hardware platforms; and 2) achieving some notional level of performance on these platforms. However, there has been a noticeable lack of consensus on the precise meaning of the term, and authors' conclusions regardi… ▽ More The term "performance portability" has been informally used in computing to refer to a variety of notions which generally include: 1) the ability to run one application across multiple hardware platforms; and 2) achieving some notional level of performance on these platforms. However, there has been a noticeable lack of consensus on the precise meaning of the term, and authors' conclusions regarding their success (or failure) to achieve performance portability have thus been subjective. Comparing one approach to performance portability with another has generally been marked with vague claims and verbose, qualitative explanation of the comparison. This paper presents a concise definition for performance portability, along with a simple metric that accurately captures the performance and portability of an application across different platforms. The utility of this metric is then demonstrated with a retroactive application to previous work. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Comments: 7 pages, in Proceedings of the 7th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

arXiv:1608.03175 [pdf, other]

Similarity Search on Automata Processors

Authors: Vincent T. Lee, Justin Kotalik, Carlo C. Del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin

Abstract: Similarity search is a critical primitive for a wide variety of applications including natural language processing, content-based search, machine learning, computer vision, databases, robotics, and recommendation systems. At its core, similarity search is implemented using the k-nearest neighbors (kNN) algorithm, where computation consists of highly parallel distance calculations and a global top-… ▽ More Similarity search is a critical primitive for a wide variety of applications including natural language processing, content-based search, machine learning, computer vision, databases, robotics, and recommendation systems. At its core, similarity search is implemented using the k-nearest neighbors (kNN) algorithm, where computation consists of highly parallel distance calculations and a global top-k sort. In contemporary von-Neumann architectures, kNN is bottlenecked by data movement which limits throughput and latency. In this paper, we present and evaluate a novel automata-based algorithm for kNN on the Micron Automata Processor (AP), which is a non-von Neumann near-data processing architecture. By employing near-data processing, the AP minimizes the data movement bottleneck and is able to achieve better performance. Unlike prior work in the automata processing space, our work combines temporal encodings with automata design to augment the space of applications for the AP. We evaluate our design's performance on the AP and compare to state-of-the-art CPU, GPU, and FPGA implementations; we show that the current generation of AP hardware can achieve over 50x speedup over CPUs while maintaining competitive energy efficiency gains. We also propose several automata optimization techniques and simple architectural extensions that highlight the potential of the AP hardware. △ Less

Submitted 7 June, 2017; v1 submitted 9 August, 2016; originally announced August 2016.

Comments: 12 pages, 11 figures, accepted to International Parallel and Distribution Processing Symposium (IPDPS) 2017

arXiv:1606.03742 [pdf, other]

Application-Driven Near-Data Processing for Similarity Search

Authors: Vincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin

Abstract: Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance… ▽ More Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance calculations and a global top-k sort. However, kNN is poorly supported by today's architectures because of its high memory bandwidth requirements. This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM). By instantiating compute units close to memory, SSAM benefits from the higher memory bandwidth and density exposed by emerging memory technologies. We evaluate the SSAM design down to layout on top of the Micron hybrid memory cube (HMC), and show that SSAM can achieve up to two orders of magnitude area-normalized throughput and energy efficiency improvement over multicore CPUs; we also show SSAM is faster and more energy efficient than competing GPUs and FPGAs. Finally, we show that SSAM is also useful for other data intensive tasks like kNN index construction, and can be generalized to semantically function as a high capacity content addressable memory. △ Less

Submitted 10 July, 2017; v1 submitted 12 June, 2016; originally announced June 2016.

Comments: 15 pages, 8 figures, 7 tables

arXiv:1102.3937 [pdf, ps, other]

Axiomatic Ranking of Network Role Similarity

Authors: Ruoming Jin, Victor E. Lee, Hui Hong

Abstract: A key task in social network and other complex network analysis is role analysis: describing and categorizing nodes according to how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithms known for graph automorphism are nonpolynomial. Mo… ▽ More A key task in social network and other complex network analysis is role analysis: describing and categorizing nodes according to how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithms known for graph automorphism are nonpolynomial. Moreover, since exact equivalence may be rare, a more meaningful task is to measure the role similarity between any two nodes. This task is closely related to the structural or link-based similarity problem that SimRank attempts to solve. However, SimRank and most of its offshoots are not sufficient because they do not fully recognize automorphically or structurally equivalent nodes. In this paper we tackle two problems. First, what are the necessary properties for a role similarity measure or metric? Second, how can we derive a role similarity measure satisfying these properties? For the first problem, we justify several axiomatic properties necessary for a role similarity measure or metric: range, maximal similarity, automorphic equivalence, transitive similarity, and the triangle inequality. For the second problem, we present RoleSim, a new similarity metric with a simple iterative computational method. We rigorously prove that RoleSim satisfies all the axiomatic properties. We also introduce an iceberg RoleSim algorithm which can guarantee to discover all pairs with RoleSim score no less than a user-defined threshold $θ$ without computing the RoleSim for every pair. We demonstrate the superior interpretative power of RoleSim on both both synthetic and real datasets. △ Less

Submitted 9 June, 2011; v1 submitted 18 February, 2011; originally announced February 2011.

Comments: 17 pages, twocolumn Version 2 of this technical report fixes minor errors in the Triangle Inequality proof, grammatical errors, and other typos. Edited and more polished version to be published in KDD'11, August 2011

ACM Class: H.2.8

arXiv:1009.6119 [pdf]

doi 10.1016/j.chb.2012.01.002

A Comprehensive Survey of Data Mining-based Fraud Detection Research

Authors: Clifton Phua, Vincent Lee, Kate Smith, Ross Gayler

Abstract: This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve hi… ▽ More This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains. △ Less

Submitted 30 September, 2010; originally announced September 2010.

Comments: 14 pages

Showing 1–50 of 50 results for author: Lee, V