Search | arXiv e-print repository

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Authors: Baoheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So

Abstract: Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between s… ▽ More Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://github.com/CASR-HKU/ESDA/tree/eye_tracking. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024 workshop, AIS: Vision, Graphics, and AI for Streaming

arXiv:2404.11770 [pdf, other]

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Authors: Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li , et al. (14 additional authors not shown)

Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggl… ▽ More This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Qinyu Chen is the corresponding author

arXiv:2401.05626 [pdf, other]

doi 10.1145/3626202.3637558

A Composable Dynamic Sparse Dataflow Architecture for Efficient Event-based Vision Processing on FPGA

Authors: Yizhao Gao, Baoheng Zhang, Yuhao Ding, Hayden Kwok-Hay So

Abstract: Event-based vision represents a paradigm shift in how vision information is captured and processed. By only responding to dynamic intensity changes in the scene, event-based sensing produces far less data than conventional frame-based cameras, promising to springboard a new generation of high-speed, low-power machines for edge intelligence. However, processing such dynamically sparse input origina… ▽ More Event-based vision represents a paradigm shift in how vision information is captured and processed. By only responding to dynamic intensity changes in the scene, event-based sensing produces far less data than conventional frame-based cameras, promising to springboard a new generation of high-speed, low-power machines for edge intelligence. However, processing such dynamically sparse input originated from event cameras efficiently in real time, particularly with complex deep neural networks (DNN), remains a formidable challenge. Existing solutions that employ GPUs and other frame-based DNN accelerators often struggle to efficiently process the dynamically sparse event data, missing the opportunities to improve processing efficiency with sparse data. To address this, we propose ESDA, a composable dynamic sparse dataflow architecture that allows customized DNN accelerators to be constructed rapidly on FPGAs for event-based vision tasks. ESDA is a modular system that is composed of a set of parametrizable modules for each network layer type. These modules share a uniform sparse token-feature interface and can be connected easily to compose an all-on-chip dataflow accelerator on FPGA for each network model. To fully exploit the intrinsic sparsity in event data, ESDA incorporates the use of submanifold sparse convolutions that largely enhance the activation sparsity throughout the layers while simplifying hardware implementation. Finally, a network architecture and hardware implementation co-optimizing framework that allows tradeoffs between accuracy and performance is also presented. Experimental results demonstrate that when compared with existing GPU and hardware-accelerated solutions, ESDA achieves substantial speedup and improvement in energy efficiency across different applications, and it allows much wider design space for real-world deployments. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Accepted to FPGA'24

arXiv:2312.09262 [pdf, other]

Random resistive memory-based deep extreme point learning machine for unified visual processing

Authors: Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo Wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data rep… ▽ More Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data representation, unprecedented hardware energy efficiency and rapid model training. However, multi-sensory data are intrinsically heterogeneous, causing significant complexity in the system development for edge-side intelligent machines. In addition, the performance of conventional digital hardware is limited by the physically separated processing and memory units, known as the von Neumann bottleneck, and the physical limit of transistor scaling, which contributes to the slowdown of Moore's law. These limitations are further intensified by the tedious training of models with ever-increasing sizes. We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM), that offers efficient unified point set analysis. We show the system's versatility across various data modalities and two different learning tasks. Compared to a conventional digital hardware-based system, our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems. Our random resistive memory-based deep extreme point learning machine may pave the way for energy-efficient and training-friendly edge AI across various data modalities and tasks. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2310.15574 [pdf, other]

3D Multi-Target Localization Via Intelligent Reflecting Surface: Protocol and Analysis

Authors: Meng Hua, Guangji Chen, Kaitao Meng, Shaodan Ma, Chau Yuen, Hing Cheung So

Abstract: With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS… ▽ More With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS for sensing, we first study a single-target-single-IRS case and propose a novel \textit{two-stage localization protocol} by controlling the on/off state of IRS. To be specific, in the IRS-off stage, we derive the Cramér-Rao bound (CRB) of the azimuth/elevation direction-of-arrival (DoA) of the BS-target link and design a DoA estimator based on the MUSIC algorithm. In the IRS-on stage, the CRB of the azimuth/elevation DoA of the IRS-target link is derived and a simple DoA estimator based on the on-grid IRS beam scanning method is proposed. Particularly, the impact of echo signals reflected by IRS from different paths on sensing performance is analyzed. Moreover, we prove that the single-beam of the IRS is not capable of sensing, but it can be achieved with \textit{multi-beam}. Based on the two obtained DoAs, the 3D single-target location is constructed. We then extend to the multi-target-multi-IRS case and propose an \textit{IRS-adaptive sensing protocol} by controlling the on/off state of multiple IRSs, and a multi-target localization algorithm is developed. Simulation results demonstrate the effectiveness of our scheme and show that sub-meter-level positioning accuracy can be achieved. △ Less

Submitted 28 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: This paper has been submitted to IEEE journal for possible publication

arXiv:2310.06233 [pdf, other]

Low-Rank Tensor Completion via Novel Sparsity-Inducing Regularizers

Authors: Zhi-Yong Wang, Hing Cheung So, Abdelhak M. Zoubir

Abstract: To alleviate the bias generated by the l1-norm in the low-rank tensor completion problem, nonconvex surrogates/regularizers have been suggested to replace the tensor nuclear norm, although both can achieve sparsity. However, the thresholding functions of these nonconvex regularizers may not have closed-form expressions and thus iterations are needed, which increases the computational loads. To sol… ▽ More To alleviate the bias generated by the l1-norm in the low-rank tensor completion problem, nonconvex surrogates/regularizers have been suggested to replace the tensor nuclear norm, although both can achieve sparsity. However, the thresholding functions of these nonconvex regularizers may not have closed-form expressions and thus iterations are needed, which increases the computational loads. To solve this issue, we devise a framework to generate sparsity-inducing regularizers with closed-form thresholding functions. These regularizers are applied to low-tubal-rank tensor completion, and efficient algorithms based on the alternating direction method of multipliers are developed. Furthermore, convergence of our methods is analyzed and it is proved that the generated sequences are bounded and any limit point is a stationary point. Experimental results using synthetic and real-world datasets show that the proposed algorithms outperform the state-of-the-art methods in terms of restoration performance. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.04954 [pdf, other]

A framework to generate sparsity-inducing regularizers for enhanced low-rank matrix completion

Authors: Zhi-Yong Wang, Hing Cheung So

Abstract: Applying half-quadratic optimization to loss functions can yield the corresponding regularizers, while these regularizers are usually not sparsity-inducing regularizers (SIRs). To solve this problem, we devise a framework to generate an SIR with closed-form proximity operator. Besides, we specify our framework using several commonly-used loss functions, and produce the corresponding SIRs, which ar… ▽ More Applying half-quadratic optimization to loss functions can yield the corresponding regularizers, while these regularizers are usually not sparsity-inducing regularizers (SIRs). To solve this problem, we devise a framework to generate an SIR with closed-form proximity operator. Besides, we specify our framework using several commonly-used loss functions, and produce the corresponding SIRs, which are then adopted as nonconvex rank surrogates for low-rank matrix completion. Furthermore, algorithms based on the alternating direction method of multipliers are developed. Extensive numerical results show the effectiveness of our methods in terms of recovery performance and runtime. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.04762 [pdf, other]

Robust Low-Rank Matrix Completion via a New Sparsity-Inducing Regularizer

Authors: Zhi-Yong Wang, Hing Cheung So, Abdelhak M. Zoubir

Abstract: This paper presents a novel loss function referred to as hybrid ordinary-Welsch (HOW) and a new sparsity-inducing regularizer associated with HOW. We theoretically show that the regularizer is quasiconvex and that the corresponding Moreau envelope is convex. Moreover, the closed-form solution to its Moreau envelope, namely, the proximity operator, is derived. Compared with nonconvex regularizers l… ▽ More This paper presents a novel loss function referred to as hybrid ordinary-Welsch (HOW) and a new sparsity-inducing regularizer associated with HOW. We theoretically show that the regularizer is quasiconvex and that the corresponding Moreau envelope is convex. Moreover, the closed-form solution to its Moreau envelope, namely, the proximity operator, is derived. Compared with nonconvex regularizers like the lp-norm with 0<p<1 that requires iterations to find the corresponding proximity operator, the developed regularizer has a closed-form proximity operator. We apply our regularizer to the robust matrix completion problem, and develop an efficient algorithm based on the alternating direction method of multipliers. The convergence of the suggested method is analyzed and we prove that any generated accumulation point is a stationary point. Finally, experimental results based on synthetic and real-world datasets demonstrate that our algorithm is superior to the state-of-the-art methods in terms of restoration performance. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.16987 [pdf, other]

SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features

Authors: Song Wang, Zhu Wang, Can Li, Xiaojuan Qi, Hayden Kwok-Hay So

Abstract: In comparison to conventional RGB cameras, the superior temporal resolution of event cameras allows them to capture rich information between frames, making them prime candidates for object tracking. Yet in practice, despite their theoretical advantages, the body of work on event-based multi-object tracking (MOT) remains in its infancy, especially in real-world settings where events from complex ba… ▽ More In comparison to conventional RGB cameras, the superior temporal resolution of event cameras allows them to capture rich information between frames, making them prime candidates for object tracking. Yet in practice, despite their theoretical advantages, the body of work on event-based multi-object tracking (MOT) remains in its infancy, especially in real-world settings where events from complex background and camera motion can easily obscure the true target motion. In this work, an event-based multi-object tracker, called SpikeMOT, is presented to address these challenges. SpikeMOT leverages spiking neural networks to extract sparse spatiotemporal features from event streams associated with objects. The resulting spike train representations are used to track the object movement at high frequency, while a simultaneous object detector provides updated spatial information of these objects at an equivalent frame rate. To evaluate the effectiveness of SpikeMOT, we introduce DSEC-MOT, the first large-scale event-based MOT benchmark incorporating fine-grained annotations for objects experiencing severe occlusions, frequent trajectory intersections, and long-term re-identification in real-world contexts. Extensive experiments employing DSEC-MOT and another event-based dataset, named FE240hz, demonstrate SpikeMOT's capability to achieve high tracking accuracy amidst challenging real-world scenarios, advancing the state-of-the-art in event-based multi-object tracking. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.00960 [pdf, other]

Network Topology Inference with Sparsity and Laplacian Constraints

Authors: Jiaxi Ying, Xi Han, Rui Zhou, Xiwen Wang, Hing Cheung So

Abstract: We tackle the network topology inference problem by utilizing Laplacian constrained Gaussian graphical models, which recast the task as estimating a precision matrix in the form of a graph Laplacian. Recent research \cite{ying2020nonconvex} has uncovered the limitations of the widely used $\ell_1$-norm in learning sparse graphs under this model: empirically, the number of nonzero entries in the so… ▽ More We tackle the network topology inference problem by utilizing Laplacian constrained Gaussian graphical models, which recast the task as estimating a precision matrix in the form of a graph Laplacian. Recent research \cite{ying2020nonconvex} has uncovered the limitations of the widely used $\ell_1$-norm in learning sparse graphs under this model: empirically, the number of nonzero entries in the solution grows with the regularization parameter of the $\ell_1$-norm; theoretically, a large regularization parameter leads to a fully connected (densest) graph. To overcome these challenges, we propose a graph Laplacian estimation method incorporating the $\ell_0$-norm constraint. An efficient gradient projection algorithm is developed to solve the resulting optimization problem, characterized by sparsity and Laplacian constraints. Through numerical experiments with synthetic and financial time-series datasets, we demonstrate the effectiveness of the proposed method in network topology inference. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2307.09232 [pdf, ps, other]

Intelligent Reflecting Surface Assisted Localization: Performance Analysis and Algorithm Design

Authors: Meng Hua, Qingqing Wu, Wen Chen, Zesong Fei, Hing Cheung So, Chau Yuen

Abstract: The target sensing/localization performance is fundamentally limited by the line-of-sight link and severe signal attenuation over long distances. This paper considers a challenging scenario where the direct link between the base station (BS) and the target is blocked due to the surrounding blockages and leverages the intelligent reflecting surface (IRS) with some active sensors, termed as \textit{… ▽ More The target sensing/localization performance is fundamentally limited by the line-of-sight link and severe signal attenuation over long distances. This paper considers a challenging scenario where the direct link between the base station (BS) and the target is blocked due to the surrounding blockages and leverages the intelligent reflecting surface (IRS) with some active sensors, termed as \textit{semi-passive IRS}, for localization. To be specific, the active sensors receive echo signals reflected by the target and apply signal processing techniques to estimate the target location. We consider the joint time-of-arrival (ToA) and direction-of-arrival (DoA) estimation for localization and derive the corresponding Cramér-Rao bound (CRB), and then a simple ToA/DoA estimator without iteration is proposed. In particular, the relationships of the CRB for ToA/DoA with the number of frames for IRS beam adjustments, number of IRS reflecting elements, and number of sensors are theoretically analyzed and demystified. Simulation results show that the proposed semi-passive IRS architecture provides sub-meter level positioning accuracy even over a long localization range from the BS to the target and also demonstrate a significant localization accuracy improvement compared to the fully passive IRS architecture. △ Less

Submitted 25 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: The paper has been submitted to IEEE journal for possible publication

arXiv:2304.05440 [pdf, other]

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

Authors: Haley M. So, Laurie Bose, Piotr Dudek, Gordon Wetzstein

Abstract: Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor-processors… ▽ More Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor-processors offer programmability and minimal processing capabilities directly on the sensor. We exploit these capabilities by developing an efficient recurrent neural network architecture, PixelRNN, that encodes spatio-temporal features on the sensor using purely binary operations. PixelRNN reduces the amount of data to be transmitted off the sensor by a factor of 64x compared to conventional systems while offering competitive accuracy for hand gesture recognition and lip reading tasks. We experimentally validate PixelRNN using a prototype implementation on the SCAMP-5 sensor-processor platform. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2302.12510 [pdf, other]

doi 10.1109/TCAD.2023.3342730

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Authors: Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong

Abstract: To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamica… ▽ More To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2301.03971 [pdf, other]

Unsupervised Mandarin-Cantonese Machine Translation

Authors: Megan Dare, Valentina Fajardo Diaz, Averie Ho Zoen So, Yifan Wang, Shibingfeng Zhang

Abstract: Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the lan… ▽ More Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the language, due to the fact that Cantonese is primarily used for oral communication. The key contributions of our project include: 1. The creation of a new corpus containing approximately 1 million Cantonese sentences, and 2. A large-scale comparison across different model architectures, tokenization schemes, and embedding structures. Our best model trained with character-based tokenization and a Transformer architecture achieved a character-level BLEU of 25.1 when translating from Mandarin to Cantonese and of 24.4 when translating from Cantonese to Mandarin. In this paper we discuss our research process, experiments, and results. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2211.08824 [pdf, other]

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

Authors: Yu-Hsiang Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming-Ching Chang, Hung Hin So, Xin Li

Abstract: Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by int… ▽ More Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by integrating an efficient object detector with a Siamese network-based Similarity Learning Module (SLM). The technical contributions of SMILETrack are twofold. First, we propose an SLM that calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding (SDE) models. The SLM incorporates a Patch Self-Attention (PSA) block inspired by the vision Transformer, which generates reliable features for accurate similarity matching. Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance. Together, these innovations help SMILETrack achieve an improved trade-off between the cost ({\em e.g.}, running speed) and performance (e.g., tracking accuracy) over several existing state-of-the-art benchmarks, including the popular BYTETrack method. SMILETrack outperforms BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets. Code is available at https://github.com/pingyang1117/SMILEtrack_Official △ Less

Submitted 22 January, 2024; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Our paper was accepted by AAAI2024

arXiv:2204.11836 [pdf, other]

Automated detection of dark patterns in cookie banners: how to do it poorly and why it is hard to do it any other way

Authors: Than Htut Soe, Cristiana Teixeira Santos, Marija Slavkovik

Abstract: Cookie banners, the pop ups that appear to collect your consent for data collection, are a tempting ground for dark patterns. Dark patterns are design elements that are used to influence the user's choice towards an option that is not in their interest. The use of dark patterns renders consent elicitation meaningless and voids the attempts to improve a fair collection and use of data. Can machine… ▽ More Cookie banners, the pop ups that appear to collect your consent for data collection, are a tempting ground for dark patterns. Dark patterns are design elements that are used to influence the user's choice towards an option that is not in their interest. The use of dark patterns renders consent elicitation meaningless and voids the attempts to improve a fair collection and use of data. Can machine learning be used to automatically detect the presence of dark patterns in cookie banners? In this work, a dataset of cookie banners of 300 news websites was used to train a prediction model that does exactly that. The machine learning pipeline we used includes feature engineering, parameter search, training a Gradient Boosted Tree classifier and evaluation. The accuracy of the trained model is promising, but allows a lot of room for improvement. We provide an in-depth analysis of the interdisciplinary challenges that automated dark pattern detection poses to artificial intelligence. The dataset and all the code created using machine learning is available at the url to repository removed for review. △ Less

Submitted 21 April, 2022; originally announced April 2022.

arXiv:2112.05221 [pdf, other]

MantissaCam: Learning Snapshot High-dynamic-range Imaging with Perceptually-based In-pixel Irradiance Encoding

Authors: Haley M. So, Julien N. P. Martel, Piotr Dudek, Gordon Wetzstein

Abstract: The ability to image high-dynamic-range (HDR) scenes is crucial in many computer vision applications. The dynamic range of conventional sensors, however, is fundamentally limited by their well capacity, resulting in saturation of bright scene parts. To overcome this limitation, emerging sensors offer in-pixel processing capabilities to encode the incident irradiance. Among the most promising encod… ▽ More The ability to image high-dynamic-range (HDR) scenes is crucial in many computer vision applications. The dynamic range of conventional sensors, however, is fundamentally limited by their well capacity, resulting in saturation of bright scene parts. To overcome this limitation, emerging sensors offer in-pixel processing capabilities to encode the incident irradiance. Among the most promising encoding schemes is modulo wrapping, which results in a computational photography problem where the HDR scene is computed by an irradiance unwrapping algorithm from the wrapped low-dynamic-range (LDR) sensor image. Here, we design a neural network--based algorithm that outperforms previous irradiance unwrapping methods and we design a perceptually inspired "mantissa" encoding scheme that more efficiently wraps an HDR scene into an LDR sensor. Combined with our reconstruction framework, MantissaCam achieves state-of-the-art results among modulo-type snapshot HDR imaging approaches. We demonstrate the efficacy of our method in simulation and show benefits of our algorithm on modulo images captured with a prototype implemented with a programmable sensor. △ Less

Submitted 20 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.06532 [pdf, ps, other]

Nonlinear Tensor Ring Network

Authors: Xiao Peng Li, Qi Liu, Hing Cheung So

Abstract: The state-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems. However, the increment of DNNs' width and depth in architecture results in a huge amount of parameters to challenge the storage and memory cost, limiting to the usage of DNNs on resource-constrained platforms, such as portable d… ▽ More The state-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems. However, the increment of DNNs' width and depth in architecture results in a huge amount of parameters to challenge the storage and memory cost, limiting to the usage of DNNs on resource-constrained platforms, such as portable devices. By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption. In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed via tensor ring decomposition. Furthermore, to mitigate the accuracy loss caused by compression, a nonlinear activation function is embedded into the tensor contraction and convolution operations inside the compressed layer. Experimental results demonstrate the effectiveness and superiority of the proposed NTRN for image classification using two basic neural networks, LeNet-5 and VGG-11 on three datasets, viz. MNIST, Fashion MNIST and Cifar-10. △ Less

Submitted 11 November, 2021; originally announced November 2021.

arXiv:2109.07809 [pdf, ps, other]

AI video editing tools. What editors want and how far is AI from delivering?

Authors: Than Htut Soe

Abstract: Video editing can be a very tedious task, so unsurprisingly Artificial Intelligence has been increasingly used to streamline the workflow or automate away tedious tasks. However, it is very difficult to get an overview of what intelligent video editing tools are in the research literature and needs for automation from the video editors. So, we identified the field of intelligent video editing tool… ▽ More Video editing can be a very tedious task, so unsurprisingly Artificial Intelligence has been increasingly used to streamline the workflow or automate away tedious tasks. However, it is very difficult to get an overview of what intelligent video editing tools are in the research literature and needs for automation from the video editors. So, we identified the field of intelligent video editing tools in research, and we survey the opinions of professional video editors. We have also summarized current state of the art in artificial intelligence research with the intention of identifying what are the possibilities and current technical limits towards truly intelligent video editing tools. The findings contribute towards understanding of the field of intelligent video editing tools, highlights unaddressed automation needs by the survey and provides general suggestions for further research in intelligent video editing tools. △ Less

Submitted 16 September, 2021; originally announced September 2021.

ACM Class: H.5; I.2

arXiv:2105.04218 [pdf, other]

Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks

Authors: Jie Ran, Rui Lin, Hayden K. H. So, Graziano Chesi, Ngai Wong

Abstract: Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing that the kernels in a convolutional neural network (CNN) are 4-way tensors, we further exploit a new elasticity dimension along the input-output channels. Specifically, a novel nuclear-norm rank minimization factorization (NRMF) approach is proposed to dynamically and… ▽ More Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing that the kernels in a convolutional neural network (CNN) are 4-way tensors, we further exploit a new elasticity dimension along the input-output channels. Specifically, a novel nuclear-norm rank minimization factorization (NRMF) approach is proposed to dynamically and globally search for the reduced tensor ranks during training. Correlation between tensor ranks across multiple layers is revealed, and a graceful tradeoff between model size and accuracy is obtained. Experiments then show the superiority of NRMF over the previous non-elastic variational Bayesian matrix factorization (VBMF) scheme. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: 8 pages, 5 figures

arXiv:2104.12766 [pdf, other]

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

Authors: Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K. H. So, Kurt Keutzer

Abstract: Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to the intractable search space of neural network architectures and hardware accelerator implementation. Differing from existing hardware-aware neural architecture search (NAS) algorithms that rely solely on the expensive learning-based… ▽ More Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to the intractable search space of neural network architectures and hardware accelerator implementation. Differing from existing hardware-aware neural architecture search (NAS) algorithms that rely solely on the expensive learning-based approaches, our work incorporates integer programming into the search algorithm to prune the design space. Given a set of hardware resource constraints, our integer programming formulation directly outputs the optimal accelerator configuration for mapping a DNN subgraph that minimizes latency. We use an accuracy predictor for different DNN subgraphs with different quantization schemes and generate accuracy-latency pareto frontiers. With low computational cost, our algorithm can generate quantized networks that achieve state-of-the-art accuracy and hardware performance on Xilinx Zynq (ZU3EG) FPGA for image classification on ImageNet dataset. The solution searched by our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Journal ref: FCCM 2021

arXiv:2012.04240 [pdf, other]

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Authors: Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden K. -H. So, Xuehai Qian, Yanzhi Wang, Xue Lin

Abstract: Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices.… ▽ More Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning. Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions. △ Less

Submitted 11 December, 2020; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Accepted by High-Performance Computer Architecture (HPCA'2021)

MSC Class: 68T07

arXiv:2009.13108 [pdf, other]

doi 10.1109/TPDS.2022.3149787

NITI: Training Integer Neural Networks Using Integer-only Arithmetic

Authors: Maolin Wang, Seyedramin Rasoulinezhad, Philip H. W. Leong, Hayden K. H. So

Abstract: While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and numerical accuracy are central to the success of most modern training algorithms. However, due to its potential for computational, storage and energy advantages i… ▽ More While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and numerical accuracy are central to the success of most modern training algorithms. However, due to its potential for computational, storage and energy advantages in hardware accelerators, neural network training methods that can be implemented with low precision integer-only arithmetic remains an active research challenge. In this paper, we present NITI, an efficient deep neural network training framework that stores all parameters and intermediate values as integers, and computes exclusively with integer arithmetic. A pseudo stochastic rounding scheme that eliminates the need for external random number generation is proposed to facilitate conversion from wider intermediate results to low precision storage. Furthermore, a cross-entropy loss backpropagation scheme computed with integer-only arithmetic is proposed. A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations in modern GPUs to achieve end-to-end training is presented. When compared with an equivalent training setup implemented with floating point storage and arithmetic, NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation. On ImageNet, 16-bit integers are needed for weight accumulation with an 8-bit datapath. This achieves training results comparable to all-floating-point implementations. △ Less

Submitted 11 February, 2022; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2006.13985 [pdf, other]

Circumvention by design -- dark patterns in cookie consents for online news outlets

Authors: Than Htut Soe, Oda Elise Nordberg, Frode Guribye, Marija Slavkovik

Abstract: To ensure that users of online services understand what data are collected and how they are used in algorithmic decision-making, the European Union's General Data Protection Regulation (GDPR) specifies informed consent as a minimal requirement. For online news outlets consent is commonly elicited through interface design elements in the form of a pop-up. We have manually analyzed 300 data collecti… ▽ More To ensure that users of online services understand what data are collected and how they are used in algorithmic decision-making, the European Union's General Data Protection Regulation (GDPR) specifies informed consent as a minimal requirement. For online news outlets consent is commonly elicited through interface design elements in the form of a pop-up. We have manually analyzed 300 data collection consent notices from news outlets that are built to ensure compliance with GDPR. The analysis uncovered a variety of strategies or dark patterns that circumvent the intent of GDPR by design. We further study the presence and variety of these dark patterns in these "cookie consents" and use our observations to specify the concept of dark pattern in the context of consent elicitation. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: Accepted for publication at NordiCHI 2020

arXiv:2005.06870 [pdf, other]

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

Authors: Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, Hayden K. H. So

Abstract: We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These thresholds can have fine-grained layer-wise adjustments dynamically via backpropagation. We demonstrate that our dynamic sparse training algorithm can easily train ver… ▽ More We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These thresholds can have fine-grained layer-wise adjustments dynamically via backpropagation. We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss using the same number of training epochs as dense models. Dynamic Sparse Training achieves the state of the art performance compared with other sparse training algorithms on various network architectures. Additionally, we have several surprising observations that provide strong evidence for the effectiveness and efficiency of our algorithm. These observations reveal the underlying problems of traditional three-stage pruning algorithms and present the potential guidance provided by our algorithm to the design of more compact network architectures. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: ICLR 2020, camera ready version

arXiv:2005.05758 [pdf, other]

doi 10.1145/3392717.3392749

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

Authors: Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden K. -H. So, Martin Herbordt, Ang Li, Yanzhi Wang

Abstract: Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achiev… ▽ More Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency. △ Less

Submitted 11 May, 2020; originally announced May 2020.

ACM Class: C.1.4

arXiv:1910.07408 [pdf, other]

doi 10.1145/3357596

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Authors: Nina Engelhardt, Hayden K. -H. So

Abstract: Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage ca… ▽ More Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage can be maintained in distributed systems. In this work, we present GraVF-M, a framework designed to ease the implementation of FPGA-based graph processing accelerators for multi-FPGA platforms with distributed memory. Based on a lightweight description of the algorithm kernel, the framework automatically generates optimized RTL code for the whole multi-FPGA design. We exploit an aspect of the programming model to present a familiar message-passing paradigm to the user, while under the hood implementing a more efficient architecture that can reduce the necessary inter-FPGA network traffic by a factor equal to the average degree of the input graph. A performance model based on a theoretical analysis of the factors influencing performance serves to evaluate the efficiency of our implementation. With a throughput of up to 5.8 GTEPS (billions of traversed edges per second) on a 4-FPGA system, the designs generated by GraVF-M compare favorably to state-of-the-art frameworks from the literature and reach 94% of the projected performance limit of the system. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Journal ref: ACM Trans. Reconfigurable Technol. Syst. 12, 4, Article 21 (November 2019)

arXiv:1910.01426 [pdf, other]

doi 10.1109/TPAMI.2019.2945027

High-dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction

Authors: Nan Meng, Hayden K. -H. So, Xing Sun, Edmund Y. Lam

Abstract: We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) a… ▽ More We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution. Many current approaches either require disparity clues or restore the spatial and angular details separately. Such methods have difficulties with non-Lambertian surfaces or occlusions. In contrast, we formulate light field super-resolution (LFSR) as tensor restoration and develop a learning framework based on a two-stage restoration with 4-dimensional (4D) convolution. This allows our model to learn the features capturing the geometry information encoded in multiple adjacent views. Such geometric features vary near the occlusion regions and indicate the foreground object border. To train a feasible network, we propose a novel normalization operation based on a group of views in the feature maps, design a stage-wise loss function, and develop the multi-range training strategy to further improve the performance. Evaluations are conducted on a number of light field datasets including real-world scenes, synthetic data, and microscope light fields. The proposed method achieves superior performance and less execution time comparing with other state-of-the-art schemes. △ Less

Submitted 17 September, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

Comments: 14 pages. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

arXiv:1908.07999 [pdf, other]

HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction

Authors: Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon Lee, Jinkyu Kim, Jaewoo Kang

Abstract: Many researchers both in academia and industry have long been interested in the stock market. Numerous approaches were developed to accurately predict future trends in stock prices. Recently, there has been a growing interest in utilizing graph-structured data in computer science research communities. Methods that use relational data for stock market prediction have been recently proposed, but the… ▽ More Many researchers both in academia and industry have long been interested in the stock market. Numerous approaches were developed to accurately predict future trends in stock prices. Recently, there has been a growing interest in utilizing graph-structured data in computer science research communities. Methods that use relational data for stock market prediction have been recently proposed, but they are still in their infancy. First, the quality of collected information from different types of relations can vary considerably. No existing work has focused on the effect of using different types of relations on stock market prediction or finding an effective way to selectively aggregate information on different relation types. Furthermore, existing works have focused on only individual stock prediction which is similar to the node classification task. To address this, we propose a hierarchical attention network for stock prediction (HATS) which uses relational data for stock market prediction. Our HATS method selectively aggregates information on different relation types and adds the information to the representations of each company. Specifically, node representations are initialized with features extracted from a feature extraction module. HATS is used as a relational modeling module with initialized node representations. Then, node representations with the added information are fed into a task-specific layer. Our method is used for predicting not only individual stock prices but also market index movements, which is similar to the graph classification task. The experimental results show that performance can change depending on the relational data used. HATS which can automatically select information outperformed all the existing methods. △ Less

Submitted 12 November, 2019; v1 submitted 7 August, 2019; originally announced August 2019.

arXiv:1906.00309 [pdf, ps, other]

Sparse Bayesian Learning Approach for Discrete Signal Reconstruction

Authors: Jisheng Dai, An Liu, Hing Cheung So

Abstract: This study addresses the problem of discrete signal reconstruction from the perspective of sparse Bayesian learning (SBL). Generally, it is intractable to perform the Bayesian inference with the ideal discretization prior under the SBL framework. To overcome this challenge, we introduce a novel discretization enforcing prior to exploit the knowledge of the discrete nature of the signal-of-interest… ▽ More This study addresses the problem of discrete signal reconstruction from the perspective of sparse Bayesian learning (SBL). Generally, it is intractable to perform the Bayesian inference with the ideal discretization prior under the SBL framework. To overcome this challenge, we introduce a novel discretization enforcing prior to exploit the knowledge of the discrete nature of the signal-of-interest. By integrating the discretization enforcing prior into the SBL framework and applying the variational Bayesian inference (VBI) methodology, we devise an alternating optimization algorithm to jointly characterize the finite-alphabet feature and reconstruct the unknown signal. When the measurement matrix is i.i.d. Gaussian per component, we further embed the generalized approximate message passing (GAMP) into the VBI-based method, so as to directly adopt the ideal prior and significantly reduce the computational burden. Simulation results demonstrate substantial performance improvement of the two proposed methods over existing schemes. Moreover, the GAMP-based variant outperforms the VBI-based method with i.i.d. Gaussian measurement matrices but it fails to work for non i.i.d. Gaussian matrices. △ Less

Submitted 18 April, 2023; v1 submitted 1 June, 2019; originally announced June 2019.

Comments: 35 pages, 8 figures

arXiv:1901.08746 [pdf]

doi 10.1093/bioinformatics/btz682

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Authors: Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang

Abstract: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements… ▽ More Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert. △ Less

Submitted 17 October, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: Bioinformatics

arXiv:1809.07950 [pdf]

doi 10.1186/s12859-019-2813-6

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

Authors: Wonjin Yoon, Chan Ho So, Jinhyuk Lee, Jaewoo Kang

Abstract: Background: Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources… ▽ More Background: Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources and each dataset covers only a small subset of entity types. Furthermore, many bio entities are polysemous, which is one of the major obstacles in named entity recognition. Results: To address the lack of data and the entity type misclassification problem, we propose CollaboNet which utilizes a combination of multiple NER models. In CollaboNet, models trained on a different dataset are connected to each other so that a target model obtains information from other collaborator models to reduce false positives. Every model is an expert on their target entity type and takes turns serving as a target and a collaborator model during training time. The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words. CollaboNet achieved state-of-the-art performance in terms of precision, recall and F1 score. Conclusions: We demonstrated the benefits of combining multiple models for BioNER. Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types. Given the state-of-the-art performance of our model, we believe that CollaboNet can improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction. △ Less

Submitted 29 May, 2019; v1 submitted 21 September, 2018; originally announced September 2018.

Comments: From DTMBio workshop at CIKM 2018, Turin, Italy. 22-26 October 2018

ACM Class: I.2.7; J.3

Journal ref: BMC Bioinformatics 2019, 20(Suppl 10):249

arXiv:1805.11987 [pdf, ps, other]

l0-norm Based Centers Selection for Training Fault Tolerant RBF Networks and Selecting Centers

Authors: Hao Wang, Chi-Sing Leung, Hing Cheung So, Ruibin Feng, Zifa Han

Abstract: The aim of this paper is to train an RBF neural network and select centers under concurrent faults. It is well known that fault tolerance is a very attractive property for neural networks. And center selection is an important procedure during the training process of an RBF neural network. In this paper, we devise two novel algorithms to address these two issues simultaneously. Both of them are bas… ▽ More The aim of this paper is to train an RBF neural network and select centers under concurrent faults. It is well known that fault tolerance is a very attractive property for neural networks. And center selection is an important procedure during the training process of an RBF neural network. In this paper, we devise two novel algorithms to address these two issues simultaneously. Both of them are based on the ADMM framework. In the first method, the minimax concave penalty (MCP) function is introduced to select centers. In the second method, an l0-norm term is directly used, and the hard threshold (HT) is utilized to address the l0-norm term. Under several mild conditions, we can prove that both methods can globally converge to a unique limit point. Simulation results show that, under concurrent fault, the proposed algorithms are superior to many existing methods. △ Less

Submitted 31 October, 2018; v1 submitted 30 May, 2018; originally announced May 2018.

arXiv:1706.03474 [pdf, ps, other]

Coordinate Descent Algorithms for Phase Retrieval

Authors: Wen-Jun Zeng, H. C. So

Abstract: Phase retrieval aims at recovering a complex-valued signal from magnitude-only measurements, which attracts much attention since it has numerous applications in many disciplines. However, phase recovery involves solving a system of quadratic equations, indicating that it is a challenging nonconvex optimization problem. To tackle phase retrieval in an effective and efficient manner, we apply coordi… ▽ More Phase retrieval aims at recovering a complex-valued signal from magnitude-only measurements, which attracts much attention since it has numerous applications in many disciplines. However, phase recovery involves solving a system of quadratic equations, indicating that it is a challenging nonconvex optimization problem. To tackle phase retrieval in an effective and efficient manner, we apply coordinate descent (CD) such that a single unknown is solved at each iteration while all other variables are kept fixed. As a result, only minimization of a univariate quartic polynomial is needed which is easily achieved by finding the closed-form roots of a cubic equation. Three computationally simple algorithms referred to as cyclic, randomized and greedy CDs, based on different updating rules, are devised. It is proved that the three CDs globally converge to a stationary point of the nonconvex problem, and specifically, the randomized CD locally converges to the global minimum and attains exact recovery at a geometric rate with high probability if the sample size is large enough. The cyclic and randomized CDs are also modified via minimization of the $\ell_1$-regularized quartic polynomial for phase retrieval of sparse signals. Furthermore, a novel application of the three CDs, namely, blind equalization in digital communications, is proposed. It is demonstrated that the CD methodology is superior to the state-of-the-art techniques in terms of computational efficiency and/or recovery performance. △ Less

Submitted 12 June, 2017; originally announced June 2017.

arXiv:1704.08802 other]

Proceedings of the 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017)

Authors: Hayden Kwok-Hay So, John Wawrzynek

Abstract: The 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) was held on 22 Feb, 2017 as a co-located workshop at the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017). This year, the program committee selected 3 papers and 3 extended abstracts to be presented at the workshop, which are subsequently collected in this online volume. The 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) was held on 22 Feb, 2017 as a co-located workshop at the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017). This year, the program committee selected 3 papers and 3 extended abstracts to be presented at the workshop, which are subsequently collected in this online volume. △ Less

Submitted 5 March, 2019; v1 submitted 27 April, 2017; originally announced April 2017.

Comments: 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) website: see http://olaf.eecs.berkeley.edu

ACM Class: C.0; C.1; B.5.2; B.6.3; B.7.2

arXiv:1702.06157 [pdf, ps, other]

Robust Phase Retrieval via ADMM with Outliers

Authors: Xue Jiang, H. C. So, X. Liu

Abstract: An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this letter. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensity- and amplitude-based observation models, t… ▽ More An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this letter. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensity- and amplitude-based observation models, the framework of ADMM is developed to solve the resulting non-differentiable optimization problems. It is demonstrated that the core subproblem of ADMM is the proximity operator of the L1-norm, which can be computed efficiently by soft-thresholding in each iteration. Simulation results are provided to validate the accuracy and efficiency of the proposed approach compared to the existing schemes. △ Less

Submitted 2 February, 2017; originally announced February 2017.

arXiv:1606.06483 [pdf]

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator

Authors: Ho-Cheung Ng, Cheng Liu, Hayden Kwok-Hay So

Abstract: FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers' productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full application acceleration, it is often necessary to also include a highly efficient processor that integrates and collaborates with the accelerators while mainta… ▽ More FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers' productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full application acceleration, it is often necessary to also include a highly efficient processor that integrates and collaborates with the accelerators while maintaining the benefits of being implemented within the same overlay framework. This paper presents an open-source soft processor that is designed to tightly-couple with FPGA accelerators as part of an overlay framework. RISC-V is chosen as the instruction set for its openness and portability, and the soft processor is designed as a 4-stage pipeline to balance resource consumption and performance when implemented on FPGAs. The processor is generically implemented so as to promote design portability and compatibility across different FPGA platforms. Experimental results show that integrated software-hardware applications using the proposed tightly-coupled architecture achieve comparable performance as hardware-only accelerators while the proposed architecture provides additional run-time flexibility. The processor has been synthesized to both low-end and high-performance FPGA families from different vendors, achieving the highest frequency of 268.67MHz and resource consumption comparable to existing RISC-V designs. △ Less

Submitted 21 June, 2016; originally announced June 2016.

Comments: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.08149

Report number: OLAF/2016/07

arXiv:1605.08149 other]

Proceedings of the 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016)

Authors: Hayden Kwok-Hay So, John Wawrzynek

Abstract: The 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) was held on 21 Mar, 2016 as a co-located workshop at the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2016). This year, the program committee selected 6 papers and 3 extended abstracts to be presented at the workshop, which are subsequently collected in this online volume. The 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) was held on 21 Mar, 2016 as a co-located workshop at the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2016). This year, the program committee selected 6 papers and 3 extended abstracts to be presented at the workshop, which are subsequently collected in this online volume. △ Less

Submitted 26 May, 2016; originally announced May 2016.

Comments: 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) website: see http://olaf.eecs.berkeley.edu

ACM Class: C.0; C.1; B.5.2; B.6.3; B.7.2

arXiv:1605.07358 [pdf, other]

Consistency Analysis for the Doubly Stochastic Dirichlet Process

Authors: Xing Sun, Nelson H. C. Yung, Edmund Y. Lam, Hayden K. -H. So

Abstract: This technical report proves components consistency for the Doubly Stochastic Dirichlet Process with exponential convergence of posterior probability. We also present the fundamental properties for DSDP as well as inference algorithms. Simulation toy experiment and real-world experiment results for single and multi-cluster also support the consistency proof. This report is also a support document… ▽ More This technical report proves components consistency for the Doubly Stochastic Dirichlet Process with exponential convergence of posterior probability. We also present the fundamental properties for DSDP as well as inference algorithms. Simulation toy experiment and real-world experiment results for single and multi-cluster also support the consistency proof. This report is also a support document for the paper "Computationally Efficient Hyperspectral Data Learning Based on the Doubly Stochastic Dirichlet Process". △ Less

Submitted 24 May, 2016; originally announced May 2016.

Comments: 13 pages, 4 figures

arXiv:1509.08451 [pdf, ps, other]

doi 10.1109/TSP.2016.2593688

Phase Retrieval Using Feasible Point Pursuit: Algorithms and Cramér-Rao Bound

Authors: Cheng Qian, Nicholas D. Sidiropoulos, Kejun Huang, Lei Huang, H. C. So

Abstract: Reconstructing a signal from squared linear (rank-one quadratic) measurements is a challenging problem with important applications in optics and imaging, where it is known as phase retrieval. This paper proposes two new phase retrieval algorithms based on non-convex quadratically constrained quadratic programming (QCQP) formulations, and a recently proposed approximation technique dubbed feasible… ▽ More Reconstructing a signal from squared linear (rank-one quadratic) measurements is a challenging problem with important applications in optics and imaging, where it is known as phase retrieval. This paper proposes two new phase retrieval algorithms based on non-convex quadratically constrained quadratic programming (QCQP) formulations, and a recently proposed approximation technique dubbed feasible point pursuit (FPP). The first is designed for uniformly distributed bounded measurement errors, such as those arising from high-rate quantization (B-FPP). The second is designed for Gaussian measurement errors, using a least squares criterion (LS-FPP). Their performance is measured against state-of-the-art algorithms and the Cramér-Rao bound (CRB), which is also derived here. Simulations show that LS-FPP outperforms the state-of-art and operates close to the CRB. Compact CRB expressions, properties, and insights are obtained by explicitly computing the CRB in various special cases -- including when the signal of interest admits a sparse parametrization, using harmonic retrieval as an example. △ Less

Submitted 28 March, 2016; v1 submitted 24 September, 2015; originally announced September 2015.

Comments: 13 pages, 13 figures

arXiv:1509.00042 [pdf]

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Authors: Cheng Liu, Ho-Cheung Ng, Hayden Kwok-Hay So

Abstract: Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators with high design productivity, researchers have increasingly turned to the use of overlay architectures as an intermediate generation target built on top of off-… ▽ More Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators with high design productivity, researchers have increasingly turned to the use of overlay architectures as an intermediate generation target built on top of off-the-shelf FPGAs. However, achieving the desired performance-overhead trade-off remains a major productivity challenge as complex application-specific customizations over a large design space covering multiple architectural parameters are needed. In this work, an automatic nested loop acceleration framework utilizing a regular soft coarse-grained reconfigurable array (SCGRA) overlay is presented. Given high-level resource constraints, the framework automatically customizes the overlay architectural design parameters, high-level compilation options as well as communication between the accelerator and the host processor for optimized performance specifically to the given application. In our experiments, at a cost of 10 to 20 minutes additional tools run time, the proposed customization process resulted in up to 5 times additional speedup over a baseline accelerator generated by the same framework without customization. Overall, when compared to the equivalent software running on the host ARM processor alone on the Zedboard, the resulting accelerators achieved up to 10 times speedup. △ Less

Submitted 27 August, 2015; originally announced September 2015.

Comments: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320)

Report number: FSP/2015/03

arXiv:1001.0080 [pdf, ps, other]

doi 10.1109/TWC.2011.110811.101739

Non-line-of-sight Node Localization based on Semi-Definite Programming in Wireless Sensor Networks

Authors: Hongyang Chen, Kenneth W. K. Lui, Zizhuo Wang, H. C. So, H. Vincent Poor

Abstract: An unknown-position sensor can be localized if there are three or more anchors making time-of-arrival (TOA) measurements of a signal from it. However, the location errors can be very large due to the fact that some of the measurements are from non-line-of-sight (NLOS) paths. In this paper, we propose a semi-definite programming (SDP) based node localization algorithm in NLOS environment for ultr… ▽ More An unknown-position sensor can be localized if there are three or more anchors making time-of-arrival (TOA) measurements of a signal from it. However, the location errors can be very large due to the fact that some of the measurements are from non-line-of-sight (NLOS) paths. In this paper, we propose a semi-definite programming (SDP) based node localization algorithm in NLOS environment for ultra-wideband (UWB) wireless sensor networks. The positions of sensors can be estimated using the distance estimates from location-aware anchors as well as other sensors. However, in the absence of LOS paths, e.g., in indoor networks, the NLOS range estimates can be significantly biased. As a result, the NLOS error can remarkably decrease the location accuracy. And it is not easy to efficiently distinguish LOS from NLOS measurements. In this paper, an algorithm is proposed that achieves high location accuracy without the need of identifying NLOS and LOS measurement. △ Less

Submitted 30 December, 2009; originally announced January 2010.

Comments: submitted to IEEE ICC'10

Showing 1–42 of 42 results for author: So, H