Zum Hauptinhalt springen

Showing 1–50 of 56 results for author: Shen, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13399  [pdf, other

    cs.AI

    VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

    Authors: Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

    Abstract: The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: to be published in IEEE ICWS 2024

  2. arXiv:2405.15750  [pdf, other

    cs.CL cs.AI cs.LG

    Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence

    Authors: Abhinav Patil, Jaap Jumelet, Yu Ying Chiu, Andy Lapastora, Peter Shen, Lexie Wang, Clevis Willrich, Shane Steinert-Threlkeld

    Abstract: This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpor… ▽ More

    Submitted 6 August, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Forthcoming in Transactions of the Association for Computational Linguistics (TACL). This is a pre-MIT Press publication version. For code and trained models, see http://github.com/CLMBRs/corpus-filtering

  3. arXiv:2405.11844  [pdf

    cs.AR cs.ET

    NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors

    Authors: Harideep Nair, William Leyman, Agastya Sampath, Quinn Jacobson, John Paul Shen

    Abstract: Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC's ability to store, pred… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted and Presented at Neuro-Inspired Computational Elements (NICE) Conference, La Jolla, CA. 2024

  4. arXiv:2404.15312  [pdf, other

    eess.SP cs.CV

    Realtime Person Identification via Gait Analysis

    Authors: Shanmuga Venkatachalam, Harideep Nair, Prabhu Vellaisamy, Yongqi Zhou, Ziad Youssfi, John Paul Shen

    Abstract: Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to dev… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  5. arXiv:2404.03648  [pdf, other

    cs.CL

    AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

    Authors: Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web. In light of the challenge, we… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  6. arXiv:2402.19376  [pdf, other

    cs.AR

    OzMAC: An Energy-Efficient Sparsity-Exploiting Multiply-Accumulate-Unit Design for DL Inference

    Authors: Harideep Nair, Prabhu Vellaisamy, Tsung-Han Lin, Perry Wang, Shawn Blanton, John Paul Shen

    Abstract: General Matrix Multiply (GEMM) hardware, employing large arrays of multiply-accumulate (MAC) units, perform bulk of the computation in deep learning (DL). Recent trends have established 8-bit integer (INT8) as the most widely used precision for DL inference. This paper proposes a novel MAC design capable of dynamically exploiting bit sparsity (i.e., number of `0' bits within a binary value) in inp… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  7. arXiv:2402.08423  [pdf, other

    cs.AI

    Vehicle Behavior Prediction by Episodic-Memory Implanted NDT

    Authors: Peining Shen, Jianwu Fang, Hongkai Yu, Jianru Xue

    Abstract: In autonomous driving, predicting the behavior (turning left, stopping, etc.) of target vehicles is crucial for the self-driving vehicle to make safe decisions and avoid accidents. Existing deep learning-based methods have shown excellent and accurate performance, but the black-box nature makes it untrustworthy to apply them in practical use. In this work, we explore the interpretability of behavi… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by ICRA2024

  8. arXiv:2402.03822  [pdf, other

    cs.AI cs.CL cs.LG

    RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

    Authors: Si Shen, Peijun Shen, Danhao Zhu

    Abstract: This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Thro… ▽ More

    Submitted 23 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  9. arXiv:2401.13877  [pdf

    cs.CV cs.RO

    AscDAMs: Advanced SLAM-based channel detection and mapping system

    Authors: Tengfei Wang, Fucheng Lu, Jintao Qin, Taosheng Huang, Hui Kong, Ping Shen

    Abstract: Obtaining high-resolution, accurate channel topography and deposit conditions is the prior challenge for the study of channelized debris flow. Currently, wide-used mapping technologies including satellite imaging and drone photogrammetry struggle to precisely observe channel interior conditions of mountainous long-deep gullies, particularly those in the Wenchuan Earthquake region. SLAM is an emerg… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  10. arXiv:2401.05461  [pdf

    cs.HC cs.AI cs.LG

    The two-way knowledge interaction interface between humans and neural networks

    Authors: Zhanliang He, Nuoye Xiong, Hongsheng Li, Peiyi Shen, Guangming Zhu, Liang Zhang

    Abstract: Despite neural networks (NN) have been widely applied in various fields and generally outperforms humans, they still lack interpretability to a certain extent, and humans are unable to intuitively understand the decision logic of NN. This also hinders the knowledge interaction between humans and NN, preventing humans from getting involved to give direct guidance when NN's decisions go wrong. While… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  11. arXiv:2312.10964  [pdf, other

    cs.CL cs.SD eess.AS

    Generative linguistic representation for spoken language identification

    Authors: Peng Shen, Xuguang Lu, Hisashi Kawai

    Abstract: Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance. With the success of recent large models, such as GPT and Whisper, the potential to leverage such pre-trained models for extracting linguistic features for LID tasks has become a promising area of research. In this paper, we explore the utilization of the d… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE ASRU2023

  12. arXiv:2312.10959  [pdf, other

    cs.SD cs.CL eess.AS

    Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: Multi-talker overlapped speech recognition remains a significant challenge, requiring not only speech recognition but also speaker diarization tasks to be addressed. In this paper, to better address these tasks, we first introduce speaker labels into an autoregressive transformer-based speech recognition model to support multi-speaker overlapped speech recognition. Then, to improve speaker diariza… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  13. arXiv:2311.01003  [pdf, other

    eess.SY cs.RO

    Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial Vehicle

    Authors: Chen Qian, Rui Chen, Peiyao Shen, Yongchun Fang, Jifu Yan, Tiefeng Li

    Abstract: Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flapping wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and d… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  14. arXiv:2310.13471  [pdf, ps, other

    eess.AS cs.SD

    Neural domain alignment for spoken language recognition based on optimal transport

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Domain shift poses a significant challenge in cross-domain spoken language recognition (SLR) by reducing its effectiveness. Unsupervised domain adaptation (UDA) algorithms have been explored to address domain shifts in SLR without relying on class labels in the target domain. One successful UDA approach focuses on learning domain-invariant representations to align feature distributions between dom… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  15. arXiv:2309.16093  [pdf, ps, other

    eess.AS cs.SD

    Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Due to the modality discrepancy between textual and acoustic modeling, efficiently transferring linguistic knowledge from a pretrained language model (PLM) to acoustic encoding for automatic speech recognition (ASR) still remains a challenging task. In this study, we propose a cross-modality knowledge transfer (CMKT) learning framework in a temporal connectionist temporal classification (CTC) base… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  16. arXiv:2309.13650  [pdf, ps, other

    eess.AS cs.SD

    Cross-modal Alignment with Optimal Transport for CTC-based ASR

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Temporal connectionist temporal classification (CTC)-based automatic speech recognition (ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the token independence assumption in decoding, an external language model (LM) is required which destroys its fast parallel decoding property. Several studies have been proposed to transfer linguistic knowledge from a pretraine… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE ASRU 2023

  17. arXiv:2309.10832  [pdf, ps, other

    cs.SD eess.AS

    Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding

    Authors: Jiahui Pan, Pengjie Shen, Hui Zhang, Xueliang Zhang

    Abstract: Multi-channel speech enhancement extracts speech using multiple microphones that capture spatial cues. Effectively utilizing directional information is key for multi-channel enhancement. Deep learning shows great potential on multi-channel speech enhancement and often takes short-time Fourier Transform (STFT) as inputs directly. To fully leverage the spatial information, we introduce a method usin… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2309.10393

  18. arXiv:2305.11651  [pdf, other

    cs.IT cs.MA cs.PF eess.SY

    Channel Cycle Time: A New Measure of Short-term Fairness

    Authors: Pengfei Shen, Yulin Shao, Haoyuan Pan, Lu Lu, Yonina C. Eldar

    Abstract: This paper puts forth a new metric, dubbed channel cycle time (CCT), to measure the short-term fairness of communication networks. CCT characterizes the average duration between two consecutive successful transmissions of a user, during which all other users successfully accessed the channel at least once. In contrast to existing short-term fairness measures, CCT provides more comprehensive insigh… ▽ More

    Submitted 14 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  19. arXiv:2209.07313  [pdf, other

    eess.IV cs.CV

    HarDNet-DFUS: An Enhanced Harmonically-Connected Network for Diabetic Foot Ulcer Image Segmentation and Colonoscopy Polyp Segmentation

    Authors: Ting-Yu Liao, Ching-Hui Yang, Yu-Wen Lo, Kuan-Ying Lai, Po-Huai Shen, Youn-Long Lin

    Abstract: We present a neural network architecture for medical image segmentation of diabetic foot ulcers and colonoscopy polyps. Diabetic foot ulcers are caused by neuropathic and vascular complications of diabetes mellitus. In order to provide a proper diagnosis and treatment, wound care professionals need to extract accurate morphological features from the foot wounds. Using computer-aided systems is a p… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  20. arXiv:2207.14578  [pdf, other

    cs.CL cs.SD eess.AS

    Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems. In this study, we propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems. The proposed encodin… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

  21. arXiv:2207.06309  [pdf, other

    cs.IT cs.NI eess.SY

    Dynamic gNodeB Sleep Control for Energy-Conserving 5G Radio Access Network

    Authors: Pengfei Shen, Yulin Shao, Qi Cao, Lu Lu

    Abstract: 5G radio access network (RAN) is consuming much more energy than legacy RAN due to the denser deployments of gNodeBs (gNBs) and higher single-gNB power consumption. In an effort to achieve an energy-conserving RAN, this paper develops a dynamic on-off switching paradigm, where the ON/OFF states of gNBs can be dynamically configured according to the evolvements of the associated users. We formulate… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Keywords: Base station sleep control, 5G, radio access network, Markov decision process, greedy policy, index policy

  22. arXiv:2206.03061  [pdf, other

    cs.CV

    Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object Interaction detection

    Authors: Hongsheng Li, Guangming Zhu, Wu Zhen, Lan Ni, Peiyi Shen, Liang Zhang, Ning Wang, Cong Hua

    Abstract: The key of Human-Object Interaction(HOI) recognition is to infer the relationship between human and objects. Recently, the image's Human-Object Interaction(HOI) detection has made significant progress. However, there is still room for improvement in video HOI detection performance. Existing one-stage methods use well-designed end-to-end networks to detect a video segment and directly predict an in… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted by IJCNN2022

  23. arXiv:2205.14248  [pdf, other

    cs.ET cs.AR cs.NE

    Towards a Design Framework for TNN-Based Neuromorphic Sensory Processing Units

    Authors: Prabhu Vellaisamy, John Paul Shen

    Abstract: Temporal Neural Networks (TNNs) are spiking neural networks that exhibit brain-like sensory processing with high energy efficiency. This work presents the ongoing research towards developing a custom design framework for designing efficient application-specific TNN-based Neuromorphic Sensory Processing Units (NSPUs). This paper examines previous works on NSPU designs for UCR time-series clustering… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  24. arXiv:2205.07410  [pdf, other

    cs.AR cs.ET cs.LG cs.NE

    TNN7: A Custom Macro Suite for Implementing Highly Optimized Designs of Neuromorphic TNNs

    Authors: Harideep Nair, Prabhu Vellaisamy, Santha Bhasuthkar, John Paul Shen

    Abstract: Temporal Neural Networks (TNNs), inspired from the mammalian neocortex, exhibit energy-efficient online sensory processing capabilities. Recent works have proposed a microarchitecture framework for implementing TNNs and demonstrated competitive performance on vision and time-series applications. Building on these previous works, this work proposes TNN7, a suite of nine highly optimized custom macr… ▽ More

    Submitted 25 May, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: To be published in ISVLSI 2022

  25. arXiv:2204.03888  [pdf, other

    cs.CL cs.SD eess.AS

    Transducer-based language embedding for spoken language identification

    Authors: Peng Shen, Xugang Lu, Hisashi Kawai

    Abstract: The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefi… ▽ More

    Submitted 29 July, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: This paper was accepted by Interspeech 2022

  26. arXiv:2203.17036  [pdf, ps, other

    eess.AS cs.CL

    Partial Coupling of Optimal Transport for Spoken Language Identification

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT). A discrepancy measurement based on OT was adopted for JDA between training and test data sets. In our previous study, it was supp… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: This work was submitted to INTERSPEECH 2022

  27. arXiv:2201.01778  [pdf, other

    quant-ph cond-mat.dis-nn cond-mat.mes-hall cs.AI cs.CV

    Quantum Capsule Networks

    Authors: Zidu Liu, Pei-Xin Shen, Weikang Li, L. -M. Duan, Dong-Ling Deng

    Abstract: Capsule networks, which incorporate the paradigms of connectionism and symbolism, have brought fresh insights into artificial intelligence. The capsule, as the building block of capsule networks, is a group of neurons represented by a vector to encode different features of an entity. The information is extracted hierarchically through capsule layers via routing algorithms. Here, we introduce a qua… ▽ More

    Submitted 5 December, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: 7 pages (main text) + 8 pages (supplementary information), 8 figures

    Journal ref: Quantum Sci. Technol. 8 015016 (2022)

  28. arXiv:2201.00443  [pdf, other

    cs.CV

    Scene Graph Generation: A Comprehensive Survey

    Authors: Guangming Zhu, Liang Zhang, Youliang Jiang, Yixuan Dang, Haoran Hou, Peiyi Shen, Mingtao Feng, Xia Zhao, Qiguang Miao, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semanti… ▽ More

    Submitted 22 June, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

    Comments: Submitted to TPAMI

  29. arXiv:2109.06310  [pdf, other

    cs.LG stat.ML

    State Relevance for Off-Policy Evaluation

    Authors: Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

    Abstract: Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces varianc… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: ICML 2021

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021

  30. arXiv:2108.09541  [pdf, other

    cs.LG

    Rotation Equivariant Operators for Machine Learning on Scalar and Vector Fields

    Authors: Paul Shen, Michael Herbst, Venkat Viswanathan

    Abstract: We develop theory and software for rotation equivariant operators on scalar and vector fields, with diverse applications in simulation, optimization and machine learning. Rotation equivariance (covariance) means all fields in the system rotate together, implying spatially invariant dynamics that preserve symmetry. Extending the convolution theorems of linear time invariant systems, we theorize tha… ▽ More

    Submitted 4 August, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

  31. arXiv:2108.08633  [pdf, other

    cs.CV cs.MM

    Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

    Authors: Ning Wang, Guangming Zhu, Liang Zhang, Peiyi Shen, Hongsheng Li, Cong Hua

    Abstract: For a given video-based Human-Object Interaction scene, modeling the spatio-temporal relationship between humans and objects are the important cue to understand the contextual information presented in the video. With the effective spatio-temporal relationship modeling, it is possible not only to uncover contextual information in each frame but also to directly capture inter-time dependencies. It i… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: ACM MM Oral paper

  32. arXiv:2108.04542  [pdf, other

    cs.AI cs.CV

    TrUMAn: Trope Understanding in Movies and Animations

    Authors: Hung-Ting Su, Po-Wei Shen, Bing-Chen Tsai, Wen-Feng Cheng, Ke-Jyun Wang, Winston H. Hsu

    Abstract: Understanding and comprehending video content is crucial for many real-world applications such as search and recommendation systems. While recent progress of deep learning has boosted performance on various tasks using visual cues, deep cognition to reason intentions, motivation, or causality remains challenging. Existing datasets that aim to examine video reasoning capability focus on visual sign… ▽ More

    Submitted 21 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: CIKM 2021. The first two authors contributed equally to this work

  33. arXiv:2106.12864  [pdf, other

    eess.IV cs.CV cs.LG

    A Systematic Collection of Medical Image Datasets for Deep Learning

    Authors: Johann Li, Guangming Zhu, Cong Hua, Mingtao Feng, BasheerBennamoun, Ping Li, Xiaoyuan Lu, Juan Song, Peiyi Shen, Xu Xu, Lin Mei, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analy… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: This paper has been submitted to one journal

  34. arXiv:2106.05519  [pdf, other

    cs.CV

    Consistent Instance False Positive Improves Fairness in Face Recognition

    Authors: Xingkun Xu, Yuge Huang, Pengcheng Shen, Shaoxin Li, Jilin Li, Feiyue Huang, Yong Li, Zhen Cui

    Abstract: Demographic bias is a significant challenge in practical face recognition systems. Existing methods heavily rely on accurate demographic annotations. However, such annotations are usually unavailable in real scenarios. Moreover, these methods are typically designed for a specific demographic group and are not general enough. In this paper, we propose a false positive rate penalty loss, which mitig… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: CVPR2021

  35. arXiv:2105.13262  [pdf, other

    cs.AR cs.ET cs.LG cs.NE

    A Microarchitecture Implementation Framework for Online Learning with Temporal Neural Networks

    Authors: Harideep Nair, John Paul Shen, James E. Smith

    Abstract: Temporal Neural Networks (TNNs) are spiking neural networks that use time as a resource to represent and process information, similar to the mammalian neocortex. In contrast to compute-intensive deep neural networks that employ separate training and inference phases, TNNs are capable of extremely efficient online incremental/continual learning and are excellent candidates for building edge-native… ▽ More

    Submitted 2 June, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

    Comments: To be published in ISVLSI 2021. arXiv admin note: substantial text overlap with arXiv:2009.00457

    Journal ref: 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2021, pp. 266-271

  36. arXiv:2105.02501  [pdf, other

    cs.CV cs.AI

    Federated Face Recognition

    Authors: Fan Bai, Jiaxiang Wu, Pengcheng Shen, Shaoxin Li, Shuigeng Zhou

    Abstract: Face recognition has been extensively studied in computer vision and artificial intelligence communities in recent years. An important issue of face recognition is data privacy, which receives more and more public concerns. As a common privacy-preserving technique, Federated Learning is proposed to train a model cooperatively without sharing data between parties. However, as far as we know, it has… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

  37. arXiv:2104.03004  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Generative probability models are widely used for speaker verification (SV). However, the generative models are lack of discriminative feature selection ability. As a hypothesis test, the SV can be regarded as a binary classification task which can be designed as a Siamese neural network (SiamNN) with discriminative training. However, in most of the discriminative training for SiamNN, only the dis… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2101.03329

  38. Unsupervised Clustering of Time Series Signals using Neuromorphic Energy-Efficient Temporal Neural Networks

    Authors: Shreyas Chaudhari, Harideep Nair, José M. F. Moura, John Paul Shen

    Abstract: Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc. These applications typically involve small, low-power devices on the edge that collect and process real-time sensory signals. State-of-the-art time-series clustering methods perform some form of loss minimization that is extremely computationally intensiv… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at ICASSP 2021

  39. arXiv:2101.03329  [pdf, ps, other

    eess.AS cs.SD

    Coupling a generative model with a discriminative learning framework for speaker verification

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: The speaker verification (SV) task is to decide whether an utterance is spoken by a target or an imposter speaker. For most studies, a log-likelihood ratio (LLR) score is estimated based on a generative probability model on speaker features and compared with a threshold for making a decision. However, the generative model usually focuses on individual feature distributions, does not have the discr… ▽ More

    Submitted 24 November, 2021; v1 submitted 9 January, 2021; originally announced January 2021.

  40. arXiv:2101.01447  [pdf, other

    cs.MM cs.CL cs.CV

    End-to-End Video Question-Answer Generation with Generator-Pretester Network

    Authors: Hung-Ting Su, Chen-Hsi Chang, Po-Wei Shen, Yu-Siang Wang, Ya-Liang Chang, Yu-Cheng Chang, Pu-Jen Cheng, Winston H. Hsu

    Abstract: We study a novel task, Video Question-Answer Generation (VQAG), for challenging Video Question Answering (Video QA) task in multimedia. Due to expensive data annotation costs, many widely used, large-scale Video QA datasets such as Video-QA, MSVD-QA and MSRVTT-QA are automatically annotated using Caption Question Generation (CapQG) which inputs captions instead of the video itself. As captions nei… ▽ More

    Submitted 5 January, 2021; originally announced January 2021.

    Comments: Accepted to TCSVT

  41. arXiv:2012.13152  [pdf, ps, other

    cs.LG cs.CL cs.SD eess.AS

    Unsupervised neural adaptation model based on optimal transport for spoken language identification

    Authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    Abstract: Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded. In this paper, we propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID. In our model, we explicitly formulate the adaptation as to reduce the distribution dis… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

  42. arXiv:2012.05419   

    cs.AR cs.ET cs.LG cs.NE

    A Custom 7nm CMOS Standard Cell Library for Implementing TNN-based Neuromorphic Processors

    Authors: Harideep Nair, Prabhu Vellaisamy, Santha Bhasuthkar, John Paul Shen

    Abstract: A set of highly-optimized custom macro extensions is developed for a 7nm CMOS cell library for implementing Temporal Neural Networks (TNNs) that can mimic brain-like sensory processing with extreme energy efficiency. A TNN prototype (13,750 neurons and 315,000 synapses) for MNIST requires only 1.56mm2 die area and consumes only 1.69mW.

    Submitted 4 June, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: This work is dated and will be superseded by a forthcoming work

  43. arXiv:2009.00457  [pdf, other

    cs.AR cs.ET cs.LG cs.NE

    Direct CMOS Implementation of Neuromorphic Temporal Neural Networks for Sensory Processing

    Authors: Harideep Nair, John Paul Shen, James E. Smith

    Abstract: Temporal Neural Networks (TNNs) use time as a resource to represent and process information, mimicking the behavior of the mammalian neocortex. This work focuses on implementing TNNs using off-the-shelf digital CMOS technology. A microarchitecture framework is introduced with a hierarchy of building blocks including: multi-neuron columns, multi-column layers, and multi-layer TNNs. We present the d… ▽ More

    Submitted 27 August, 2020; originally announced September 2020.

    Comments: Submission Under Review for an IEEE Conference

  44. arXiv:2007.06013  [pdf, other

    cs.CV eess.IV

    MeDaS: An open-source platform as service to help break the walls between medicine and informatics

    Authors: Liang Zhang, Johann Li, Ping Li, Xiaoyuan Lu, Peiyi Shen, Guangming Zhu, Syed Afaq Shah, Mohammed Bennarmoun, Kun Qian, Björn W. Schuller

    Abstract: In the past decade, deep learning (DL) has achieved unprecedented success in numerous fields including computer vision, natural language processing, and healthcare. In particular, DL is experiencing an increasing development in applications for advanced medical image analysis in terms of analysis, segmentation, classification, and furthermore. On the one hand, tremendous needs that leverage the po… ▽ More

    Submitted 13 July, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: layout error fixed

  45. arXiv:2004.00288  [pdf, other

    cs.CV

    CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

    Authors: Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, Feiyue Huang

    Abstract: As an emerging topic in face recognition, designing margin-based loss functions can increase the feature margin between different classes for enhanced discriminability. More recently, the idea of mining-based strategies is adopted to emphasize the misclassified samples, achieving promising results. However, during the entire training process, the prior methods either do not explicitly emphasize th… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  46. arXiv:2002.03741  [pdf, other

    cs.CV

    Efficient Scene Text Detection with Textual Attention Tower

    Authors: Liang Zhang, Yufei Liu, Hang Xiao, Lu Yang, Guangming Zhu, Syed Afaq Shah, Mohammed Bennamoun, Peiyi Shen

    Abstract: Scene text detection has received attention for years and achieved an impressive performance across various benchmarks. In this work, we propose an efficient and accurate approach to detect multioriented text in scene images. The proposed feature fusion mechanism allows us to use a shallower network to reduce the computational complexity. A self-attention mechanism is adopted to suppress false pos… ▽ More

    Submitted 30 January, 2020; originally announced February 2020.

    Comments: Accepted by ICASSP 2020

  47. arXiv:2002.03662  [pdf, other

    cs.CV

    Improving Face Recognition from Hard Samples via Distribution Distillation Loss

    Authors: Yuge Huang, Pengcheng Shen, Ying Tai, Shaoxin Li, Xiaoming Liu, Jilin Li, Feiyue Huang, Rongrong Ji

    Abstract: Large facial variations are the main challenge in face recognition. To this end, previous variation-specific methods make full use of task-related prior to design special network losses, which are typically not general among different tasks and scenarios. In contrast, the existing generic methods focus on improving the feature discriminability to minimize the intra-class distance while maximizing… ▽ More

    Submitted 18 July, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: ECCV2020

  48. Structure-Feature based Graph Self-adaptive Pooling

    Authors: Liang Zhang, Xudong Wang, Hongsheng Li, Guangming Zhu, Peiyi Shen, Ping Li, Xiaoyuan Lu, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: Various methods to deal with graph data have been proposed in recent years. However, most of these methods focus on graph feature aggregation rather than graph pooling. Besides, the existing top-k selection graph pooling methods have a few problems. First, to construct the pooled graph topology, current top-k selection methods evaluate the importance of the node from a single perspective only, whi… ▽ More

    Submitted 30 January, 2020; originally announced February 2020.

    Comments: 7 pages, 4 figures, The Web Conference 2020

  49. arXiv:1912.12011  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Cross-scale Attention Model for Acoustic Event Classification

    Authors: Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

    Abstract: A major advantage of a deep convolutional neural network (CNN) is that the focused receptive field size is increased by stacking multiple convolutional layers. Accordingly, the model can explore the long-range dependency of features from the top layers. However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range depende… ▽ More

    Submitted 15 June, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

  50. arXiv:1812.03304  [pdf, ps, other

    cs.RO

    Real-time Acceleration-continuous Path-constrained Trajectory Planning With Built-in Tradability Between Cruise and Time-optimal Motions

    Authors: Peiyao Shen, Xuebo Zhang, Yongchun Fang

    Abstract: In this paper, a novel real-time acceleration-continuous path-constrained trajectory planning algorithm is proposed with an appealing built-in tradability mechanism between cruise motion and time-optimal motion. Different from existing approaches, the proposed approach smoothens time-optimal trajectories with bang-bang input structures to generate acceleration-continuous trajectories while preserv… ▽ More

    Submitted 8 December, 2018; originally announced December 2018.

    Comments: 12 pages, 19 figures