Search | arXiv e-print repository

Photonic-Electronic Integrated Circuits for High-Performance Computing and AI Accelerators

Authors: Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, Zhixing Jiang, Zhoufeng Ying, Jason Midkiff, Sourabh Jain, May H. Hlaing, David Z. Pan, Ray T. Chen

Abstract: In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issues, are propelling the search for alternative computing paradigms. Among various emerging technologies, i… ▽ More In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issues, are propelling the search for alternative computing paradigms. Among various emerging technologies, integrated photonics stands out as a promising solution for next-generation high-performance computing, thanks to the inherent advantages of light, such as low latency, high bandwidth, and unique multiplexing techniques. Furthermore, the progress in photonic integrated circuits (PICs), which are equipped with abundant photoelectronic components, positions photonic-electronic integrated circuits as a viable solution for high-performance computing and hardware AI accelerators. In this review, we survey recent advancements in both PIC-based digital and analog computing for AI, exploring the principal benefits and obstacles of implementation. Additionally, we propose a comprehensive analysis of photonic AI from the perspectives of hardware implementation, accelerator architecture, and software-hardware co-design. In the end, acknowledging the existing challenges, we underscore potential strategies for overcoming these issues and offer insights into the future drivers for optical computing. △ Less

Submitted 11 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2401.02347 [pdf, other]

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Authors: Longtian Qiu, Shan Ning, Xuming He

Abstract: Image captioning aims at generating descriptive and meaningful textual descriptions of images, enabling a broad range of vision-language applications. Prior works have demonstrated that harnessing the power of Contrastive Image Language Pre-training (CLIP) offers a promising approach to achieving zero-shot captioning, eliminating the need for expensive caption annotations. However, the widely obse… ▽ More Image captioning aims at generating descriptive and meaningful textual descriptions of images, enabling a broad range of vision-language applications. Prior works have demonstrated that harnessing the power of Contrastive Image Language Pre-training (CLIP) offers a promising approach to achieving zero-shot captioning, eliminating the need for expensive caption annotations. However, the widely observed modality gap in the latent space of CLIP harms the performance of zero-shot captioning by breaking the alignment between paired image-text features. To address this issue, we conduct an analysis on the CLIP latent space which leads to two findings. Firstly, we observe that the CLIP's visual feature of image subregions can achieve closer proximity to the paired caption due to the inherent information loss in text descriptions. In addition, we show that the modality gap between a paired image-text can be empirically modeled as a zero-mean Gaussian distribution. Motivated by the findings, we propose a novel zero-shot image captioning framework with text-only training to reduce the modality gap. In particular, we introduce a subregion feature aggregation to leverage local region information, which produces a compact visual representation for matching text representation. Moreover, we incorporate a noise injection and CLIP reranking strategy to boost captioning performance. We also extend our framework to build a zero-shot VQA pipeline, demonstrating its generality. Through extensive experiments on common captioning and VQA datasets such as MSCOCO, Flickr30k and VQAV2, we show that our method achieves remarkable performance improvements. Code is available at https://github.com/Artanic30/MacCap. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: AAAI 2024.Open sourced, Code and Model Available

arXiv:2312.04534 [pdf, other]

PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns

Authors: Shuliang Ning, Duomin Wang, Yipeng Qin, Zirong Jin, Baoyuan Wang, Xiaoguang Han

Abstract: In this paper, we propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human images. Unlike prior arts constrained by specific input types, our method allows flexible specification of style (text or image) and texture (full garment, cropped sections, or texture patches) conditions. To address the enta… ▽ More In this paper, we propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human images. Unlike prior arts constrained by specific input types, our method allows flexible specification of style (text or image) and texture (full garment, cropped sections, or texture patches) conditions. To address the entanglement challenge when using full garment images as conditions, we develop a two-stage pipeline with explicit disentanglement of style and texture. In the first stage, we generate a human parsing map reflecting the desired style conditioned on the input. In the second stage, we composite textures onto the parsing map areas based on the texture input. To represent complex and non-stationary textures that have never been achieved in previous fashion editing works, we first propose extracting hierarchical and balanced CLIP features and applying position encoding in VTON. Experiments demonstrate superior synthesis quality and personalization enabled by our method. The flexible control over style and texture mixing brings virtual try-on to a new level of user experience for online shopping and fashion design. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Project page: https://ningshuliang.github.io/2023/Arxiv/index.html

arXiv:2312.02963 [pdf, other]

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Authors: Zhangyang Xiong, Chenghong Li, Kenkun Liu, Hongjie Liao, Jianqiao Hu, Junyi Zhu, Shuliang Ning, Lingteng Qiu, Chongjie Wang, Shijie Wang, Shuguang Cui, Xiaoguang Han

Abstract: In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while remarkable progress has been made with models trained on large-scale synthetic and real-captured object data like Objaverse and MVImgNet, a similar level of progress has not been observed in the domain of human-centric… ▽ More In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while remarkable progress has been made with models trained on large-scale synthetic and real-captured object data like Objaverse and MVImgNet, a similar level of progress has not been observed in the domain of human-centric tasks partially due to the lack of a large-scale human dataset. Existing datasets of high-fidelity 3D human capture continue to be mid-sized due to the significant challenges in acquiring large-scale high-quality 3D human data. To bridge this gap, we present MVHumanNet, a dataset that comprises multi-view human action sequences of 4,500 human identities. The primary focus of our work is on collecting human data that features a large number of diverse identities and everyday clothing using a multi-view human capture system, which facilitates easily scalable data collection. Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million frames with extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions. To explore the potential of MVHumanNet in various 2D and 3D visual tasks, we conducted pilot studies on view-consistent action recognition, human NeRF reconstruction, text-driven view-unconstrained human image generation, as well as 2D view-unconstrained human image and 3D avatar generation. Extensive experiments demonstrate the performance improvements and effective applications enabled by the scale provided by MVHumanNet. As the current largest-scale 3D human dataset, we hope that the release of MVHumanNet data with annotations will foster further innovations in the domain of 3D human-centric tasks at scale. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Project page: https://x-zhangyang.github.io/MVHumanNet/

arXiv:2311.05726 [pdf, other]

doi 10.3389/fphy.2024.1334298

Neural Network Methods for Radiation Detectors and Imaging

Authors: S. Lin, S. Ning, H. Zhu, T. Zhou, C. L. Morris, S. Clayton, M. Cherukara, R. T. Chen, Z. Wang

Abstract: Recent advances in image data processing through machine learning and especially deep neural networks (DNNs) allow for new optimization and performance-enhancement schemes for radiation detectors and imaging hardware through data-endowed artificial intelligence. We give an overview of data generation at photon sources, deep learning-based methods for image processing tasks, and hardware solutions… ▽ More Recent advances in image data processing through machine learning and especially deep neural networks (DNNs) allow for new optimization and performance-enhancement schemes for radiation detectors and imaging hardware through data-endowed artificial intelligence. We give an overview of data generation at photon sources, deep learning-based methods for image processing tasks, and hardware solutions for deep learning acceleration. Most existing deep learning approaches are trained offline, typically using large amounts of computational resources. However, once trained, DNNs can achieve fast inference speeds and can be deployed to edge devices. A new trend is edge computing with less energy consumption (hundreds of watts or less) and real-time analysis potential. While popularly used for edge computing, electronic-based hardware accelerators ranging from general purpose processors such as central processing units (CPUs) to application-specific integrated circuits (ASICs) are constantly reaching performance limits in latency, energy consumption, and other physical constraints. These limits give rise to next-generation analog neuromorhpic hardware platforms, such as optical neural networks (ONNs), for high parallel, low latency, and low energy computing to boost deep learning acceleration. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Report number: LA-UR-23-32395

arXiv:2305.19592 [pdf]

Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning

Authors: Chenghao Feng, Jiaqi Gu, Hanqing Zhu, Rongxing Tang, Shupeng Ning, May Hlaing, Jason Midkiff, Sourabh Jain, David Z. Pan, Ray T. Chen

Abstract: The optical neural network (ONN) is a promising hardware platform for next-generation neuromorphic computing due to its high parallelism, low latency, and low energy consumption. However, previous integrated photonic tensor cores (PTCs) consume numerous single-operand optical modulators for signal and weight encoding, leading to large area costs and high propagation loss to implement large tensor… ▽ More The optical neural network (ONN) is a promising hardware platform for next-generation neuromorphic computing due to its high parallelism, low latency, and low energy consumption. However, previous integrated photonic tensor cores (PTCs) consume numerous single-operand optical modulators for signal and weight encoding, leading to large area costs and high propagation loss to implement large tensor operations. This work proposes a scalable and efficient optical dot-product engine based on customized multi-operand photonic devices, namely multi-operand optical neurons (MOON). We experimentally demonstrate the utility of a MOON using a multi-operand-Mach-Zehnder-interferometer (MOMZI) in image recognition tasks. Specifically, our MOMZI-based ONN achieves a measured accuracy of 85.89% in the street view house number (SVHN) recognition dataset with 4-bit voltage control precision. Furthermore, our performance analysis reveals that a 128x128 MOMZI-based PTCs outperform their counterparts based on single-operand MZIs by one to two order-of-magnitudes in propagation loss, optical delay, and total device footprint, with comparable matrix expressivity. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 19 pages, 10 figures

arXiv:2305.04451 [pdf, other]

doi 10.1145/3588432.3591568

FashionTex: Controllable Virtual Try-on with Text and Texture

Authors: Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang, Xiaoguang Han

Abstract: Virtual try-on attracts increasing research attention as a promising way for enhancing the user experience for online cloth shopping. Though existing methods can generate impressive results, users need to provide a well-designed reference image containing the target fashion clothes that often do not exist. To support user-friendly fashion customization in full-body portraits, we propose a multi-mo… ▽ More Virtual try-on attracts increasing research attention as a promising way for enhancing the user experience for online cloth shopping. Though existing methods can generate impressive results, users need to provide a well-designed reference image containing the target fashion clothes that often do not exist. To support user-friendly fashion customization in full-body portraits, we propose a multi-modal interactive setting by combining the advantages of both text and texture for multi-level fashion manipulation. With the carefully designed fashion editing module and loss functions, FashionTex framework can semantically control cloth types and local texture patterns without annotated pairwise training data. We further introduce an ID recovery module to maintain the identity of input portrait. Extensive experiments have demonstrated the effectiveness of our proposed pipeline. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Accepted to SIGGRAPH 2023 (Conference Proceedings)

arXiv:2303.15786 [pdf, other]

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Authors: Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He

Abstract: Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenari… ▽ More Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP. △ Less

Submitted 26 July, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: CVPR 2023.Open sourced, Code and Model Available

Report number: 13

arXiv:2212.04655 [pdf, other]

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

Authors: Shuliang Ning, Mengcheng Lan, Yanran Li, Chaofeng Chen, Qian Chen, Xunlai Chen, Xiaoguang Han, Shuguang Cui

Abstract: The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner. This way often leads to severe performance degradation when they try to extrapolate a longer period of future, thus limiting the practical use of the prediction model. Alter… ▽ More The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner. This way often leads to severe performance degradation when they try to extrapolate a longer period of future, thus limiting the practical use of the prediction model. Alternatively, a Multi-In-Multi-Out (MIMO) architecture that outputs all the future frames at one shot naturally breaks the recursive manner and therefore prevents error accumulation. However, only a few MIMO models for video prediction are proposed and they only achieve inferior performance due to the date. The real strength of the MIMO model in this area is not well noticed and is largely under-explored. Motivated by that, we conduct a comprehensive investigation in this paper to thoroughly exploit how far a simple MIMO architecture can go. Surprisingly, our empirical studies reveal that a simple MIMO model can outperform the state-of-the-art work with a large margin much more than expected, especially in dealing with longterm error accumulation. After exploring a number of ways and designs, we propose a new MIMO architecture based on extending the pure Transformer with local spatio-temporal blocks and a new multi-output decoder, namely MIMO-VP, to establish a new standard in video prediction. We evaluate our model in four highly competitive benchmarks (Moving MNIST, Human3.6M, Weather, KITTI). Extensive experiments show that our model wins 1st place on all the benchmarks with remarkable performance gains and surpasses the best SISO model in all aspects including efficiency, quantity, and quality. We believe our model can serve as a new baseline to facilitate the future research of video prediction tasks. The code will be released. △ Less

Submitted 30 May, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

ACM Class: I.4.9

Journal ref: AAAI 2023

arXiv:2202.02621 [pdf, other]

COVID-19 and Influenza Joint Forecasts Using Internet Search Information in the United States

Authors: Simin Ma, Shaoyang Ning, Shihao Yang

Abstract: As COVID-19 pandemic progresses, severe flu seasons may happen alongside an increase in cases in cases and death of COVID-19, causing severe burdens on health care resources and public safety. A consequence of a twindemic may be a mixture of two different infections in the same person at the same time, "flurona". Admist the raising trend of "flurona", forecasting both influenza outbreaks and COVID… ▽ More As COVID-19 pandemic progresses, severe flu seasons may happen alongside an increase in cases in cases and death of COVID-19, causing severe burdens on health care resources and public safety. A consequence of a twindemic may be a mixture of two different infections in the same person at the same time, "flurona". Admist the raising trend of "flurona", forecasting both influenza outbreaks and COVID-19 waves in a timely manner is more urgent than ever, as accurate joint real-time tracking of the twindemic aids health organizations and policymakers in adequate preparation and decision making. Under the current pandemic, state-of-art influenza and COVID-19 forecasting models carry valuable domain information but face shortcomings under current complex disease dynamics, such as similarities in symptoms and public healthcare seeking patterns of the two diseases. Inspired by the inner-connection between influenza and COVID-19 activities, we propose ARGOX-Joint-Ensemble which allows us to combine historical influenza and COVID-19 disease forecasting models to a new ensemble framework that handles scenarios where flu and COVID co-exist. Our framework is able to emphasize learning from COVID-related or influenza signals, through a winner-takes-all ensemble fashion. Moreover, our experiments demonstrate that our approach is successful in adapting past influenza forecasting models to the current pandemic, while improving upon previous COVID-19 forecasting models, by steadily outperforming alternative benchmark methods, and remaining competitive with publicly available models. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.12160

arXiv:2107.10068 [pdf, other]

From Single to Multiple: Leveraging Multi-level Prediction Spaces for Video Forecasting

Authors: Mengcheng Lan, Shuliang Ning, Yanran Li, Qian Chen, Xunlai Chen, Xiaoguang Han, Shuguang Cui

Abstract: Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces. This work fills this gap. For the first time, we deeply study numerous strategies to perform video forecasting in multi-prediction spaces and fus… ▽ More Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces. This work fills this gap. For the first time, we deeply study numerous strategies to perform video forecasting in multi-prediction spaces and fuse their results together to boost performance. The prediction in the pixel space usually lacks the ability to preserve the semantic and structure content of the video however the prediction in the high-level feature space is prone to generate errors in the reduction and recovering process. Therefore, we build a recurrent connection between different feature spaces and incorporate their generations in the upsampling process. Rather surprisingly, this simple idea yields a much more significant performance boost than PhyDNet (performance improved by 32.1% MAE on MNIST-2 dataset, and 21.4% MAE on KTH dataset). Both qualitative and quantitative evaluations on four datasets demonstrate the generalization ability and effectiveness of our approach. We show that our model significantly reduces the troublesome distortions and blurry artifacts and brings remarkable improvements to the accuracy in long term video prediction. The code will be released soon. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:2103.00783 [pdf, other]

PENet: Towards Precise and Efficient Image Guided Depth Completion

Authors: Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, Xiaojin Gong

Abstract: Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch to exploit and fuse two modalities thoroughly. More s… ▽ More Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch to exploit and fuse two modalities thoroughly. More specifically, one branch inputs a color image and a sparse depth map to predict a dense depth map. The other branch takes as inputs the sparse depth map and the previously predicted depth map, and outputs a dense depth map as well. The depth maps predicted from two branches are complimentary to each other and therefore they are adaptively fused. In addition, we also propose a simple geometric convolutional layer to encode 3D geometric cues. The geometric encoded backbone conducts the fusion of different modalities at multiple stages, leading to good depth completion results. We further implement a dilated and accelerated CSPN++ to refine the fused depth map efficiently. The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission. It also infers much faster than most of the top ranked methods. The code of this work is available at https://github.com/JUGGHM/PENet_ICRA2021. △ Less

Submitted 18 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: Accepted by ICRA 2021

arXiv:2102.03888 [pdf, other]

OPT-GAN: A Broad-Spectrum Global Optimizer for Black-box Problems by Learning Distribution

Authors: Minfang Lu, Shuai Ning, Shuangrong Liu, Fengyang Sun, Bo Zhang, Bo Yang, Lin Wang

Abstract: Black-box optimization (BBO) algorithms are concerned with finding the best solutions for problems with missing analytical details. Most classical methods for such problems are based on strong and fixed a priori assumptions, such as Gaussianity. However, the complex real-world problems, especially when the global optimum is desired, could be very far from the a priori assumptions because of their… ▽ More Black-box optimization (BBO) algorithms are concerned with finding the best solutions for problems with missing analytical details. Most classical methods for such problems are based on strong and fixed a priori assumptions, such as Gaussianity. However, the complex real-world problems, especially when the global optimum is desired, could be very far from the a priori assumptions because of their diversities, causing unexpected obstacles. In this study, we propose a generative adversarial net-based broad-spectrum global optimizer (OPT-GAN) which estimates the distribution of optimum gradually, with strategies to balance exploration-exploitation trade-off. It has potential to better adapt to the regularity and structure of diversified landscapes than other methods with fixed prior, e.g., Gaussian assumption or separability. Experiments on diverse BBO benchmarks and high dimensional real world applications exhibit that OPT-GAN outperforms other traditional and neural net-based BBO algorithms. △ Less

Submitted 31 January, 2023; v1 submitted 7 February, 2021; originally announced February 2021.

arXiv:2009.00096 [pdf]

DeepSTCL: A Deep Spatio-temporal ConvLSTM for Travel Demand Prediction

Authors: Dongjie Wang, Yan Yang, Shangming Ning

Abstract: Urban resource scheduling is an important part of the development of a smart city, and transportation resources are the main components of urban resources. Currently, a series of problems with transportation resources such as unbalanced distribution and road congestion disrupt the scheduling discipline. Therefore, it is significant to predict travel demand for urban resource dispatching. Previousl… ▽ More Urban resource scheduling is an important part of the development of a smart city, and transportation resources are the main components of urban resources. Currently, a series of problems with transportation resources such as unbalanced distribution and road congestion disrupt the scheduling discipline. Therefore, it is significant to predict travel demand for urban resource dispatching. Previously, the traditional time series models were used to forecast travel demand, such as AR, ARIMA and so on. However, the prediction efficiency of these methods is poor and the training time is too long. In order to improve the performance, deep learning is used to assist prediction. But most of the deep learning methods only utilize temporal dependence or spatial dependence of data in the forecasting process. To address these limitations, a novel deep learning traffic demand forecasting framework which based on Deep Spatio-Temporal ConvLSTM is proposed in this paper. In order to evaluate the performance of the framework, an end-to-end deep learning system is designed and a real dataset is used. Furthermore, the proposed method can capture temporal dependence and spatial dependence simultaneously. The closeness, period and trend components of spatio-temporal data are used in three predicted branches. These branches have the same network structures, but do not share weights. Then a linear fusion method is used to get the final result. Finally, the experimental results on DIDI order dataset of Chengdu demonstrate that our method outperforms traditional models with accuracy and speed. △ Less

Submitted 22 August, 2020; originally announced September 2020.

arXiv:2001.02107 [pdf]

doi 10.1093/database/bay071

Leveraging Prior Knowledge for Protein-Protein Interaction Extraction with Memory Network

Authors: Huiwei Zhou, Zhuang Liu, Shixian Ning, Yunlong Yang, Chengkun Lang, Yingyu Lin, Kun Ma

Abstract: Automatically extracting Protein-Protein Interactions (PPI) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations lea… ▽ More Automatically extracting Protein-Protein Interactions (PPI) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations learned from knowledge bases. Both entity embeddings and relation embeddings of prior knowledge are effective in improving the PPI extraction model, leading to a new state-of-the-art performance on the BioCreative VI PPI dataset. The paper also shows that multiple computational layers over an external memory are superior to long short-term memory networks with the local memories. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Published on Database-The Journal of Biological Databases and Curation, 11 pages, 5 figures

Journal ref: Database-The Journal of Biological Databases and Curation, 2018, 2018: bay071

arXiv:2001.02091 [pdf]

doi 10.1016/j.jbi.2019.103234

Knowledge-aware Attention Network for Protein-Protein Interaction Extraction

Authors: Huiwei Zhou, Zhuang Liu1, Shixian Ning, Chengkun Lang, Yingyu Lin, Lei Du

Abstract: Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. However, many of the current PPI extraction methods need extensive feature engineering and cannot make full use of the prior knowledge in knowledge bases (KB). KBs contain huge amounts of structured information about entities and relationships, therefore pla… ▽ More Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. However, many of the current PPI extraction methods need extensive feature engineering and cannot make full use of the prior knowledge in knowledge bases (KB). KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in PPI extraction. This paper proposes a knowledge-aware attention network (KAN) to fuse prior knowledge about protein-protein pairs and context information for PPI extraction. The proposed model first adopts a diagonal-disabled multi-head attention mechanism to encode context sequence along with knowledge representations learned from KB. Then a novel multi-dimensional attention mechanism is used to select the features that can best describe the encoded context. Experiment results on the BioCreative VI PPI dataset show that the proposed approach could acquire knowledge-aware dependencies between different words in a sequence and lead to a new state-of-the-art performance. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Published on Journal of Biomedical Informatics, 14 pages, 5 figures

Journal ref: Journal of Biomedical Informatics, 2019, 96: 103234

arXiv:2001.00295 [pdf]

doi 10.1016/j.jbi.2018.07.007

Chemical-induced Disease Relation Extraction with Dependency Information and Prior Knowledge

Authors: Huiwei Zhou, Shixian Ning, Yunlong Yang, Zhuang Liu, Chengkun Lang, Yingyu Lin

Abstract: Chemical-disease relation (CDR) extraction is significantly important to various areas of biomedical research and health care. Nowadays, many large-scale biomedical knowledge bases (KBs) containing triples about entity pairs and their relations have been built. KBs are important resources for biomedical relation extraction. However, previous research pays little attention to prior knowledge. In ad… ▽ More Chemical-disease relation (CDR) extraction is significantly important to various areas of biomedical research and health care. Nowadays, many large-scale biomedical knowledge bases (KBs) containing triples about entity pairs and their relations have been built. KBs are important resources for biomedical relation extraction. However, previous research pays little attention to prior knowledge. In addition, the dependency tree contains important syntactic and semantic information, which helps to improve relation extraction. So how to effectively use it is also worth studying. In this paper, we propose a novel convolutional attention network (CAN) for CDR extraction. Firstly, we extract the shortest dependency path (SDP) between chemical and disease pairs in a sentence, which includes a sequence of words, dependency directions, and dependency relation tags. Then the convolution operations are performed on the SDP to produce deep semantic dependency features. After that, an attention mechanism is employed to learn the importance/weight of each semantic dependency vector related to knowledge representations learned from KBs. Finally, in order to combine dependency information and prior knowledge, the concatenation of weighted semantic dependency representations and knowledge representations is fed to the softmax layer for classification. Experiments on the BioCreative V CDR dataset show that our method achieves comparable performance with the state-of-the-art systems, and both dependency information and prior knowledge play important roles in CDR extraction task. △ Less

Submitted 1 January, 2020; originally announced January 2020.

Comments: Published on Journal of Biomedical Informatics, 13 pages

Journal ref: Journal of Biomedical Informatics, 2018, 84:171-178

arXiv:1912.10604 [pdf]

doi 10.1109/TCBB.2018.2838661

Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction

Authors: Huiwei Zhou, Yunlong Yang, Shixian Ning, Zhuang Liu, Chengkun Lang, Yingyu Lin, Degen Huang

Abstract: Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in che… ▽ More Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems. △ Less

Submitted 22 December, 2019; originally announced December 2019.

Comments: Published on IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11 pages, 5 figures

Journal ref: IEEE/ACM TCBB,2018,16(6):1879-1889

arXiv:1912.10590 [pdf]

doi 10.1186/s12859-019-2873-7

Knowledge-guided Convolutional Networks for Chemical-Disease Relation Extraction

Authors: Huiwei Zhou, Chengkun Lang, Zhuang Liu, Shixian Ning, Yingyu Lin, Lei Du

Abstract: Background: Automatic extraction of chemical-disease relations (CDR) from unstructured text is of essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior knowledge about chemicals and diseases. Prior knowledge provides strong support for CDR extraction. How to make full use of it is… ▽ More Background: Automatic extraction of chemical-disease relations (CDR) from unstructured text is of essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior knowledge about chemicals and diseases. Prior knowledge provides strong support for CDR extraction. How to make full use of it is worth studying. Results: This paper proposes a novel model called "Knowledge-guided Convolutional Networks (KCN)" to leverage prior knowledge for CDR extraction. The proposed model first learns knowledge representations including entity embeddings and relation embeddings from KBs. Then, entity embeddings are used to control the propagation of context features towards a chemical-disease pair with gated convolutions. After that, relation embeddings are employed to further capture the weighted context features by a shared attention pooling. Finally, the weighted context features containing additional knowledge information are used for CDR extraction. Experiments on the BioCreative V CDR dataset show that the proposed KCN achieves 71.28% F1-score, which outperforms most of the state-of-the-art systems. Conclusions: This paper proposes a novel CDR extraction model KCN to make full use of prior knowledge. Experimental results demonstrate that KCN could effectively integrate prior knowledge and contexts for the performance improvement. △ Less

Submitted 22 December, 2019; originally announced December 2019.

Comments: Published on BMC Bioinformatics, 16 pages, 5 figures

Journal ref: BMC Bioinformatics, 2019, 20(1):260

arXiv:1912.05147 [pdf]

doi 10.1016/j.compbiolchem.2019.107146

Improving Neural Protein-Protein Interaction Extraction with Knowledge Selection

Authors: Huiwei Zhou, Xuefei Li, Weihong Yao, Zhuang Liu, Shixian Ning, Chengkun Lang, Lei Du

Abstract: Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. Meanwhile, knowledge bases (KBs) contain huge amounts of structured information of protein entities and their relations, which can be encoded in entity and relation embeddings to help PPI extraction. However, the prior knowledge of protein-protein pairs must… ▽ More Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. Meanwhile, knowledge bases (KBs) contain huge amounts of structured information of protein entities and their relations, which can be encoded in entity and relation embeddings to help PPI extraction. However, the prior knowledge of protein-protein pairs must be selectively used so that it is suitable for different contexts. This paper proposes a Knowledge Selection Model (KSM) to fuse the selected prior knowledge and context information for PPI extraction. Firstly, two Transformers encode the context sequence of a protein pair according to each protein embedding, respectively. Then, the two outputs are fed to a mutual attention to capture the important context features towards the protein pair. Next, the context features are used to distill the relation embedding by a knowledge selector. Finally, the selected relation embedding and the context features are concatenated for PPI extraction. Experiments on the BioCreative VI PPI dataset show that KSM achieves a new state-of-the-art performance (38.08% F1-score) by adding knowledge selection. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Published in Computational Biology and Chemistry; 14 pages, 2 figures

Journal ref: Computational Biology and Chemistry, 2019, 83: 107146

arXiv:1810.02717 [pdf, other]

Clust-LDA: Joint Model for Text Mining and Author Group Inference

Authors: Shaoyang Ning, Xi Qu, Victor Cai, Nathan Sanders

Abstract: Social media corpora pose unique challenges and opportunities, including typically short document lengths and rich meta-data such as author characteristics and relationships. This creates great potential for systematic analysis of the enormous body of the users and thus provides implications for industrial strategies such as targeted marketing. Here we propose a novel and statistically principled… ▽ More Social media corpora pose unique challenges and opportunities, including typically short document lengths and rich meta-data such as author characteristics and relationships. This creates great potential for systematic analysis of the enormous body of the users and thus provides implications for industrial strategies such as targeted marketing. Here we propose a novel and statistically principled method, clust-LDA, which incorporates authorship structure into the topical modeling, thus accomplishing the task of the topical inferences across documents on the basis of authorship and, simultaneously, the identification of groupings between authors. We develop an inference procedure for clust-LDA and demonstrate its performance on simulated data, showing that clust-LDA out-performs the "vanilla" LDA on the topic identification task where authors exhibit distinctive topical preference. We also showcase the empirical performance of clust-LDA based on a real-world social media dataset from Reddit. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Showing 1–21 of 21 results for author: Ning, S