MfeCNN: Mixture Feature Embedding Convolutional Neural Network for Data Mapping

IEEE Trans Nanobioscience. 2018 Jul;17(3):165-171. doi: 10.1109/TNB.2018.2841053. Epub 2018 May 28.

Abstract

Data mapping plays an important role in data integration and exchanges among institutions and organizations with different data standards. However, traditional rule-based approaches and machine learning methods fail to achieve satisfactory results for the data mapping problem. In this paper, we propose a novel and sophisticated deep learning framework for data mapping called mixture feature embedding convolutional neural network (MfeCNN). The MfeCNN model converts the data mapping task to a multiple classification problem. In the model, we incorporated multimodal learning and multiview embedding into a CNN for mixture feature tensor generation and classification prediction. Multimodal features were extracted from various linguistic spaces with a medical natural language processing package. Then, powerful feature embeddings were learned by using the CNN. As many as 10 classes could be simultaneously classified by a softmax prediction layer based on multiview embedding. MfeCNN achieved the best results on unbalanced data (average F1 score, 82.4%) among the traditional state-of-the-art machine learning models and CNN without mixture feature embedding. Our model also outperformed a very deep CNN with 29 layers, which took free texts as inputs. The combination of mixture feature embedding and a deep neural network can achieve high accuracy for data mapping and multiple classification.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Data Mining
  • Deep Learning*
  • Humans
  • Natural Language Processing
  • Neural Networks, Computer*
  • Workflow