Background and objective: Convolutional neural networks (CNNs) offer human experts-like performance and in the same time they are faster and more consistent in their prediction. However, most of the proposed CNNs require an expensive state-of-the-art hardware which substantially limits their use in practical scenarios and commercial systems, especially for clinical, biomedical and other applications that require on-the-fly analysis. In this paper, we investigate the possibility of making CNNs lighter by parametrizing the architecture and decreasing the number of trainable weights of a popular CNN: U-Net.
Methods: In order to demonstrate that comparable results can be achieved with substantially less trainable weights than the original U-Net we used a challenging application of a pixel-wise virus classification in Transmission Electron Microscopy images with minimal annotations (i.e. consisting only of the virus particle centers or centerlines). We explored 4 U-Net hyper-parameters: the number of base feature maps, the feature maps multiplier, the number of the encoding-decoding levels and the number of feature maps in the last 2 convolutional layers.
Results: Our experiments lead to two main conclusions: 1) the architecture hyper-parameters are pivotal if less trainable weights are to be used, and 2) if there is no restriction on the trainable weights number using a deeper network generally gives better results. However, training larger networks takes longer, typically requires more data and such networks are also more prone to overfitting. Our best model achieved an accuracy of 82.2% which is similar to the original U-Net while using nearly 4 times less trainable weights (7.8 M in comparison to 31.0 M). We also present a network with < 2 M trainable weights that achieved an accuracy of 76.4%.
Conclusions: The proposed U-Net hyper-parameter exploration can be adapted to other CNNs and other applications. It allows a comprehensive CNN architecture designing with the aim of a more efficient trainable weight use. Making the networks faster and lighter is crucial for their implementation in many practical applications. In addition, a lighter network ought to be less prone to over-fitting and hence generalize better.
Keywords: Deep learning; Hardware integration; Hyper parameter optimization; Transmission Electron Microscopy.
Copyright © 2019. Published by Elsevier B.V.