Freezing of gait (FoG) is identified as a sudden and brief episode of movement cessation despite the intention to continue walking. It is one of the most disabling symptoms of Parkinson's disease (PD) and often leads to falls and injuries. Many computer-aided FoG detection methods have been proposed to use data collected from unimodal sources, such as motion sensors, pressure sensors, and video cameras. However, there are limited efforts of multimodal-based methods to maximize the value of all the information collected from different modalities in clinical assessments and improve the FoG detection performance. Therefore, in this study, a novel end-to-end deep architecture, namely graph fusion neural network (GFN), is proposed for multimodal learning-based FoG detection by combining footstep pressure maps and video recordings. GFN constructs multimodal graphs by treating the encoded features of each modality as vertex-level inputs and measures their adjacency patterns to construct complementary FoG representations, thus reducing the representation redundancy among different modalities. In addition, since GFN is devised to process multimodal graphs of arbitrary structures, it is expected to achieve superior performance with inputs containing missing modalities, compared to the alternative unimodal methods. A multimodal FoG dataset was collected, which included clinical assessment videos and footstep pressure sequences of 340 trials from 20 PD patients. Our proposed GFN demonstrates a great promise of multimodal FoG detection with an area under the curve (AUC) of 0.882. To the best of our knowledge, this is one of the first studies to utilize multimodal learning for automated FoG detection, which offers significant opportunities for better patient assessments and clinical trials in the future.