Using deep neural networks (DNNs) to explore spatial patterns and temporal dynamics of human brain activities has been an important yet challenging problem because the artificial neural networks are hard to be designed manually. There have been several promising deep learning methods, e.g., deep belief network (DBN), convolutional neural network (CNN), and deep sparse recurrent auto-encoder (DSRAE), that can decompose neuroscientific and meaningful spatiotemporal patterns from 4D functional Magnetic Resonance Imaging (fMRI) data. However, those previous studies still depend on hand-crafted neural network architectures and hyperparameters, which are not optimal in various senses. In this paper, we employ the evolutionary algorithms (EA) to optimize the deep neural architecture of DSRAE by minimizing the expected loss of initialized models, named eNAS-DSRAE (evolutionary Neural Architecture Search on Deep Sparse Recurrent Auto-Encoder). Also, validation experiments are designed and performed on the publicly available human connectome project (HCP) 900 datasets, and the results achieved by the optimized eNAS-DSRAE suggested that our framework can successfully identify the spatiotemporal features and perform better than the hand-crafted neural network models. To our best knowledge, the proposed eNAS-DSRAE is not only among the earliest NAS models that can extract connectome-scale meaningful spatiotemporal brain networks from 4D fMRI data, but also is an effective framework to optimize the RNN-based models.