This article presents an intelligent fault diagnosis method for wind turbine (WT) gearbox by using wavelet packet decomposition (WPD) and deep learning. Specifically, the vibration signals from the gearbox are decomposed using WPD and the decomposed signal components are fed into a hierarchical convolutional neural network (CNN) to extract multiscale features adaptively and classify faults effectively. The presented method combines the multiscale characteristic of WPD with the strong classification capacity of CNNs, and it does not need complex manual feature extraction steps as usually adopted in existing results. The presented CNN with multiple characteristic scales based on WPD (WPD-MSCNN) has three advantages: 1) the added WPD layer can legitimately process the nonstationary vibration data to obtain components at multiple characteristic scales adaptively, it takes full advantage of WPD and, thus, enables the CNN to extract multiscale features; 2) the WPD layer directly sends multiscale components to the hierarchical CNN to extract rich fault information effectively, and it avoids the loss of useful information due to hand-crafted feature extraction; and 3) even if the scale changes, the lengths of components remain the same, which shows that the proposed method is robust to scale uncertainties in the vibration signals. Experiments with vibration data from a production wind farm provided by a company using condition monitoring system (CMS) show that the presented WPD-MSCNN method is superior to traditional CNN and multiscale CNN (MSCNN) for fault diagnosis.