Numerous automatic sleep staging approaches have been proposed to provide an eHealth alternative to the current gold-standard - hypnogram scoring by human experts. However, a majority of such studies exploit data of limited scale, which compromises both the validation and the reproducibility and transferability of such automatic sleep staging systems in real clinical settings. In addition, the computational issues and physical meaningfulness of the analysis are typically neglected, yet affordable computation is a key criterion in Big Data analytics. To this end, we establish a comprehensive analysis framework to rigorously evaluate the feasibility of automatic sleep staging from multiple perspectives, including robustness with respect to the number of training subjects, model complexity, and different classifiers. This is achieved for a large collection of publicly accessible polysomnography (PSG) data, recorded over 515 subjects. The trade-off between affordable computation and satisfactory accuracy is shown to be fulfilled by an extreme learning machine (ELM) classifier, which in conjunction with the physically meaningful hidden Markov model (HMM) of the transition between the different sleep stages (smoothing model) is shown to achieve both fast computation and the highest average Cohen's kappa value of κ = 0.73 (Substantial Agreement). Finally, it is shown that for accurate and robust automatic sleep staging, a combination of structural complexity (multi-scale entropy) and frequency-domain (spectral edge frequency) features is both computationally affordable and physically meaningful.