In recent years, person re-identification (re-ID) has achieved relatively good performance, benefiting from the revival of deep neural networks. However, due to the existence of domain bias which refers to the different data distributions between two domains, it remains challenging to directly deploy a model trained on a labeled source domain to a target domain only with unlabeled data available. In this paper, a Self-Training with Progressive Representation Enhancement (PREST) framework, which comprises a multi-scale self-training method and a view-invariant representation learning module, is proposed to promote re-ID performance on the target domain in an unsupervised manner. More specifically, multi-scale representations, including the global body and local parts of pedestrian images, are utilized to obtain pseudo-labels. Then, some images are selected according to the pseudo-labels to create a new dataset for supervising the fine-tuning process, which is operated iteratively to progressively promote the performance. Furthermore, to mitigate the influence of different styles among sub-domains, in cases where a single sub-domain is captured by one camera, a classifier with a gradient reverse layer is first employed to learn view-invariant representation for pedestrian images with the same identity taken by different cameras; this can further enhance the reliability of the predicted labels and improve the cross-domain re-ID performance. Extensive experiments on three large-scale re-ID datasets demonstrate that our framework achieves significantly better performance than existing approaches.