The nano-self-assembly of natural organic matter (NOM) profoundly influences the occurrence and fate of NOM and pollutants in large-scale complex environments. Machine learning (ML) offers a promising and robust tool for interpreting and predicting the processes, structures and environmental effects of NOM self-assembly. This review seeks to provide a tutorial-like compilation of data source determination, algorithm selection, model construction, interpretability analyses, applications and challenges for big-data-based ML aiming at elucidating NOM self-assembly mechanisms in environments. The results from advanced nano-submicron-scale spatial chemical analytical technologies are suggested as input data which provide the combined information of molecular interactions and structural visualization. The existing ML algorithms need to handle multi-scale and multi-modal data, necessitating the development of new algorithmic frameworks. Interpretable supervised models are crucial owing to their strong capacity of quantifying the structure-property-effect relationships and bridging the gap between simply data-driven ML and complicated NOM assembly practice. Then, the necessity and challenges are discussed and emphasized on adopting ML to understand the geochemical behaviors and bioavailability of pollutants as well as the elemental cycling processes in environments resulting from the NOM self-assembly patterns. Finally, a research framework integrating ML, experiments and theoretical simulation is proposed for comprehensively and efficiently understanding the NOM self-assembly-involved environmental issues.