Three-dimensional (3D) molecular generation models employ deep neural networks to simultaneously generate both topological representation and molecular conformations. Due to their advantages in utilizing the structural and interaction information on targets, as well as their reduced reliance on existing bioactivity data, these models have attracted widespread attention. However, limited training and testing data sets and the unexpected biases inherent in single evaluation metrics pose a significant challenge in comparing these models in practical settings. In this work, we proposed Durian, an evaluation framework for structure-based 3D molecular generation that incorporates protein-ligand data with experimental affinity and a comprehensive array of physicochemical and geometric metrics. The benchmark tasks encompass assessing the capability of models to reproduce the property distribution of training sets, generate molecules with rational distributions of drug-related properties, and exhibit potential high affinity toward given targets. Binding affinities were evaluated using three independent docking methods (QuickVina2, Surflex and Gnina) with both "Dock" and "Score" modes to reduce false positives arising from conformational searches or scoring functions. Specifically, we applied Durian to six 3D molecular generation methods: LiGAN, Pocket2Mol, DiffSBDD, SBDD, GraphBP, and SurfGen. While most methods demonstrated the ability to generate drug-like small molecules with reasonable physicochemical properties, they exhibited varying degrees of limitations in balancing novelty, structural rationality, and synthetic accessibility, thereby constraining their practical applications in drug discovery. Based on a total of 17 metrics, Durian highlights the importance of multiobjective optimization in 3D molecular generation methods. For instance, SurfGen and SBDD showed relatively comprehensive performance but could benefit from further improvements in molecular conformational rationality. Our evaluation framework is expected to provide meaningful guidance for the selection, optimization, and application of 3D generative models in practical drug design tasks.