Mice have been widely used as a model organism to investigate human gene-phenotype relationships based on a conjecture that orthologous genes generally perform similar functions and are associated with similar phenotypes. However, phenotypes associated with orthologous genes often turn out to be quite different between human and mouse. Herein, we devised a method to quantitatively compare phenotypes annotations associated with mouse models and human. Using semantic similarity comparisons, we identified orthologous genes with different phenotype annotations, of which the similarity score is on a par with that of random gene pairs. Analysis of sequence evolution and transcriptomic changes revealed that orthologous genes with phenotypic differences are correlated with changes in noncoding regulatory elements and tissue-specific expression profiles rather than changes in protein-coding sequences. To map accurate gene-phenotype relationships using model organisms, we propose that careful consideration of the evolutionary divergence of noncoding regulatory elements and transcriptomic profiles is essential.