Humans use environmental context for facilitating object searches. The benefit of context for visual search requires learning. Modeling the learning process of context for efficient processing is vital to understanding visual function in everyday environments. We proposed a model that accounts for the contextual cueing effect, which refers to the learning effect of scene context to identify the location of a target item. The model extracted the global feature of a scene and gradually strengthened the relationship between the global feature and its target location with repeated observations. We compared the model and human performance with two visual search experiments (letter arrangements on a gray background or a natural scene). The proposed model successfully simulated the faster reduction of the number of saccades required before target detection for the natural scene background compared with the uniform gray background. We further tested whether the model replicated the known characteristics of the contextual cueing effect in terms of local learning around the target, the effect of the ratio of repeated and novel stimuli, and the superiority of natural scenes.