We explore the nature of forgetting in a corpus of 125,000 students learning Spanish using the Rosetta Stone® foreign-language instruction software across 48 lessons. Students are tested on a lesson after its initial study and are then retested after a variable time lag. We observe forgetting consistent with power function decay at a rate that varies across lessons but not across students. We find that lessons which are better learned initially are forgotten more slowly, a correlation which likely reflects a latent cause such as the quality or difficulty of the lesson. We obtain improved predictive accuracy of the forgetting model by augmenting it with features that encode characteristics of a student's initial study of the lesson and the activities the student engaged in between the initial and delayed tests. The augmented model can predict 23.9% of the variance in an individual's score on the delayed test. We analyze which features best explain individual performance.
Keywords: Big data; Computational modeling; Corpus analysis; Forgetting; Second language learning.
Copyright © 2016 Cognitive Science Society, Inc.