Google Scholar

Crowdsourcing dialect characterization through Twitter

B Gonçalves, D Sánchez - PloS one, 2014 - journals.plos.org

PloS one, 2014•journals.plos.org

We perform a large-scale analysis of language diatopic variation using geotagged
microblogging datasets. By collecting all Twitter messages written in Spanish over more
than two years, we build a corpus from which a carefully selected list of concepts allows us
to characterize Spanish varieties on a global scale. A cluster analysis proves the existence
of well defined macroregions sharing common lexical properties. Remarkably enough, we
find that Spanish language is split into two superdialects, namely, an urban speech used …

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.

PLOS

Show moreShow less

Speichern Sie Cite Cited by 117 Related articles All 25 versions Cached

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Crowdsourcing dialect characterization through Twitter