ELCC: the Emergent Language Corpus Collection

B Boldt, D Mortensen - arXiv preprint arXiv:2407.04158, 2024 - arxiv.org
arXiv preprint arXiv:2407.04158, 2024arxiv.org
We introduce the Emergent Language Corpus Collection (ELCC): a collection of corpora
collected from open source implementations of emergent communication systems across the
literature. These systems include a variety of signalling game environments as well as more
complex tasks like a social deduction game and embodied navigation. Each corpus is
annotated with metadata describing the characteristics of the source system as well as a
suite of analyses of the corpus (eg, size, entropy, average message length). Currently …
We introduce the Emergent Language Corpus Collection (ELCC): a collection of corpora collected from open source implementations of emergent communication systems across the literature. These systems include a variety of signalling game environments as well as more complex tasks like a social deduction game and embodied navigation. Each corpus is annotated with metadata describing the characteristics of the source system as well as a suite of analyses of the corpus (e.g., size, entropy, average message length). Currently, research studying emergent languages requires directly running different systems which takes time away from actual analyses of such languages, limits the variety of languages that are studied, and presents a barrier to entry for researchers without a background in deep learning. The availability of a substantial collection of well-documented emergent language corpora, then, will enable new directions of research which focus their purview on the properties of emergent languages themselves rather than on experimental apparatus.
arxiv.org