Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models

Milička, Jiří

Computer Science > Computation and Language

arXiv:2408.16740 (cs)

[Submitted on 29 Aug 2024]

Title:Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models

Authors:Jiří Milička

View PDF HTML (experimental)

Abstract:This paper addresses the conceptual, methodological and technical challenges in studying large language models (LLMs) and the texts they produce from a quantitative linguistics perspective. It builds on a theoretical framework that distinguishes between the LLM as a substrate and the entities the model simulates. The paper advocates for a strictly non-anthropomorphic approach to models while cautiously applying methodologies used in studying human linguistic behavior to the simulated entities. While natural language processing researchers focus on the models themselves, their architecture, evaluation, and methods for improving performance, we as quantitative linguists should strive to build a robust theory concerning the characteristics of texts produced by LLMs, how they differ from human-produced texts, and the properties of simulated entities. Additionally, we should explore the potential of LLMs as an instrument for studying human culture, of which language is an integral part.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2408.16740 [cs.CL]
	(or arXiv:2408.16740v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.16740

Submission history

From: Jiří Milička [view email]
[v1] Thu, 29 Aug 2024 17:34:10 UTC (44 KB)

Computer Science > Computation and Language

Title:Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators