InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Bertaglia, Thales; Heisig, Lily; Kaushal, Rishabh; Iamnitchi, Adriana

Computer Science > Computers and Society

arXiv:2403.15214 (cs)

[Submitted on 22 Mar 2024]

Title:InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Authors:Thales Bertaglia, Lily Heisig, Rishabh Kaushal, Adriana Iamnitchi

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) raise concerns about lowering the cost of generating texts that could be used for unethical or illegal purposes, especially on social media. This paper investigates the promise of such models to help enforce legal requirements related to the disclosure of sponsored content online. We investigate the use of LLMs for generating synthetic Instagram captions with two objectives: The first objective (fidelity) is to produce realistic synthetic datasets. For this, we implement content-level and network-level metrics to assess whether synthetic captions are realistic. The second objective (utility) is to create synthetic data that is useful for sponsored content detection. For this, we evaluate the effectiveness of the generated synthetic data for training classifiers to identify undisclosed advertisements on Instagram. Our investigations show that the objectives of fidelity and utility may conflict and that prompt engineering is a useful but insufficient strategy. Additionally, we find that while individual synthetic posts may appear realistic, collectively they lack diversity, topic connectivity, and realistic user interaction patterns.

Comments:	To appear at the 18th International AAAI Conference on Web and Social Media (ICWSM 2024) -- please cite accordingly
Subjects:	Computers and Society (cs.CY); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Cite as:	arXiv:2403.15214 [cs.CY]
	(or arXiv:2403.15214v1 [cs.CY] for this version)
	https://doi.org/10.48550/arXiv.2403.15214

Submission history

From: Thales Bertaglia [view email]
[v1] Fri, 22 Mar 2024 13:58:42 UTC (1,007 KB)

Computer Science > Computers and Society

Title:InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computers and Society

Title:InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators