Motivation: Social media represent an unrivalled opportunity for epidemiological cohorts to collect large amounts of high-resolution time course data on mental health. Equally, the high-quality data held by epidemiological cohorts could greatly benefit social media research as a source of ground truth for validating digital phenotyping algorithms. However, there is currently a lack of software for doing this in a secure and acceptable manner. We worked with cohort leaders and participants to co-design an open-source, robust and expandable software framework for gathering social media data in epidemiological cohorts.
Implementation: Epicosm is implemented as a Python framework that is straightforward to deploy and run inside a cohort's data safe haven.
General features: The software regularly gathers Tweets from a list of accounts and stores them in a database for linking to existing cohort data.
Availability: This open-source software is freely available at [https://dynamicgenetics.github.io/Epicosm/].
Keywords: ALSPAC; Big Data; Social media; cohort studies; data linkage; data science; epidemiology; longitudinal studies; mental health; wellbeing.
© The Author(s) 2023. Published by Oxford University Press on behalf of the International Epidemiological Association.