Assessment of urban air quality from Twitter communication using self-attention network and a multilayer classification model

Environ Sci Pollut Res Int. 2023 Jan;30(4):10414-10425. doi: 10.1007/s11356-022-22836-w. Epub 2022 Sep 8.

Abstract

Social media platforms are one of the prominent new-age methods used by public for spreading awareness or drawing attention on an issue or concern. This study demonstrates how the twitter responses of public can be used for qualitative monitoring of air pollution in an urban area. Tweets discussing about air quality in Delhi, India, were extracted during 2019-2020 using a machine learning technique based on self-attention network. These tweets were cleaned, sorted, and classified into 3-class quality viz. poor air quality, good air quality, and noise or neutral tweets. The present study used a multilayer classification model with first layer as an embedding layer and second layer as bi-directional long-short term memory (BiLSTM) layer. A method was then devised for estimating PM2.5 concentration from the tweets using 'spaCy' similarity analysis of classified tweets and data extracted from Continuous Ambient Air Quality Monitoring Stations (CAAQMS) in Delhi for the study period. The accuracy of this estimation was found to be high (80-99%) for extreme air quality conditions (extremely good or severe) and lower during moderate variations in air quality. Application of this methodology depended on perceivable changes in air quality, twitter engagement, and environmental consciousness among public.

Keywords: Air pollution; BiLSTM; Deep learning; Delhi; PM2.5; spaCy.

MeSH terms

  • Air Pollution* / analysis
  • Communication
  • Humans
  • India
  • Social Media*