Previous Article in Journal
Mapping PM2.5 Sources and Emission Management Options for Bishkek, Kyrgyzstan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Approach for Local Atmospheric Emission Predictions

by
Alessandro Marongiu
*,
Gabriele Giuseppe Distefano
,
Marco Moretti
,
Federico Petrosino
,
Giuseppe Fossati
,
Anna Gilia Collalto
and
Elisabetta Angelino
ARPA Lombardia, Environmental Protection Agency of Lombardia Region, 20124 Milano, Italy
*
Author to whom correspondence should be addressed.
Air 2024, 2(4), 380-401; https://doi.org/10.3390/air2040022
Submission received: 4 September 2024 / Revised: 26 September 2024 / Accepted: 30 September 2024 / Published: 3 October 2024

Abstract

:
This paper presents a novel machine learning methodology able to extend the results of detailed local emission inventories to larger domains where emission estimates are not available. The first part of this work consists in the development of an emission inventory of elemental carbon (EC), black carbon (BC), organic carbon (OC), and levoglucosan (LG) obtained from the detailed emission estimates available from the Project LIFE PREPAIR for the Po Basin in north Italy. The emissions of these chemical species in combination with particulate primary emissions and gaseous precursors are very important information in source apportionment and in the impact assessment of the different emission sources in air quality. To gain a better understanding of the origins of atmospheric pollution, it is possible to combine measurements with emission estimates for the particulate matter fractions known as EC, BC, OC, and LG. To identify the sources of emissions, it is usual practice to use the ratio of the measured EC, OC, TC (Total Carbon), and LG. The PREPAIR emission estimates and these new calculations are then used to train the Random Forest (RF) algorithm, considering a large array of local variables, such as taxes, the characteristics of urbanization and dwellings, the number of employees detailed for economic activities, occupation levels and land cover. The outcome of the comparison of the predictions of the machine learning implemented model (ML) with the estimates obtained for the same areas by two independent methods, local disaggregation of the national emission inventory and Copernicus Air Modelling Service (CAMS) emissions estimates, is extremely encouraging and confirms it also as a promising approach in terms of effort saving. The implemented modelling approach identifies the most important variables affecting the spatialization of different pollutants in agreement with the main emission source characteristics and is suitable for harmonization of the results of different local emission inventories with national emission reporting.

1. Introduction

Atmospheric aerosols are suspended particles and droplets which are directly emitted from various sources or formed by chemical reactions from precursor gases as sulphur dioxide (SO2), nitrogen oxides (NOx), Volatile Organic Compounds (VOCs) and ammonia (NH3). The carbonaceous component of atmospheric aerosols is composed of two main fractions: an organic fraction, known as organic carbon (OC), and a dark, highly polymerized fraction, resistant to oxidation at temperatures below 400 °C. The latter non-organic and non-carbonate fraction is often referred to as black carbon (BC) or elemental carbon (EC). The term BC is typically used when pure optical techniques are employed for quantification. In contrast, EC is used when refractory carbon is identified through physical and chemical analyses, such as thermo-optical methods. Therefore, BC and EC are correlated but not coincident and in most cases they have slightly different thermal, optical and chemical behaviour [1,2].
OC and EC can be used as indicators for air quality monitoring, for example, in urban areas, by considering their ratio and relationship with road traffic [3]. The OC measured in the atmosphere can be either primary emitted or secondary formed. Primary OC comes from anthropogenic sources, like incomplete combustion of fossil fuels and biomass in domestic heating, cooking and industrial processes, and from biogenic sources, such as from vegetation. Secondary OC is formed in the atmosphere through photochemical and aqueous-phase reactions of volatile organic compounds.
BC is the largest contributor to global shortwave absorption in the atmosphere among all light-absorbing aerosol compounds and has a broad range of negative impacts on the health of the human population. BC can also travel long distances, forming atmospheric brown clouds in combination with other aerosols. Atmospheric brown clouds intercept solar radiation, causing surface dimming and impacting the hydrological cycle. BC deposition also darkens snow and ice surfaces, potentially accelerating melting, especially of Arctic Sea ice [4,5].
Despite its importance, the complete understanding of BC formation, accumulation, dynamics and its role in the broader carbon cycle remains limited. Therefore, substantial efforts are needed to enhance the knowledge of BC sources, of the most effective mitigation methods and of its impacts on climate, air quality and health. To improve the understanding of BC concentrations in urban areas and address the mitigation of its health and environmental impacts, the World Health Organization (WHO) recommends systematic measurements of BC or EC, the creation of BC inventories, and the implementation of relevant BC mitigation actions. The use of accurate BC emission factors (EFs) in emission inventories and climate and health models is crucial for effectively mitigating the adverse impacts and societal costs caused by BC emissions. Therefore, it is essential to collect and maintain accurate and up-to-date information on BC EFs [6,7].
BC emission inventories are strategic tools for air quality management and play a crucial role in climate change and human health studies. By identifying major emitters and prioritizing sectors for emission reductions, emission inventories help strategize effective measures. They also serve as input for atmospheric dispersion models, calculation of emission trends, the analysis of policy effectiveness and the creation of cost-effective emission scenarios [8].
One of the crucial points in compiling an emission inventory is the choice of emission factors (EFs), so that they reflect the local characteristics of the inventory site, such as technological and economic development and available information, to reduce uncertainty. A review conducted by Rönkkö et al. on the emission factors (EFs) of the main anthropogenic sources of BC, such as traffic, residential combustion and energy production, reveals that the differences in observed EFs were wide, up to six orders of magnitude within the same category depending on the combustion device, fuel and post-treatment systems. Low-technology combustion plants contribute significantly to both emissions and uncertainties that need to be reduced by characterizing emissions from small residential combustion appliances, industrial and mobile sources [6].
Globally, the emissions of primary PM2.5, BC, and OC from biomass burning are estimated at approximately 51, 4.6, and 29 Tg, respectively and accounting for 70%, 55%, and 90% of the total emissions of these pollutants from all sources. These emissions account for specifically residential combustion, and fires are identified as the predominant sources [9].
Biomass incomplete combustion generates several organic compounds, like monosaccharides from cellulose break out, terpenes, aliphatic compounds and other large molecules [10]. Biomass combustion can also release a specific class of organic compounds called Condensable Primary Organic Aerosol (CPOA). These are found as vapor phase at stack exit and can condense in the atmosphere. As a matter of fact, Primary Particulate Matter (PPM) emitted from biomass combustion can be divided into two main contributions: Filterable Particulate Matter (FPM) and Condensable Particulate Matter (CPM) [11]. In this paper, our calculations on biomass combustion are always defined as total Primary Particulate Matter, due to both the filterable and condensable parts.
Levoglucosan (LG) can constitute a large fraction of OC emitted from biomass and for this reason is commonly recognized as a specific tracer for biomass burning [10,11,12,13,14,15,16].
LG tends to degrade in the atmosphere and it cannot be considered as inert [17]. LG measurements can also be associated with other types of marker, like potassium and modern carbon [18], making it the best chemical tracer for RWC in the EMEP measurement system. The LG/OC ratio in Primary Organic Aerosol (POA) can vary based on the type of wood burned and the combustion conditions [16,17,18,19]. Some studies concerning the emission ratio are available for European contexts [20,21,22]. Values of LG/OC of 11.7% for PM2,5 and 8.5% for PM10 are reported for Norwegian conditions [23,24,25]. LG does not belong to the mandatory EMEP measurement programme (Level 1 or Level 2) defined in the UNECE monitoring strategy, but it is included at Level 3 to evaluate the contribution of the different sources of organic aerosol. A Levoglucosan emission inventory helps to better quantify the biomass burning contribution to overall emissions and how this is distributed, in order to develop strategies for emissions reduction.
The role of NH3, NOx and SO2 in secondary formation of particulate matter in the Po Basin has been widely discussed in previous studies [26,27,28].
Ammonia in the atmosphere, primarily produced by agriculture, can react with acidic species, such as sulfuric acid, nitric acid and hydrochloric acid, leading to the production of secondary inorganic aerosols (SIAs): ammonium sulphate, ammonium nitrate and ammonium chloride, which are the main components of PM [29]. Primary PM emissions from agriculture originate from poultry and pig houses, and animal manure used for fertilization of arable land can also contribute to airborne PM. A positive correlation between manure dry matter (DM) and PM production rate was further observed [30].
This paper is organized in the following sections. In Section 2, we present how the Particulate Emission Inventory developed in LIFE PREPAIR is used to estimate emissions of OC, EC and BC, along with the main hypothesis on the calculation of LG emissions from OC. A focus on the main methodological aspects of National and Local emission inventories in Italy is presented to define the common input variables, in order to spatialize the emissions and identify possible issues determining differences in the emission inventories’ estimates.
The Methodology description is completed by presenting the Machine Learning techniques developed to extend the results from the Po Basin to the whole of Italy. In Section 3, the main sources and their spatialization in the Po Basin are reported. The results of the Machine Learning simulations are then compared with the estimates at local level produced from the national emission inventory downscaling and from the Copernicus Atmosphere Monitoring Service (CAMS)
In Section 4, the agreement of the proposed estimates is presented, considering concentration trends and emission distributions.
In Section 5, we report the importance of the results obtained by the proposed Machine Learning approach in improving monitoring systems and extending results from the high spatial resolution emission inventory to estimates for much larger areas. These represent examples of the potential use of ML techniques in the sector of environmental data analysis, if they are used in combination with knowledge of the data used and the nature of the phenomena under study.

2. Methodology for Emission Inventory Development

In this paper we estimate the emissions of EC, BC, OC and LG in the Italian Regions of the Po Basin, starting from the development of a common air pollutant emission dataset for the Po Basin and Slovenia, foreseen in the project LIFE PREPAIR (www.lifeprepair.eu, accessed on 2 September 2024) [31]. The works carried on in PREPAIR have released three updated emission inventories for PM, NH3, NOx, SO2, CO and VOCNM, referring to 2013, 2017 and 2019, by collection of the information from the Environmental Protection Agencies and Regions of Lombardy, Emilia-Romagna, Piedmont, Veneto, Friuli Venezia Giulia, Valle d’Aosta, the Province of Bolzano (participating as stakeholder) and the Province of Trento. The Po Basin is the most populous macro-region of Italy and, located in the north of the country (Figure 1), covers an extent of about 115,000 km2 with a population of about 26 million inhabitants. It is mostly a plain surrounded by the Alps and Apennines, and it is often characterized by thermal inversion and air stagnation [32]. According to European and National Legislation [33,34], the Italian Regions and autonomous Provinces have different responsibilities for air quality monitoring and management, encompassing the updating of their respective local emission inventories every two or three years. The EEA–EMEP Guidebook (www.eea.europa.eu, accessed on 2 September 2024) is the primary technical reference for updating national and local emission inventories [35,36]. In this framework, the estimates of primary emissions of the main atmospheric pollutants are available for different Italian regions with high spatial resolution, defined at the municipal level according to the SNAP source classification [37,38].
The PREPAIR Project’s emission inventories are often based on a bottom-up methodology that prioritizes the use of local and comprehensive data sources developed in the framework of the INEMAR System (www.inemar.eu/xwiki/bin/view/Inemar/WebHome, accessed on 2 September 2024) [39], which allow the ensemble of local data to exhibit extremely good comparability, even if all compilers make estimates independently.
The methodology at the basis of the PREPAIR emission estimates is generally more complex than that implemented at the national level, where a top-down methodology is used to disaggregate the overall emission trend in various Italian Regions on a yearly basis [37].
It is explicitly stated in the Italian legislation [34] that regional and national emission inventories must be harmonized. The National System of the Environmental Protection Agencies, SNPA [38], has analysed the different processes used for emission inventory estimates in Italy to identify all the possible sources of divergences and differences. Figure 2 shows all the possible processes of calculation in emission inventories in Italy. Information at various levels can be processed to produce an emissions inventory. Using extremely sophisticated algorithms, information from multiple databases is combined to determine the contributions to emissions of the various sources. Data are not always available with the required spatial and/or technological detail and require the application of proxy variables. These proxies can be used, for example, to distribute a total regional indicator in each municipality according to a percentage.
To deduce the sources of proxies and indicators, various methods might be used. As reported in the EMEP/EEA guidebook [35,36], the process of estimating may entail selecting methods with varying degrees of complexity, and the findings obtained from the classification of emission sources may be subjected to varying degrees of specificity. This aspect can be of greater importance when comparing estimates produced by various tools and applications. The data sources may have classifications of indicators that do not directly correspond with the sources listed in the inventories. In such cases, specific conversion tables must be created and updated, or coefficients for the indicator’s distribution among the various SNAP categories must be developed. Figure 2 suggests that, to obtain activity indicators from information from local databases, it might be necessary to use connection tables. These could then be subject to further technological breakdowns to apply more detailed methodological tiers in the estimation of emissions. In this case, further spatialization proxies would provide the definition of municipal emission data, completing the estimates in a top-down manner. On a municipal or even punctual scale, the indicators might already be accessible in other situations.
Within this process, many factors can be the possible causes of divergences arising from the comparison/harmonization between the inventories [38]:
  • Source classification
  • Definition of the pollutant
  • Methodological tier
  • Database and methodology update level
  • Technical insight
  • Accuracy and completeness of point sources
  • Availability of specific studies and technical insights
  • Availability of the required indicator
  • Technical methodological proxy
  • Disaggregation from a scale above the regional scale
  • Alignment of the reference year
  • Methods of reconstruction of the time-series or completion of the indicator
  • Variation of administrative contexts
  • Economic and production context

2.1. Methodology for Emission Estimates of BC, EC, OC and LG

The emissions of EC and OC are obtained from the product of particulate emissions by their relative abundance in the particulate for a specific combination of sector, activity and fuel. The particulate emissions are obtained from the results available from LIFE PREPAIR [31].
The general approach is reported in the following:
E y , i , k = p y , i , k , j × E i , k , j
where:
y,i,k,j = chemical species, sector, fuel, PM size fraction;
py,i,k,j = abundance (%) of chemical species y in PM size fraction j for sector i and fuel k;
Ei,k,j = emissions for the PM size fraction j for sector i and fuel k;
The emissions of OC and EC are obtained with the methodology and speciation fractions for total suspended particulate (TSP) emissions reported by Caserini et al. [2]. The approach is based on the same reference methodology as Kupiainen and Klimont (2007) [40] and Winther and Nielsen [41]. The factors related to BC are obtained from the EMEP/EEA air pollutant emission inventory guidebook [35,36] and based on PM2.5 emissions. Table 1 shows the average relative abundance of BC, OC and EC obtained by all coefficients used in this study for different fuels.
The estimation of OC can serve as the basis for the LG emission estimations [11], assuming an LG/OC emission rate of 10% of the POA carbon emissions. As reported by Simpson et al. [11], this ratio is highly approximate and may have an uncertainty of a factor of two (ranging from 5% to 20%). We used this emission ratio in this analysis to account for all sources of biomass burning [42]. According to Yao Hu et al. [42], the emission factor for Levoglucosan resulting from oak burning is 95.56 mg/kg, while Jimenez et al. [43] indicate an emission value of 202.3 mg/kg. The primary assumptions made in this paper regarding the quantity of LG in OC and the computation of OC emissions involve estimating an overall LG emission factor from burning biomass of about 121 mg/kg and a potential variability range of 60–242 mg/kg.
The methodology for emission estimates is in agreement with the analytical activities performed in LIFE PREPAIR for measuring OC, EC and LG in PM samples. The particulate matter is collected on a quartz filter from the project monitoring stations: Bologna, Torino, Cavallermaggiore, Vicenza, and Schivenoglia. Each filter, corresponding to a 24 h sampling procedure, is cut by using an appropriate puncher with a 1.41 and 1.50 cm2 dimension cell for the OC/EC and LG analysis, respectively (the choice of a 1.41 cm2 puncher is related to the basic structure of the thermo-optical autosampler instrument) by totally obtaining two small filters, always corresponding to the same original sample. The OC/EC analysis is conducted by using an OCEC Dual Optical Lab Instrument system developed by Sunset Laboratory. This type of analytical instrument allows measurement of the carbonaceous fraction coming from the sample powder collected on the filter by applying volatilization and oxidation processes to the chemical species containing carbon atoms. Thanks to a thermic process within the instrument, a qualitative and quantitative analysis of the released carbon gases is performed with a subsequent optical control procedure to obtain an appropriate resolution between OC and EC. LG analysis is conducted by a Liquid-Ionic-Chromatography technique. Specifically, it is first necessary to make a pre-treatment filter procedure by inserting it in a test tube, filled with 10 mL ultrapure water, which is inserted in an ultrasonic bath for 30 min. Finally, the analytical content is filtered in a clean test tube by using an appropriate microfilter of 0.45 µm pores.

2.2. Methodology for Machine Learning Applied to Emission Inventory

The algorithms implemented in the INEMAR system are very complex and detailed. They often refer to the highest methodological tiers and guarantee the maximum level of accuracy. However, in some cases they can require long elaboration time, both for data collection and processing.
The aim of finding a direct and efficient method for calculating emissions based on indicators that are readily available has been discussed in a recent paper [44]. The authors report the calculation of the order of magnitude of the Greenhouse Gases (GHGs) emissions within the boundaries of an Italian province by correlation and regression methods applied to land use and basic information (such as, for example, GDP).
In the present paper, we define a general approach based on machine learning (ML), able to predict local emission estimates based on different input information, which is commonly used in the detailed processes of calculation of bottom-up local emission inventories and in the top-down spatialization of national emission reporting. This general and novel approach allows us to:
  • reduce the processing time for local emission inventories;
  • extend the results of the emission inventories of the PREPAIR domain to the whole country with high accuracy, compared to all of the independent processes of top-down spatialization from the national estimates [45] and calculations from CAMS [46];
  • find the most important variables affecting the spatialization of different pollutants in agreement with the main emission source characteristics, e.g., level of urbanization for BC, livestock’s consistency for NH3.
  • harmonize the results of different local emission inventories with national emission reporting.
The results of PREPAIR emissions are used to train Random Forest (RF) algorithms, which consider a vast array of local variables, such as taxes, the characteristics of urbanization and homes, the number of employees detailed for economic activities, occupation levels, livestock and land use. As shown in Table 2, the ML technique is used to predict the municipal emission rate as a function of 166 statistical parameters acquired from various data sources.
All data used to train and test the machine learning algorithms are available to the public. In the Supplementary Materials, Table S1 provides a thorough description of each variable, and Figures S1–S7 show the indicator map for the whole of Italy. ARPA Lombardia’s air pollution emission data platform, EMITOOL, was consulted to obtain the results pertaining to the emission source contributions. The Random Forest (RF) method was selected due to its superior prediction performance over regression and decision trees. Moreover, a general but incomplete investigation has been conducted of neural networks. This kind of technique is mostly intended for continuous input variables that do not accept zero values; nevertheless, the factors taken into consideration in this study may not be continuous in every municipality, for example, the potential lack of a Land cover class in a particular territory.
The Random Forest method, randomForestSRC [47,48,49,50], used in this study is implemented in a CRAN compliant R-package [50] and uses fast OpenMP parallel processing. In a previous paper, we investigated the application of machine learning approaches to calculate time varying emission rates and atmospheric concentrations of ammonia in the Po Basin [28]. By introducing randomization into the foundational learning process, the Random Forest (RF) technique can enhance ensemble learning [48]. In RF, feature subset trees are used to generate predictions [49]. Ishwaran et al.’s Random Survival Forest (RSF) is an extension of this strategy [48]. By randomizing the learning process in two ways—random feature selection and random sampling of the data to construct a tree—RF is a technique that averages trees and creates an ensemble.
In this paper, we focus attention on the spatial distribution of emissions, investigating the capability of RF to predict emission rates for all Italian municipalities. RF is trained with the yearly municipal emission rates for PM, OC, EC, BC, and LG for variations of the parameters shown in Table 2.
Due to the large number of input variables, RF is performed with the variable importance methodology (VIMP). In the joint-VIMP approach, two variables are paired, and their paired VIMP (also known as “paired” importance) is determined, according to Ishwaran et al. [47]. Every separate variable’s VIMP is also computed. “Additive” importance is the term used to describe the total of these two values. If the univariate VIMP for each of the paired variables is significantly large, a significant positive or negative difference between “Paired” and “Additive” suggests a relationship worth investigating. The dataset is randomly split into two parts for training and testing purposes: one subset contained 75% of the data for training, and the second subset contained the remaining 25% for testing.

3. Results of the Emission Inventory for the Po Basin

As defined before, atmospheric particulate matter encompasses solid particles and liquid droplets directly emitted or formed in the atmosphere from precursor as NOx, NH3, SO2 and other gases (e.g., particle-producing organic gases). Table 3 shows the emission estimates of these precursors in the atmosphere of the Po Basin. Emissions are classified according to the SNAP nomenclature (Selected Nomenclature for Air Pollution), including the 11 macro-sectors as reported below.
As reported in a previous paper [31], the main source of SO2 emissions is the remaining sulphur in fuels used in industrial combustion, the most relevant source in the Po Basin. The literature has extensively documented the benefits of switching from fuel oil to natural gas for atmospheric emissions, and the emission time-series at national and local basis clearly shows a relevant reduction of this pollutant. Biogenic sources, which can be relevant to the total amount, are not included in NMVOCs mainly emitted from solvent and product use. Road transport is the main emission source of NOx, while NH3 is prevalently emitted from livestock and fertilisation in agriculture activities. All these pollutants show a decrease in the Po Basin between 2013 and 2019.
In Table 4, the emission levels for the macro-sector of the SNAP classification are shown for the pollutants BC, EC, OC, LG, PM10 and PM2.5 for the three years of the PREPAIR emission inventory.
All the emissions of PM are taken from the atmospheric emission inventories of the Regions and Provinces. The emissions of OC, EC, BC and LG are calculated by a common run with the methodology previously described.
The total estimates of PM10 were 79 kt/y for 2013, which decreased to 69 kt/y for 2017 and to 66 kt/y for 2019. The main contribution of non-industrial combustion plants to PM10 remains quite stable over the years (56.59% for 2013, 56.53% for 2017 and 54.98% for 2019). Road transport contributed 19.9% (2013), 18.9% (2017) and 21.9% (2019) to total PM10. PM2.5 emissions show a similar trend and similar emission sources. In the Po Basin, total emissions reduced from 69 kt/y in 2013, to 59 kt/y in 2017 and to 54 kt/y in 2019. The main emission sources are non-industrial combustion plants (63.9% for 2013, 64.7% for 2017 and 64.9% for 2019). Road transport contributed to 14.7% (2013), 13.03% (2017) and 13.1% (2019) of the total emissions of PM2.5.
In the Po Basin, the emissions of BC show a reduction of 32% between 2013 and 2019, from 11 kt/y in 2013 to 8 kt/y in 2019.
The sectors that contributed most to the emissions of BC in 2013 were non-industrial combustion plants (43.6%), road transport (34.7%), and other mobile sources and machinery (14.5%). The 46.3% of BC emissions in 2017 from non-industrial combustion plants were related to residential heating using wood fuels, while the remaining emissions from road transport and other mobile sources and machinery sectors were due to diesel-powered transport and work vehicles (30%).
The reduction in BC emissions is due to a reduction in emissions from road traffic and from non-industrial combustion. In more detail, what contributed most to the reduction in BC were vehicle renewal policies, which resulted in a reduction of more than 50% for vehicles <3.5 t, cars, and heavy vehicles >3.5 t. A contribution to BC emission reduction is also due to the renewal of biomass domestic burners, with an overall impact lower than road traffic. For this reason, the relative impact of non-industrial combustion to the total emissions of BC increases from 2013 (43.68%) to 2019 (47.53%) while the impact of road transport decreased from 34.73% to 26.83% in the same period.
Similar analysis can be extended to EC emissions. The total emissions of EC were 12 kt/y in 2013, decreasing to 10 in 2017 and to 8 kt/y in 2019. The main contribution, related to non-industrial combustion plants, showed an increase from 45.2% (2013) to 46.8% (2017) and to 51.5% (2019), while it was possible to observe a significant decrease concerning the contribution related to road transport, starting from 40% in 2013 to 35.4% in 2017 and to 26.9% in 2019.
A decrease of 18% in the total emission of OC from anthropogenic sources was observed in the Po Basin, with figures falling from 29 kt/y in 2013 to 26 kt/y in 2017 and to 24 kt/y in 2019.
As reported in Table 5, biomass combustion is identified as the main source of OC, contributing to a substantial 79% of the total emissions. Road transport is responsible for 6.43% of OC emissions in 2019, with the majority attributed to the degradation of brakes, tires, and road surfaces (779 t/2019), and emissions from diesel engines (591 t/2019). Other sources and sinks, encompassing forest fires, fireworks, and cigarette smoke, comprised 5.38% of the total OC emissions for the year 2019.
As for LG emissions, the total estimated LG was 2.5 kt/y for 2013, which decreased to 2.2 in 2017 and to 2 kt/y in 2019. As shown in Table 4 and Table 5, the emission inventories for BC and EC are highly similar, which is expected due to the strong correlation between these carbon fractions [1]. A similar trend is observed between OC and LG, largely due to the influence of wood on total OC emissions.
The emission trend calculated in PREPAIR is consistent with the European and Italian emission reports (Table 6). For the period 2013–2019, the main indicators of per capita emissions (kg/Inhabit/y) and emission densities (kg/km2) agree in reduction.
The overall per capita emissions and emission densities of BC and PM in the Po Basin are comparable or lower than the parameters calculated for the EU-27 and Italy [51].
The spatial distribution of BC and OC emissions, as depicted in Figure 3 and Figure 4, is calculated using 2019 emission data from all municipalities, normalized by their respective areas. Separate distributions are also created for the road transport (Figure 3c and Figure 4c) and non-industrial combustion plant (Figure 3b and Figure 4b) sectors. The road network of the Po Basin is overlaid onto the road transport distribution, while the urban land use map is superimposed onto the non-industrial combustion plants’ distribution, largely linked to domestic heating.

Extension of the Emission Inventory for the Po Basin by Machine Learning to Italy

The very detailed results of the Emission Inventory for the Po Basin have been investigated by a machine learning approach. The scope of this activity is to develop a methodology to extend bottom-up inventories to larger domain areas. A random split of municipal emission estimates from the Po Basin defines a subset of training and testing data. The performances of Random Forest have been evaluated comparing them to the performances of the model predicting test data after each training step. The very large number of input parameters suggested the adoption of different approaches to reduce the possible effects of model overfitting. In a preliminary step before the training, an evaluation of the variable selection using the minimal depth of Random Forest defines the top ten input variables of the model for each pollutant considered. This step is the variable reduction. The training of Random Forest is than performed by variable importance.
To further reduce potential overfitting, the training test for RF has also been performed by removing higher values than 95 percentiles. The abovementioned general hypothesis has also been investigated considering, as model output variables, the total emission rates or annual emission density. The first can be directly obtained from the emission inventory and the latter is calculated by the ratio of the total emissions and the total area extension of the municipality. As a matter of fact, RF has been tested in four different cases:
  • Train RF with minimal depth, and variable reduction, considering emission density
  • Train RF with minimal depth, and variable reduction, considering 95-percentile emission density
  • Train RF with minimal depth, and variable reduction, considering emission rates
  • Train RF with minimal depth, and variable reduction, considering 95-percentile emission rates
All the results for training and testing of the model in these four cases are reported in Supplementary Materials (Tables S2–S5). For all pollutants are reported: the top ten input variable identified by the model, Pearson, Normalized Mean Bias, Normalized Mean Standard Deviation and Root Mean Square Error. The main performance metrics are defined in Table S6 of the Supplementary Materials.
A relevant validation of the model predictions by machine learning to the whole of Italy consists of the comparison of the results with available data from the top-down disaggregation of the Italian National Emission Inventory at Province level [45] and with the available emission rates from CAMS [46]. Figure 5 shows the comparison of the model estimates (_RF) with emissions reported for each Italian Province from the disaggregation of a top-down emission inventory (_NIR). It must be underlined that the compared data are obtained from totally independent processes of calculation.
Data reported with the prefix _NIR are obtained from the disaggregation of the emissions of pollutants and greenhouse gases estimated in the national inventory of emissions prepared by ISPRA. The estimates are based on a creation of a database of over 1,600,000 records involving the collection and processing of a considerable amount of statistical data of various kinds: demographic, economic, industrial production indicators (such as population, vehicle registration, air traffic, product consumption, fuel consumption, etc.) and other territorial indicators relating to land use (e.g., agricultural land, covered by forests or vegetation, etc.).
Very similar results are obtained when comparing RF predictions of this study with CAMS emission estimates for BC [52,53] and PM [54,55] (_CAMS). CAMS data are downloaded from ECCAD and combined with the administrative boundaries of Italian Provinces to estimate annual emission rates. The comparisons in Figure 5 span from an R2 of 0.76 to 0.87. The CAMS emission inventories generally start from the reported emissions by European countries to UNFCCC (for greenhouse gases) and to EMEP/CEIP (for air pollutants), adding harmonization for sectors between all pollutants and countries, and completing or combining information from other databases, such as the Emissions Database for Global Atmospheric Research (EDGARv5), the emissions provided by the Community Emissions Data System, and the emission datasets available from the IIASA GAINS model.
The municipal details for emission estimates by ML propagation of the LIFE PREPAIR emission inventories to the whole of Italy are reported in Figure 6. The volumes of 3D maps are proportional to the total annual emission calculated by RF and reported on the same scale. The total annual emissions of PM2.5 are more abundant than their carbonaceous fraction of BC and OC. In a similar way, the estimates of LG emissions are also comparable with a subfraction of the total amount of OC emissions.
As reported in Section 2, the bottom-up emission inventories considered a huge number of parameters and proxies, as well as the national top-down. In the machine learning, we implement a process using input data similar or quite equivalent to the common proxies used for Italian emission inventories.
The agreement between RF and top-down estimates is confirmed by correlation R2 in the order of about 0.9, totally comparable with the test and training phases of machine learning for the Po Basin local emission inventories.
As is reasonable, the higher emission rates for PM, OC, BC and LG are shown in proximity to the most urbanized and populated areas of Italy. By the selection of the most important variables in RF, the first three main input data, including the total area extension of the municipality, i.e., on the basis of the calculation of emission density, are the level of urbanization, the total number of workers moving out of the city, and the municipal extension of non-irrigated arable land and natural grasslands. Along with these variables, the number of residential buildings with 2 and 3 floors also seems to be playing a relevant role in the distribution of emissions.

4. Discussion

It can be very useful to combine the emission rates calculated for the Po Basin with measurements conducted in different sites of the Po Basin within the framework of LIFE PREPAIR [56]. By comparing the emission values taken from this work and the average concentrations from the measurements, we can see a good convergence in terms of a fair decrease of BC and EC from 2013 to 2019 [56]. In fact, the average concentration of EC shows a 39% reduction when considering this time range (reducing from 1.8 to 1.1 µg/m3), and we can approximately apply the same concept to BC. If we consider BC and EC emission values obtained in Table 1, the reduction observed from 2013 to 2019 is 32% for BC, while the reduction observed for EC is 34%. Moreover, the average concentration of BC, when considering the time range 2019–2023, decreases from 4.6 to 2.8 µg/m3 with a percentage reduction of 39%.
The importance of monitoring emissions of PM2.5, OC, EC, LG, NH3, NOx and SO2 is confirmed by studies on source apportionment of PM2.5 in the Po Basin based on Positive Matrix Factorization (PMF) [57]. These pollutants are directly connected to the most abundant chemical species measured in the PM2.5 in the Po Basin. The PMF methodology is a factorial decomposition technique based on a weighted least squares fit approach. To reduce the space of potential solutions, non-negativity constraints are imposed on chemical profiles and the contributions of identified factors. Uncertainty values are used to weigh the concentration data. In terms of primary components, the application of PMF on chemical composition in the Po Basin confirms that traffic and biomass burning are the most significant contributors to PM2.5 [57]. The relevance of secondary aerosols further suggests that it is challenging to lower concentration values due to unfavorable climatic conditions that promote air mass stagnation. Based on emission data and ammonium concentrations, an approximate calculation suggests that agriculture and livestock contribute to at least 10% of the total. This amount must also be increased by the exhaust emissions from off-road vehicles and pruning burnings, which are factored into the biomass burning factor [57].
As shown in the previous section, the propagation of the results by RF to Italy is in very good agreement with the estimates obtained from the top-down National Emission Inventory. The resulting emission maps are then affected, mainly by the level of urbanization or non-urbanization (e.g., non-irrigated arable land and natural grasslands), by building characteristics (smaller buildings have more technical feasibility for the installing of small domestic biomass burners) and by the potential demand for inter-municipal mobility due to the number of workers moving in/out of the city. Considering all these aspects, we find that it can be interesting to compare the PM10 emission map, calculated by ML, with the nighttime lights NASA Worldwide observation [58]. Figure 7 shows this comparison and demonstrates how higher emission areas are comparable with the densest night electrified zones observed from space.
The service of NASA’s ESDIS (Earth Science Data and Information System Project) allows the examination of over 1000 global, full-resolution satellite picture layers, interactively before downloading the underlying data. Many of the imaging layers are updated daily and are available within three hours of observation, practically depicting the whole Earth as it appears “right now”.
The NASA picture (Figure 7) refers to a night in 2019 and is compared with emission estimates for the same reference year. The areas shown of higher night light intensity can be in good approximation when assumed as a proxy for larger urbanized areas. The color band intensity can be combined with the administrative boundaries of the Italian municipalities. For each municipality, an indicator defined as the weighted average of light intensity on surface extension can be calculated. The comparison of this indicator with the municipal emission rate of PM10 predicted by ML shows a very high correlation, with an estimated R2 higher than 0.8.

5. Conclusions

In this paper, we present the methodology and the results of estimating emissions of EC, BC, OC and LG in the Po Basin, starting from the emission dataset on particulate matter collected in the framework of the LIFE PREPAIR Project. The understanding of emission sources of BC is relevant for its impact on air quality, human health and climate effect. The emission estimates of these carbonaceous fractions of particulate matter also play a relevant role as the basis for source apportionment study. Non-industrial combustion and road transport are confirmed as the main emission sources of OC. In the case of BC and EC, these sources are still relevant but with an inversion in the order of importance. The emission of LG can be calculated by speciation of OC as reported in other relevant studies. This method of estimating LG’s emissions is intended only as a useful means to identify the sources from biomass combustion within the territory, given the importance of these in the total emission of carbonaceous components of atmospheric aerosols. The emission reduction of BC for 2013–2019 is confirmed by national emission inventories in Italy and in the EU-27. In the Po Basin, it is also confirmed by the variation in measured mean concentration reported in the analysis program of the PREPAIR Project.
The emission estimates for the Po Basin reported in the LIFE PREPAIR follow the general bottom-up approach, which in some cases uses the same input parameters as the top-down for the National Emission Inventory in Italy. In this paper, we present a methodology for extending the results of a local emission inventory to a larger domain by machine learning. The training of Random Forest requires a very deep knowledge of the data on emission sources and estimates from local emission inventories, which remain relevant inputs for the implemented methodology.
The areas with higher emission rates of PM and its carbonaceous fractions are mainly characterized by high urbanization and high mobility demand. This is confirmed by emission inventory estimates, by the variable relative importance in machine learning and by the qualitative comparison of the emission maps with urbanization level observed by remote sensing.
This study provides a benchmark for source apportionment and air quality modelling studies, both addressed in the identification of areas with specific emission source patterns and in the improvement of emission estimates. The development of bottom-up local emission inventories as a top-down process involves a huge number of variables, determining that operations are time-consuming and affected by a certain degree of discretion for each inventory edition. The proposed ML approach can allow a drastic reduction in elaboration time, and an increase in reproducibility and comparability of different emission inventories and their annual update. Above all, the systematic development of ML can assist local emission compilers in the definition of missing inputs necessary to the implementation of detailed emission inventories. Further development of this methodology will be the study with ML of specific input parameters for highly detailed algorithms in emission estimates, within the framework of activities related to missing data completion.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/air2040022/s1, Table S1. Source of data and variable definition for the Input to Random Forest; Figure S1. Data on Taxations; Figure S2. Data on Municipalities; Figure S3. Data on Heating Degree Days HDD; Figure S4. Data on Dwellings and Population; Figure S5. Data on Livestock; Figure S6. Data on Landcover; Figure S7. Data on Occupation for Sector; Table S2. Train RF with minimal depth, variable reduction, considering emission density; Table S3. Train RF with minimal depth, variable reduction, considering 95-perc emission density; Table S4. Train RF with minimal depth, variable reduction, considering emission rates; Table S5. Train RF with minimal depth, variable reduction, considering 95-perc emission rates; Table S6. Performance metrics definitions.

Author Contributions

Conceptualization, A.M. and E.A.; methodology, A.M.; software, A.M.; validation, A.M. and G.G.D.; formal analysis, A.M.; investigation, A.M. and G.G.D.; resources, A.M., A.G.C., F.P. and G.G.D.; data curation, A.M., G.G.D., A.G.C., F.P., M.M. and G.F.; writing—original draft preparation, A.M., G.G.D., A.G.C., M.M., F.P. and G.F.; writing—review and editing, E.A. and A.M.; visualization, A.M., G.G.D. and A.G.C.; supervision, A.M.; project administration, E.A.; funding acquisition, E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by LIFE-IP PREPAIR (Po Regions Engaged to Policies of AIR) project, Grant Number LIFE15 IPE/IT/000013 and by REGIONE LOMBARDIA with Proposta di proseguimento del progetto di monitoraggio dell’ammoniaca dal comparto agricolo 2024–2026. identification code n. 0043296—14 March 2024.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The updated emission estimates for the Po Basin can be visualized by the public: https://emitool.arpalombardia.it/index?lingua=en (accessed on 28 June 2024).

Acknowledgments

Acknowledgments are given to all the Beneficiaries of LIFE-IP PREPAIR: Emilia-Romagna Region (Project Coordinator) Veneto Region, Lombardy Region, Piedmont Region, Friuli Venezia Giulia Region, Autonomous Province of Trento, Regional Agency for Environment of Emilia-Romagna (ARPAE), Regional Agency for Environment of Veneto, Regional Agency for Environment of Piedmont, Regional Agency for Environmental Protection of Lombardy, Environmental Protection Agency of Valle d’Aosta, Environmental Protection Agency of Friuli Venezia Giulia, Slovenian Environment Agency, Municipality of Bologna, Municipality of Milan, City of Turin, ART-ER, Lombardy Foundation for Environment. We acknowledge the use of imagery from the NASA Worldview application (https://worldview.earthdata.nasa.gov), part of the NASA Earth Science Data and Information System (ESDIS). This publication has been prepared using the European Union’s Copernicus Land Monitoring Service information; https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac. Acknowledgments are also given to ECCAD for the archiving and distribution of the data on emissions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lavanchy, V.M.H.; Gäggeler, H.W.; Schotterer, U.; Schwikowski, M.; Baltensperger, U. Historical record of carbonaceous particle concentrations from a European high-alpine glacier (Colle Gnifetti, Switzerland). J. Geophys. Res. 1999, 104, 21227–21236. [Google Scholar] [CrossRef]
  2. Caserini, S.; Galante, S.; Ozgen, S.; Cucco, S.; de Gregorio, K.; Moretti, M. A methodology for elemental and organic carbon emission inventory and results for Lombardy region, Italy. Sci. Total Environ. 2013, 450–451, 22–30. [Google Scholar] [CrossRef] [PubMed]
  3. Salma, I.; Chi, X.; Maenhaut, W. Elemental and organic carbon in urban canyon and background environments in Budapest, Hungary. Atmos. Environ. 2004, 38, 27–36. [Google Scholar] [CrossRef]
  4. Ramanathan, V.; Carmichael, G. Global and regional climate changes due to black carbon. Nat. Geosci. 2008, 1, 221–227. [Google Scholar] [CrossRef]
  5. Bond, T.C.; Doherty, S.J.; Fahey, D.W.; Forster, P.M.; Berntsen, T.; De Angelo, B.J.; Flanner, M.G.; Ghan, S.; Karcher, B.; Koch, D.; et al. Bounding the role of black carbon in the climate system: A scientific assessment. J. Geophys. Res. Atmos. 2013, 118, 5380–5552. [Google Scholar] [CrossRef]
  6. Rönkkö, T.; Saarikoski, S.; Kuittinen, N.; Karjalainen, P.; Keskinen, H.; Järvinen, A.; Mylläri, F.; Aakko-Saksa, P.; Timonen, H. Review of black carbon emission factors from different anthropogenic sources. Environ. Res. Lett. 2023, 18, 033004. [Google Scholar] [CrossRef]
  7. WHO. Improving the Capacity of Countries to Report on Air Quality in Cities: Users’ Guide to the Repository of United Nations Tools; Global Report; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
  8. Gidhagen, L.; Krecl, P.; Créso Targino, A.; Polezer, G.; Godoi, R.H.M.; Felix, E.P.; Cipoli, Y.; Charres, I.; Malucelli, F.; Wolf, A.; et al. An integrated assessment of the impacts of PM2.5 and black carbon particles on the air quality of a large Brazilian city. Air Qual. Atmos. Health 2021, 14, 1455–1473. [Google Scholar] [CrossRef]
  9. Xiong, R.; Li, J.; Zhang, Y.; Zhang, L.; Jiang, K.; Zheng, H.; Kong, S.; Shen, H.; Cheng, H.; Shen, G.; et al. Global brown carbon emission from combustion sources. Environ. Sci. Ecotechnol. 2022, 12, 100201. [Google Scholar] [CrossRef]
  10. Simoneit, B.R.T. A review of biomarker compounds as source indicators and tracers for air pollution. Environ. Sci. Pollut. 1999, 6, 159–169. [Google Scholar] [CrossRef]
  11. Simpson, D.; Kuenen, J.; Fagerli, H.; Heinesen, D.; Benedictow, A.; van der Gon, H.D.; Visschedijk, A.; Klimont, Z.; Aas, W.; Lin, Y.; et al. Revising PM2.5 Emissions from Residential Combustion, 2005–2019 Implications for Air Quality Concentrations and Trends; Nordic Council of Ministers: Copenhagen, Denmark, 2022; ISBN 978-92-893-7357-9. [Google Scholar] [CrossRef]
  12. Leithead, A.; Li, S.-M.; Hoff, R.; Cheng, Y.; Brook, J. Levoglucosan and dehydroabietic acid: Evidence of biomass burning impact on aerosols in the Lower Fraser Valley. Atmos. Environ. 2006, 40, 2721–2734. [Google Scholar] [CrossRef]
  13. Bhattarai, H.; Saikawa, E.; Wan, X.; Zhu, H.; Ram, K.; Gao, S.; Kang, S.; Zhang, Q.; Zhang, Y.; Wu, G.; et al. Levoglucosan as a tracer of biomass burning: Recent progress and perspectives. Atmos. Res. 2019, 220, 20–33. [Google Scholar] [CrossRef]
  14. Piazzalunga, A.; Belis, C.; Bernardoni, V.; Cazzuli, O.; Fermoa, P.; Valli, G.; Vecchi, R. Estimates of wood burning contribution to PM by the macro-tracer method using tailored emission factors. Atmos. Environ. 2011, 45, 6642–6649. [Google Scholar] [CrossRef]
  15. Galindo, N.; Clemente, A.; Yubero, E.; Nicolàs, J.F.; Crespo, J. PM10 chemical composition at a residential site in the western mediterranean: Estimation of the contribution of biomass burning from levoglucosan and its isomers. Environ. Res. 2020, 196, 110394. [Google Scholar] [CrossRef]
  16. Kaskaoutis, D.G.; Grivas, G.; Oikonomou, K.; Tavernaraki, P.; Papoutsidaki, K.; Tsagkaraki, M.; Stavroulas, I.; Zarmpas, P.; Paraskevopoulou, D.; Bougiatioti, A.; et al. Impacts of severe residential wood burning on atmospheric processing, water-soluble organic aerosol and light absorption, in an inland city of Southeastern Europe. Atmos. Environ. 2022, 280, 119139. [Google Scholar] [CrossRef]
  17. Hoffmann, D.; Tilgner, A.; Iinuma, Y.; Herrmann, H. Atmospheric stability of levoglucosan: A detailed laboratory study. Environ. Sci. Technol. 2010, 44, 694–699. [Google Scholar] [CrossRef] [PubMed]
  18. Zotter, P.; Ciobanu, V.G.; Zhang, Y.L.; El-Haddad, I.; Macchia, M.C.; Daellenbach, K.R.; Salazar, G.; Huang, R.-J.; Wacker, L.; Hueglin, C.; et al. Radiocarbon analysis of elemental and organic carbon in Switzerland during winter-smog episodes from 2008 to 2012—Part 1: Source apportionment and spatial variability. Atmos. Chem. Phys. 2014, 14, 13551–13570. [Google Scholar] [CrossRef]
  19. Hedberg, E.; Johansson, C.; Johansson, L.; Swietlicki, E.; Brorström-Lundèn, E. Is levoglucosan a suitable quantitative tracer for wood burning? Comparison with receptor modeling on trace elements in Lycksele, Sweden. J. Air Waste Manage. Assoc. 2006, 56, 1669–1678. [Google Scholar] [CrossRef]
  20. Gelencsér, A.; May, B.; Simpson, D.; Sànchez-Ochoa, A.; Kasper-Giebl, A.; Puxbaum, H.; Caseiro, A.; Pio, C.; Legrand, M. Source apportionment of PM2.5 organic aerosol over Europe: Primary/secondary, natural/anthropogenic, fossil/biogenic origin. J. Geophys. Res. 2007, 112, D23S04. [Google Scholar] [CrossRef]
  21. Szidat, S.; Ruff, M.; Perron, N.; Wacker, L.; Synal, H.-A.; Hallquist, M.; Shannigrahi, A.S.; Yttri, K.E.; Dye, C.; Simpson, D. Fossil and non-fossil sources of organic carbon (OC) and elemental carbon (EC) in Göteborg, Sweden. Atmos. Chem. Phys. 2009, 9, 1521–1535. [Google Scholar] [CrossRef]
  22. Szidat, S.; Jenk, T.M.; Gäggler, H.W.; Synal, H.-A.; Fisseha, R.; Baltensperger, U.; Kalberer, M.; Samburova, V.; Reimann, S.; Kasper-Giebl, A.; et al. Radiocarbon (14C)-deduced biogenic and anthropogenic contributions to organic carbon (OC) of urban aerosols from Zürich, Switzerland. Atmos. Environ. 2004, 38, 4035–4044. [Google Scholar] [CrossRef]
  23. Yttri, K.E.; Simpson, D.; Stenström, K.; Puxbaum, H.; Svendby, T. Source apportionment of the carbonaceous aerosol in Norway -- Quantitative estimates based on 14C, thermaloptical and organic tracer analysis. Atmos. Chem. Phys. 2011, 11, 9375–9394. [Google Scholar] [CrossRef]
  24. Yttri, K.E.; Dye, C.; Braathen, O.-A.; Simpson, D.; Steinnes, E. Carbonaceous aerosols at urban influenced sites in Norway. Atmos. Chem. Phys. 2009, 9, 2007–2020. [Google Scholar] [CrossRef]
  25. Yttri, K.E.; Schnelle-Kreis, J.; Maenhaut, W.; Abbaszade, G.; Alves, C.; Bjerke, A.; Bonnier, N.; Bossi, R.; Claeys, M.; Dye, C.; et al. An intercomparison study of analytical methods used for quantification of levoglucosan in ambient aerosol filter samples. Atmos. Meas. Tech. 2015, 8, 125–147. [Google Scholar] [CrossRef]
  26. Veratti, G.; Stortini, M.; Amorati, R.; Bressan, L.; Giovannini, G.; Bande, S.; Bissardella, F.; Ghigo, S.; Angelino, E.; Colombo, L.; et al. Impact of NOx and NH3 Emission Reduction on Particulate Matter across Po Valley: A LIFE-IP-PREPAIR Study. Atmosphere 2023, 14, 762. [Google Scholar] [CrossRef]
  27. Colombo, L.; Marongiu, A.; Fossati, G.; Malvestiti, G.; Angelino, E. PM2.5 wintertime sensitivity to changes in NOx, SO2, and NH3 emissions in Lombardy Region. Air Qual Atmos Health 2024, 17, 1451–1466. [Google Scholar] [CrossRef]
  28. Marongiu, A.; Collalto, A.G.; Distefano, G.G.; Angelino, E. Application of Machine Learning to Estimate Ammonia Atmospheric Emissions and Concentrations. Air 2024, 2, 38–60. [Google Scholar] [CrossRef]
  29. Park, J.; Kim, E.; Oh, S.; Kim, H.; Kim, S.; Kim, Y.P.; Song, M. Contributions of Ammonia to High Concentrations of PM2.5 in an Urban Area. Atmosphere 2021, 12, 1676. [Google Scholar] [CrossRef]
  30. Kabelitz, T.; Ammon, C.; Funk, R.; Münch, S.; Biniasch, O.; Nübel, U.; Thiel, N.; Rösler, U.; Siller, P.; Amon, B.; et al. Functional relationship of particulate matter (PM) emissions, animal species and moisture content during manure application 2020. Env. Int. 2020, 143, 105577. [Google Scholar] [CrossRef] [PubMed]
  31. Marongiu, A.; Angelino, E.; Moretti, M.; Malvestiti, G.; Fossati, G. Atmospheric Emission Sources in the Po-Basin from the LIFE-IP PREPAIR Project 2022, Open Journal of Air Pollution, Vol.11 No.3, September 2022; Environmental Protection Agency of Lombardia Region, Air Quality Modeling and Inventory Unit, Monitoring Sector ARPA: Milano, Italy, 2022. [Google Scholar]
  32. Raffaelli, K.; Deserti, M.; Stortini, M.; Amorati, R.; Vasconi, M.; Giovannini, G. Improving Air Quality in the Po Valley, Italy: Some Results by the LIFE-IP-PREPAIR Project. Atmosphere 2020, 11, 429. [Google Scholar] [CrossRef]
  33. DIRECTIVE (EU) 2008/50 on Ambient Air Quality and Cleaner Air for Europe. Available online: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2008:152:0001:0044:EN:PDF (accessed on 28 June 2024).
  34. D.lgs. 155/10. Attuazione Della Direttiva 2008/50/CE Relativa Alla Qualità Dell’aria Ambiente e per Un’aria Più Pulita in Europa. Available online: https://web.camera.it/parlam/leggi/deleghe/10155dl.htm (accessed on 28 June 2024).
  35. EMEP/EEA Air Pollutant Emission Inventory Guidebook 2016. Available online: https://www.eea.europa.eu/publications/emep-eea-guidebook-2016 (accessed on 2 September 2024).
  36. EMEP/EEA Air pollutant EMISSION inventory GUIDEBOOK 2019. Available online: https://www.eea.europa.eu/publications/emep-eea-guidebook-2019 (accessed on 2 September 2024).
  37. CTN_ACE. Linee Guida Agli inventari LOCALI di Emissioni in Atmosfera. 2001. Available online: https://www.isprambiente.gov.it/files/aria/lineeguidainventariemissioniatmosfera.pdf (accessed on 28 June 2024).
  38. SNPA. Inventari regionali delle emissioni in atmosfera e loro articolazione a livello locale. 2016. Available online: https://www.snpambiente.it/wp-content/uploads/2018/11/DOC-78_CF-Inventari-emisisoni-in-atm-con-allegati.pdf (accessed on 28 June 2024).
  39. ARPA Lombardia. IN.EM.AR. Official Site. 2021. Available online: www.inemar.eu (accessed on 2 September 2024).
  40. Kupiainen, K.; Klimont, Z. Primary emissions of fine carbonaceous particles in Europe. Atmos. Environ. 2007, 41, 2156–2170. [Google Scholar] [CrossRef]
  41. Winther, M.; Nielsen, O.K. Technology dependent BC and OC emissions for Denmark, Greenland and the Faroe Islands calculated for the time period 1990–2030. Atmos. Environ. 2011, 45, 5880–5895. [Google Scholar] [CrossRef]
  42. Hu, Y.; Kong, S.; Cheng, Y.; Shen, G.; Liu, D.; Wang, S.; Guo, L.; Fu, P. Identification and Parametrization of Key Factors Affecting Levoglucosan Emission During Solid Fuel Burning. Environ. Sci. Technol. 2023, 57, 20043–20052. [Google Scholar] [CrossRef] [PubMed]
  43. Jimenez, J.; Farias, O.; Quiroz, R.; Yañez, J. Emission factors of particulate matter, polycyclic aromatic hydrocarbons, and levoglucosan from wood combustion in south-central Chile. J Air Waste Manag. Assoc. 2017, 67, 806–813. [Google Scholar] [CrossRef] [PubMed]
  44. De Lotto, R.; Bellati, R.; Moretti, M. Correlation Methodologies between Land Use and Greenhouse Gas emissions: The Case of Pavia Province (Italy). Air 2024, 2, 86–108. [Google Scholar] [CrossRef]
  45. ISPRA. La disaggregazione a livello provinciale dell’inventario nazionale delle emissioni, ISPRA Rapporti 369/2022. Tech. Rep. 2009, 92, 2009. [Google Scholar]
  46. Emissions of Atmospheric Compounds and Compilation of Ancillary Data—ECCAD. Available online: https://eccad.sedoo.fr/#/catalogue (accessed on 25 July 2024).
  47. Ishwaran, H.; Kogalur, U.B. Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). 2023. Available online: https://ishwaran.org/ (accessed on 1 April 2023).
  48. Ishwaran, H.; Kogalur, U.B. Random Survival Forests for R. R News 2007, 7, 25–31. [Google Scholar]
  49. Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random Survival Forests 1. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
  50. Ishwaran, H.; Lu, M.; Lauer, M.S.; Blackstone, E.H.; Kogalur, U.B. randomForestSRC: Getting Started with randomForestSRC Vignette. 2021. Available online: http://randomforestsrc.org/articles/survival.html (accessed on 22 June 2023).
  51. EEA 2023, Air pollutant emissions data viewer of the data contained in the EU emission inventory report 1990-2021 under the UNECE Convention on Long-range Transboundary Air Pollution (LRTAP). Available online: https://www.eea.europa.eu/data-and-maps/dashboards/air-pollutant-emissions-data-viewer-5 (accessed on 2 July 2024).
  52. Granier, C.; Darras, S.; van der Gon, H.D.; Doubalova, J.; Elguindi, N.; Galle, B.; Gauss, M.; Guevara, M.; Jalkanen, J.-P.; Kuenen, J.; et al. The Copernicus Atmosphere Monitoring Service Global and Regional Emissions 2019 (April 2019 Version) Report April 2019 Version. Ph.D. Dissertation, Copernicus Atmosphere Monitoring Service, Reading, UK. [CrossRef]
  53. Soulie, A.; Granier, C.; Darras, S.; Zilbermann, N.; Doumbia, T.; Guevara, M.; Jalkanen, J.-P.; Keita, S.; Liousse, C.; Crippa, M.; et al. Global Anthropogenic Emissions (CAMS-GLOB-ANT) for the Copernicus Atmosphere Monitoring Service 2023, Simulations of Air Quality Forecasts and Reanalyses. Earth Syst. Sci. Data 2023, 2023, 1–45. [Google Scholar]
  54. Kuenen, J.; Dellaert, S.; Visschedijk, A.; Jalkanen, J.-P.; Super, I.; van der Gon, H.D. Copernicus Atmosphere Monitoring Service Regional Emissions Version 4.2 (CAMS-REG-v4.2) 2021, Copernicus Atmosphere Monitoring Service; ECCAD: Abu Dhabi, United Arab Emirates, 2021. [Google Scholar] [CrossRef]
  55. Kuenen, J.; Dellaert, S.; Visschedijk, A.; Jalkanen, J.-P.; Super, I.; van der Gon, H.D. CAMS-REG-v4: A state-of-the-art high-resolution European emission inventory for air quality modelling. Earth Syst. Sci. Data 2022, 14, 491–515. [Google Scholar] [CrossRef]
  56. Life Prepair 2023, Monitoring the Environmental Effects of Pollutants Reduction Measures Implemented by Air Quality Improvement Plans. Report Action D6, PM10 Chemical Composition and Source Apportionment on Special Stations. Available online: https://www.lifeprepair.eu/?smd_process_download=1&download_id=9494 (accessed on 2 July 2024).
  57. Scotto, F.; Bacco, D.; Lasagni, S.; Trentini, A.; Poluzzi, V.; Vecchi, R. A multi-year source apportionment of PM2.5 at multiple sites in the southern Po Valley (Italy). Atmos. Pollut. Res. 2021, 12, 101192. [Google Scholar] [CrossRef]
  58. NASA Worldview. 2019. Available online: https://worldview.earthdata.nasa.gov/?v=-14.44003408250736,31.04465552181413,37.46215471434639,56.80652319025247&l=Reference_Labels_15m(hidden),Reference_Features_15m(hidden),Coastlines_15m(hidden),VIIRS_SNPP_DayNightBand_ENCC,VIIRS_SNPP_CorrectedReflectance_TrueColor&lg=false&t=2019-06-26-T03%3A25%3A30Z (accessed on 2 September 2024).
Figure 1. Characterization of the Po Basin study area, with emphasis on urbanized area, road networks, and topographical features.
Figure 1. Characterization of the Po Basin study area, with emphasis on urbanized area, road networks, and topographical features.
Air 02 00022 g001
Figure 2. Processes for local and national emission inventories in Italy (adapted from SNPA 2016 [38]).
Figure 2. Processes for local and national emission inventories in Italy (adapted from SNPA 2016 [38]).
Air 02 00022 g002
Figure 3. Spatial distribution of total OC emissions (a), non-industrial combustion (b) and road transport (c).
Figure 3. Spatial distribution of total OC emissions (a), non-industrial combustion (b) and road transport (c).
Air 02 00022 g003
Figure 4. Spatial distribution of total BC emission (a), non-industrial combustion (b) and road transport (c).
Figure 4. Spatial distribution of total BC emission (a), non-industrial combustion (b) and road transport (c).
Air 02 00022 g004
Figure 5. Comparison of Emissions in Italian Provinces. ML calculation (_RF), Top-down of the National Emission Inventory (_NIR) and CAMS Emissions (_CAMS).
Figure 5. Comparison of Emissions in Italian Provinces. ML calculation (_RF), Top-down of the National Emission Inventory (_NIR) and CAMS Emissions (_CAMS).
Air 02 00022 g005
Figure 6. Emission density maps for PM2.5, OC, BC and LG estimated for Italy by ML propagation of the Po Basin inventories (t/km2).
Figure 6. Emission density maps for PM2.5, OC, BC and LG estimated for Italy by ML propagation of the Po Basin inventories (t/km2).
Air 02 00022 g006aAir 02 00022 g006b
Figure 7. NASA Worldview (left) and PM10 emission map calculated by ML (right).
Figure 7. NASA Worldview (left) and PM10 emission map calculated by ML (right).
Air 02 00022 g007
Table 1. Average relative abundance of EC, OC and BC in PM calculated for different fuels.
Table 1. Average relative abundance of EC, OC and BC in PM calculated for different fuels.
FuelEC/TSPOC/TSPBC/PM2.5
Gasoline29%40%17%
Coal0.1%0.1%6%
Diesel62%21%54%
Refinery Gas7%75%15%
Gasoil8%2%39%
LPG7%75%7%
Kerosene70%30%15%
Biomass12%36%14%
Natural Gas7%75%6%
Fuel Oil5%1%34%
Table 2. Definition of the dataset used for the implementation of Random Forest.
Table 2. Definition of the dataset used for the implementation of Random Forest.
IndicatorsVariablesData SourceReference
EmissionsAnnual emissions in t/year
BC, EC, OC, LG, NH3, PM10, PM2.5
LIFE PREPAIR EMITOOLhttps://emitool.arpalombardia.it/home
(accessed on 28 June 2024)
Taxations
(5×)
Number of taxpayers and income values (in euros)Ministry of Economy and Financehttps://www1.finanze.gov.it/finanze/analisi_stat/public/index.php?tree=2020
(accessed on 2 July 2024)
Municipality
(8×)
Geographical and demographic characteristics of a municipalityItalian National Statistical Institutehttps://www.istat.it/it/archivio/156224
(accessed on 25 July 2024)
Heating Degree Days HDD 1×Sum of daily differences between 20 °C and the municipal average temperaturesItalian Institute for Environmental Protection and Researchhttps://scia.isprambiente.it/
(accessed on 28 June 2024)
Dwellings and Population
(23×)
Residential building types and labor force employed in various economic activitiesItalian National Statistical Institutehttps://esploradati.censimentopopolazione.istat.it/databrowser/#/it/censtest/categories/SUB_MUN_DATA
(accessed on 28 June 2024)
Livestock
(7×)
Number of heads: dairy and non-dairy cows, sows and other pigs, broilers, laying hens and other poultryNational Veterinary Informative Systemhttps://www.vetinfo.it/j6_statistiche/#/
(accessed on 25 July 2024)
LANDCOVER
(44×)
Land coverage and land use spatial dataEuropean Environment Agencyhttps://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac
(accessed on 28 June 2024)
Occupation for Sector
(78×)
Occupations employed within a specific sector of the economyItalian National Statistical Institutehttps://www.istat.it/it/archivio/16777
(accessed on 25 July 2024)
Table 3. Share and total emission estimates of Particulate Matter Precursors in the Po Basin.
Table 3. Share and total emission estimates of Particulate Matter Precursors in the Po Basin.
YearMacrosectorSO2NOxNMVOCNH3
20131—Combustion in energy and transformation industries17.99%6.32%0.33%0.02%
2—Non-industrial combustion plants6.69%9.33%10.59%0.37%
3—Combustion in manufacturing industry41.17%15.45%1.79%0.19%
4—Production processes20.67%2.60%8.59%0.10%
5—Fuel extraction and distribution0.00%0.05%4.46%0.00%
6-Solvent and other product use0.07%0.23%58.03%0.08%
7—Road transport0.52%50.25%13.56%1.12%
8—Other mobile sources and machinery9.83%13.50%1.65%0.00%
9—Waste treatment and disposal2.46%1.38%0.33%0.73%
10—Agriculture0.31%0.72%0.25%97.32%
11—Other sources and sinks0.28%0.18%0.42%0.08%
Totalfor the Po Basin (t/y)49,930388,766422,931256,238
20171—Combustion in energy and transformation industries15.46%6.95%0.41%0.04%
2—Non-industrial combustion plants6.67%10.45%10.57%0.54%
3—Combustion in manufacturing industry48.35%15.54%2.28%0.22%
4—Production processes23.18%2.56%10.25%0.13%
5—Fuel extraction and distribution0.00%0.00%5.97%0.00%
6—Solvent and other product use0.06%0.25%54.30%0.02%
7—Road transport0.64%47.93%13.24%0.98%
8—Other mobile sources and machinery2.83%14.01%1.52%0.00%
9—Waste treatment and disposal1.85%1.20%0.31%0.73%
10—Agriculture0.36%0.74%0.30%97.21%
11—Other sources and sinks0.60%0.35%0.87%0.12%
Totalfor the Po Basin (t/y)39,666340,615353,477254,661
20191—Combustion in energy and transformation industries14.47%6.44%0.63%0.03%
2—Non-industrial combustion plants7.03%9.71%8.37%1.75%
3—Combustion in manufacturing industry47.89%15.51%1.99%0.33%
4—Production processes22.51%2.30%10.58%0.12%
5—Fuel extraction and distribution0.00%0.00%6.57%0.00%
6—Solvent and other product use0.03%0.07%59.83%0.01%
7—Road transport0.69%49.23%9.34%0.93%
8—Other mobile sources and machinery2.56%13.90%1.33%0.00%
9—Waste treatment and disposal3.68%1.60%0.28%0.40%
10—Agriculture0.48%0.92%0.30%96.29%
11—Other sources and sinks0.65%0.34%0.78%0.13%
Totalfor the Po Basin (t/y)32,535307,462351,610240,918
Table 4. Share and total emission estimates of Primary Particulate Matter and Carbonaceous fractions in the Po Basin.
Table 4. Share and total emission estimates of Primary Particulate Matter and Carbonaceous fractions in the Po Basin.
YearMacrosectorBCECLGOCPM10PM2.5
20131—Combustion in energy and transformation industries0.38%0.42%0.31%0.87%0.62%0.60%
2—Non-industrial combustion plants43.68%45.26%93.35%79.87%56.59%63.90%
3—Combustion in manufacturing industry4.19%2.89%0.94%2.99%4.15%3.74%
4—Production processes0.02%0.01%0.00%0.02%2.87%1.97%
6—Solvent and other product use0.00%0.00%0.00%0.00%4.88%5.06%
7—Road transport34.73%40.04%0.00%8.54%19.97%14.71%
8—Other mobile sources and machinery14.54%7.69%0.00%2.09%3.72%4.17%
9—Waste treatment and disposal0.09%0.05%0.00%0.03%0.08%0.09%
10—Agriculture0.66%1.49%2.19%1.86%3.99%2.53%
11—Other sources and sinks1.71%2.16%3.21%3.72%3.12%3.23%
Totalfor the Po Basin (t/y)11,44512,257250129,36579,12169,119
20171—Combustion in energy and transformation industries0.50%0.57%0.45%0.98%0.69%0.72%
2—Non-industrial combustion plants46.28%46.78%90.71%78.08%56.53%64.76%
3—Combustion in manufacturing industry4.60%3.53%0.97%3.39%4.14%3.94%
4—Production processes0.04%0.00%0.00%0.02%3.13%2.24%
6—Solvent and other product use0.00%0.00%0.00%0.00%3.34%3.32%
7—Road transport29.91%35.45%0.00%7.66%18.87%13.03%
8—Other mobile sources and machinery14.88%7.85%0.00%2.01%3.49%3.72%
9—Waste treatment and disposal0.10%0.05%0.01%0.03%0.08%0.09%
10—Agriculture0.78%1.71%2.34%2.00%4.73%3.05%
11—Other sources and sinks2.90%4.04%5.52%5.84%5.00%5.12%
Totalfor the Po Basin (t/y)925710,115222425,97768,62259,096
20191—Combustion in energy and transformation industries0.53%0.71%0.40%1.22%0.70%0.81%
2—Non-industrial combustion plants47.53%51.47%90.83%78.77%54.98%64.90%
3—Combustion in manufacturing industry3.99%3.23%0.75%3.03%3.49%3.51%
4—Production processes0.02%0.00%0.00%0.02%2.93%2.24%
6—Solvent and other product use0.00%0.00%0.00%0.00%2.78%2.94%
7—Road transport26.83%26.94%0.00%6.43%21.87%13.10%
8—Other mobile sources and machinery17.19%10.25%0.00%2.23%3.53%4.27%
9—Waste treatment and disposal0.70%0.73%0.74%0.69%0.86%1.00%
10—Agriculture0.96%2.24%2.60%2.23%4.93%3.42%
11—Other sources and sinks2.24%4.42%4.67%5.38%3.93%3.82%
Totalfor the Po Basin (t/y)77978038207624,19265,98753,726
Table 5. Summary of BC, EC, OC, PM10 and PM2.5 emissions in the Po Basin in 2019 [t/year].
Table 5. Summary of BC, EC, OC, PM10 and PM2.5 emissions in the Po Basin in 2019 [t/year].
TypePM10PM2.5BCECOC%BC/PM10%EC/PM10%OC/PM10
t year−1t year−1t year−1t year−1t year−1
DieselPassenger cars—Highways2322321351564458%68%19%
Passenger cars—Extra urban3293291912226358%68%19%
Passenger cars—Urban64464437443512358%68%19%
Light duty vehicles < 3.5 t—Highways1991991161343858%68%19%
Light duty vehicles < 3.5 t—Extra urban11411466772258%68%19%
Light duty vehicles < 3.5 t—urban3193191862166158%68%19%
Heavy duty vehicles > 3.5 t and buses—Highways4024022042428851%60%22%
Heavy duty vehicles > 3.5 t and buses—Extra urban3333331702017351%60%22%
Heavy duty vehicles > 3.5 t and buses—Urban3613611842187951%60%22%
Mopeds (<50 cm3)—Urban2.82.81.61.80.657%62%21%
Railways, airports, shipping, other transports3223151821538157%47%25%
Total325732501809205767256%63%21%
Biomass combustionStove burning wood18,46717,79418432389968410%13%52%
Stove burning pellet1092108516511630415%11%28%
Closed fireplace burning wood55905388871588294216%11%53%
Closed fireplace burning pellet6968107.11815%10%26%
Traditional open fireplace4060390527734023397%8%58%
Boiler < 50 MW—industrial combustion84382623519815128%23%18%
Residential boiler burning wood80278511414339314%18%49%
Residential boiler burning pellet11811718123115%10%26%
Boiler < 50 MW—district heating16916425357915%21%47%
Boiler >=50 e < 300 MW—Industrial combustion5.45.31.51.32.327%24%42%
Other (Pizza ovens, kitchens, etc.)5864551039250532397%9%55%
Total37,08035,6493951433619,18211%12%52%
Other combustionsGasoline and diesel—off road1712171299563344658%37%26%
On field burning of stubble, straw and other agricultural wastes139013011172326958%17%50%
Coal26120.40.010.012%0%0%
Fuel oil3531172.20.448%6%1%
LPG21211.41.5167%7%75%
Forest fires12561028931945817%15%46%
Combustions with contact (cement, foundries, etc.)86148614151392%2%16%
Fireworks6893767915833211%23%48%
Gasoline—road transport351351719315020%27%43%
Natural gas81580634656924%8%85%
Tobacco6486483.23.23891%0%60%
Gas and naphta in oil refineries57578.64.04315%7%75%
Residential and industrial gas oil12112130298.825%24%7%
Waste incineration28272.72.06.110%7%22%
Minor combustions processes50334.74.1169%8%31%
Total8060700914711436351318%18%44%
Other sourcesNon exhaust emissions from road vehicles673336483631357795%2%12%
Industrial processes193512051.80.14.00%0%0%
Total866848523651357834%2%9%
Other TSP emissions8921296520074412%1%0%
Total 65,98753,7267797803824,19212%12%37%
Table 6. Total emission estimates for the Po Basin compared to Italy and the EU-27 by considering per capita and overall emission density.
Table 6. Total emission estimates for the Po Basin compared to Italy and the EU-27 by considering per capita and overall emission density.
YearAreaPer Capita Emissions (kg/Inhabit/y)Emission Density (kg/km2)
BCECLGOCPM10PM2.5BCECLGOCPM10PM2.5
2013EU-270.51---4.663.2360.78---556.9385.8
Italy0.44---3.992.9888.8---797.4594.9
Po Basin0.440.470.0961.143.062.6899.8106.921.8256.1690602.7
2017EU-270.42---4.22.850.72---507.9338.4
Italy0.36---3.912.8173.3---783.2562.6
Po Basin0.350.390.0850.992.622.2680.788.219.4226.5598.4515.3
2019EU-270.44---4.643.0746.84---491.2324.9
Italy0.33---3.592.5266---710.1498.9
Po Basin0.30.310.080.922.522.056870.118.1210.9575.4468.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marongiu, A.; Distefano, G.G.; Moretti, M.; Petrosino, F.; Fossati, G.; Collalto, A.G.; Angelino, E. Machine Learning Approach for Local Atmospheric Emission Predictions. Air 2024, 2, 380-401. https://doi.org/10.3390/air2040022

AMA Style

Marongiu A, Distefano GG, Moretti M, Petrosino F, Fossati G, Collalto AG, Angelino E. Machine Learning Approach for Local Atmospheric Emission Predictions. Air. 2024; 2(4):380-401. https://doi.org/10.3390/air2040022

Chicago/Turabian Style

Marongiu, Alessandro, Gabriele Giuseppe Distefano, Marco Moretti, Federico Petrosino, Giuseppe Fossati, Anna Gilia Collalto, and Elisabetta Angelino. 2024. "Machine Learning Approach for Local Atmospheric Emission Predictions" Air 2, no. 4: 380-401. https://doi.org/10.3390/air2040022

Article Metrics

Back to TopTop