Short-Term Water Demand Forecasting from Univariate Time Series of Water Reservoir Stations

Myllis, Georgios; Tsimpiris, Alkiviadis; Vrana, Vasiliki

doi:10.3390/info15100605

Open AccessArticle

Short-Term Water Demand Forecasting from Univariate Time Series of Water Reservoir Stations

by

Georgios Myllis

¹

,

Alkiviadis Tsimpiris

^1,* and

Vasiliki Vrana

^2,*

¹

Department of Computer Informatics and Telecommunications Engineering, International Hellenic University, 624 24 Serres, Greece

²

Department of Business Administration, International Hellenic University, 624 24 Serres, Greece

^*

Authors to whom correspondence should be addressed.

Information 2024, 15(10), 605; https://doi.org/10.3390/info15100605

Submission received: 7 August 2024 / Revised: 16 September 2024 / Accepted: 27 September 2024 / Published: 3 October 2024

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This study presents an improved data-centric approach to short-term water demand forecasting using univariate time series from water reservoir levels. The dataset comprises water level recordings from 21 reservoirs in Eastern Thessaloniki collected over 15 months via a SCADA system provided by the water company EYATH S.A. The methodology involves data preprocessing, anomaly detection, data imputation, and the application of predictive models. Techniques such as the Interquartile Range method and moving standard deviation are employed to identify and handle anomalies. Missing values are imputed using LSTM networks optimized through the Optuna framework. This study emphasizes a data-centric approach in deep learning, focusing on improving data quality before model application, which has proven to enhance prediction accuracy. This strategy is crucial, especially in regions where reservoirs are the primary water source, and demand distribution cannot be solely determined by flow meter readings. LSTM, Random Forest Regressor, ARIMA, and SARIMA models are utilized to extract and analyze water level trends, enabling more accurate future water demand predictions. Results indicate that combining deep learning techniques with traditional statistical models significantly improves the accuracy and reliability of water demand predictions, providing a robust framework for optimizing water resource management.

Keywords:

water demand forecasting; univariate time series; anomaly detection; LSTM networks; data preprocessing; data imputation; water reservoir management; Optuna optimization; deep learning techniques

1. Introduction

Water demand forecasting is an essential component of effective water resource management, particularly in urban areas where the balance between supply and demand is crucial for maintaining sustainable infrastructure [1,2]. State programs and regulatory frameworks are increasingly being implemented to govern the use of water resources, particularly in light of climate change and increasing urbanization. In the European Union, for example, the Water Framework Directive (2000/60/EC) emphasizes the sustainable use of water resources, urging member states to ensure the long-term availability of clean water by promoting efficient use and preventing waste [3]. Similarly, the EU’s Blueprint to Safeguard Europe’s Water Resources (2012) highlights the need for optimizing water usage, particularly through technological innovations that improve efficiency [4].

These regulatory frameworks place a responsibility on utility managers to make informed decisions regarding water distribution, ensuring both the efficient use of resources and compliance with sustainability targets. Accurate short-term forecasts, driven by advanced technologies, enable managers to optimize water distribution in line with regulatory requirements, minimizing waste while ensuring that demand is met [5]. Traditionally, time series models such as ARIMA (AutoRegressive Integrated Moving Average) and SARIMA (Seasonal AutoRegressive Integrated Moving Average) have been widely used in water demand forecasting due to their ability to model linear trends and seasonality [1,6]. However, these models often struggle with non-linearities and sudden fluctuations in water consumption caused by factors such as weather variations and sensor malfunctions [7].

Recent advances in machine learning, particularly deep learning models like Long Short-Term Memory (LSTM) networks, have shown significant promise in addressing the limitations of traditional methods [5,8]. LSTM networks are well suited for capturing long-term dependencies and complex temporal relationships in sequential data, which makes them particularly effective for modeling non-linear water demand patterns influenced by irregular and seasonal factors [5,9]. For instance, Smith et al. (2021) demonstrated that LSTM models significantly improved prediction accuracy over traditional ARIMA models when applied to water demand forecasting [10].

In addition to LSTM, Random Forest Regressors (RFRs) have gained attention as a robust alternative for forecasting tasks, especially in scenarios where data quality may be compromised by missing values or anomalies [11,12]. The RFR’s ability to handle noisy and imprecise data has made it an attractive choice for short-term water demand forecasting, as evidenced by its success in outperforming other regression models in recent studies [11]. Nevertheless, optimizing model performance remains a challenge, with recent research emphasizing the importance of a data-centric approach as a key innovation in enhancing the accuracy of predictive models, particularly in domains where data quality plays a critical role [7,13].

This data-centric approach prioritizes the preprocessing, cleaning, and augmentation of data to ensure that machine learning models are trained on high-quality inputs. In the context of water demand forecasting, a data-centric methodology is particularly valuable because real-world data often contain anomalies, such as sudden spikes or drops in water levels that may result from sensor malfunctions rather than genuine changes in water usage [7,14]. Models trained on well-preprocessed data—where anomalies are treated as outliers or imputed appropriately—consistently outperform those trained on raw data [12,15].

The proposed methodology builds on prior research in water demand forecasting while addressing key gaps in the literature, particularly regarding the handling of non-linear and anomalous data. By expanding the feature set and ensuring data quality, this study contributes to the growing body of literature on data-centric machine learning applications in water resource management [3,13].

Data imputation techniques play a critical role in the accuracy of water demand forecasting, especially when working with time series data prone to missing values due to sensor failures or data transmission errors. This study evaluates the impact of various imputation methods, such as bidirectional Long Short-Term Memory (bi-LSTM) networks, linear interpolation, polynomial interpolation, mean imputation, and K-Nearest Neighbors (KNN) imputation, on the forecasting performance of the models [8,12,16,17]. The goal is to determine the most reliable approach for improving data quality in water demand prediction.

The primary aim of this study is to integrate deep learning and machine learning forecasting techniques with traditional statistical models to improve the accuracy and reliability of water demand predictions. By analyzing tank levels rather than flow measurements, this research seeks to address a gap in the current methodology, particularly in areas where reservoirs are the main water source.

Advanced data preprocessing techniques, including anomaly detection, imputation, and feature selection, are employed to address the challenges posed by data anomalies and missing values [7,15]. By incorporating these techniques, the study aims to enhance the accuracy and reliability of short-term water demand forecasts and provide a robust framework for more effective management and distribution of water resources.

2. Materials and Methods

2.1. Dataset

The dataset used for this study consists of water level recordings from 21 reservoirs located in Eastern Thessaloniki, covering a total area of approximately 85 km². These recordings were provided by the water company EYATH S.A. and are univariate time series obtained from a SCADA (Supervisory Control and Data Acquisition) system. The SCADA system continuously receives water level signals from sensors installed in each reservoir at a high frequency of one measurement per minute. The data span a period of 15 months, from 1 November 2022 to 30 March 2024, resulting in 21 univariate time series. This period and frequency of data collection yield approximately 13.9 billion individual measurements, representing a rich dataset with a high temporal resolution. All the original data series from all the reservoirs for the corresponding period can be viewed in Figure A1 in Appendix A.

While the dataset is geographically focused on a specific region, it captures a diverse range of reservoirs characterized by different operational and urban contexts. The study area includes various suburban zones with varying population densities, land use patterns, and water consumption behaviors. The dataset encompasses two types of reservoirs:

Storage reservoirs: these directly supply water to residential, commercial, and industrial areas, playing a critical role in meeting regional water demands.

Intermediary reservoirs: these function as transfer stations, moving water to other reservoirs without directly supplying end consumers.

To further analyze the dataset, we employed a clustering approach to categorize the reservoirs based on their water level dynamics. Using the k-means clustering algorithm, the reservoirs were grouped into four distinct clusters according to patterns in their water level trends, seasonality, and overall behavior. This clustering revealed that reservoirs in each cluster exhibit unique water level dynamics, consistent with findings in similar studies on clustering for water management [18]

Cluster 1: reservoirs with stable water levels and minimal seasonal variations.

Cluster 2: reservoirs with moderate seasonal variations, reflecting varying consumption patterns.

Cluster 3: reservoirs with high fluctuations in water levels, indicating significant variability in water usage.

Cluster 4: reservoirs with pronounced trends and seasonal variations, primarily serving as intermediary stations.

This clustering analysis provides a more granular understanding of the water distribution and consumption patterns across the studied area, enabling a deeper exploration of operational strategies and potential areas for optimization.

2.2. Methodology

The study focused on the following research areas, as seen in Figure 1:

Handling univariate time series datasets for anomaly detection, correction, and forecasting involves several crucial steps:

A.: Data preprocessing is the initial phase, crucial for ensuring data accuracy through cleaning and correction. Key tasks include removing duplicates, verifying the value column’s data type for compatibility with numerical operations, and addressing NaN (Not A Number) values, zeros, and negatives. Additionally, missing time series are filled, and the chronological order of the data is checked. These steps provide a reliable foundation for accurate predictions.
B.: Recognizing and understanding anomalies—systematic, point, and collective anomalies—is essential [19]. The Interquartile Range (IQR) method identifies outliers, while the moving standard deviation detects collective anomalies [20].
C.: To address missing values, the study evaluates the effectiveness of various imputation methods, including linear interpolation, polynomial interpolation, mean imputation, and K-Nearest Neighbors (KNN) imputation, and the bi-LSTM framework is employed for time series data [16,17].
D.: For water demand forecasting, several models are employed, including a forecasting LSTM (f-LSTM) model, a Random Forest Regressor (RFR) model [9,21], an ARIMA model, and a SARIMA model.
E.: Estimating water demand from tank levels involves defining the tank’s geometry and volume relationships, calculating the cross-sectional area, and determining water demand by summing volume changes over specified intervals [22,23,24]. This comprehensive process leverages advanced models and thorough preprocessing techniques to ensure an accurate understanding and prediction of water demand from tank level data.
F.: To evaluate the models for imputation accuracy and the forecasting performance of the models, the following performance metrics are calculated: Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination, R² [25].

Figure 2 presents the workflow of the data formatting at each step of the applied methodology.

2.3. Anomaly Detection

Understanding and correctly identifying the types of anomalies present in time series data are crucial for implementing effective correction and forecasting techniques. Anomalies can manifest in various forms, and a comprehensive understanding of these is essential for accurate data interpretation and improved model performance. The primary types of anomalies are described as follows:

Contextual anomalies: Contextual anomalies, also known as conditional or seasonal anomalies, occur when data points appear normal in a broader context but are abnormal in a specific setting or time period. For example, increased water consumption may be expected during daytime hours but considered anomalous during the night [19,20]. Detecting such anomalies typically requires domain knowledge and a nuanced understanding of environmental or temporal conditions. However, the methods employed in this study are designed primarily to identify point and collective anomalies and do not directly address contextual anomalies. Any removal or adjustment of contextual anomalies has been conducted by the authors based on their expertise and prior knowledge of the data’s specific context.
Point anomalies: Point anomalies refer to individual data points that significantly deviate from the norm. These anomalies may represent outliers that are either much higher or lower than the mean value of the dataset and do not conform to the typical pattern or trend [12,16]. The detection of point anomalies is crucial in maintaining the reliability of forecasting models, as such deviations can skew results and lead to inaccurate predictions. The methods used in this study effectively recognize point anomalies through advanced techniques such as bidirectional LSTM networks and statistical outlier detection.
Collective anomalies: Collective anomalies occur when a group of data points collectively deviates from the expected pattern, even if individual points within the group do not appear anomalous on their own. For instance, a continuous sequence of elevated water levels may suggest a leak in a reservoir system, despite each individual reading falling within normal operational limits [17]. The study’s methodology is well suited for detecting such collective anomalies by analyzing patterns within sequences of data points rather than focusing on isolated values, making it particularly relevant for time series analysis.

The methods utilized in this study are specifically tailored to identify point and collective anomalies due to their direct impact on data integrity and predictive accuracy. Contextual anomalies, which require a deeper understanding of the specific temporal or environmental context, have been preprocessed and adjusted by the authors based on their domain expertise. A comprehensive understanding of these different types of anomalies, and the strategies for their management, is essential for accurate data interpretation and the development of robust forecasting models [22,23].

2.3.1. Outliers Detection

The Interquartile Range (IQR) method is a statistical measure used to identify extreme values in a dataset. It assesses the spread of the middle 50% of the data by calculating the difference between the third quartile (Q3) and the first quartile (Q1). Specifically, the IQR is defined as IQR = Q3 − Q1, where Q3 is the value below which 75% of the data falls, and Q1 is the value below which 25% of the data falls. This method helps identify outliers as values significantly higher than Q3 + 1.5 × QR or lower than Q1 − 1.5 × IQR [26]. These outliers are values that deviate considerably from the central data distribution, indicating potential anomalies. By applying these criteria, we can effectively detect and manage outliers, ensuring a more accurate analysis and reliable data for further processing.

To enhance the IQR method, the Optuna framework is utilized for automated parameter optimization, specifically targeting the detection of outliers in a dataset. Optuna aims to find the optimal bounds for identifying outliers by proposing random values within specified ranges for the lower and upper bounds of the IQR [17,22]. The objective function evaluates the performance of different parameter combinations by suggesting random values for these bounds: the lower bound ranges from the quantile (0.01) to the quantile (0.25), and the upper bound ranges from the quantile (0.75) to the quantile (0.99). The function uses these proposed bounds to identify data points that fall outside these limits as outliers [12,16].

The performance of the objective function is evaluated based on the number of outliers detected. The goal is to minimize the negative sum of the outliers, thereby optimizing the number of values that lie within the bounds. The optimization process seeks to maximize the performance of the objective function, as indicated by the creation of a study object with a direction of “maximize” [25].

The optimization is performed using the Tree-structured Parzen Estimator (TPE) algorithm, which is a Bayesian optimization method commonly employed to explore parameter space effectively and find parameter combinations that offer the best performance according to the objective function [18,21]. The TPE algorithm models the distribution of good and bad parameter values based on the history of evaluations.

In the TPE method, two probability densities are computed:

l(x): The probability density function for the “good” set of parameters. This function represents the likelihood of parameter values that result in objective function values better than a certain threshold.

g(x): The probability density function for the “bad” set of parameters. This function represents the likelihood of parameter values that do not meet the threshold.

The TPE algorithm selects new parameter values to test by maximizing the ratio, as seen in Equation (1):

\frac{l (x)}{g (X)}

(1)

This ratio helps the algorithm focus on regions of the parameter space more likely to improve the objective function’s value based on past trials [17,27].

The probability densities l(x) and g(x) can be defined as follows:

l(x): The density for parameters resulting in better-than-threshold performance is computed using a kernel density estimator (KDE), as seen in Equation (2):

l (x) = \sum_{i = 1}^{n_{b}} \frac{w_{i}}{\sqrt{{2 π σ}^{2}}} \exp (- \frac{{(x - x_{i})}^{2}}{{2 σ}^{2}})

(2)

where

$n_{b}$ is the number of “good” samples (samples with performance better than the threshold).
$w_{i}$ is the weight for the i-th “good” sample.
$x_{i}$ represents the i-th “good” sample.
σ is the bandwidth parameter controlling the smoothness of the kernel density estimator.

g(x): The density for parameters that do not meet the threshold performance is similarly computed using a KDE, as seen in Equation (3):

g (x) = \sum_{i = 1}^{n_{w}} \frac{w_{i}}{\sqrt{{2 π σ}^{2}}} \exp (- \frac{{(x - x_{i})}^{2}}{{2 σ}^{2}})

(3)

$n_{w}$ is the number of “bad” samples (samples with performance worse than or equal to the threshold).
$w_{i}$ is the weight for the i-th “bad” sample.
$x_{i}$ represents the i-th “bad” sample.
σ is the bandwidth parameter controlling the smoothness of the kernel density estimator.

Through this iterative process, the TPE algorithm identifies the optimal lower and upper bounds l and u that maximize the negative sum of outliers, effectively minimizing the number of outliers detected in the given dataset. The “process” refers to this continuous optimization loop where parameter suggestions and evaluations are made, converging towards the best-performing bounds.

The optimization is executed over multiple trials to determine the optimal bounds for outlier detection automatically, providing a robust statistical and algorithmic basis for identifying deviations in a data series.

Let

x

be an element of the dataset. The objective function seeks to optimize the bounds

l

(lower bound) and

u

(upper bound), minimizing the number of outliers detected by these limits. Outliers are defined as those values that are either smaller than

l

or greater than

u

.

The Outlier Detection Equation is shown in Equation (4):

f (x, l, u) = \{\begin{matrix} 1 i f x < l o r x > u \\ 0 o t h e r w i s e \end{matrix}

(4)

The overall performance of the bounds for a dataset

D

is computed as the negative sum of the detected outliers, as seen in Equation (5):

P (l, u) = - \sum_{x \in D} f (x, l, u)

(5)

Here, P is the evaluation function used in the objective function to measure the performance of a trial within the study. It evaluates how well the bounds l and u work to detect outliers in the dataset D. The optimization seeks to maximize P, which corresponds to minimizing the number of outliers detected.

l: It is the lower bound for detecting the outliers. Any value x of the dataset D that is less than l is considered as an outlier.
u: It is the upper bound for detecting the outliers. Any value x of dataset D that is greater than u is also considered as an outlier.
P(l, u): The performance evaluation function that calculates the negative sum of outliers detected based on the bounds l and u.
(x, l, u): The indicator function that returns 1 if a value x is an outlier (i.e., x < l or x > u), and 0 otherwise.

Equation (6) defines the optimization goal:

{m a x}_{l, u} P (l, u)

(6)

This equation seeks to find the bounds l and u that maximize the performance function (l, u), thereby minimizing the number of detected outliers and improving the dataset’s quality for further analysis.

2.3.2. Collective Anomalies

To identify collective anomalies in time series data, the Moving Standard Deviation Method is employed. This approach involves calculating the standard deviation of data points within a fixed-size window as it moves across the dataset. By doing so, it measures the dispersion of values within each window, offering insights into the local variability of the data. For a time series dataset {x₁, x₂, …, x_n} with a window size

w

, the moving standard deviation at time

t

(for

t

≥

w

) is defined as seen in Equation (7):

M S D_{t} = \sqrt{\frac{1}{w} \sum_{i = t - w + 1}^{t} {(x_{i} - \bar{x})}^{2}}

(7)

As shown in Equation (8),

\bar{x}

is the mean of the values within the window:

\bar{x} = \frac{1}{w} \sum_{i = t - w + 1}^{t} x_{i}

(8)

By analyzing the moving standard deviation, we can detect periods of increased variability, which may indicate collective anomalies. These anomalies are characterized by significant deviations from the expected data patterns, helping to identify potential issues in the system being monitored. This method provides a dynamic and localized measure of data variability, crucial for accurate anomaly detection in time series analysis [16].

2.4. Data Imputation

The combination of the improved Interquartile Range (IQR) method with the Optuna framework and the moving standard deviation approach ensures accurate preprocessing techniques to effectively manage and interpret anomalies [9]. All anomalous values identified by these methods are replaced with NaN values. All time series data, ready for imputation for all the tanks, are presented in Figure A2 in Appendix A.

For the imputation of these NaN values, several imputation methods were employed to handle missing data, each varying in complexity and ability to capture underlying patterns in the data. The imputation methods include the following:

Linear and polynomial interpolation: Traditional methods that assume a linear or polynomial relationship between data points to fill in missing values. These methods are easy to implement but may not capture complex, non-linear patterns in the data.

Mean imputation: A simple approach that replaces missing values with the mean of the observed data. This method is less effective when the missing data pattern is not random.

K-Nearest Neighbors (KNN) imputation: A technique that fills missing values based on the values of the nearest neighbors in the feature space. KNN considers the similarity between data points, but its performance can degrade with higher percentages of missing data.

Bi-LSTM imputation: This advanced method utilizes a bidirectional Long Short-Term Memory network to learn from data sequences in both forward and backward directions, providing a comprehensive understanding of the temporal dynamics. It is particularly effective for time series data where capturing temporal dependencies is crucial. Figure 3 illustrates the process followed to impute data for NaN values.

The effectiveness of these imputation methods was evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²) to assess their impact on the forecasting performance of f-LSTM, Random Forest, and ARIMA models.

2.4.1. Data Resampling

First, downsampling of the data is performed, a process that helps reduce the volume of data and eliminate noise, facilitating analysis and visualization. Subsequent predictions are meaningful at least for hourly water demand forecasting. Therefore, resampling involves grouping the time series data into specified intervals, with the interval being hourly (“H”). For each interval, the average is calculated using the arithmetic mean, as seen in Equation (9).

μ = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

(9)

2.4.2. Data Normalization

Next, data are separated into valid data and anomalous sets, with anomalous sets represented by NaN values. The valid data are then normalized as shown in Equation (10) by scaling the univariate time series to a similar range, ensuring each feature contributes proportionally [12].

x_{n o r m a l i z e d} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(10)

2.4.3. Data Filter

The Savitzky–Golay filter is a smoothing technique used to enhance a dataset by improving the signal-to-noise ratio without losing important details. It works by applying a local polynomial fit to a moving window of the data, which helps reduce noise and preserve significant patterns and trends [27]. For some datasets, this filter was used to further improve the model’s performance by smoothing the data before feeding it into the bi-LSTM.

2.4.4. Sequences Creation

Normalized valid data are converted into sequences for training a bi-LSTM model. By generating these sequences, the model can capture the dependencies between values at different time points. If

X_{t}

is the value at time t, the sequences for bi-LSTM are as seen in Equation (11):

S_{t} = [X_{t}, X_{t + 1}, \dots, X_{t + N - 1}]

(11)

where N is the length of the sequence.

The output at each time t in the bi-LSTM model is based not only on the input

X_{t}

but also on the previous states of hidden state

h_{t - 1}

and cell state

C_{t - 1}

of the bi-LSTM model.

By creating sequences and normalizing the data, noise and outliers are reduced, improving the model’s stability and performance. Normalization maintains consistency in the data, while sequences allow the bi-LSTM to learn from smaller datasets, leading to faster convergence and better performance. Sequences also function as data augmentation [13].

2.4.5. Data Reconstruction

Due to the extensive gaps observed in the data resulting from missing values, traditional imputation methods such as linear or polynomial interpolation, as well as forward and backward methods, demonstrate low performance accuracy. Consequently, an enhanced approach is employed, which interpolates normalized sequence values trained on a bi-LSTM model into the anomaly set. This data reconstruction method ensures that every NaN value is interpolated with a sequence of adequate length, thereby avoiding the generation of insufficient or shorter sequences. By not selecting time-dependent indices as values for the sequences, this method circumvents the issue of long-term missing data leading to suboptimal sequence lengths.

Furthermore, this approach ensures that predictions for missing values (NaN) are carefully calibrated to avoid overlapping with existing valid values, thereby preserving the integrity of the temporal relationships within the time series. By doing so, the model can accurately capture and maintain the sequence of events, which is critical in time series forecasting. This meticulous handling of missing data allows the model to learn effectively from the available valid data and generalize well when predicting missing points, thereby significantly improving both the accuracy and robustness of the model.

The innovative imputation strategy employed in this study represents a considerable advancement over traditional methods, such as linear interpolation or forward fill, which often disrupt temporal coherence by introducing artificial trends that do not align with the underlying patterns of the data. Traditional methods typically fail to account for complex, non-linear dependencies and temporal dynamics, resulting in less reliable predictions, particularly in datasets with significant anomalies or irregular patterns [1, 2].

By contrast, the bi-LSTM model-based imputation excels in preserving the natural flow of time series data. This is achieved by allowing the model to learn from the full temporal context—both forward and backward—while imputing missing values in a way that reflects the intrinsic behavior of the time series. Empirical evidence from recent studies suggests that this approach yields a 15–20% improvement in forecasting accuracy when compared to traditional imputation methods in water demand forecasting scenarios [8, 10]. These results demonstrate the superior ability of machine learning techniques to maintain temporal coherence and effectively handle missing data, resulting in more accurate and reliable predictions.

Figure 4 illustrates the data reconstruction method, showcasing how the model reconstructs missing values while preserving the underlying temporal relationships. The use of this method enhances the model’s ability to generalize from incomplete datasets and produces more accurate forecasts, especially in complex, real-world datasets characterized by anomalies and irregular consumption patterns.

To predict the NaN values, which constitute the anomaly set, sequences of data of the same length as those used for training the model are interpolated before each NaN point. These sequences are derived from the valid data subset and are used to generate predictions for the anomalous NaN points.

In Figure 5, a subset of the data prepared for imputation consists only of valid data used for training and validating the model. This subset was normalized to the range of 0 to 1 [22]. The data from tank df20 were selected as a representative dataset due to its numerous anomalies, as illustrated in Figure A1.

In Figure 6, a segment of valid data starting from the beginning is chosen. The length of this segment is calculated to ensure it covers the total NaN values of the anomaly set with the appropriate number of sequences. These sequences are then used to generate predictions for the NaN points.

Figure 7 shows the sequences interpolated with the NaN values. The method for each NaN value involves applying the trained LSTM model to predict the missing data. This is achieved by feeding the model with the sequence mapped to each NaN value.

Figure 8 shows the predicted values of each NaN data value with no overlap with existing valid values, ensuring that the temporal relationship within the time series is maintained.

2.4.6. Hypermeter Optimization

The Optuna framework is employed for selecting the optimal hyperparameters, which are parameters that define the model’s structure and are learned during training. These hyperparameters include sequence length, LSTM units, activation function, optimizer, learning rate, additional dense layers, dropout rate, batch size, and the appropriate objective function that best fits training the bi-LSTM model. Sampling techniques such as random sampling and the Tree-structured Parzen Estimator (TPE) were also used. The range of hyperparameters considered is as follows, and the optimum hyperparameters are presented in Table 1:

Sequence length: sampled as an integer between 3 and 20;
LSTM units: sampled as an integer between 20 and 100;
Activation function: sampled from [“relu”, “tanh”, “sigmoid”],
Optimizer: sampled from [“adam", “rmsprop”, “sgd”];
Learning rate: logarithmically sampled between 0.0001 and 0.01;
Additional dense layers: sampled as a Boolean;
Dropout rate: sampled between 0.0009 and 0.4274;
Batch size: sampled from a predefined list of values [16-32-64].

The cost function used for training the bi-LSTM model is the Mean Squared Error (MSE). Various optimizers are employed based on the sampled hyperparameters, including Adam [28], RMSProp [29], and SGD [30], which combine the beneficial characteristics of different optimization algorithms. The activation functions utilized include ReLU [31], tanh, and sigmoid [12], facilitating non-linear learning and helping to avoid vanishing gradients.

To prevent overfitting during model training, both EarlyStopping and ModelCheckpoint callbacks are employed. EarlyStopping halts training when performance does not improve for a predefined number of epochs, while ModelCheckpoint saves the model whenever there is an improvement in validation loss, ensuring that the best-performing version of the model is retained.

2.4.7. bi-LSTM

In this study, a bidirectional LSTM (bi-LSTM) is employed, which processes the data in two directions: forward and backward [25,27].

Forward LSTM: The forward LSTM processes the input data from the beginning to the end of the sequence. At each time step, the forward LSTM updates its internal states by taking into account the current input and the hidden state from the previous time step. This allows the model to learn patterns based on past information [26].

Backward LSTM: The backward LSTM works in the reverse direction, processing the input data from the end of the sequence back to the beginning. It updates its states by considering future information, thus allowing the model to capture dependencies that occur later in the sequence but are still relevant to earlier time steps [28].

The final output of the bi-LSTM at each time step is a combination of the information learned from both the forward and backward passes. This bidirectional approach enables the model to capture more comprehensive patterns in the data by leveraging both past and future context, making it particularly useful for tasks such as water demand forecasting, where temporal relationships are critical [18,32].

The bi-LSTM model is defined with a bidirectional wrapper to process data both forwards and backwards. For the forward LSTM, the process is as follows:

Input: $x_{t}$ at time t.
Hidden state: $h_{t}^{f}$ and cell state $c_{t}^{f}$ are computed as in a classic LSTM.

The following equations describe the operations of a forward LSTM cell (example with ReLU) at time

t

, which calculates the hidden state

h_{t}

and cell state

c_{t}

based on the current input

x_{t}

, the previous hidden stat

h_{t - 1}

, and cell state

c_{t - 1}

:

As shown in Equation (12), Input Gate controls the extent to which the new input will influence the cell state.

i_{t}^{f} = σ (W_{i}^{f} x_{t} + U_{i}^{f} h_{t - 1}^{f} + b_{i}^{f})

(12)

As shown in Equation (13), the Forget Gate determines how much of the previous cell state should be carried forward.

f_{t}^{f} = σ (W_{f}^{f} x_{t} + U_{f}^{f} h_{t - 1}^{f} + b_{f}^{f})

(13)

As shown in Equation (14), the Output Gate modulates the amount of the cell state that will influence the hidden state.

o_{t}^{f} = σ (W_{o}^{f} x_{t} + U_{o}^{f} h_{t - 1}^{f} + b_{o}^{f})

(14)

As shown in Equation (15), the candidate cell state calculates the proposed new value (the candidate cell state) that could potentially update the LSTM’s cell state.

{\tilde{c}}_{t}^{f} = R e L U (W_{c}^{f} x_{t} + U_{c}^{f} h_{t - 1}^{f} + b_{c}^{f})

(15)

As shown in Equation (16), the cell state updates the cell state by incorporating both the previous cell state and the new input.

c_{t}^{f} = f_{t}^{f} ⊙ c_{t - 1}^{f} + i_{t}^{f} ⊙ {\tilde{c}}_{i}^{f}

(16)

As shown in Equation (17), the hidden state computes the hidden state based on the updated cell state.

h_{t}^{f} = o_{t}^{f} ⊙ R e L U (c_{t}^{f})

(17)

For the backward LSTM, the process is as follows:

Input: $x_{t}$ at time t.
Hidden state: $h_{t}^{b}$ and cell state $c_{t}^{b}$ are computed in reverse, i.e., from future to past.

Equations (18)–(23) for the backward LSTM are as follows:

i_{t}^{b} = σ (W_{i}^{b} x_{t} + U_{i}^{b} h_{t - 1}^{b} + b_{i}^{b})

(18)

f_{t}^{b} = σ (W_{f}^{b} x_{t} + U_{f}^{b} h_{t - 1}^{b} + b_{f}^{b})

(19)

o_{t}^{b} = σ (W_{o}^{b} x_{t} + U_{o}^{b} h_{t - 1}^{b} + b_{o}^{b})

(20)

{\tilde{c}}_{t}^{b} = R e L U (W_{c}^{b} x_{t} + U_{c}^{b} h_{t - 1}^{b} + b_{c}^{b})

(21)

c_{t}^{b} = f_{t}^{b} ⊙ c_{t - 1}^{b} + i_{t}^{b} ⊙ {\tilde{c}}_{i}^{b}

(22)

h_{t}^{b} = o_{t}^{b} ⊙ R e L U (c_{t}^{b})

(23)

This approach enhances the ability of the bi-LSTM to learn and predict patterns in sequence data. The final output of the bi-LSTM is a combination of the outputs from both the forward and backward LSTM, as seen in Equation (24).

h_{t} = c o n c a t (h_{t}^{f}, h_{t}^{b})

(24)

The final output

h_{t}

is the concatenation of the two hidden states

h_{t}^{f}

and

h_{t}^{b}

.

2.4.8. Piecewise Polynomial Method

The “piecewise_polynomial” method is used for interpolating the remaining missing values, particularly at the beginning and end of the sequence, after predicting the anomalous set NaN values with the bi-LSTM model. This ensures a continuous curve that is smooth and fits the data [17].

To ensure that the values do not exceed the tank limits, the clip function is used, which restricts the values within predefined bounds as determined by the IQR method (with

l

as the lower bound and

u

as the upper bound), ensuring the accuracy of the data [17], as seen in Equation (25).

y = c l i p (x, l, u) = \{\begin{matrix} l i f x < l \\ x i f l \leq x \leq u \\ u i f x > u \end{matrix}

(25)

2.5. Forecasting Models

Four models were used for water demand forecasting: RFR, f-LSTM, ARIMA, and SARIMA.

2.5.1. Water Demand Prediction Method with Random Forest Regressor

The prediction method is based on random forests, a technique that involves multiple decision trees and belongs to ensemble methods [9]. Each tree operates independently, generating a prediction for the target value based on the inputs (features), by dividing the feature space into regions. The parameters n_estimators and random_state are crucial for the RFR algorithm. The n_estimators parameter determines the number of trees in the model, while the random_state ensures the reproducibility of the results. Each split in a tree is made based on the criterion that maximizes the “purity” of the node, using the MSE [9]. Each tree is trained on a random subset of the data (bootstrap sampling), enhancing accuracy and independence. The final prediction is derived from the average of the predictions of all trees, reducing variance and overfitting [9].

In the data, the column of dates and times is converted into time labels, allowing for more effective time series analysis. Lag features of the value column are created, adding three columns representing the values at previous time points. Lags allow the model to train on the temporal interdependencies of the data. The data are split into training (80%), validation (20%), and test sets (20%). The process involves initializing and training the RFR using the training data, predicting the values for the validation set, and evaluating the model’s performance on unseen data, ensuring accurate and reliable predictions. The prediction

\hat{y} (X, D)

for a given input point

X

after training on a dataset

D

is calculated, as seen in Equation (26):

\hat{y} (X, D) = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (X, D_{b})

(26)

where

B

is the number of trees in the forest,

T_{b}

is the

b - t h

decision tree, and

D_{b}

is the bootstrap sample from

D

that was used to train Tb [9].

2.5.2. Water Demand Prediction Method with f-LSTM

In the data preparation phase, temporal features such as year, month, day, and hour are added, which help capture seasonal and temporal trends [16]. These features are added as new columns, providing different information for the timestamp of each entry: year for analyzing long-term trends, month for detecting seasonal patterns, day for monthly or weekly patterns, and hour for analyzing intra-day fluctuations. This process allows the model to incorporate temporal changes in data behavior, leading to more accurate predictions. The input and output data are normalized to the range (0, 1), optimizing the training process. The data are split into training (70%), validation (15%), and test sets (15%).

A prediction model is defined that includes an LSTM layer with 44 units and ReLU activation function, a dropout layer with a rate of 0.2 to prevent overfitting, and a dense layer that outputs a single value for the prediction [17]. The input shape is (1, 5), specifying 1 time step and 5 features. The model is trained on the training data using the Adam optimizer and the MSE as the loss function, while performance is evaluated on the test set. This approach ensures model optimization for accurate predictions, taking into account temporal trends and seasonal variations [17].

2.5.3. Water Demand Prediction Method with ARIMA

The water demand prediction method utilizes the ARIMA model to automatically select optimal parameters for forecasting based on time series data. The ARIMA algorithm systematically evaluates different combinations of ARIMA model parameters, p (autoregressive terms), d (differencing order), and q (moving average terms) to identify the best-fitting model that minimizes a predefined information criterion, such as the Akaike Information Criterion (AIC) [1,2]. By automating the model selection process, ARIMA eliminates the need for manual parameter tuning, ensuring that the model adapts to the unique patterns present in the water demand data, such as trends, seasonality, and random noise [6].

In this approach, the water demand time series data are first decomposed into their trend, seasonal, and residual components using seasonal decomposition techniques. The ARIMA model is then fitted specifically to the residual component, which represents the stationary part of the data after the removal of trend and seasonal effects. The ARIMA function performs a grid search over possible values of p, d, and q, applying statistical tests such as the Augmented Dickey–Fuller (ADF) test to determine the optimal level of differencing, d, and iteratively selects the model parameters that minimize the forecasting error [5,16]. The ARIMA model for forecasting future values is expressed as seen in Equation (25):

{\hat{y}}_{t} = c + \emptyset_{1} y_{t - 1} + \emptyset_{2} y_{t - 2} + \dots + \emptyset_{p} y_{t - p} - θ_{1} ϵ_{t - 1} - θ_{2} ϵ_{t - 2} - \dots - θ_{q} ϵ_{t - q}

(27)

where

{\hat{y}}_{t}

is the predicted value, c is a constant,

\emptyset_{i}

are the coefficients of the autoregressive terms,

θ_{j}

are the coefficients of the moving average terms, and

ϵ_{t}

represents the error term.

Once the optimal ARIMA model is identified through ARIMA, the model forecasts future residuals, which are then combined with the previously extracted trend and seasonal components to reconstruct the final water demand predictions. This method leverages the strengths of ARIMA in modeling linear, stationary data while enhancing it through automated parameter optimization, providing a robust framework for accurate short-term water demand forecasting [7,21].

To forecast water demand over a future period, such as the next 7 days, the residual component is forecasted using ARIMA. The forecasted residuals are then combined with the trend and seasonal components extracted earlier. The seasonal component is replicated to cover the 7-day period, while the trend component is extended as needed using forward fill techniques. The final combined forecast is obtained by summing these three components, which allows for capturing the overall pattern in the water demand data, including both short-term fluctuations and longer-term trends [21, 18].

2.5.4. Water Demand Prediction Method with SARIMA

The water demand prediction method using the SARIMA model extends the ARIMA framework by incorporating seasonality into the model, making it particularly suitable for time series data with clear seasonal patterns, such as hourly water demand data. The SARIMA model is characterized by both non-seasonal parameters (p, d, q) and seasonal parameters (P, D, Q, S), where (p, d, q) represent the autoregressive terms, differencing order, and moving average terms, respectively, and (P, D, Q, S) denote the seasonal counterparts, with S representing the seasonal period (e.g., 24 h for daily seasonality on hourly data) [2,3].

In this approach, the water demand time series is first decomposed into trend, seasonal, and residual components using seasonal decomposition techniques. The SARIMA model is then fitted to the residual component, which represents the stationary part of the data after removing trend and seasonal effects. For the non-seasonal component, the SARIMA model uses the parameters (p, d, q), while for the seasonal component, the parameters (P, D, Q, S) are used. For example, the SARIMA model may be defined with parameters (4, 0, 5) × (1, 1, 1, 24), indicating both non-seasonal and seasonal orders [21].

The SARIMA model equation for forecasting is expressed as seen in Equation (28):

Φ (Β^{s}) ϕ (Β) y_{t} = Θ (Β^{s}) θ (Β) ϵ_{t}

(28)

where

ϕ (Β)

and

θ (Β)

are polynomials for the non-seasonal autoregressive and moving average components,

Φ (Β^{s})

and

Θ (Β^{s})

are polynomials for the seasonal components, and B is the backshift operator. The seasonal differencing is applied D times, and non-seasonal differencing d times, to make the series stationary.

For a 7-day forecast, the SARIMA model forecasts the residual component of the time series for the specified period. These forecasted residuals are then combined with the trend and seasonal components extracted earlier. The seasonal component is repeated to cover the 7-day forecast horizon, while the trend component is extended using forward fill techniques. The final forecast is obtained by summing these components, capturing both seasonal variations and irregular patterns in the water demand data [1,7].

This approach enables a comprehensive assessment of the model’s capability to predict future water demand, considering both seasonal patterns and non-seasonal variations [16,18]. The SARIMA model’s ability to handle complex seasonal structures makes it particularly effective for water demand forecasting in environments with distinct seasonal patterns, such as urban areas influenced by daily and weekly cycles [3,11].

2.5.5. Conversion of Water Level Values to Water Volume Values

The total volume and maximum height of these tanks are known, allowing for the calculation of their cross-sectional area. With the cross-sectional area of the tank known, changes in the water level are used to calculate the change in volume by the difference in the water level height from one time point to the next. By summing these volume changes over specified time intervals (hourly, daily), the water demand can be estimated. A decrease in volume indicates water usage, while an increase suggests water inflow and storage in the tank [24].

2.6. Model Performance Evaluation

The models’ imputation accuracy for the bi-LSTM and the forecasting performance of the models were evaluated using metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²) [16. These metrics provide a quantitative assessment of how well the models’ predictions align with the actual data. In this context, “more accurate” refers to achieving lower values for MSE and RMSE, which indicate smaller deviations between the predicted and actual values, and higher R² values, which demonstrate a better fit of the model’s predictions to the observed data.

2.7. Computational Complexity and Practical Implications

The proposed framework optimizes the imputation and forecasting processes for water demand by running only the best-performing methods based on initial evaluations. This approach ensures the efficient use of computational resources while maintaining high accuracy in predictions. The computational strategy is organized into two distinct phases: an initial processing phase and a weekly update phase.

2.7.1. Initial Processing Phase

During the initial processing phase, a comprehensive analysis is conducted on the entire dataset, which includes one year of historical data. This phase requires substantial computational resources to determine the most suitable imputation and forecasting methods. The methods used during this phase include the following:

Imputation Methods Evaluation: Various imputation techniques (linear, polynomial, mean, K-Nearest Neighbors (KNN), and bidirectional Long Short-Term Memory (bi-LSTM) models) are evaluated to handle missing values. The best-performing method, determined by validation metrics such as Mean Squared Error (MSE) or Root Mean Squared Error (RMSE), is selected for use in subsequent processing.
IQR Method with Optuna Optimization: The Interquartile Range (IQR) method, enhanced with the Optuna framework, is employed to detect outliers. This method runs 500 optimization trials on the entire dataset to find the optimal bounds for outlier detection. This process, which is computationally intensive, is executed only once during the initial phase.
Moving Standard Deviation Method: This method is used to detect collective anomalies across the entire dataset. Similar to the IQR method, it is only run once during the initial phase.
Hyperparameter Optimization for bi-LSTM: The bi-LSTM model, if chosen as the best imputation method, undergoes hyperparameter tuning using the Optuna framework, running 20 trials over 20 epochs for each reservoir (tank). The optimized hyperparameters are saved for future use, eliminating the need for repeated optimization.
ARIMA Model Parameter Optimization: If ARIMA is identified as the best forecasting method, the optimal p, d, q parameters are determined once through a grid search or other optimization methods. This initial determination is computationally intensive, but the parameters are reused in all future runs, significantly reducing subsequent computational demands.
Forecasting Model Training: The best-performing forecasting model, such as the forecasting LSTM (f-LSTM), Random Forest Regressor (RFR), ARIMA, or SARIMA, is trained using one year of data. For instance, if f-LSTM is chosen, it is trained for 20 epochs to ensure robust performance. This model is saved for use in future forecasting tasks, reducing the need for frequent re-training.

2.7.2. Weekly Update Processing

The weekly update phase is computationally lighter and focuses on integrating new data and updating models based on the previously selected best methods.

Imputation update: The best imputation method, identified during the initial phase (e.g., bi-LSTM or linear interpolation), is applied to new data segments containing missing values on a weekly basis. The saved parameters from the initial optimization are reused, minimizing the computational cost of imputation updates.
Forecasting update: The best forecasting model (e.g., f-LSTM or ARIMA) is updated weekly using a rolling window of one year of the most recent data. If f-LSTM is the chosen model, it runs for 20 epochs to forecast new data efficiently. If ARIMA is selected, it uses the pre-determined p, d, q parameters, avoiding the need for recalibration.

2.8. Data Availability

Restrictions apply to the availability of these data. Data were obtained from EYATH S.A. and are available from the authors with the permission of EYATH S.A. for scientific purposes.

2.9. Ethical Approval

This study did not involve human or animal subjects, and therefore, no specific ethical approval was required from an institutional review board for these aspects. The dataset utilized in this research consists exclusively of water level measurements from central reservoirs, expressed in water volume units, to forecast regional water demand. It is important to note that the data do not include any personal or sensitive information, such as residential water usage statistics or identifiable consumer data.

3. Results

3.1. Clustering Analysis of Reservoir Dynamics

The initial analysis of the water level data revealed varying patterns across the 21 reservoirs in Eastern Thessaloniki. The reservoirs displayed a wide range of water level behaviors, from relatively stable trends to highly fluctuating levels. These patterns were influenced by several factors, including reservoir type (storage vs. intermediary), local consumption behaviors, and varying operational practices across different zones.

To gain deeper insights into these patterns, a clustering analysis was conducted using the k-means clustering algorithm to group reservoirs with similar water level dynamics. The optimal number of clusters was determined through an elbow analysis, which suggested four distinct clusters, each characterized by unique water level trends and seasonal variations, as shown in Figure 9.

The clusters and their reservoir assignments are detailed in Table 2, below.

Cluster 1: Comprised of reservoirs with stable water levels and minimal seasonal variations. These reservoirs are primarily storage reservoirs, with consistent inflows and outflows, serving residential and commercial areas with stable demand.

Cluster 2: Included reservoirs with moderate seasonal variations, indicating varying consumption patterns in suburban areas with a medium population density. These reservoirs show water level fluctuations corresponding to different seasons or periods of varying demand.

Cluster 3: Characterized by reservoirs with high fluctuations in water levels, suggesting significant variability in water usage. This cluster likely reflects reservoirs that are either subject to intermittent supply or experience high consumption during specific periods.

Cluster 4: Consisted of reservoirs with pronounced trends and seasonal variations, predominantly serving as intermediary stations that manage water transfer between different parts of the distribution network. These reservoirs demonstrate more dynamic water level changes due to their role in balancing supply across multiple zones.

Figure 10 illustrates the average water level trends for each cluster, highlighting the differences in behavior among the reservoirs grouped within each cluster.

3.2. Comparative Analysis of Imputation Methods

The performance of different imputation methods was analyzed to understand their impact on subsequent forecasting accuracy. Table 3 presents the performance metrics (MSE, RMSE, MAE, R²) for each imputation method across different datasets (tanks). The results indicate that the bi-LSTM imputation method consistently outperforms traditional methods, showing lower MSE and higher R² values across all datasets.

The datasets analyzed in this study exhibit varying percentages of missing data before imputation, ranging from approximately 1.54% to 16.66%. This range allowed us to assess the sensitivity of different imputation methods across low, moderate, and high levels of missing data. For datasets with a low percentage of missing data (e.g., <5%), traditional methods such as mean imputation and linear interpolation generally provided reasonably accurate forecasts. However, even in these cases, the bi-LSTM method often outperformed simpler methods, though the margin of improvement was not as significant due to the limited amount of missing data. For instance, in the dataset df14 (1.54% missing), the difference in Mean Squared Error (MSE) between bi-LSTM and linear interpolation was relatively small.

As the percentage of missing data increased to a moderate level (e.g., 5–10%), the performance gap between advanced methods like bi-LSTM and traditional methods widened. Traditional methods, such as mean imputation or linear interpolation, tended to introduce more bias or variance in the reconstructed data, which adversely affected the accuracy of the forecasting models. For example, the dataset df18 (4.05% missing) demonstrated a clear improvement in forecast accuracy when using bi-LSTM compared to other methods, as evidenced by lower validation MSE and higher R² values for both Random Forest and f-LSTM models.

At higher levels of missing data (e.g., >10%), the bi-LSTM method exhibited significantly better performance in maintaining forecast accuracy. This improvement is largely attributable to bi-LSTM’s ability to capture temporal dependencies more effectively, thereby reducing the error introduced during the imputation process. In datasets such as df31 (11.91% missing) and df32 (14.77% missing), the difference in MSE and R² between bi-LSTM and traditional methods was more pronounced. These findings highlight that while simpler imputation methods may be sufficient for datasets with minimal missing data, advanced methods like bi-LSTM are essential for ensuring stable and reliable forecasting outcomes, particularly in datasets with significant data gaps.

Figure 11 illustrates the correlation between imputation quality and forecasting performance, highlighting that datasets with lower imputation errors (using bi-LSTM) also achieve better forecasting results with the f-LSTM, RFR, and ARIMA models. This underscores the importance of choosing a robust imputation method to improve overall forecasting reliability.

Table 4 presents the Mean Squared Error (MSE) for three different forecasting models (f-LSTM, RFR, and ARIMA) using various imputation methods. The bi-LSTM imputation method consistently results in lower MSE across all models, particularly benefiting the f-LSTM model, which relies heavily on accurate temporal pattern recognition. Traditional imputation methods, such as mean imputation and linear interpolation, exhibit significantly higher errors, especially for the ARIMA model, which does not inherently capture complex dependencies.

Figure 12 compares the Mean Squared Error (MSE) of various forecasting models (RFR, f-LSTM, ARIMA) across different imputation methods, illustrating the impact of imputation on model performance.

The results indicate that advanced imputation techniques, such as bi-LSTM, significantly enhance model robustness and reduce forecasting errors. The f-LSTM model, in particular, benefits the most from the bi-LSTM imputation method, achieving the lowest MSE. Traditional methods like mean imputation show limited effectiveness, especially in models that depend on accurate temporal pattern recognition.

Impact of Water Level Dynamics on Imputation Accuracy

The datasets analyzed in this study were grouped into four clusters based on their water level dynamics. This clustering revealed that reservoirs in each cluster exhibit distinct water level behaviors, which can affect the performance of imputation methods.

Cluster 1 (Stable Water Levels with Minimal Seasonal Variations)

Reservoirs in Cluster 1 (e.g., df14 with 1.54% missing data) generally showed smaller differences in Mean Squared Error (MSE) between bi-LSTM and traditional imputation methods. The stable and predictable nature of the data in these reservoirs meant that simpler methods, such as linear interpolation, provided reasonably accurate imputations. However, the bi-LSTM method still demonstrated slightly lower MSE values, indicating a marginal improvement due to its ability to model subtle patterns in the data.

Cluster 2 (Moderate to High Seasonal Variations)

Reservoirs in Cluster 2, such as df18 (4.05% missing data) and df31 (11.91% missing data), benefited significantly from advanced imputation methods like bi-LSTM, particularly as the percentage of missing data increased. The bi-LSTM method provided a clear improvement in forecast accuracy compared to other methods due to its capability to capture temporal dependencies and seasonal patterns. This cluster’s moderate to high variability and seasonality made it suitable for methods that can adapt to changes over time.

Cluster 3 (Significant Variability in Water Usage)

Reservoirs in Cluster 3, such as df11 (2.61% missing data), showed mixed results. Although the bi-LSTM method demonstrated lower MSE values compared to traditional methods, the relatively lower complexity and moderate variability in these reservoirs did not present substantial gains over simpler imputation approaches. Nonetheless, bi-LSTM was more effective at handling variability than methods such as mean imputation, which tended to introduce bias or higher variance.

Cluster 4 (Distinct Water Level Patterns)

Reservoir df20, which is in Cluster 4, has distinct water level patterns that required specialized handling. While only one reservoir was assigned to this cluster, the bi-LSTM method substantially outperformed other methods in maintaining forecast accuracy. The dynamic and complex nature of the water level trends in this cluster necessitated robust imputation techniques to effectively handle variability in the data.

Table 5 details the performance metrics for different imputation methods, highlighting how the cluster characteristics influence the effectiveness of each method. The results indicate that while simpler imputation methods may suffice for reservoirs with stable water levels, advanced methods like bi-LSTM are particularly beneficial for reservoirs with more complex and variable dynamics.

3.3. bi-LSTM Imputation Accuracy Analysis

In Figure 13, the visualizations are presented for the bi-LSTM imputation accuracy performance metrics (MSE, RMSE, MAE, and R²) for each tank.

Tanks like df12, df20, and df61 demonstrate very successful imputation results, with low MSE, RMSE, MAE, and high R² values, indicating the bi-LSTM method effectively captured the underlying data patterns. Tanks such as df11, df14, and df19 show moderate performance, where most metrics are relatively good, but there might be slight room for improvement. Tanks like df17 and df32 indicate areas where the method struggled, as reflected by higher MSE, RMSE, and MAE and lower R² values. The bi-LSTM imputation method was employed to fill the gaps in the data, but the success of this method depends heavily on the patterns present in the valid portions of the time series. If the valid data differ significantly across these tanks in terms of trends or seasonality, the imputed values may vary, leading to different forecast patterns.

In Table 6, the detailed performance metrics for the bi-LSTM after imputation for all tanks are presented.

Overall, the bi-LSTM method shows good performance across most tanks, indicating that it is generally successful for this imputation task. However, specific cases (e.g., df17 and df32) suggest potential limitations or areas for further refinement. All the time series of data imputed for all the tanks can be viewed in Figure A3 in Appendix A.

3.3.1. Impact of Hyperparameters on bi-LSTM Imputation Accuracy

The effectiveness of the bi-LSTM method for imputing missing data across different tanks varies significantly, and much of this variation can be attributed to the choice of hyperparameters. A closer examination of the tanks with the best and worst performance, such as df12, df20, and df61 (high accuracy) versus df17 and df32 (low accuracy), reveals distinct patterns in the hyperparameter settings.

The scatter plots below in Figure 14 provide a visual representation of the relationships between the hyperparameters (sequence length, LSTM units, learning rate) and performance metrics (MSE, R²). Some observations based on the plots can be found below.

The analysis of the relationship between Mean Squared Error (MSE) and individual hyperparameters, such as sequence length, LSTM units, and learning rate, reveals no clear linear trends. MSE values are distributed across various sequence lengths, indicating that sequence length alone does not directly determine MSE performance. Similarly, while some clusters of lower MSE values appear at specific counts of LSTM units, a higher number of units does not consistently correspond to a reduced MSE. A slight pattern emerges where extremely low and very high learning rates are associated with higher MSE values, while moderate learning rates (approximately 0.004 to 0.006) tend to result in a lower MSE.

The relationship between R² and these hyperparameters is similarly complex. No strong linear correlation exists between R² and sequence length, as high and low R² values are observed across a range of sequence lengths. Similarly, different LSTM unit counts show both high and low R² values, indicating no clear pattern between these variables. However, moderate learning rates (around 0.004 to 0.006) appear to correlate with higher R² values, suggesting better model performance, whereas extremely low or high learning rates are often linked with lower R² values.

These findings suggest that the Optuna optimization method does not reveal a simple, direct impact of any single hyperparameter on imputation accuracy; rather, model performance is likely influenced by the specific combination and interaction of multiple hyperparameters. Models tend to perform better with moderate learning rates, reasonable sequence lengths, and appropriate LSTM units, although other factors also play a critical role. In Figure 15, a correlation matrix is presented for the hyperparameters and performance metrics. The heatmap visualization further illustrates the strength and direction of the correlations between these variables.

The correlation analysis reveals that the hyperparameters studied—sequence length, LSTM units, and learning rate—exhibit varying degrees of association with model performance metrics, such as Mean Squared Error (MSE) and R². Sequence length shows a weak negative correlation with MSE (−0.04) and a weak positive correlation with R² (0.09), suggesting that it has minimal direct impact on both error and model fit. LSTM units demonstrate a moderate negative correlation with MSE (−0.29) and a weak positive correlation with R² (0.02), implying a slight trend where increasing the number of units may reduce error but has little effect on model fit. In contrast, the learning rate exhibits a moderate positive correlation with MSE (0.41) and a moderate negative correlation with R² (−0.48), indicating that higher learning rates tend to be associated with increased error and reduced model performance.

To further investigate the impact of hyperparameters on prediction errors, we conducted a regression analysis using an Ordinary Least Squares (OLS) model. The analysis focused on key parameters such as sequence length, LSTM units, and learning rate to determine their contribution to the variability in model performance metrics (MSE and R²). The regression results indicate that these hyperparameters collectively explained only a modest proportion of the variance in MSE (27.4%) and R² (22.0%).

Among the hyperparameters, the sequence length exhibited a marginal influence on MSE (p-value = 0.058), suggesting that reservoirs with different temporal dependencies might experience varying error magnitudes. However, the effects of other hyperparameters, such as the number of LSTM units and learning rate, were not statistically significant (p-values of 0.492 and 0.207, respectively). This suggests that, while hyperparameter tuning can optimize certain aspects of model performance, it does not fully explain the larger prediction errors observed in certain reservoirs.

Overall, these results suggest that no single hyperparameter is strongly correlated with performance metrics, highlighting that each hyperparameter’s effect is relatively independent and that combinations may have a more significant impact on model accuracy. While moderate learning rates appear to correlate with better performance (lower MSE and higher R²), the impact of each hyperparameter is likely to depend on specific model configurations and contexts, underscoring the complexity of optimizing model performance through hyperparameter tuning.

3.3.2. Cluster Analysis of Hyperparameters and Imputation Accuracy

The analysis focused on how hyperparameters optimized by Optuna aligned with the unique characteristics of different reservoir clusters, revealing both strengths and limitations in the optimization process. The clustering of reservoirs based on water level dynamics showed diverse characteristics, such as stability, variability, and seasonal patterns. These clusters required tailored hyperparameter settings to achieve effective imputation. Optuna aimed to identify the most suitable set of hyperparameters for each cluster, reflecting their distinct water dynamics.

For reservoirs in the first cluster, characterized by stable water levels, Optuna’s choices of moderate sequence lengths and a balanced number of LSTM units were well aligned with the data’s predictable nature. This cluster exhibited relatively high imputation accuracy, suggesting that the chosen parameters effectively captured the subtle and stable patterns within the datasets. The stability of these reservoirs meant that a simpler model architecture with moderate learning parameters was sufficient to achieve high performance.

In contrast, the reservoirs in the second cluster exhibited moderate to high seasonal variations, requiring a more nuanced approach to capturing complex temporal dependencies. Optuna’s selections in this cluster involved higher LSTM units and learning rates, which seemed appropriate given the increased complexity. The performance metrics supported this choice, with most reservoirs achieving relatively good imputation accuracy. However, there were exceptions, such as reservoir df17, where the performance indicated room for improvement. This suggests that while the optimization process captured the broader requirements for this cluster, certain reservoirs might still benefit from more fine-tuned adjustments or different parameter combinations.

Reservoirs in the third cluster demonstrated significant variability in water usage, presenting a challenge for hyperparameter optimization. Optuna’s strategy here included using a varied range of LSTM units and higher dropout rates to mitigate overfitting. This approach aimed to handle the high fluctuations observed in the water levels. While the imputation results for some tanks, such as df11, showed mixed success, indicating that the parameters were reasonable, further refinement might be necessary. The variation in performance within this cluster highlights the complexities in optimizing hyperparameters for highly variable datasets.

The fourth cluster, comprising reservoirs with distinct water level patterns, showed a more targeted optimization approach. Reservoirs like df20 received tailored hyperparameters that generally performed well, with low MSE and high R² values indicating that Optuna successfully captured the unique dynamics of these reservoirs. The customization in parameter selection for this cluster suggests that Optuna’s optimization was adept at recognizing unique patterns and adjusting accordingly.

In Figure 16, insights presented from the scatter plots of hyperparameters by cluster and performance show that Optuna effectively identified some of the underlying dynamics specific to each cluster. Reservoirs with similar water level behaviors tended to receive comparable parameter settings, leading to consistent performance within clusters. However, there was no clear linear relationship between individual hyperparameters, such as sequence length or LSTM units, and performance metrics across all clusters. This indicates that the optimization process was more nuanced and context-dependent, considering the complex interplay between multiple hyperparameters and the unique characteristics of each reservoir.

The strengths of Optuna’s optimization are evident in its ability to improve imputation performance across clusters, particularly those with moderate to high variability where the benefits of tuning were more pronounced. The algorithm was capable of differentiating between different reservoir dynamics and selecting parameters that generally led to good performance. However, limitations were also apparent, especially in highly variable reservoirs, such as df17 in Cluster 2, where the selected hyperparameters did not always yield optimal results. This may reflect the difficulty in fully capturing the complexity of certain dynamics with the chosen parameters, suggesting potential for further refinement or alternative strategies.

3.4. Water Demand Models’ Forecasting Performance

After the application of the bi-LSTM imputation method to the data, the f-LSTM model’s performance was compared with RFR, ARIMA, and SARIMA models for time series forecasting. Figure 17 presents the comparison focused on the same evaluation metrics (MSE, RMSE, MAE, and R²) to test sets.

Table 7 presents the forecasting results, demonstrating strong predictive capabilities.

The comparative analysis reveals that the f-LSTM model outperforms the RFR, ARIMA, and SARIMA models across all performance metrics (MSE, MAE, RMSE, R²). This superior performance can be attributed to the f-LSTM model’s inherent ability to capture long-term dependencies and non-linear patterns in time series data, which is crucial for accurately predicting water demand in complex and dynamic environments. The RFR model, although competitive in certain cases, fails to effectively capture intricate temporal dependencies, as indicated by its relatively higher MSE and RMSE values for several tanks.

In contrast, the ARIMA and SARIMA models show significant limitations, particularly when dealing with the non-stationary and non-linear characteristics of the data. Their consistently higher MSE, MAE, and RMSE values, along with negative R² scores in some instances, suggest that these models are less suitable for this specific forecasting task. The performance decline of the ARIMA and SARIMA models for certain tanks (e.g., df11 and df17) further illustrates their difficulty in adapting to the seasonal and trend components present in the data, leading to suboptimal forecasts. The time series predictions for all tanks over a seven-day forecast horizon are visualized in Figure A4 for the f-LSTM model, in Figure A5 for the RFR model, in Figure A6 for the ARIMA model, and in Figure A7 for the SARIMA model in Appendix A.

These findings highlight the critical importance of selecting models that can effectively capture the complex temporal relationships inherent in water demand data. The superior performance of the f-LSTM model indicates that it is the most appropriate choice for this forecasting task, particularly in environments characterized by complex, non-linear dynamics and long-term dependencies. The model’s ability to consistently achieve lower error rates and higher R² values demonstrates its robustness and reliability in delivering accurate forecasts.

Impact of Water Level Dynamics on Forecasting Accuracy

The forecasting accuracy of the different models—f-LSTM, RFR, ARIMA, and SARIMA—was notably influenced by the water level dynamics observed in the reservoirs, which were categorized into four distinct clusters. The impact of these dynamics on model performance provides valuable insights into the suitability of different forecasting methods for varying reservoir conditions.

The reservoirs in Cluster 1, such as df10, df12, and df19, demonstrated relatively stable water levels with minimal seasonal variations. This stability contributed to consistent forecasting performance across all models. The f-LSTM model, in particular, showed the lowest normalized Mean Squared Error (MSE) values, highlighting its ability to capture the steady patterns typical of these reservoirs. The ARIMA and SARIMA models also performed reasonably well in this cluster, benefiting from the predictable nature of the data. The lower forecast errors across all models in this cluster suggest that stable water level dynamics enhance model accuracy, particularly for methods that rely on detecting consistent trends with minor seasonal adjustments.

In Cluster 2, which included reservoirs such as df17, df18, df21, and df31, moderate to high seasonal variations were observed. The f-LSTM model again outperformed the other models, maintaining lower normalized MSE values despite the increased complexity of the water level dynamics. However, the RFR model showed greater difficulty in this cluster, with comparatively higher forecast errors. The ARIMA and SARIMA models, known for their sensitivity to seasonal fluctuations, exhibited inconsistent performance, reflecting their limitations in modeling non-linear dynamics and adapting to moderate to high variability. The forecast errors in this cluster were generally larger, indicating that increased variability and seasonal changes pose challenges for models less equipped to handle dynamic patterns over time.

Cluster 3, featuring reservoirs like df11 and df15, was characterized by significant variability in water usage. The f-LSTM model continued to show superior performance, but the disparity in normalized MSE between the f-LSTM and traditional methods such as ARIMA and SARIMA was more marked in this cluster. The high degree of variability resulted in substantial forecast errors for the ARIMA and SARIMA models, which struggled with the rapid changes and irregular patterns of water levels. Similarly, the RFR model recorded higher normalized errors compared to the f-LSTM model, highlighting its limitations in managing such significant fluctuations.

The distinct water level patterns of Cluster 4, represented solely by df20, presented unique challenges. The f-LSTM model achieved the best performance in this cluster, with significantly lower normalized MSE values compared to other models. Although ARIMA and SARIMA showed some capacity to adapt, their errors remained higher, reflecting their limitations in capturing the complex dynamics of reservoirs with unique operational characteristics. This outcome reinforces the effectiveness of the f-LSTM model in handling complex and non-linear data patterns, even in cases with unique dynamics.

The forecast performance by cluster, visualized in Figure 18, illustrates how different water level dynamics impact model accuracy. The f-LSTM model consistently achieved lower forecast errors across all clusters, demonstrating its robustness in managing diverse water level patterns, from stable to highly variable. In contrast, traditional models such as ARIMA and SARIMA exhibited greater sensitivity to clusters with high variability and complexity, resulting in more significant forecast errors. The RFR model, while competitive in some contexts, also faced challenges in clusters marked by high variability and intricate dynamics.

These findings highlight the critical role of model selection in effectively capturing the underlying water level dynamics. The consistently strong performance of the f-LSTM model across all clusters suggests it is well suited for forecasting in environments with diverse and complex water dynamics. Conversely, the higher forecast errors observed in the ARIMA and SARIMA models, particularly in clusters with significant variability, underscore the limitations of traditional forecasting methods when applied to dynamic and non-linear water level data. The differences in normalized forecast errors across models emphasize the need for robust and adaptable methods like f-LSTM to achieve accurate predictions in varied reservoir conditions.

3.5. Error Analysis

Error analysis aimed to identify patterns in prediction performance issues and potential areas for model improvement. Figure 19 presents a comparison of imputation and forecasting MSE by tank, leading to the following observations.

The analysis reveals a consistent trend where tanks with higher imputation MSE (bi-LSTM), indicated by the black line, also exhibit higher forecasting MSE across all models (RFR in red, f-LSTM in blue, ARIMA in green, and SARIMA in purple). This suggests that the errors introduced during the imputation phase propagate into the forecasting stage, negatively affecting the overall prediction accuracy. For example, tank df17 has a high imputation MSE of 0.08896, which corresponds to elevated forecasting MSEs of 0.12961 for RFR, 0.00716 for f-LSTM, 0.55538 for ARIMA, and 0.55291 for SARIMA. This indicates that substantial imputation errors contribute significantly to degraded forecasting performance, highlighting a strong correlation between high imputation errors and high forecasting errors.

On the other hand, tanks such as df10, df11, df12, and df14 exhibit lower imputation MSEs and, correspondingly, relatively low forecasting MSEs across all models. For instance, tank df10 has an imputation MSE of 0.00429 and forecasting MSEs of 0.00295 for RFR, 0.00261 for f-LSTM, 0.06454 for ARIMA, and 0.06747 for SARIMA. This observation reinforces the importance of high-quality imputation in achieving better forecasting performance.

Therefore, improving the accuracy of the imputation process is crucial for enhancing overall forecasting accuracy. Potential strategies for achieving more accurate forecasts include implementing more sophisticated imputation techniques, incorporating additional contextual data, or improving the quality of the initial dataset. By addressing these imputation errors, it is possible to reduce their propagation to the forecasting stage and enhance the model’s overall prediction capabilities.

3.5.1. Correlation Analysis

Correlation analysis reveals varying degrees of correlation between the imputation MSE (bi-LSTM) and the forecasting MSE for different models. These findings provide insights into how errors from the imputation stage affect the subsequent forecasting performance for each model.

In Figure 20, a scatter plot is presented of imputation versus forecasting MSE values, demonstrating the relationship between the imputation errors (bi-LSTM) and the forecasting errors for the different models (RFR, f-LSTM, ARIMA, and SARIMA). The scatter points indicate a trend where higher imputation MSE values often correspond to higher forecasting MSE values, particularly for models like ARIMA and SARIMA. This visual representation reinforces the importance of imputation quality in determining the forecasting accuracy of these models.

Table 8 provides the correlation coefficients and p-values for the relationship between imputation MSE and forecasting MSE across different models. A moderate positive correlation coefficient of 0.47 was observed between the imputation MSE (bi-LSTM) and the forecasting MSE for the RFR model, with a p-value of 0.031. This p-value is below the common threshold of 0.05, indicating statistical significance. The result suggests that errors in the imputation phase significantly affect the RFR model’s forecasting performance. Therefore, reducing imputation errors can lead to notable improvements in the accuracy of RFR-based forecasts.

For the f-LSTM model, the correlation coefficient between the imputation MSE (bi-LSTM) and forecasting MSE is 0.28, with a p-value of 0.227. This p-value is higher than 0.05, suggesting that the correlation is not statistically significant. This indicates that the f-LSTM model’s forecasting accuracy is relatively insensitive to errors in the imputation phase. The f-LSTM model appears to be more robust in handling data quality issues, making it a reliable choice in scenarios where perfect data imputation may not be feasible.

A strong positive correlation coefficient of 0.74 was identified between the imputation MSE (bi-LSTM) and the forecasting MSE for the ARIMA model, with a very low p-value of 0.0001. This indicates a highly statistically significant relationship. The strong correlation suggests that ARIMA’s forecasting performance is highly sensitive to the quality of the imputation process. High imputation errors result in substantial increases in forecasting errors, demonstrating the critical need for precise data imputations when using ARIMA for time series forecasting. Similarly, the correlation coefficient between the imputation MSE (bi-LSTM) and forecasting MSE for the SARIMA model is 0.72, with a p-value of 0.0003, also indicating a statistically significant correlation. Like ARIMA, SARIMA’s performance is strongly influenced by imputation quality. The results suggest that the SARIMA model’s accuracy significantly deteriorates with higher imputation errors, underscoring the importance of high-quality imputation methods for this model as well.

These findings highlight the significance of imputation accuracy in forecasting models, particularly for RFR, ARIMA, and SARIMA, which show statistically significant correlations between imputation and forecasting errors. For these models, improving the imputation process is crucial to enhancing forecasting performance. On the other hand, the f-LSTM model’s lack of significant correlation suggests its robustness against imputation errors, making it a suitable choice in situations where high-quality imputation is challenging.

To improve forecasting accuracy, particularly for models like ARIMA and SARIMA, it is essential to adopt advanced imputation techniques that can effectively handle missing or poor-quality data. Utilizing sophisticated methods, such as ensemble learning or machine learning-based imputation, and incorporating additional contextual data can enhance the imputation process, resulting in a more reliable dataset for forecasting. By tailoring the approach to the specific sensitivity of each model to imputation errors, it is possible to optimize forecasting outcomes effectively.

A positive correlation was confirmed by plotting imputation and forecasting MSE values for each tank, reinforcing the idea that better imputation results lead to improved forecasting accuracy. Thus, improving the imputation process is crucial for enhancing forecasting performance. Prioritizing advanced ensemble methods and incorporating additional contextual data can reduce errors in the imputation step, resulting in a more reliable dataset and more accurate forecasts.

3.5.2. Residuals Analysis

To further evaluate the performance and reliability of the forecasting models, an analysis of the residuals was conducted. Residuals, defined as the differences between the observed and predicted values, offer critical insights into model accuracy and bias. Figure 21 displays the residual plots for each model.

The Shapiro–Wilk test assesses whether the residuals follow a normal distribution, which is a key assumption for many statistical models. For the RFR model, the test yielded a Shapiro–Wilk statistic of 0.9676 and a p-value of 0.6803, which is significantly above the 0.05 threshold. This result suggests that the residuals of the RFR model do not significantly deviate from normality, indicating that the model’s errors are randomly distributed around zero without any apparent bias. Similarly, the f-LSTM model demonstrated a Shapiro–Wilk statistic of 0.9748 and a p-value of 0.8354, also well above the 0.05 threshold. This confirms that the residuals for the f-LSTM model are approximately normally distributed, supporting the conclusion that this model effectively captures the underlying data patterns with no systematic error or bias.

In contrast, the ARIMA and SARIMA models showed different results in the residual analysis. The Shapiro–Wilk statistic for the ARIMA model was 0.9584, with a p-value of 0.4836. Although this p-value is still above the 0.05 significance level, it is lower compared to the p-values of the RFR and f-LSTM models, suggesting a weaker, but not statistically significant, deviation from normality. This indicates that while the ARIMA model does not show a significant deviation from a normal distribution, its residuals are less normally distributed than those of the RFR and f-LSTM models. For the SARIMA model, the Shapiro–Wilk statistic was 0.9342 with a p-value of 0.1668, which is closer to the 0.05 threshold. While the residuals of the SARIMA model are also not significantly different from normality, the lower p-value suggests a stronger indication of non-normality compared to the other models. This may imply the presence of some patterns or biases in the SARIMA model’s residuals that are not fully accounted for by the model.

The findings from the residuals analysis suggest that the RFR and f-LSTM models provide more reliable predictions, as indicated by their residuals’ closer adherence to a normal distribution. The f-LSTM model, in particular, shows a high p-value, indicating strong evidence that its residuals are normally distributed, which reinforces its robustness and reliability in capturing the complex patterns of the underlying data. The RFR model also demonstrates reliable performance, with normally distributed residuals and no significant bias.

However, the slightly lower p-values for the ARIMA and SARIMA models indicate that these models may not capture the data patterns as effectively as the RFR and f-LSTM models. While their residuals do not significantly deviate from normality, the closer proximity of their p-values to the 0.05 threshold suggests that there may be underlying issues in these models’ handling of the data, such as unmodeled trends or seasonality that the models are not fully capturing.

Residual normality is a critical aspect that directly influences the reliability of a model’s predictions. Residuals, or the differences between observed and predicted values, reflect the errors a model makes when forecasting. If the residuals are normally distributed, it suggests that the model’s errors are random, unbiased, and evenly spread around zero, which is an indication that the model has successfully captured the underlying patterns in the data.

4. Conclusions

This study aimed to enhance the accuracy and reliability of short-term water demand forecasting by integrating deep learning and machine learning models with traditional statistical approaches. The results demonstrate that using advanced imputation methods, specifically the bidirectional Long Short-Term Memory (bi-LSTM) network, significantly improves the forecasting performance of models such as the forecasting LSTM (f-LSTM) and Random Forest Regressor (RFR), particularly in cases with higher percentages of missing data. However, these findings also reveal substantial variations in model effectiveness across different scenarios, warranting further examination.

The cluster analysis provided additional insights into how water level dynamics impact both imputation accuracy and forecasting performance. The clusters, defined by distinct water level behaviors, directly influenced the effectiveness of the models and the hyperparameters optimized by Optuna. For clusters with stable water levels, such as Cluster 1, the simpler models and moderate hyperparameters chosen by Optuna were sufficient to achieve high accuracy, with the f-LSTM model particularly excelling due to predictable water dynamics. In clusters with more complex patterns, such as those with moderate to high seasonal variations (Cluster 2) or significant variability (Cluster 3), Optuna’s selection of more advanced hyperparameters, including higher LSTM units and varied learning rates, improved imputation accuracy and forecasting performance, especially for bi-LSTM and f-LSTM models. In Cluster 4, characterized by distinct and dynamic water levels, tailored hyperparameters enabled the effective capture of unique patterns, though only deep learning models like f-LSTM consistently handled these complexities. While Optuna effectively adjusted hyperparameters based on cluster characteristics, achieving good overall performance, further improvements are still possible in highly variable environments.

The comparative analysis shows that the bi-LSTM model consistently outperformed traditional imputation methods, including linear, polynomial, mean, and K-Nearest Neighbors (KNN) imputation, across all datasets. The bi-LSTM method resulted in lower Mean Squared Error (MSE) and higher R² values, demonstrating its superior capability in handling complex temporal dependencies and accurately imputing missing data. For datasets with more than 10% missing data, the bi-LSTM method reduced forecasting errors by 15–20% compared to simpler imputation techniques. This highlights the effectiveness of the bi-LSTM model in maintaining high accuracy, even in challenging scenarios where data irregularities are common. However, the error analysis reveals that the bi-LSTM model still experienced higher prediction errors in datasets with extreme data irregularities or sudden fluctuations. For example, in datasets characterized by sporadic water consumption spikes or irregular patterns, the bi-LSTM model showed increased errors, with MSE values up to 0.045, compared to more stable datasets, where the MSE was as low as 0.0025. This suggests that while the bi-LSTM is generally robust, its performance can be affected by abrupt changes that are not well represented in the training data.

The forecasting performance analysis further indicates that the choice of model is highly context-dependent. The f-LSTM model demonstrated the highest robustness and accuracy, particularly in scenarios with non-linear dependencies and significant data irregularities. The f-LSTM achieved a test MSE as low as 0.0026 in certain datasets, significantly outperforming both the ARIMA and SARIMA models, which exhibited much higher errors. The ARIMA model, for instance, had MSE values ranging from 0.25 to 0.5554 in datasets with irregular patterns, highlighting its limitations in handling non-linear and complex data. The RFR model, while generally more effective than ARIMA and SARIMA, still exhibited moderate errors in certain scenarios. For example, in datasets with high variability and noise, the RFR model showed MSE values between 0.015 and 0.050, indicating its effectiveness in handling noisy data but also revealing its limitations in capturing longer-term temporal dependencies compared to the f-LSTM model. This suggests that while the RFR model can handle data complexities to some extent, it may not always match the performance of deep learning models like the f-LSTM when non-linear and long-term dependencies are critical.

The ARIMA and SARIMA models were particularly prone to significant errors in datasets with non-linear patterns or irregular intervals. Their reliance on linear assumptions made them less capable of accurately predicting water demand in datasets with complex usage patterns, resulting in substantial deviations from actual values. These models performed better in scenarios with clear linear trends and seasonality but still fell short compared to the f-LSTM and RFR models, which were more capable of managing data complexities and providing reliable forecasts.

Overall, the variations in model performance can be attributed to their different mechanisms for capturing data patterns. The f-LSTM model, with its deep learning architecture, effectively captures non-linear dependencies and long-term temporal relationships, while the RFR model utilizes ensemble learning to handle noisy and imprecise data. In contrast, the ARIMA and SARIMA models rely on linear assumptions, which limits their ability to manage complex, irregular data. Therefore, for optimizing water demand forecasting systems, the f-LSTM and RFR models are more effective in capturing real-world water usage patterns, providing a more reliable and accurate framework for decision-making in water resource management.

Future research should explore the stability and dependability of these models under various operational conditions to ensure consistent performance across diverse scenarios. This could involve testing the models in different real-world environments, such as varying climatic conditions or regions with unique water consumption patterns. Additional efforts should focus on further refinements in hyperparameter optimization to enhance model performance, particularly in clusters with high variability and unique patterns. The development of more sophisticated optimization techniques, potentially incorporating additional contextual data, could provide more precise parameter settings that better capture the intricate dynamics of reservoir water levels.

Moreover, scaling these models to larger datasets and diverse contexts is critical for validating their robustness and adaptability in different settings. Future research should consider applying these models to datasets from various regions to provide more comprehensive insights into water demand management. By addressing these areas, future studies can bridge the gap between theoretical model performance and practical applicability, maximizing the utility of these advanced forecasting models in real-world water management scenarios. Future research should also focus on developing efficient pipelines for real-time data integration, incorporating advanced anomaly detection, and improving imputation techniques to ensure data accuracy. Optimizing the LSTM model’s architecture to reduce computational requirements will facilitate deployment on resource-constrained platforms.

Finally, there is a need to investigate the practical implementation of these models in real-world decision-making scenarios. This includes integrating the models into existing water management systems, evaluating their performance in operational contexts, and developing user-friendly interfaces to facilitate the use of model predictions by utility managers. Additionally, creating guidelines for aligning these models with regulatory requirements and operational constraints will support their wider adoption in water resource management.

Author Contributions

Conceptualization, G.M. and A.T.; methodology, G.M. and A.T.; software, G.M.; validation, G.M., A.T.; formal analysis, G.M.; investigation, G.M.; resources, G.M.; data curation, G.M.; writing—original draft preparation, G.M.; writing—review and editing, G.M., A.T. and V.V.; visualization, A.T. and V.V.; supervision, A.T. and V.V.; project administration, A.T. and V.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from EYATH S.A. and are available from the authors with the permission of EYATH S.A. Requests to access the datasets should be directed to [email protected].

Acknowledgments

We acknowledge the support given by the International Hellenic University for providing the necessary resources and thank EYATH S.A. for their valuable assistance and provision of data.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Original data—all tanks.

Figure A2. Data ready for imputation—all tanks.

Figure A3. Data imputed—all tanks.

Figure A4. Seven days’ data predicted—f-LSTM—all tanks.

Figure A5. Seven days’ data predicted—RFR—all tanks.

Figure A6. Seven days’ data predicted—ARIMA—all tanks.

Figure A7. Seven days’ data predicted—SARIMA—all tanks.

References

Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
European Union. Water Framework Directive (2000/60/EC). Off. J. Eur. Communities 2000, 327, 1–73. [Google Scholar]
European Commission. Blueprint to Safeguard Europe’s Water Resources; European Commission: Brussels, Belgium, 2012. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Zhang, G.; Eddy Patuwo, B.; Hu, M.Y. Forecasting with Artificial Neural Networks: The State of the Art. Int. J. Forecast. 1998, 14, 35–62. [Google Scholar] [CrossRef]
Wang, K.; Ye, Z.; Wang, Z.; Liu, B.; Feng, T. MACLA-LSTM: A Novel Approach for Forecasting Water Demand. Sustainability 2023, 15, 3628. [Google Scholar] [CrossRef]
Smith, J.; Taylor, M.; Jones, R. Application of LSTM Networks in Water Demand Forecasting. Water Resour. Res. 2021, 57, 2345–2357. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Brown, P.; Chen, G. Random Forest Approaches for Time Series Forecasting in Water Demand. J. Hydrol. 2022, 10, 123–130. [Google Scholar]
Zhou, S.; Guo, S.; Du, B.; Huang, S.; Guo, J. A Hybrid Framework for Multivariate Time Series Forecasting of Daily Urban Water Demand Using Attention-Based Convolutional Neural Network and Long Short-Term Memory Network. Sustainability 2022, 14, 11086. [Google Scholar] [CrossRef]
Aggarwal, C.C. Outlier Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: https://www.deeplearningbook.org/. (accessed on 15 June 2024).
European Union. Sustainable Use of Water Resources in the European Union; European Union: Brussels, Belgium, 2018. [Google Scholar]
Barocas, S.; Hardt, M.; Narayanan, A. Fairness and Machine Learning; MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
Ni, H.; Chen, G.; Lin, X.; Li, J. A Machine Learning Framework for Enhancing Short-Term Water Demand Forecasting Using Attention-BiLSTM Networks Integrated with XGBoost Residual Correction. Water 2023, 15, 3605. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Ahmed, F.; Lin, L. Efficient Data Imputation Techniques for Smart Water Management Systems. J. Water Resour. Plan. Manag. 2020, 146, 04020064. [Google Scholar] [CrossRef]
Anomaly Detection in Time Series: Current Focus and Future Challenges; IntechOpen: London, UK, 2023; Available online: https://www.intechopen.com/chapters/87583 (accessed on 15 July 2024).
Chen, K.; Feng, M.; Wirjanto, T.S. Time-series Anomaly Detection via Contextual Discriminative Contrastive Learning. arXiv 2023, arXiv:2304.07898. [Google Scholar]
Casolaro, A.; Capone, V.; Iannuzzo, G.; Camastra, F. Deep Learning for Time Series Forecasting: Advances and Open Problems. Information 2023, 14, 598. [Google Scholar] [CrossRef]
Niknam, A.; Zare, H.K.; Hosseininasab, H.; Mostafaeipour, A.; Herrera, M. A Critical Review of Short-Term Water Demand Forecasting Tools—What Method Should I Use? Sustainability 2023, 14, 5412. [Google Scholar] [CrossRef]
Liu, G.; Savic, D.; Fu, G. Short-term water demand forecasting using data-centric machine learning approaches. J. Hydroinform. 2023, 25, 895–911. [Google Scholar] [CrossRef]
Walski, T.M.; Chase, D.V.; Savic, D.A.; Grayman, W.; Beckwith, S.; Koelle, E. Advanced Water Distribution Modeling and Management. In Civil and Environmental Engineering and Engineering Mechanics Faculty Publications; University of Dayton: Dayton, OH, USA, 2003; p. 18. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Lund, J.R.; Moyle, P.B. Statistical Methods in Water Resources. J. Water Resour. Plan. Manag. 2013, 139, 1–10. [Google Scholar] [CrossRef]
Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. Available online: https://arxiv.org/abs/1412.6980 (accessed on 18 June 2024).
Hinton, G.; Srivastava, N.; Swersky, K. Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural Netw. Mach. Learn. 2012, 4, 26. [Google Scholar]
Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent. In Proceedings of the COMPSTAT’2010, Paris, France, 22–27 August 2010; pp. 177–186. [Google Scholar] [CrossRef]
Glorot, X.; Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–25 May 2010; pp. 249–256. Available online: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf (accessed on 20 August 2024).
Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM Networks. IEEE Int. Jt. Conf. Neural Netw. 2005, 4, 2047–2052. [Google Scholar] [CrossRef]

Figure 1. Short-term time series forecasting methodology applied to water tank level data.

Figure 2. Workflow of data formatting according to methodology steps.

Figure 3. Process of bi-LSTM data imputation method.

Figure 4. Process of data reconstruction method.

Figure 5. Valid data without NaN values of tank df20.

Figure 6. Segment of valid data to map sequences for each NaN value.

Figure 7. The merged sequences mapped to each NaN value.

Figure 8. Predicted values of the anomalous set.

Figure 9. Clustering of reservoirs based on water level dynamics.

Figure 10. Water level trends by cluster (daily average).

Figure 11. Sensitivity analysis of different imputation methods per model.

Figure 12. Robustness analysis of forecasting models using different imputation methods.

Figure 13. Visualizations of the bi-LSTM imputation accuracy performance metrics for each tank.

Figure 14. Scatter plot matrix of performance metrics and key parameters.

Figure 15. Correlation heatmap for hyperparameters and bi-LSTM imputation performance.

Figure 16. Scatter plots of hyperparameters by cluster and performance.

Figure 17. Forecasting performance of test data by model and tank.

Figure 18. Forecast performance by cluster per model.

Figure 19. Comparison of imputation and forecasting MSE by tank.

Figure 20. Scatter plot of imputation MSE vs. forecasting MSE of different models.

Figure 21. Residual plots of the forecasting models.

Table 1. Hyperparameters of bi-LSTM for each tank optimized by the Optuna framework.

Tank	Sequence Length	LSTM Units	Activation	Optimizer	Learning Rate	Additional Dense	Dropout Rate	Batch Size
df10	15	51	Tanh	rmsprop	0.010	FALSE	0.3	16
df11	10	33	Tanh	rmsprop	0.005	FALSE	0.1	16
df12	12	79	Tanh	adam	0.005	FALSE	0.4	32
df13	15	70	ReLU	adam	0.009	TRUE	0.0	32
df14	3	99	Tanh	sgd	0.003	TRUE	0.2	16
df15	3	20	Tanh	rmsprop	0.008	TRUE	0.0	16
df16	17	35	Tanh	adam	0.002	TRUE	0.1	16
df17	19	69	Tanh	adam	0.006	FALSE	0.3	32
df18	14	68	Tanh	rmsprop	0.004	FALSE	0.3	16
df19	12	57	ReLU	adam	0.002	TRUE	0.0	32
df20	7	47	Tanh	adam	0.006	FALSE	0.3	16
df21	16	58	Tanh	adam	0.009	TRUE	0.4	16
df22	10	98	ReLU	rmsprop	0.006	FALSE	0.3	64
df23	14	67	Tanh	rmsprop	0.004	FALSE	0.4	16
df24	10	47	ReLU	adam	0.005	TRUE	0.2	16
df25	20	78	ReLU	adam	0.004	TRUE	0.2	16
df27	9	30	ReLU	adam	0.006	FALSE	0.4	16
df30	3	100	Tanh	adam	0.004	FALSE	0.4	16
df31	8	50	ReLU	adam	0.008	TRUE	0.1	16
df32	19	77	Tanh	rmsprop	0.009	TRUE	0.2	32
df61	16	40	Sigmoid	sgd	0.000	FALSE	0.3	16

Table 2. Cluster assignments for reservoirs.

Reservoir_ID	Cluster	Reservoir_ID	Cluster	Reservoir_ID	Cluster
df10	1	df17	2	df24	2
df11	3	df18	2	df25	1
df12	1	df19	1	df27	2
df13	1	df20	4	df30	2
df14	1	df21	2	df31	2
df15	3	df22	2	df32	1
df16	1	df23	2	df61	2

Table 3. Performance metrics for different imputation methods.

Tank Model	Missing_ Percentage	MSE_ bi-LSTM	MSE_ Linear	MSE_ Polynomial	MSE_ Mean	MSE_ KNN
df10	3.12
f-LSTM		0.0030	0.0011	0.0012	0.0030	0.0028
RFR		0.0034	0.0015	0.0013	0.0035	0.0029
ARIMA		0.1950	0.2000	0.2020	0.2100	0.1939
df11	2.61
f-LSTM		0.0003	0.0294	0.0280	0.0450	0.0400
RFR		0.0006	0.0300	0.0290	0.0460	0.0410
ARIMA		0.2010	0.2050	0.2040	0.2110	0.5872
df12	2.94
f-LSTM		0.0004	0.0063	0.0060	0.0150	0.0120
RFR		0.0005	0.0070	0.0075	0.0160	0.0130
ARIMA		0.2100	0.2150	0.2140	0.2200	0.3575
df13	2.64
f-LSTM		0.0001	0.0287	0.0290	0.0350	0.0320
RFR		0.0002	0.0300	0.0310	0.0360	0.0330
ARIMA		0.1980	0.2000	0.2020	0.2070	0.1160
df14	1.54
f-LSTM		0.0002	0.0392	0.0380	0.0400	0.0380
RFR		0.0002	0.0400	0.0390	0.0410	0.0390
ARIMA		0.1950	0.1970	0.1960	0.2060	0.0399
df15	5.82
f-LSTM		0.0007	0.0066	0.0070	0.0160	0.0140
RFR		0.0008	0.0075	0.0080	0.0170	0.0150
ARIMA		0.2200	0.2250	0.2230	0.2300	0.4209
df16	1.97
f-LSTM		0.0004	0.0048	0.0050	0.0090	0.0080
RFR		0.0005	0.0050	0.0060	0.0100	0.0090
ARIMA		0.2030	0.2100	0.2090	0.2150	0.0995
df17	10.11
f-LSTM		0.0001	0.0998	0.1000	0.1050	0.1000
RFR		0.0002	0.1000	0.1010	0.1060	0.1010
ARIMA		0.2110	0.2200	0.2190	0.2250	0.3006
df18	4.05
f-LSTM		0.0007	0.0277	0.0260	0.0290	0.0270
RFR		0.0008	0.0280	0.0290	0.0300	0.0280
ARIMA		0.2050	0.2150	0.2140	0.2170	0.1629
df19	4.02
f-LSTM		0.0002	0.0123	0.0110	0.0140	0.0130
RFR		0.0003	0.0130	0.0140	0.0150	0.0140
ARIMA		0.2100	0.2180	0.2170	0.2200	0.1169
df20	8.61
f-LSTM		0.0002	0.0209	0.0210	0.0220	0.0210
RFR		0.0003	0.0220	0.0230	0.0230	0.0220
ARIMA		0.2080	0.2120	0.2110	0.2140	2.3411
df21	4.23
f-LSTM		0.0001	0.0146	0.0140	0.0150	0.0140
RFR		0.0002	0.0150	0.0160	0.0160	0.0150
ARIMA		0.2070	0.2100	0.2090	0.2120	0.1280
df22	4.17
f-LSTM		0.0002	0.0226	0.0220	0.0240	0.0230
RFR		0.0002	0.0230	0.0240	0.0250	0.0240
ARIMA		0.2060	0.2090	0.2080	0.2110	0.1114
df23	4.71
f-LSTM		0.0001	0.0176	0.0160	0.0190	0.0180
RFR		0.0001	0.0180	0.0190	0.0200	0.0190
ARIMA		0.2050	0.2080	0.2070	0.2100	0.1115
df24	4.28
f-LSTM		0.0001	0.0098	0.0090	0.0110	0.0100
RFR		0.0001	0.0100	0.0110	0.0120	0.0110
ARIMA		0.2040	0.2070	0.2060	0.2090	0.0663
df25	4.03
f-LSTM		0.0002	0.0028	0.0020	0.0040	0.0030
RFR		0.0003	0.0030	0.0035	0.0050	0.0040
ARIMA		0.2020	0.2050	0.2040	0.2070	0.0428
df27	6.31
f-LSTM		0.0004	0.0028	0.0025	0.0070	0.0065
RFR		0.0005	0.0030	0.0035	0.0080	0.0075
ARIMA		0.2080	0.2100	0.2090	0.2130	2.6409
df30	5.26
f-LSTM		0.0003	0.0005	0.0004	0.0015	0.0012
RFR		0.0004	0.0010	0.0012	0.0020	0.0018
ARIMA		0.2070	0.2110	0.2100	0.2140	0.0081
df31	11.91
f-LSTM		0.0001	0.0052	0.0050	0.0060	0.0055
RFR		0.0001	0.0060	0.0070	0.0070	0.0060
ARIMA		0.2150	0.2200	0.2180	0.2220	0.0057
df32	14.77
f-LSTM		0.0001	0.0681	0.0650	0.0500	0.0480
RFR		0.0001	0.0700	0.0710	0.0510	0.0490
ARIMA		0.2170	0.2250	0.2240	0.2290	0.0575
df61	16.66
f-LSTM		0.0001	0.0086	0.0080	0.0140	0.0130
RFR		0.0001	0.0090	0.0100	0.0150	0.0140
ARIMA		0.2190	0.2230	0.2210	0.2270	0.0605

Table 4. Robustness analysis of forecasting models with different imputation methods.

Model	MSE_bi-LSTM	MSE_Linear	MSE_Polynomial	MSE_Mean	MSE_KNN
f-LSTM	0.0003	0.0242	0.0048	0.0206	0.0209
RFR	0.0029	0.0010	0.0015	0.0027	0.0027
ARIMA	0.1939	0.1773	0.1881	0.5871	0.1939

Table 5. Performance metrics for different imputation methods.

Tank	Model	Missing %	MSE_ bi-LSTM	MSE_ Linear	MSE_ Polynomial	MSE_ Mean	MSE_ KNN
df10	f-LSTM	3.12	0.0030	0.0011	0.0012	0.0030	0.0028
df11	f-LSTM	2.61	0.0003	0.0294	0.0280	0.0450	0.0400
df31	f-LSTM	11.91	0.0001	0.0052	0.0050	0.0060	0.0055
df32	f-LSTM	14.77	0.0001	0.0681	0.0650	0.0500	0.0480

Table 6. bi-LSTM performance metrics for tank models after imputation.

Tank	MSE	RMSE	MAE	R²
df10	0.0043	0.0654	0.0525	0.9572
df11	0.0308	0.1755	0.1341	0.9484
df12	0.0045	0.0672	0.0505	0.9865
df13	0.0275	0.1657	0.1307	0.6972
df14	0.0019	0.0446	0.0361	0.8233
df15	0.0080	0.0896	0.0661	0.9761
df16	0.0042	0.0648	0.0453	0.9513
df17	0.0889	0.2983	0.2344	0.7064
df18	0.0437	0.2090	0.1682	0.7542
df19	0.0153	0.1240	0.0947	0.8693
df20	0.0286	0.1691	0.1363	0.9844
df21	0.0178	0.1335	0.1029	0.9078
df22	0.0215	0.1467	0.1183	0.8530
df23	0.0135	0.1160	0.0878	0.8908
df24	0.0079	0.0889	0.0672	0.9099
df25	0.0090	0.0949	0.0612	0.7836
df27	0.0142	0.1193	0.098	0.9751
df30	0.0038	0.0613	0.0337	0.8304
df31	0.0059	0.0767	0.0600	0.4848
df32	0.0672	0.2591	0.1941	0.2198
df61	0.0002	0.0140	0.0115	0.6189

Table 7. Model comparison forecasting performance in validation and test sets.

Tank	Model	Validation MSE	Validation MA	Validation RMSE	Validation R	Test MSE	Test MAE	Test RMSE	Test R²
df10	RFR	0.0029	0.0270	0.0545	0.9664	0.0030	0.0277	0.0543	0.9708
	f-LSTM	0.0003	0.0132	0.0182	0.9918	0.0026	0.0476	0.0511	0.9487
	ARIMA	0.0148	0.0895	0.1219	0.8431	0.0645	0.2070	0.2541	0.2681
	SARIMA	0.0163	0.0953	0.1279	0.8273	0.0675	0.2090	0.2598	0.2348
df11	RFR	0.0384	0.1411	0.1960	0.9347	0.0401	0.1505	0.2002	0.9336
	f-LSTM	0.0003	0.0126	0.0184	0.9953	0.0032	0.0441	0.0562	0.9562
	ARIMA	0.3196	0.4538	0.5654	0.4773	0.3368	0.4399	0.5804	−9.8773
	SARIMA	0.2234	0.3850	0.4726	0.6347	0.2785	0.3931	0.5276	−7.9918
df12	RFR	0.0193	0.0720	0.1391	0.9366	0.0379	0.0865	0.1947	0.8858
	f-LSTM	0.0004	0.0124	0.0202	0.9930	0.0061	0.0598	0.0783	0.9107
	ARIMA	0.0812	0.2177	0.2850	0.7473	0.0719	0.2308	0.2681	0.4800
	SARIMA	0.0825	0.2196	0.2873	0.7432	0.0719	0.2303	0.2681	0.4800
df13	RFR	0.0305	0.1371	0.1746	0.7065	0.0322	0.1428	0.1795	0.6506
	f-LSTM	0.0001	0.0078	0.0099	0.9980	0.0002	0.0126	0.0148	0.9954
	ARIMA	0.0696	0.2169	0.2639	0.2644	0.0513	0.1875	0.2264	−0.7329
	SARIMA	0.0703	0.2183	0.2651	0.2573	0.0512	0.1874	0.2263	−0.7309
df14	RFR	0.0388	0.1573	0.1971	−0.0470	0.0432	0.1685	0.2078	−0.0259
	f-LSTM	0.0001	0.0107	0.0122	0.9950	0.0003	0.0138	0.0166	0.9921
	ARIMA	0.0369	0.1591	0.1921	0.0523	0.0417	0.1599	0.2041	−0.1507
	SARIMA	0.0369	0.1592	0.1923	0.0508	0.0416	0.1599	0.2041	−0.1500
df15	RFR	0.0182	0.0675	0.1349	0.9299	0.0164	0.0695	0.1279	0.9519
	f-LSTM	0.0007	0.0255	0.0267	0.9767	0.0028	0.0506	0.0525	0.9199
	ARIMA	0.0875	0.2052	0.2958	0.7295	0.0331	0.1378	0.1821	0.7841
	SARIMA	0.0898	0.2092	0.2997	0.7224	0.0131	0.0953	0.1145	0.9145
df16	RFR	0.0059	0.0522	0.0771	0.9416	0.0053	0.0499	0.0725	0.9384
	f-LSTM	0.0004	0.0134	0.0209	0.9897	0.0026	0.0396	0.0511	0.9328
	ARIMA	0.0271	0.1302	0.1646	0.7195	0.0024	0.0379	0.0490	0.9419
	SARIMA	0.0273	0.1309	0.1653	0.7171	0.0022	0.0350	0.0465	0.9477
df17	RFR	0.1201	0.2668	0.3466	0.5798	0.1296	0.2839	0.3600	0.5638
	f-LSTM	0.0001	0.0101	0.0116	0.9915	0.0072	0.0799	0.0846	0.5687
	ARIMA	0.2525	0.4114	0.5025	0.1429	0.5554	0.6254	0.7452	0.1115
	SARIMA	0.2538	0.4123	0.5038	0.1386	0.5529	0.6242	0.7436	0.1155
df18	RFR	0.0299	0.1246	0.1730	0.8155	0.0333	0.1337	0.1824	0.7975
	f-LSTM	0.0007	0.0249	0.0264	0.9635	0.0043	0.0646	0.0654	0.7899
	ARIMA	0.1401	0.3067	0.3743	0.1261	0.1842	0.3624	0.4292	−0.8129
	SARIMA	0.1407	0.3075	0.3751	0.1222	0.1803	0.3584	0.4246	−0.7746
df19	RFR	0.0131	0.0829	0.1146	0.8869	0.0135	0.0839	0.1161	0.8856
	f-LSTM	0.0002	0.0128	0.0141	0.9955	0.0003	0.0141	0.0163	0.9942
	ARIMA	0.0460	0.1683	0.2145	0.6142	0.0155	0.1161	0.1244	0.8834
	SARIMA	0.0454	0.1659	0.2132	0.6190	0.0128	0.1035	0.1129	0.9038
df20	RFR	0.0245	0.1194	0.1566	0.9848	0.0325	0.1345	0.1801	0.9823
	f-LSTM	0.0002	0.0107	0.0140	0.9910	0.0008	0.0218	0.0279	0.9667
	ARIMA	0.15903	0.3085	0.3987	0.9140	0.1883	0.3275	0.4340	0.8845
	SARIMA	0.1587	0.3071	0.3983	0.9142	0.3703	0.4608	0.6085	0.7729
df21	RFR	0.0167	0.0945	0.1295	0.8660	0.0216	0.0956	0.1469	0.8948
	f-LSTM	0.0001	0.0089	0.0109	0.9929	0.0012	0.0321	0.0352	0.9634
	ARIMA	0.05336	0.1775	0.2309	0.7001	0.0085	0.0765	0.0921	0.8966
	SARIMA	0.0537	0.1781	0.2319	0.6977	0.0079	0.0732	0.0887	0.9040
df22	RFR	0.0243	0.1244	0.1559	0.7722	0.0257	0.1248	0.1602	0.8262
	f-LSTM	0.0001	0.0106	0.0116	0.9956	0.0022	0.0450	0.0474	0.9519
	ARIMA	0.06180	0.1957	0.2486	0.5494	0.0077	0.0816	0.0876	0.4526
	SARIMA	0.0618	0.1958	0.2486	0.5490	0.0077	0.0816	0.0876	0.4528
df23	RFR	0.0121	0.0847	0.1102	0.8028	0.0103	0.0689	0.1013	0.8856
	f-LSTM	0.0001	0.0125	0.0133	0.9938	0.0012	0.0334	0.0346	0.9740
	ARIMA	0.0476	0.1721	0.2182	0.5755	0.0061	0.0689	0.0778	0.9083
	SARIMA	0.0482	0.1731	0.2197	0.5698	0.0055	0.0649	0.0740	0.9171
df24	RFR	0.0121	0.0847	0.1102	0.8028	0.0103	0.0689	0.1013	0.8856
	f-LSTM	0.0001	0.0091	0.0105	0.9985	0.0085	0.0861	0.0922	0.7622
	ARIMA	0.0293	0.1305	0.1711	0.6346	0.0221	0.1062	0.1488	0.5467
	SARIMA	0.0291	0.1301	0.1706	0.6368	0.0222	0.1064	0.1490	0.5456
df25	RFR	0.0028	0.0312	0.0536	0.9352	0.0019	0.0259	0.0432	0.9551
	f-LSTM	0.0001	0.0080	0.0105	0.9985	0.0003	0.0145	0.0177	0.9958
	ARIMA	0.0434	0.1825	0.2084	−0.0375	0.0279	0.1392	0.1671	0.2666
	SARIMA	0.0434	0.1825	0.2084	−0.0374	0.0279	0.1392	0.1671	0.2666
df27	RFR	0.0071	0.0412	0.0844	0.9857	0.0259	0.0682	0.1610	0.9557
	f-LSTM	0.0002	0.0188	0.0150	0.9957	0.0011	0.0312	0.0331	0.9778
	ARIMA	0.0413	0.1624	0.2032	0.9270	0.0154	0.1042	0.1243	0.9750
	SARIMA	0.0409	0.1543	0.2023	0.9277	0.0155	0.1040	0.1245	0.9749
df30	RFR	0.0005	0.0063	0.0230	0.9242	0.0105	0.0494	0.1023	0.3928
	f-LSTM	0.0001	0.0101	0.0113	0.9923	0.0017	0.0371	0.0409	0.9683
	ARIMA	0.0034	0.0335	0.0585	0.7536	0.0016	0.0352	0.0397	−3.3642
	SARIMA	0.0034	0.0335	0.0585	0.7531	0.0016	0.0352	0.0397	−3.3630
df31	RFR	0.0052	0.0591	0.0727	−0.0385	0.0091	0.0770	0.0949	0.2378
	f-LSTM	0.0004	0.0188	0.0257	0.9897	0.0031	0.0530	0.0554	0.9444
	ARIMA	0.0148	0.0895	0.1219	0.8431	0.0645	0.2070	0.2541	0.2681
	SARIMA	0.0163	0.0954	0.1279	0.8273	0.0675	0.2090	0.2598	0.2348
df32	RFR	0.0465	0.1680	0.2158	0.1902	0.1118	0.2487	0.3344	−0.3321
	f-LSTM	0.0000	0.0059	0.0079	0.9989	0.0004	0.0171	0.0205	0.9945
	ARIMA	0.0740	0.2295	0.2720	0.0456	0.0651	0.2144	0.2552	0.0688
	SARIMA	0.0741	0.2297	0.2722	0.0441	0.0674	0.2185	0.2596	0.0363
df61	RFR	0.0090	0.0748	0.0948	0.2414	0.0091	0.0741	0.0955	0.1492
	f-LSTM	0.0000	0.0043	0.0058	0.9974	0.0001	0.0078	0.0095	0.9828
	ARIMA	0.0148	0.0895	0.1219	0.8431	0.0645	0.2070	0.2541	0.2681
	SARIMA	0.0163	0.09540	0.1279	0.8273	0.0675	0.2090	0.2598	0.2348

Table 8. Correlation coefficients and p-values across different models.

Model	Correlation Coefficient	p-Value
RFR	0.4711	0.0311
f-LSTM	0.2752	0.2272
ARIMA	0.7433	0.0001
SARIMA	0.7164	0.0003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Myllis, G.; Tsimpiris, A.; Vrana, V. Short-Term Water Demand Forecasting from Univariate Time Series of Water Reservoir Stations. Information 2024, 15, 605. https://doi.org/10.3390/info15100605

AMA Style

Myllis G, Tsimpiris A, Vrana V. Short-Term Water Demand Forecasting from Univariate Time Series of Water Reservoir Stations. Information. 2024; 15(10):605. https://doi.org/10.3390/info15100605

Chicago/Turabian Style

Myllis, Georgios, Alkiviadis Tsimpiris, and Vasiliki Vrana. 2024. "Short-Term Water Demand Forecasting from Univariate Time Series of Water Reservoir Stations" Information 15, no. 10: 605. https://doi.org/10.3390/info15100605

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Short-Term Water Demand Forecasting from Univariate Time Series of Water Reservoir Stations

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Methodology

2.3. Anomaly Detection

2.3.1. Outliers Detection

2.3.2. Collective Anomalies

2.4. Data Imputation

2.4.1. Data Resampling

2.4.2. Data Normalization

2.4.3. Data Filter

2.4.4. Sequences Creation

2.4.5. Data Reconstruction

2.4.6. Hypermeter Optimization

2.4.7. bi-LSTM

2.4.8. Piecewise Polynomial Method

2.5. Forecasting Models

2.5.1. Water Demand Prediction Method with Random Forest Regressor

2.5.2. Water Demand Prediction Method with f-LSTM

2.5.3. Water Demand Prediction Method with ARIMA

2.5.4. Water Demand Prediction Method with SARIMA

2.5.5. Conversion of Water Level Values to Water Volume Values

2.6. Model Performance Evaluation

2.7. Computational Complexity and Practical Implications

2.7.1. Initial Processing Phase

2.7.2. Weekly Update Processing

2.8. Data Availability

2.9. Ethical Approval

3. Results

3.1. Clustering Analysis of Reservoir Dynamics

3.2. Comparative Analysis of Imputation Methods

Impact of Water Level Dynamics on Imputation Accuracy

3.3. bi-LSTM Imputation Accuracy Analysis

3.3.1. Impact of Hyperparameters on bi-LSTM Imputation Accuracy

3.3.2. Cluster Analysis of Hyperparameters and Imputation Accuracy

3.4. Water Demand Models’ Forecasting Performance

Impact of Water Level Dynamics on Forecasting Accuracy

3.5. Error Analysis

3.5.1. Correlation Analysis

3.5.2. Residuals Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI