Imputation of Rainfall Data Using the Sine Cosine Function Fitting Neural Network

  1. Po Chan Chiu 1
  2. Ali Selamat 1
  3. Ondrej Krejcar 2
  4. King Kuok Kuok 3
  5. Enrique Herrera-Viedma 4
  6. Giuseppe Fenza 5
  1. 1 University of Technology Malaysia
    info

    University of Technology Malaysia

    Johor Bahru, Malasia

    ROR https://ror.org/026w31v75

  2. 2 University of Hradec Králové
    info

    University of Hradec Králové

    Hradec Králové, República Checa

    ROR https://ror.org/05k238v14

  3. 3 Swinburne University of Technology Sarawak Campus
    info

    Swinburne University of Technology Sarawak Campus

    Kuching, Malasia

    ROR https://ror.org/014cjmc76

  4. 4 Universidad de Granada
    info

    Universidad de Granada

    Granada, España

    ROR https://ror.org/04njjy449

  5. 5 University of Salerno
    info

    University of Salerno

    Fisciano, Italia

    ROR https://ror.org/0192m2k53

Revista:
IJIMAI

ISSN: 1989-1660

Año de publicación: 2021

Volumen: 6

Número: 7

Páginas: 39-48

Tipo: Artículo

DOI: 10.9781/IJIMAI.2021.08.013 DIALNET GOOGLE SCHOLAR lock_openDialnet editor

Otras publicaciones en: IJIMAI

Resumen

Missing rainfall data have reduced the quality of hydrological data analysis because they are the essential input for hydrological modeling. Much research has focused on rainfall data imputation. However, the compatibility of precipitation (rainfall) and non-precipitation (meteorology) as input data has received less attention. First, we propose a novel pre-processing mechanism for non-precipitation data by using principal component analysis (PCA). Before the imputation, PCA is used to extract the most relevant features from the meteorological data. The final output of the PCA is combined with the rainfall data from the nearest neighbor gauging stations and then used as the input to the neural network for missing data imputation. Second, a sine cosine algorithm is presented to optimize neural network for infilling the missing rainfall data. The proposed sine cosine function fitting neural network (SC-FITNET) was compared with the sine cosine feedforward neural network (SCFFNN), feedforward neural network (FFNN) and long short-term memory (LSTM) approaches. The results showed that the proposed SC-FITNET outperformed LSTM, SC-FFNN and FFNN imputation in terms of mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (R), with an average accuracy of 90.9%. This study revealed that as the percentage of missingness increased, the precision of the four imputation methods reduced. In addition, this study also revealed that PCA has potential in pre-processing meteorological data into an understandable format for the missing data imputation.

Referencias bibliográficas

  • P. Muñoz, J. Orellana-Alvear, P. Willems, and R. Célleri. “Flash-flood forecasting in an Andean mountain catchment—Development of a stepwise methodology based on the random forest algorithm,” Water, vol. 10, no. 11, 2018, pp. 1519.
  • S. Szewrański, J. Chruściński, J. Kazak, M. Świąder, K. TokarczykDorociak, and R. Żmuda, “Pluvial flood risk assessment tool (PFRA) for rainwater management and adaptation to climate change in newly urbanised areas,” Water, vol. 10, no. 4, 2018, pp. 386.
  • K.K. Kuok, S. Harun, S.M. Shamsuddin, and P.C. Chiu, “Evaluation of daily rainfall-runoff model using multilayer perceptron and particle swarm optimization feedforward neural networks,” Journal of Environmental Hydrology, vol. 18, no. 10, 2010, pp. 1-16.
  • N. Yang, B.H. Men, and C.K. Lin, “Impact analysis of climate change on water resources,” Procedia Engineering, vol. 24, 2011, pp. 643-648.
  • K.K. Kuok, S. Harun, and P.C. Chiu, “Hourly runoff forecast at different leadtime for a small watershed using artificial neural networks,” International Journal of Advances in Soft Computing and its Application, vol. 3, 2011, pp. 68-86.
  • R.A. Mcdonald, P.W. Thurston, and M.R. Nelson, “A Monte Carlo study of missing item methods,” Organizational Research Methods, vol. 3, no. 1, 2000, pp. 71-92.
  • P.E. McKnight, K.M. McKnight, S. Sidani, and A.J. Figueredo, “Missing data: A gentle introduction,” Guilford Press. 2007.
  • K.J. Lee and J.B. Carlin, “Multiple imputation for missing data: Fully conditional specification versus multivariate normal imputation,” American Journal of Epidemiology, vol. 171, no. 5, 2010, pp. 624-632.
  • Y. Gao, C. Merz, G. Lischeid, and M. Schneider, “A review on missing hydrological data processing,” Environmental earth sciences, vol. 77, no. 2, 2018, pp. 47.
  • S. Londhe, P. Dixit, S. Shah, and S. Narkhede, “Infilling of missing daily rainfall records using artificial neural network,” ISH Journal of Hydraulic Engineering, vol. 21, no. 3, 2015, pp. 255-264.
  • T. Canchala-Nastar, Y. Carvajal-Escobar, W. Alfonso-Morales, W.L. Cerón and E. Caicedo, “Estimation of missing data of monthly rainfall in southwestern Colombia using artificial neural networks, ” Data in brief, vol. 26, 2019, pp. 104517.
  • P.C. Chiu, A. Selamat, O. Krejcar, and K.K. Kuok, “Missing rainfall data estimation using artificial neural network and nearest neighbor imputation,” In Advancing Technology Industrialization Through Intelligent Software Methodologies, Tools and Techniques: Proceedings of the 18th International Conference on New Trends in Intelligent Software Methodologies, Tools and Techniques (SoMeT_19), IOS Press, vol. 318, 2019, pp. 132.
  • M.R. Mispan, N.F.A. Rahman, M.F. Ali, K. Khalid, M.H.A. Bakar and S.H. Haron, “Missing river discharge data imputation approach using artificial neural network,” Methodology, vol. 25, 2015, pp. 20.
  • P.C. Chiu, A. Selamat and O. Krejcar, “Infilling missing rainfall and runoff data for Sarawak, Malaysia using gaussian mixture model based K-nearest neighbor Imputation,” In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Springer, 2019, pp. 27-38.
  • W.Y. Lai, and K.K. Kuok, “A study on bayesian principal component analysis for addressing missing rainfall data,” Water Resources Management, 2019, pp.1-14.
  • S. Mirjalili, “SCA: A sine cosine algorithm for solving optimization problems,” Knowledge-Based Systems, vol. 96, 2016, pp. 120-133.
  • C. Qu, Z. Zeng, J. Dai, Z. Yi, and W. He, “A modified sine-cosine algorithm based on neighborhood search and greedy levy mutation,” Computational Intelligence and Neuroscience, 2018.
  • S. Das, A. Bhattacharya and A.K. Chakraborty, “Solution of short-term hydrothermal scheduling using sine cosine algorithm,” Soft Computing, vol. 22, no. 19, 2018, pp. 6409-6427.
  • S. Li, H. Fang, and X. Liu, “Parameter optimization of support vector regression based on sine cosine algorithm,” Expert Systems with Applications, vol. 91, 2018, pp. 63-77.
  • M.A. Tawhid, and P. Savsani, “Discrete sine-cosine algorithm (DSCA) with local search for solving traveling salesman problem,” Arabian Journal for Science and Engineering, 2018, pp. 1-11.
  • R.E. Chandler, V.S. Isham, N.A. Leith, P.J. Northrop, C.J. Onof, and H.S. Wheater, “Uncertainty in rainfall inputs,” World Scientific/Imperial College Press, 2011.
  • O. Stoner, and T. Economou, “An advanced hidden markov model for hourly rainfall time series,” arXiv:1906.03846. 2019.
  • T. Kashiwao, K. Nakayama, S. Ando, K. Ikeda, M. Lee and A. Bahadori, “A neural network-based local rainfall prediction system using meteorological data on the Internet: A case study using data from the Japan Meteorological Agency,” Applied Soft Computing, vol. 56, 2017, pp. 317-330.
  • M.H. Yen, D.W. Liu, Y.C. Hsin, C.E. Lin, and C.C. Chen, “Application of the deep learning for the prediction of rainfall in Southern Taiwan,” Scientific Reports, vol. 9, no. 1, 2019, pp. 1-9.
  • M. Chhetri, S. Kumar, P.P. Roy, and B.G. Kim, “Deep BLSTM-GRU model for monthly rainfall prediction: A case study of Simtokha, Bhutan,” Remote Sensing, vol. 12, no. 19, 2020, pp.3174.
  • S.K. Grange, and D.C. Carslaw, “Using meteorological normalisation to detect interventions in air quality time series,” Science of the Total Environment, vol. 653, 2019, pp.578-588.
  • L.I. Smith, “A tutorial on principal components analysis”, 2002. Accessed: Jan. 3, 2020. [online]. Available: http://www.cs.otago.ac.nz/cosc453/ student_tutorials/principal_components.pdf
  • C. Skittides, and W.G. Früh, “Wind forecasting using principal component analysis,” Renewable Energy, vol. 69, 2014, pp. 365-374.
  • M. Hubert, P.J. Rousseeuw, and W. Van den Bossche, “MacroPCA: An allin-one PCA method allowing for missing values as well as cellwise and rowwise outliers,” Technometrics, vol. 61, no.4, 2019, pp. 459-473.
  • Z. Zuśka, J. Kopcińska, E. Dacewicz, B. Skowera, J. Wojkowski, and A. Ziernicka–Wojtaszek, “Application of the principal component analysis (PCA) method to assess the impact of meteorological elements on concentrations of particulate matter (PM10): A case study of the mountain valley (the Sącz Basin, Poland),” Sustainability, vol. 11, no. 23, 2019, pp. 6740.
  • Y.Y Choi, H. Shon, Y.J. Byon, D.K. Kim, S. Kang, “Enhanced application of principal component analysis in machine learning for imputation of missing traffic data,” Applied Science, vol. 9, no. 10, 2019, pp. 2149.
  • B.S. Harish, and S.V.A Kumar. “Anomaly based intrusion detection using modified fuzzy clustering,” International Journal of Interactive Multimedia and Artificial Intelligence, vol 4, no. 6, 2017, pp. 54-59, doi: 10.9781/ ijimai.2017.05.002.
  • T. Kurita, “Principal component analysis (PCA),” In: Ikeuchi K. (eds) Computer Vision: A Reference Guide, Springer, 2014.
  • K. Pearson, “Principal components analysis,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 6, no.2, 1901, pp. 559.
  • H. Hotelling, “Analysis of a complex of statistical variables into principal components,” Journal of Educational Psychology, vol. 24, no. 6, 1933, pp. 417-441.
  • R. Khattree, and D.N. Naik “Multivariate data reduction and discrimination with SAS software,” SAS Institute, 2000.
  • G. Bebis, and M. Georgiopoulos, “Feedforward neural networks,” IEEE Potentials, vol. 13, no. 4, 1994, pp. 27-31.
  • S. Mirjalili, “How effective is the Grey Wolf optimizer in training multilayer perceptrons,” Applied Intelligence, vol. 43, no.1, 2015, pp.150-161.
  • S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computing, vol. 9, no. 8, 1997, pp. 1735-1780.
  • E. Mussumeci, and F.C. Coelho, “Large-scale multivariate forecasting models for Dengue-LSTM versus random forest regression,” Spatial and Spatio-temporal Epidemiology, vol. 35, 2020, pp. 100372.
  • S. Maya, and U. Ken, “DADIL: Data augmentation for domain-invariant learning,” Data Science and Pattern Recognition, vol. 4, no. 2, 2020, pp. 33–49.
  • J.C.W. Lin, Y. Shao, Y. Djenouri and U. Yun, “ASRNN: A recurrent neural network with an attention model for sequence labeling,” KnowledgeBased Systems, vol. 212, 2020, pp. 106548.
  • Hourly meteorological dataset for Kuching station: 2002 to 2003, Malaysian Meteorological Department, Selangor, Malaysia, October 2019.
  • Hourly rainfall datasets for Sungai Merang station and nearest neighbor stations: 2002 to 2003, Department of Irrigation and Drainage (DID), Sarawak, Malaysia, October 2019.
  • A.J. Henry, N.D. Hevelone, S. Lipsitz, and L.L. Nguyen, “Comparative methods for handling missing data in large databases,” Journal of Vascular Surgery, vol. 58, no. 5, 2013, pp. 1353-1359.
  • J.R. Cheema, “Some general guidelines for choosing missing data handling methods in educational research,” Journal of Modern Applied Statistical Methods, vol. 13, no. 2, 2014, pp. 3.
  • P. Zhu, Q. Xu, Q. Hu, C. Zhang, and H. Zhao, “Multi-label feature selection with missing labels,” Pattern Recognition, vol. 74, 2018, pp. 488-502.
  • H. Hassani, M. Kalantari, and Z. Ghodsi, “Evaluating the performance of multiple imputation methods for handling missing values in time series data: A study focused on East Africa, soil-carbonate-stable isotope data,” Stats, vol. 2, no. 4, 2019, pp. 457-467.
  • S. Oba, M.A. Sato, I. Takemasa, M. Monden, K.I. Matsubara, and S. Ishii, “A Bayesian missing value estimation method for gene expression profile data,” Bioinformatics, vol. 19, no.16, 2003, pp. 2088-2096.
  • R.J. Little, and D.B. Rubin, “Statistical analysis with missing data,” John Wiley & Sons, 2014.
  • M.K. Gill, T. Asefa, Y. Kaheil, and M. McKee, “Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique,” Water Resources Research, vol. 43, no.7, 2007.
  • J. H. Lee, and Jr.J. Huber, “Multiple imputation with large proportions of missing data: How much is too much?,” In United Kingdom Stata Users’ Group Meetings 2011 (No. 23). Stata Users Group, 2011.
  • Q. Shang, Z. Yang, S. Gao, and D. Tan, “An imputation method for missing traffic data based on FCM optimized by PSO-SVR,” Journal of Advanced Transportation, 2018.
  • T. Kim, W. Ko, and J. Kim, “Analysis and impact evaluation of missing data imputation in day-ahead PV generation forecasting,” Applied Sciences, vol. 9, no. 1, 2019, pp. 204.
  • O.F. Ayilara, L. Zhang, T.T. Sajobi, R. Sawatzky, E. Bohm, and L.M. Lix, “Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry,” Health and Quality of Life Outcomes, vol. 17, no. 1, 2019, pp. 106.
  • S. Bai, J.Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271, 2018.
  • A. Cecaj, M. Lippi, M. Mamei, and F. Zambonelli, “Comparing deep learning and statistical methods in forecasting crowd distribution from Aggregated Mobile Phone Data,” Applied Sciences, vol. 10, no. 18, 2020, pp. 6580.