Finding an Accurate Early Forecasting Model from Small DatasetA Case of 2019-nCoV Novel Coronavirus Outbreak

  1. Rubén González-Crespo
  2. Enrique Herrera-Viedma
  3. Nilanjan Dey
  4. Simon James Fong
  5. Gloria Li
Revista:
IJIMAI

ISSN: 1989-1660

Año de publicación: 2020

Volumen: 6

Número: 1

Páginas: 132-140

Tipo: Artículo

DOI: 10.9781/IJIMAI.2020.02.002 DIALNET GOOGLE SCHOLAR lock_openDialnet editor

Otras publicaciones en: IJIMAI

Resumen

Epidemic is a rapid and wide spread of infectious disease threatening many lives and economy damages. It is important to fore-tell the epidemic lifetime so to decide on timely and remedic actions. These measures include closing borders, schools, suspending community services and commuters. Resuming such curfews depends on the momentum of the outbreak and its rate of decay. Being able to accurately forecast the fate of an epidemic is an extremely important but difficult task. Due to limited knowledge of the novel disease, the high uncertainty involved and the complex societal-political factors that influence the widespread of the new virus, any forecast is anything but reliable. Another factor is the insufficient amount of available data. Data samples are often scarce when an epidemic just started. With only few training samples on hand, finding a forecasting model which offers forecast at the best efforts is a big challenge in machine learning. In the past, three popular methods have been proposed, they include 1) augmenting the existing little data, 2) using a panel selection to pick the best forecasting model from several models, and 3) fine-tuning the parameters of an individual forecasting model for the highest possible accuracy. In this paper, a methodology that embraces these three virtues of data mining from a small dataset is proposed. An experiment that is based on the recent coronavirus outbreak originated from Wuhan is conducted by applying this methodology. It is shown that an optimized forecasting model that is constructed from a new algorithm, namely polynomial neural network with corrective feedback (PNN+cf) is able to make a forecast that has relatively the lowest prediction error. The results showcase that the newly proposed methodology and PNN+cf are useful in generating acceptable forecast upon the critical time of disease outbreak when the samples are far from abundant.

Referencias bibliográficas

  • “WHO | Novel Coronavirus – China”. WHO. Archived from the original on 23 January 2020. Retrieved 1 February 2020.
  • Cohen, Jon (January 2020). “Wuhan seafood market may not be source of novel virus spreading globally”. Science. doi:10.1126/science.abb0611. ISSN 0036-8075.
  • “Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV)”. World Health Organization (WHO). 30 January 2020. Archived from the original on 31 January 2020. Retrieved 30 January 2020.
  • Sparrow, Annie. “How China’s Coronavirus Is Spreading—and How to Stop It”. Foreign Policy. Archived from the original on 31 January 2020. Retrieved 2 February 2020.
  • Croda, R. M. C., D. E. G. Romero, and S. O. C. Morales, “Sales Prediction through Neural Networks for a Small Dataset”, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 4, pp. 35-41, 03/2019.
  • R. J. Hyndman, and A. V. Kostenko, “Minimum sample size requirements for seasonal forecasting models,” Foresight, vol. 6, pp. 12-15, 2007.
  • S. Ingrassia, and I. Morlini, “Neural network modeling for small datasets,” Technometrics, vol.47, no. 3, pp. 297-311, 2005.
  • A. Pasini, “Artificial neural networks for small dataset analysis.” Journal of thoracic disease, vol. 7, no. 5, pp- 953, 2015.
  • M. A. Lateh, A. K. Muda, Z. I. M. Yusof, N. A. Muda and M. S. Azmi, “Handling a Small Dataset Problem in Prediction Model by employ Artificial Data Generation Approach: A Review”, Journal of Physics: Conference Series, Volume 892, Conf. Ser. 892 012016.
  • R. Andonie, “Extreme data mining: Inference from small datasets,” Int. J. Comput. Commun. Control, vol. 5, no. 3, pp. 280–291, 2010.
  • T. Shaikhina, N. A.Khovanova, “Handling limited datasets with neural networks in medical applications: A small-data approach”, Artificial Intelligence in Medicine, vol. 75, pp. 51-63, 2017.
  • J. F. Slifker and S. S. Shapiro, “The Johnson system: selection and parameter estimation,” Technometrics, vol. 22, no. 2, pp. 239–246, 1980
  • R. Adhikari, and R. K. Agrawal, “An introductory study on time series modeling and forecasting,” arXiv preprint arXiv:1302.6613, 2013.
  • A. Singh, and G. C. Mishra, “Application of Box-Jenkins method and Artificial Neural Network procedure for time series forecasting of prices,” Statistics in Transition new series, vol. 1, no. 16, pp. 83-96, 2015.
  • Arun, V., M. Krishna, B. V. Arunkumar, S. K. Padma, and V. Shyam, “Exploratory Boosted Feature Selection and Neural Network Framework for Depression Classification”, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 3, pp. 61-71, 12/2018
  • A. G. Ivakhnenko (1970) “Heuristic Self-Organization in Problems of Engineering Cybernetics”. Automatica Vol. 6, pp.207–219.
  • A. G. Ivakhnenko, and A. A. Zholnarskiy, (1992) “Estimating the coefficients of polynomials in parametric GMDH algorithms by the improved instrumental variables method”, Journal of Automation and Information Sciences c/c of Avtomatika, Vol. 25, no. 3, pp.25-32.
  • S. K. Oh, W. Pedrycz, B. J. Park, “Polynomial neural networks architecture: analysis and design”, Computers & Electrical Engineering, Vol. 29, No. 6, August 2003, pp. 703-725.
  • A. Andoni, R. Panigrahy, G. Valiant, L. Zhang, “Learning Polynomials with Neural Networks”, Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32.
  • S. Fong, N. N. Zhou, R. K. Wong, X. S. Yang: Rare Events Forecasting Using a Residual-Feedback GMDH Neural Network. ICDM Workshops 2012: 464-473.
  • N. Dey, S. Fong, W. Song, K. Cho (2018) Forecasting Energy Consumption from Smart Home Sensor Network by Deep Learning. In: Deshpande A. et al. (eds) Smart Trends in Information Technology and Computer Communications. SmartCom 2017. Communications in Computer and Information Science, vol 876. Springer.
  • W. Waheeb, and R. Ghazali, Forecasting the Behavior of Gas Furnace Multivariate Time Series Using Ridge Polynomial Based Neural Network Models, International Journal of Interactive Multimedia and Artificial Intelligence, ISSN 1989-1660, vol. 5, no. 5, 06/2019, pp.126-133.
  • S. K. Shevade, S. S. Keerthi, C. Bhattacharyya, K.R.K. Murthy: Improvements to the SMO Algorithm for SVM Regression. In: IEEE Transactions on Neural Networks, 1999.
  • J. Su, H. Zhang, A fast decision tree learning algorithm, AAAI’06: Proceedings of the 21st national conference on Artificial intelligence – Vol. 1, July 2006, pp.500–505.
  • Y. Wang, I. H. Witten: Induction of model trees for predicting continuous classes. In: Poster papers of the 9th European Conference on Machine Learning, 1997.