Performance and Convergence Analysis of Modified C-Means Using Jeffreys-Divergence for Clustering

  1. Ayan Seal 1
  2. Aditya Karlekar 2
  3. Ondrej Krejcar 3
  4. Enrique Herrera-Viedma 4
  1. 1 PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005 (India) / Center for Basic and Applied Research, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, Hradec Kralove, 50003 (Czech Republic)
  2. 2 Hitkarini College of Engineering and Technology, Jabalpur, 482005 (India)
  3. 3 Center for Basic and Applied Research, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, Hradec Kralove, 50003 (Czech Republic) / Malaysia Japan International Institute of Technology, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur (Malaysia)
  4. 4 Department Computer Science and Artificial Intelligence, University of Granada, 18071 Granada (Spain) / Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, 21589 (Saudi Arabia)
Revista:
IJIMAI

ISSN: 1989-1660

Año de publicación: 2021

Volumen: 7

Número: 2

Páginas: 141-149

Tipo: Artículo

DOI: 10.9781/IJIMAI.2021.04.009 DIALNET GOOGLE SCHOLAR

Otras publicaciones en: IJIMAI

Resumen

The size of data that we generate every day across the globe is undoubtedly astonishing due to the growth of the Internet of Things. So, it is a common practice to unravel important hidden facts and understand the massive data using clustering techniques. However, non- linear relations, which are essentially unexplored when compared to linear correlations, are more widespread within data that is high throughput. Often, nonlinear links can model a large amount of data in a more precise fashion and highlight critical trends and patterns. Moreover, selecting an appropriate measure of similarity is a well-known issue since many years when it comes to data clustering. In this work, a non-Euclidean similarity measure is proposed, which relies on non-linear Jeffreys-divergence (JS). We subsequently develop c- means using the proposed JS (J-c-means). The various properties of the JS and J-c-means are discussed. All the analyses were carried out on a few real-life and synthetic databases. The obtained outcomes show that J-c-means outperforms some cutting-edge c-means algorithms empirically.

Información de financiación

This work is partially supported by the project “Prediction of diseases through computer assisted diagnosis system using images captured by minimally-invasive and non-invasive modalities”, Computer Science and Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur India (under ID: SPARC-MHRD-231). This work is also partially supported by the project Grant Agency of Excellence, Univer-sity of Hradec Kralove, Faculty of Informatics and Management, Czech Republic (under ID: UHK-FIM-GE-2204-2021).

Financiadores

Referencias bibliográficas

  • H. Wattimanela, U. Pasaribu, S. Indratno, A. Puspito, “Eartquakes clustering based on the magnitude and the depths in molluca province,” in AIP Conference Proceedings, vol. 1692, 2015, p. 020021, AIP Publishing.
  • J. Yang, J. Cao, R. He, L. Zhang, “A unified clustering approach for identifying functional zones in suburban and urban areas,” in IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WK- SHPS), April 2018, pp. 94–99.
  • T. Pitchayaviwat, “A study on clustering customer sugges tion on online social media about insurance services by using text mining techniques,” in 2016 Management and Innovation Technology International Conference (MITicon), Oct 2016, pp. MIT–148–MIT–151.
  • R. Suresh, I. Anand, B. Vianesh, H. R. Mohammed, “Study of clustering algorithms for library management system,” in 2018 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC), March 2018, pp. 221–224.
  • A. Naik, D. Reddy, P. K. Jana, “A novel clustering algorithm for biological data,” in 2011 Second International Conference on Emerging Applications of Information Technology, Feb 2011, pp. 249–252.
  • K. C. Gull, A. B. Angadi, C. G. Seema, S. G. Kanakaraddi, “A clustering technique to rise up the marketing tactics by looking out the key users taking facebook as a case study,” in 2014 IEEE International Advance Computing Conference (IACC), Feb 2014, pp. 579–585.
  • J. Li, A. Nehorai, “Gaussian mixture learning via adaptive hierarchical clustering,” Signal Processing, vol. 150, pp. 116–121, 2018.
  • R. Abe, S. Miyamoto, Y. Endo, Y. Hamasuna, “Hierarchical clustering algorithms with automatic estimation of the number of clusters,” in 17th World Congress of International Fuzzy Systems Association, 2017.
  • S. Ghassempour, F. Girosi, A. Maeder, “Clustering multivariate time series using hidden markov models,” International Journal of Environmental Research and Public Health, vol. 11, pp. 2741–2763, 2014.
  • M. Pacella, A. Grieco, M. Blaco, “On the use of self-organizing map for text clustering in engineering change process analysis: A case study,” Computational Intelligence and Neuroscience, p. 11, 2016.
  • V. Schellekens, L. Jacques, “Quantized compressive k-means,” IEEE Signal Processing Letters, vol. 25, no. 8, 2018.
  • K. K. Sharma, A. Seal, “Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance,” Expert Systems with Applications, p. 114326, 2020.
  • K. K. Sharma, A. Seal, “Outlier-robust multi-view clustering for uncertain data,” Knowledge-Based Systems, vol. 211, p. 106567, 2021.
  • K. K. Sharma, A. Seal, “Multi-view spectral clustering for uncertain objects,” Information Sciences, vol. 547, pp. 723–745, 2021.
  • L. Bottou, Y. Bengio, “Convergence properties of the k-means algorithms,” in Advances in neural information processing systems, 1995, pp. 585–592.
  • A. Karlekar, A. Seal, O. Krejcar, C. Gonzalo-Martin, “Fuzzy k-means using non-linear s-distance,” IEEE Access, vol. 7, pp. 55121–55131, 2019.
  • A. Banerjee, S. Merugu, I. S. Dhillon, J. Ghosh, “Clustering with bregman divergences,” Journal of machine learning research, vol. 6, no. Oct, pp. 1705–1749, 2005.
  • S. Chakraborty, S. Das, “k- means clustering with a new divergencebased distance metric: Convergence and performance analysis,” Pattern Recognition Letters, vol. 100, pp. 67–73, 2017.
  • L. Legrand, E. Grivel, “Jeffrey’s divergence between moving-average models that are real or complex, noise-free or disturbed by additive white noises,” Signal Processing, vol. 131, pp. 350–363, 2017.
  • K. K. Sharma, A. Seal, “Modeling uncertain data using monte carlo integration method for clustering,” Expert systems with applications, vol. 137, pp. 100–116, 2019.
  • A. Seal, A. Karlekar, O. Krejcar, C. Gonzalo-Martin, “Fuzzy c- means clustering using jeffreys-divergence based similarity measure,” Applied Soft Computing, vol. 88, p. 106016, 2020.
  • F. Nielsen, R. Nock, “Total jensen divergences: Definition, properties and clustering,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, pp. 2016–2020.
  • F. Nielsen, R. Nock, S. I. Amari, “On clustering histograms with k-means by using mixed -divergences,” Entropy, vol. 16, 2014.
  • R. Nock, F. Nielsen, S.-I. Amari, “On conformal divergences and their population minimizers,” IEEE Transactions on Information Theory, vol. 62, 2016.
  • M. D. Gupta, S. Srinivasa, J. Madhukara, M. Antony, “Kl divergence based agglomerative clustering for automated vitiligo grading,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2700–2709.
  • A. Notsu, O. Komori, S. Eguchi, “Spontaneous clustering via minimum gamma-divergence,” Neural Computation, vol. 26, 2014.
  • K. K. Sharma, A. Seal, “Clustering analysis using an adaptive fused distance,” Engineering Applications of Artificial Intelli- gence, vol. 96, p. 103928, 2020.
  • A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recognition Letters, vol. 31, no. 8, 2010.
  • L. Legrand, E. Grivel, “Jeffrey’s divergence between moving-average models that are real or complex, noise-free or disturbed by additive white noises,” Signal Processing, vol. 131, 2017.
  • J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera, “Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of MultipleValued Logic and Soft Computing, vol. 17, 2011.
  • D. Dheeru, E. Karra Taniskidou, “Uci machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml.
  • U. Maulik, S. Bandyopadhyay, “Performance evaluation of some clustering algorithms and validity indices,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 12, 2002.
  • N. X. Vinh, J. Epps, J. Bailey, “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance,” Journal of Machine Learning Research, vol. 11, no. Oct, 2010.
  • L. Hubert, P. Arabie, “Comparing partitions,” Journal of classification, vol. 2, no. 1, 1985.
  • P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987.
  • J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probabil- ity, vol. 1, 1967, pp. 281–297, Oakland, CA, USA.
  • S. Chakraborty, S. Das, “k- means clustering with a new divergencebased distance metric: Convergence and performance analysis,” Pattern Recognition Letters, vol. 100, pp. 67–73, 2017.
  • L. Hubert, P. Arabie, “Comparing partitions,” Journal of classification, vol. 2, no. 1, pp. 193–218, 1985.