Comparative Study between Parallel K-Means and Parallel K-Medoids with Message Passing Interface (MPI)

Fhira Nhita

Abstract


Data mining is a combination technology for analyze a useful information from dataset using some technique such as classification, clustering, and etc. Clustering is one of the most used data mining technique these day. K-Means and K-Medoids is one of clustering algorithms that mostly used because it’s easy implementation, efficient, and also present good results. Besides mining important information, the needs of time spent when mining data is also a concern in today era considering the real world applications produce huge volume of data. This research analyzed the result from K-Means and K-Medoids algorithm and time performance using High Performance Computing (HPC) Cluster to parallelize K-Means and K-Medoids algorithms and using Message Passing Interface (MPI) library. The results shown that K-Means algorithm gives smaller SSE than K-Medoids. And also parallel algorithm that used MPI gives faster computation time than sequential algorithm.


Full Text:

PDF

References


Jing,Zhang., Gongqing, Wu., Xuegang, Hu., Shiying, Li., Shuilong, Hao. (2011) A Parallel K-Means Clustering Algorithm with MPI. International Symposium on Parallel Architectures, Algorithms and Programming, 2011 IEEE

Tan, Pang-Ning., Steinbach,Michael., Kumar,Vipin.(2006) Introduction to Data Mining.

Jiawei, Han., Kamber, Micheline.(2001) Data Mining Concepts and Technique.

F. Lusk, N. Doss, A. Skjellum. (1996) A High-Performance, Portable Implementation of the MPI Message Passing Interface. Parallel Computing. vol.22, pp 789-828.

Ahmad Firdaus Ahmad Fadzil, Noor Elaiza Abdul Khalid, Mazani Manaf. (2011) Scaling Perormance of Task-Intensive Applications via Mapreduce Parallel Processing. Faculty of Computer anda Mathematical Science, UiTM Shah Alam, Selangor, Malaysia.

C. Blake, E. Ceogh, C. Merz. (1996) UCI Repository of Machine learning databases. Irvine: Departement of Information and Computer Science, University of California.

KentRidge Biomedical Dataset Repository.Retrieved 13 August 2014, from http://datam.i2r.a-star.edu.sg/datasets/krbd/

S Singh, Shalini,. N. C, Chauhan.: K-means v/s K-medoids.( 1996) A Comparative Study. National Conference on Recent Trends in Engineering & Technology.

T. Soni Madhulatha: Comparison Between K-Means and K-Medoids Clustering. (2011) International Journal of Advanced Computing (IJAC) Vol 3

Hesam T. Dasthi., Tiago Simas., Rita A. Ribein., Amir Assadi,. And Andre Moitinho. (2010) MK-Means – Modified K-Means clustering algorithm.WCCI 2010 IEEE World Congress on Computational Intelegence. CCIB Barcelona, Spain.

Vilasaki, N. Karthikeyani., K. Thangavel. (2009) Impact of Normalization in Distributed K-Means Clustering. International Journal of Soft Computing.




DOI: http://dx.doi.org/10.21108/IJOICT.2016.22.86

Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 Fhira Nhita

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.