Optimasi Algoritma K-Means Clustering dengan Parallel Processing menggunakan Framework R

Mastura Diana Marieska, Suci Lestari, Calvin Mahendra, Nabila Rizky Oktadini, Muhammad Ali Buchari

Abstract


Parallel processing sering digunakan untuk melakukan optimasi execution time terhadap algoritma data mining. Pada penelitian ini, parallel processing digunakan untuk melakukan optimasi pada algoritma clustering K-Means. Implementasi algoritma K-means dilakukan dengan memanfaatkan package yang tersedia pada framework R. Algoritma K-Means dijalankan secara serial dan parallel. Untuk mendapatkan persentase optimasi, maka dilakukan perbandingan antara execution time pada parallel processing dan execution time pada serial processing. Penelitian ini menggunakan dataset Boston Housing yang umum digunakan pada data mining. Skenario pengujian dibedakan berdasarkan jumlah core dan jumlah centroid. Hasil pengujian menunjukkan bahwa parallel processing untuk tiap skenario memiliki execution time yang lebih kecil daripada serial processing. Optimasi yang dihasilkan cukup signifikan, yakni bernilai 20% hingga 52%. Optimasi tertinggi didapatkan pada jumlah core terbanyak dan jumlah centroid terbesar.


Keywords


Parallel Processing; Clustering; K-means; Framework R; Execution Time

Full Text:

PDF

References


B. Parhami, “Parallel Processing with Big Data,” Encycl. Big Data Technol., pp. 1–7, 2018.

R. Harris, “An Introduction to R,” Quant. Geogr. Basics, vol. 2, pp. 250–286, 2018.

P. Pacheco, An Introduction to Parallel Programming. 2011.

G. Xie et al., “Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems,” IEEE Trans. Serv. Comput., vol. 13, no. 5, pp. 871–886, 2020.

P. Kasap and B. Şeyda, “The Review of Attributes Influencing Housing Prices using Data Mining Methods,” vol. 4531, pp. 155–165.

A. Ali, “The Boston Housing Dataset Analysis University of Cumberlands,” no. September, 2020.

M. Hittmeir, A. Ekelhart, and R. Mayer, “Utility and Privacy Assessments of Synthetic Data for Regression Tasks,” Proc. - 2019 IEEE Int. Conf. Big Data, Big Data 2019, pp. 5763–5772, 2019.

N. Yusliani, R. Primartha, and M. Diana, “Multiprocessing Stemming: A Case Study of Indonesian Stemming,” Int. J. Comput. Appl., vol. 182, no. 40, pp. 15–19, 2019.

C. Ledur, D. Griebler, I. Manssour, and L. G. Fernandes, “A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support,” Proc. - Int. Comput. Softw. Appl. Conf., vol. 1, pp. 298–304, 2017.

M. D. Marieska, M. R. P. Sufa, A. Widianto, N. Yusliani, and R. I. Heroza, “Performance Analysis of Parallel Processing on GPU for Simple Mathematical Computations,” vol. 172, no. Siconian 2019, pp. 294–299, 2020.

E. Mahdi, “A Survey of R Software for Parallel Computing,” Am. J. Appl. Math. Stat., vol. 2, no. 4, pp. 224–230, 2014.

L. S. Riza, M. A. Ashari, and R. Megasari, “The Implementation of Gradient Descent Based Methods Using Parallel Computing in R for Regression Tasks,” Proceeding - 2018 Int. Symp. Adv. Intell. Informatics Revolutionize Intell. Informatics Spectr. Humanit. SAIN 2018, pp. 37–42, 2019.

M. Jones, “No Title,” 2017. .

V. M. Steen and A. S. Tanenbaum, Distributed Systems, 3rd ed. 2017.

M. Kalra, N. Lal, and S. Qamar, “K-Mean Clustering Algorithm Approach for Data Mining of Heterogeneous Data,” Lect. Notes Networks Syst., vol. 10, no. January, pp. 61–70, 2018.

Yudi Agusta, “K-Means – Penerapan, Permasalahan dan Metode Terkait,” J. Sist. dan Inform., vol. 3, no. Februari, pp. 47–60, 2007.

H. Crc, Data Mining with R-Kumar. .

R. September, “Package ‘ parallel ,’” pp. 1–14, 2020.

A. Brian, B. Venables, D. M. Bates, D. Firth, and M. B. Ripley, “Package ‘ MASS ,’” 2020.




DOI: http://dx.doi.org/10.26418/jp.v7i1.43400

Refbacks

  • There are currently no refbacks.