Reduksi Dimensi Data menggunakan Metode Wrapper Sequential Feature Selection untuk Peningkatan Performa Algoritma Naïve Bayes terhadap Dataset Medis

Mochammad Yusa, Funny Farady Coastera, Muhammad Randa Yandika

Abstract


Penggunaan Machine Learning sebagai alat bantu dalam penanganan medis saat ini berkembang dengan pesat. Salah satu penyakit medis yang dikembangkan menggukan algoritma komputasi adalah Cardiovascullar Disease (CVD). Machine learning model yang diterapkan didasarkan dataset rekam medis. Tujuan penelitian ini adalah menginvestigasi performa algoritma naïve bayes dengan menerpakan metode Wrapper Sequential Feature Selection (WSFS). Metode penelitian dimulai dari pengumpulan dataset, data preprocessing, penerapan model Naïve Bayes, dan atribut scoring menggunakan Wrapper SFS, dan validasi performa menggunakan uji validasi 10-Fold Cross-Validation. Data history yang digunakan yaitu dataset Heart Failure Clinical Records yang terdiri dari 299 instances pada 13 features. Hasil penelitian menunjukkan bahwa metode Wrapper SFS dapat mengimprovisasi nilai performa Algoritma Naïve Bayes dari nilai akurasi, Precisi, dan Recall. Adapun kenaikan performa didapatkan dengan kombinasi 6 fitur ('anaemia', 'diabetes', 'ejection_fraction', 'serum_creatinine', 'gender', 'time') yang didapatkan dari seleksi fitur WSFS terhadap Algoritma tersebut yaitu nilai akurasi meningkat sebanyak 6,334%, skor recall meningkat 11,333%, dan nilai precision meningkat sebesar 20,07% dibandingkan dengan Algoritma Naïve Bayes.   


Keywords


Penyakit Kardiovaskuler; Machine Learning; Wrapper Sequential Feature Selection; Naïve Bayes

Full Text:

PDF

References


S. M. Gorade, A. Deo, and P. Purohit, “A Study Some Data Mining Classification Techniques,” Int. J. Mod. Trends Eng. Res., vol. 4, no. 1, pp. 210–215, 2017, doi: 10.21884/ijmter.2017.4031.zt9tv.

P. Golpour et al., “Comparison of support vector machine, naïve bayes and logistic regression for assessing the necessity for coronary angiography,” Int. J. Environ. Res. Public Health, vol. 17, no. 18, pp. 1–9, 2020, doi: 10.3390/ijerph17186449.

M. Yusa and E. Utami, “Classifiers evaluation: Comparison of performance classifiers based on tuples amount,” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 2017, vol. 4, doi: 10.11591/eecsi.4.1086.

P. R. Anukrishna and V. Paul, “A review on feature selection for high dimensional data,” Proc. Int. Conf. Inven. Syst. Control. ICISC 2017, vol. 5, no. 6, pp. 395–402, 2017, doi: 10.1109/ICISC.2017.8068746.

O. Saini and S. Sharma, “A Review on Dimension Reduction Techniques in Data Mining,” Comput. Eng. Intell. Syst., vol. 9, no. 1, pp. 7–14, 2018.

R. Aziz, C. K. Verma, and N. Srivastava, “Dimension reduction methods for microarray data: a review,” AIMS Bioeng., vol. 4, no. 2, pp. 179–197, 2017, doi: 10.3934/bioeng.2017.2.179.

G. Chao, Y. Luo, and W. Ding, “Recent Advances in Supervised Dimension Reduction: A Survey,” Mach. Learn. Knowl. Extr., vol. 1, no. 1, pp. 341–358, 2019, doi: 10.3390/make1010020.

G. Kicska and A. Kiss, “Comparing swarm intelligence algorithms for dimension reduction in machine learning,” Big Data Cogn. Comput., vol. 5, no. 3, 2021, doi: 10.3390/bdcc5030036.

V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos, “A review of feature selection methods on synthetic data,” Knowl. Inf. Syst., vol. 34, no. 3, pp. 483–519, 2013, doi: 10.1007/s10115-012-0487-8.

Y. B. Wah, N. Ibrahim, H. A. Hamid, S. Abdul-Rahman, and S. Fong, “Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy,” Pertanika J. Sci. Technol., vol. 26, no. 1, pp. 329–340, 2018.

D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Med. Inform. Decis. Mak., vol. 20, no. 1, pp. 1–16, 2020, doi: 10.1186/s12911-020-1023-5.

D. Shah, S. Patel, and S. K. Bharti, “Heart Disease Prediction using Machine Learning Techniques,” SN Comput. Sci., vol. 1, no. 6, pp. 345–351, 2020, doi: 10.1109/ICDABI53623.2021.9655783.

D. W. Nugraha, A. Y. E. Dodu, and N. Chandra, “Klasifikasi Penyakit Stroke Menggunakan Metode Naive Bayes Classifier (Studi Kasus Pada Rumah Sakit Umum Daerah Undata Palu),” semanTIK, vol. 3, no. 2, pp. 13–22, 2017.

D. Derisma, “Perbandingan Kinerja Algoritma untuk Prediksi Penyakit Jantung dengan Teknik Data Mining,” J. Appl. Informatics Comput., vol. 4, no. 1, pp. 84–88, 2020, doi: 10.30871/jaic.v4i1.2152.

H. Hairani and M. Innuddin, “Kombinasi Metode Correlated Naive Bayes dan Metode Seleksi Fitur Wrapper untuk Klasifikasi Data Kesehatan,” J. Tek. Elektro, vol. 11, no. 2, pp. 50–55, 2020, doi: 10.15294/jte.v11i2.23693.

D. P. Utomo, P. Sirait, and R. Yunis, “Reduksi Atribut Pada Dataset Penyakit Jantung dan Klasifikasi Menggunakan Algoritma C5. 0,” J. Media Inform. Budidarma, vol. 4, no. 4, pp. 994–1006, 2020, doi: 10.30865/mib.v4i4.2355.

M. Abukmeil, S. Ferrari, A. Genovese, V. Piuri, and F. Scotti, “A Survey of Unsupervised Generative Models for Exploratory Data Analysis and Representation Learning,” ACM Comput. Surv., vol. 54, no. 5, 2021, doi: 10.1145/3450963.

S. K. Dey, M. M. Rahman, U. R. Siddiqi, and A. Howlader, “Analyzing the epidemiological outbreak of COVID-19: A visual exploratory data analysis approach,” J. Med. Virol., vol. 92, no. 6, pp. 632–638, 2020, doi: 10.1002/jmv.25743.




DOI: https://doi.org/10.26418/jp.v8i2.54328

Refbacks

  • There are currently no refbacks.