Prediksi Penyakit Jantung Menggunakan Metode-Metode Machine Learning Berbasis Ensemble – Weighted Vote

Apriyanto Alhamad, Azminuddin I. S. Azis, Budy Santoso, Sunarto Taliki

Abstract


Kematian yang disebabkan penyakit jantung masih sangat tinggi, sehingga perlu peningkatan upaya-upaya pencegahannya, misalnya dengan meningkatkan capaian model prediksinya. Penerapan metode-metode machine learning pada dataset publik (Cleveland, Hungary, Switzerland, VA Long Beach, & Statlog) yang umumnya digunakan oleh para peneliti untuk prediksi penyakit jantung, termasuk pengembangan alat bantunya, masih belum menangani missing value, noisy data, unbalanced class, dan bahkan data validation secara efisien. Oleh karena itu, pendekatan imputasi mean/mode diusulkan untuk menangani missing value replacement, Min-Max Normalization untuk menangani smoothing noisy data, K-Fold Cross Validation untuk menangani data validation, dan pendekatan ensemble menggunakan metode Weighted Vote (WV) yang dapat menyatukan kinerja tiap-tiap metode machine learning untuk mengambil keputusan klasifikasi sekaligus untuk mereduksi unbalanced class. Hasil penelitian ini menunjukkan bahwa metode yang diusulkan tersebut memberikan akurasi sebesar 85,21%, sehingga mampu meningkatkan kinerja akurasi metode-metode machine learning, selisih 7,14% dengan Artificial Neural Network, 2,77% dengan Support Vector Machine, 0,34% dengan C4.5, 2,94% dengan Naïve Bayes, dan 3,95% dengan k-Nearest Neighbor.


Keywords


machine learning; weighted vote; ensemble; prediksi penyakit jantung; unbalanced class

Full Text:

PDF

References


Kementerian Kesehatan Republik Indonesia, “Penyakit Jantung Penyebab Kematian Tertinggi, Kemenkes Ingatkan CERDIK,†2017. [Online]. Available: http://www.depkes.go.id/article/view/17073100005/penyakit-jantung-penyebab-kematian-tertinggi-kemenkes-ingatkan-cerdik-.html. [Accessed: 19-Aug-2018].

PT. Jawa Pos Grup Multimedia Redaksi, “Sepertiga Kematian di Dunia Dipicu Penyakit Jantung, Angkanya Segini,†2018. [Online]. Available: https://www.jawapos.com/kesehatan/29/09/2017/sepertiga-kematian-di-dunia-dipicu-penyakit-jantung-angkanya-segini. [Accessed: 19-Aug-2018].

katadata, “Penyakit Kardiovaskular, Penyebab Kematian Terbanyak di Dunia,†2018. [Online]. Available: https://databoks.katadata.co.id/datapublish/2018/03/13/penyakit-kardiovaskular-penyebab-kematian-terbesar-di-dunia. [Accessed: 30-Aug-2018].

S. H. Ishtake and S. A. Sanap, “Intelligent Heart Disease Prediction System Using Data Mining Techniques,†Int. J. Healthc. Biomed. Res., vol. 1, no. 3, pp. 94–101, 2013.

M. Viceconti, P. Hunter, and R. Hose, “Big data, big knowledge: big data for personalized healthcare,†in IEEE Journal of Biomedical and Health Informatics, 2015, vol. 19, no. 4, pp. 1209–1215.

University of California Irvine Machine Learning Repository, “Heart Disease Dataset.†[Online]. Available: https://archive.ics.uci.edu/ml/datasets/heart+Disease.

S. Bashir, U. Qamar, and F. H. Khan, “Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble,†Qual. Quant., vol. 49, no. 5, pp. 2061–2076, 2015.

I. Fakhruzi, “An Artificial Neural Network with Bagging to Address Imbalance Datasets on Clinical Prediction,†in 2018 International Conference on Information and Communications Technology (ICOIACT), 2018, no. 1, pp. 895–898.

P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, “Predicting Breast Cancer Recurrence Using Machine Learning Tehniques: A Systematic Review,†ACM Comput. Surv., vol. 49, no. 3, pp. 52:1-52:40, 2016.

M. M. Suarez-Alvarez, D.-T. Pham, M. Y. Prostov, and Y. I. Prostov, “Statistical approach to normalization of feature vectors and clustering of mixed datasets,†in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2012, vol. 468, no. 2145, pp. 2630–2651.

M. A. Jabbar and S. Shirina, “Heart disease prediction system based on hidden naïve bayes classifier,†in 2016 International Conference on Circuits, Controls, Communications and Computing (I4C), 2016.

S. Palaniappan and R. Awang, “Intelligent Heart Disease Prediction System using Data Mining Techniques,†Int. J. Comput. Sci. Netw. Secur., vol. 8, no. 8, pp. 343–350, 2008.

M. Anbarasi, E. Anupriya, and N. C. S. N. Iyengar, “Enhanced Prediction of Heart Disease with Feature Subset Selection using Genetic Algorithm,†Int. J. Eng. Sci. Technol., vol. 2, no. 10, pp. 5370–5376, 2010.

J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction,†Int. J. Comput. Appl., vol. 17, no. 8, pp. 43–48, 2011.

C. S. Dangare and S. S. Apte, “Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques,†Int. J. Comput. Appl., vol. 47, no. 10, pp. 44–48, 2012.

S. A. Pattekari and A. Parveen, “Prediction System for Heart Disease using Naive Bayes,†Int. J. Adv. Comput. Math. Sci., vol. 3, no. 3, pp. 290–294, 2012.

E. O. Olaniyi, O. K. Oyedotun, and A. Helwan, “Neural Network Diagnosis of Heart Disease,†in International Conference on Advances in Biomedical Engineering (ICABME), 2015, pp. 21–24.

J. Patel, T. Upadhyay, and S. Patel, “Heart Disease Prediction Using Machine Learning and Data Mining Technique,†Comput. Sci. Electron. Journals, vol. 7, pp. 129–137, 2016.

J. Singh, A. Kamra, and H. Singh, “Prediction of Heart Diseases Using Associative Classification,†in International Conference on Wireless Networks and Embedded Systems (WECON), 2016.

R. El Bialy, M. A. Salama, and O. Karam, “An ensemble model for Heart disease data sets: a generalized model,†in Proceedings of the 10th International Conference on Informatics and Systems - INFOS ’16, 2016, pp. 191–196.

Purushottam, K. Saxena, and R. Sharma, “Efficient Heart Disease Prediction System,†in Procedia Computer Science, 2016, vol. 85, pp. 962–969.

S. Ekiz and P. Erdogmus, “Comparative Study of Heart Disease Classification,†in Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), 2017.

D. Kinge and S. K. Gaikwad, “Survey on Data Mining Techniques for Disease Prediction,†Int. Res. J. Eng. Technol., vol. 5, no. 1, pp. 630–636, 2018.

G. Manikandan, A. Vasudev, and A. Balasubramanian, “A Survey to Identify an Efficient Classification Algorithm for Heart Disease Prediction,†Int. J. Pure Appl. Math., vol. 119, no. 12, pp. 13337–13345, 2018.

R. A. Kurian and K. S. Lakshmi, “An Ensemble Classifier for the Prediction of Heart Disease,†Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 3, no. 6, pp. 25–31, 2018.

M. Shouman, T. Turner, and R. Stocker, “Using Data Mining Techniques in Heart Disease Diagnosis and Treatment,†in Japan-Egypt Conference on Electronics, Communications and Computers, 2012, pp. 189–193.

S. Vijayarani, “A Study of Heart Disease Prediction in Data Mining,†Int. J. Comput. Sci. Inf. Technol. Secur., vol. 2, no. 5, pp. 1041–1045, 2012.

N. K. S. Banu and S. Swamy, “Prediction of Heart Disease at early stage using Data Mining and Big Data Analytics: A Survey,†in nternational Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT), 2016, pp. 256–261.

B. Gnaneswar and M. R. E. Jebarani, “A Review on Prediction and Diagnosis of Heart Failure,†in International Conference on Innovations in Information, Embedded and Communication System (ICIIECS), 2017.

C. Sowmiya and P. Sumitra, “Analytical Study of Heart Disease Diagnosis Using Classification Techniques,†in IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), 2017.

G. E. A. P. A. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised learning,†Appl. Artif. Intell., vol. 17, no. 5–6, pp. 519–533, 2003.

L. Rokach, “Ensemble-based classifiers,†Artif. Intell. Rev., vol. 33, no. 1–2, pp. 1–39, 2010.

S. Zhang, Z. Jin, and X. Zhu, “Missing data imputation by utilizing information within incomplete instances,†J. Syst. Softw., vol. 84, no. 3, pp. 452–459, 2011.

M. S. Santos, P. H. Abreu, P. J. Garcia-Laencina, A. Simao, and A. Carvalho, “A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients,†J. Biomed. Inform., vol. 58, pp. 49–59, 2015.




DOI: http://dx.doi.org/10.26418/jp.v5i3.37188

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
  View My Stats