Perbandingan Algoritma Pohon dengan Beberapa Skenario Pelabelan untuk Analisis Sentimen pada Aplikasi Milik Pemerintah/BUMN

Anwar Fitrianto, Silmi Anisa Rizki Manaf, Agus Mohamad Soleh

Abstract


Berkembangnya era digitalisasi mengakibatkan banyaknya inovasi yang diupayakan untuk mempermudah aktivitas masyarakat di berbagai bidang, salah satunya yaitu adanya aplikasi yang menunjang agar menjadi lebih efisien dan dapat diakses dari mana saja. Aplikasi milik pemerintah dan BUMN sebagai perusahaan berskala nasional cenderung belum banyak diketahui dan banyak yang memiliki rating rendah disertai dengan berbagai macam ulasan pengguna aplikasi. Analisis sentimen merupakan analisis yang cocok untuk menganalisis ulasan dari aplikasi yang dipilih. Data yang digunakan adalah ulasan aplikasi InfoBMKG, BPOM Mobile, MyIndihome, dan MyPertamina. Penelitian bertujuan untuk membandingkan performa algoritma double random forest  dan algoritma berbasis pohon lain yaitu decision tree, extra trees, dan random forest berdasarkan tingkat ketepatan performa akurasi model. Pelabelan data berdasarkan rating aplikasi, lexicon-based, dan sentiment scoring dengan peubah prediktor dihasilkan dari tokenisasi unigram yang diberi bobot dengan TF-IDF. Setiap observasi data dikategorikan ke dalam kelas positif, netral, dan negatif. Hasil penelitian menunjukkan algoritma extra trees dan metode pelabelan sentiment scoring mampu menghasilkan performa terbaik dengan nilai rata-rata akurasi mencapai 80 – 84% pada tiap aplikasi yang dipilih.

Keywords


Analisis Sentimen; Decision Tree; Double Random Forest; Extra Trees; Klasifikasi; Random Forest

Full Text:

PDF

References


Asosiasi Penyelenggara Jasa Internet Indonesia (APJII). Survei Penetrasi & Perilaku Internet 2023. Jakarta, 2023.

Hendriyanto, M. D., Ridha, A. A., and Enri, U, “Analisis Sentimen Ulasan Aplikasi Mola Pada Google Play Store Menggunakan Algoritma Support Vector Machine”. INTECOMS: Journal of Information Technology and Computer Science, 5(1), 1-7, 2022.

P. Mehta and Dharnil Pandya, “A review on sentiment analysis methodologies, practices, and applications”, International Journal of Scientific and Technology Research, vol. 2, pp. 601–609, 2020.

A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review”, Artificial Intelligence Review, vol. 53, no. 6, pp. 4335–4385, 2020.

Y. Asri, W. N. Suliyanti, D. Kuswardani, M. Fajri, “Pelabelan otomatis lexicon vader dan klasifikasi Naïve Bayes dalam menganalisis sentimen data ulasan PLN Mobile”, PETIR: Jurnal Pengkajian dan Penerapan Teknik Informatika, vol. 15, no. 2, pp. 264–275, 2022.

J. A. Shathik and K. K. Prasad, “A literature review on application of sentiment analysis using machine learning techniques”, International Journal of Applied Engineering and Management Letters (IJAEML), vol. 4, no. 2, pp. 41–77, 2020.

A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protect using twitter data”, International Journal of Information Management Data Insights, vol. 1, no. 2, 2021.

A. S. Aribowo, H. Basiron, N. S. Herman, and S. Khomsah, “An evaluation of preprocessing steps and tree-based ensemble machine learning for analysing sentiment on Indonesian youtube comments”, International Journal of Advanced Trends in Computer Science and Engineering, vol 9, no. 5, pp. 7078–7086, 2020.

S. Han, H. Kim, and Y. S. Lee, “Double random forest”, Machine Learning, vol. 198, pp. 1569–1586, 2020.

B. Charbuty and A. Abdulazeez, “Classification based on decision tree algorithm for machine learning”, Journal of Applied Science and Technology Trends, vol. 2, no. 1, pp. 20–28, 2021.

I. Tamara. “Kajian kinerja algoritma klasifikasi extra-trees pada permasalahan data kelas tak seimbang”, thesis, Institut Pertanian Bogor, Bogor, Indonesia, 2022.

T. Daniya, M. Geetha, and K. S. Kumar, “Classification and regression trees with GINI index”, Advances in Mathematics: Scientific Journal, vol. 9, no. 10, pp. 8237–8247, 2020.

Yuan Y, Wu L, Zhang X. Gini-impurity index analysis. IEEE Transactions on Information Forensics and Security. 16:3154–3169, 2021.

S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm”, International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612 – 619, 2020.

M. A. Ganaie, M. Tanveer, P. N. Suganthan, V. Snásel, “Oblique and rotation double random forest”, Neural Networks, vol. 153, pp. 496–517, 2022.

L. Breiman, “Random forests”, Machine Learning, pp. 5–32, 2001.

M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning”, The Stata Journal, vol. 20, no. 1, pp. 3–29, 2020.

P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees”, Machine Learning, pp. 3–42, 2006.

E. K. Ampomah, Z. Qin, and G. Nyame, “Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement”, Information, vol. 11, no. 6, pp. 332, 2020.

M. R. C. Acosta, S. Ahmed, C. E. Garcia, and I. Koo, “Extremely randomized trees-based scheme for stealthy cyber-attack detection in smart grid network”, IEEE Access, vol. 8, pp. 19921–19933, 2020.

T. Wang, K. Lu, K. P. Chow, and Q. Zhu, “COVID-19 sensing: negative sentiment analysis on social media in China via BERT model”, IEEE Access, vol. 8, pp. 138162–138169, 2020.

A. Onan, “Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks”, Concurrency and Computation: Practice and Experience, vol. 33, no. 23, pp. e5909, 2021.

M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class classification: an overview”, arXiv preprint arXiv:2008.05756, 2020.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation”, BMC genomics, vol. 21, no. 1, pp. 1–13, 2020.

N. A. Salsabila, Y. A. Winatmoko, A. A. Septiandri, and A. Jamal, “Colloquial Indonesian lexicon”, 2018 International Conference on Asian Language Processing (IALP), IEEE, pp. 226–229, 2018.

D. H. Wahid and S. N. Azhari, “Peringkasan sentimen ekstraktif di twitter menggunakan hybrid TF-IDF dan cosine similarity”, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 10, no. 2, pp. 207–218, 2016.

T. H. Pudjiantoro and F. R. Umbara, “Analisis sentimen terhadap e-commerce pada media sosial twitter menggunakan metode Naïve Bayes”, Seminar Nasional Informatika dan Aplikasinya (SNIA), 2021.

A. Mash, “The impact of tokenization on gender bias in machine translation”, Universitat Pompeu Fabra, Barcelona, 2023.

B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?”, Computational Statistics, col. 36, no. 3, pp. 2009–2031, 2021.

M. Kuhn and K. Johnson, Applied Predictive Modeling. New York, United States: Springer, 2013, vol. 26.




DOI: http://dx.doi.org/10.26418/jp.v10i1.73512

Refbacks

  • There are currently no refbacks.