Text Mining Literature Review on Indonesian Social Media

Angelina Pramana Thenata

Abstract


Era sekarang jumlah berita dari berbagai media sosial yang tersebar dalam waktu singkat dan kebutuhan masyarakat untuk mengkonsumsi berita dalam berbagai referensi dapat mempengaruhi kehidupan masyarakat. Hal ini menyebabkan data yang tersebar dapat dikumpulkan dan dimanfaatkan oleh pemerintah, pengusaha, analisis, ataupun peneliti untuk mengidentifikasi tren, mengembangkan bisnis, memprediksi perilaku pelanggan dan lain sebagainya. Pengumpulan data berita dari media sosial tersebut dapat menggunakan text mining yang melibatkan algoritma yakni Naive Bayes, K-NN, dan SVM. Namun, penggunaan algoritma pada studi kasus yang tidak sesuai dapat memberikan hasil yang tidak optimal. Oleh karena itu, penelitian ini akan menganalisis algoritma text mining yang diimplementasikan pada media sosial berbahasa Indonesia dengan memakai metode systematic literature review. Metode ini dimulai dengan melakukan tahap planning yang menetapkan pertanyaan penelitian, kata pencarian, sumber literatur digital, dan standard literatur. Dilanjutkan dengan tahap conducting yang memilih dan mencocokan standard literatur, serta ekstraksi data. Kemudian tahap reporting yang melakukan analisis hasil ekstraksi data sehingga bisa menemumkan informasi dan pengetahuan. Tolak ukur yang menjadi acuan untuk perbandingan yakni pengujian confusion matrix berupa accuracy, precision, dan recall. Adapun hasil dari penelitian ini ditemukan algoritma Naive Bayes memberikan hasil yang stabil tapi kurang optimal jika diterapkan pada studi kasus media sosial berbahasa Indonesia. Sedangkan algortima K-NN dan SVM ditemukan memberikan hasil yang optimal jika diterapkan pada studi kasus media sosial berbahasa Indonesia yang dibuktikan dengan accuracy (50%-98.13%), precision (58.22%-98.48%), dan recall (21.05%-98%).  


Keywords


Media Sosial; Text Mining; Literatur Review; Naive Bayes; K-NN; SVM

Full Text:

PDF

References


Statista, “Number of internet users APAC 2021, by country Published by Statista Research Department, Mar 29, 2021 As of January 2021, China ranked first with around 939.8 million internet users while India achieved second place with 624 million internet users. China,” 2021. https://www.statista.com/statistics/265153/number-of-internet-users-in-the-asia-pacific-region/ (accessed Jun. 30, 2021).

P. H. Prastyo, A. S. Sumi, A. W. Dian, and A. E. Permanasari, “Tweets Responding to the Indonesian Government’s Handling of COVID-19: Sentiment Analysis Using SVM with Normalized Poly Kernel,” J. Inf. Syst. Eng. Bus. Intell., vol. 6, no. 2, p. 112, 2020, doi: 10.20473/jisebi.6.2.112-122.

R. H. Satrio and M. A. Fauzi, “Klasifikasi Tweets Pada Twitter Menggunakan Metode K-Nearest Neighbour (K-NN) Dengan Pembobotan TF-IDF,” Junral Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 8, pp. 8293–8300, 2019.

M. Z. Al-Taie, S. Kadry, and J. P. Lucas, “Online Data Preprocessing: A case Study Approach,” Int. J. Electr. Comput. Eng., vol. 9, no. 4, pp. 2620–2626, 2019, doi: 10.11591/ijece.v9i4.pp2620-2626.

M. Kannan, S., Gurusamy, V., Vijayarani, S., Ilamathi, J. & Nithya, “Preprocessing Techniques for Text Mining Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. October 2014, pp. 7–16, 2016.

R. Wongso, F. A. Luwinda, B. C. Trisnajaya, O. Rusli, and Rudy, “News Article Text Classification in Indonesian Language,” Procedia Comput. Sci., vol. 116, pp. 137–143, 2017, doi: 10.1016/j.procs.2017.10.039.

M. Syarifuddin, “Analisis Sentimen Opini Publik Mengenai Covid-19 Pada Twitter Menggunakan Metode Naïve Bayes Dan Knn,” Inti Nusa Mandiri, vol. 15, no. 1, pp. 23–28, 2020.

F. S. Jumeilah, “Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 1, no. 1, pp. 19–25, 2017, doi: 10.29207/resti.v1i1.11.

B. Haryanto, Y. Ruldeviyani, F. Rohman, T. N. Julius Dimas, R. Magdalena, and F. Muhamad Yasil, “Facebook Analysis of Community Sentiment on 2019 Indonesian Presidential Candidates From Facebook Opinion Data,” Procedia Comput. Sci., vol. 161, pp. 715–722, 2019, doi: 10.1016/j.procs.2019.11.175.

D. A. Agustina, S. Subanti, and E. Zukhronah, “Implementasi Text Mining Pada Analisis Sentimen Pengguna Twitter Terhadap Marketplace di Indonesia Menggunakan Algoritma Support Vector Machine,” Indones. J. Appl. Stat., vol. 3, no. 2, p. 109, 2021, doi: 10.13057/ijas.v3i2.44337.

A. Taufik, “Komparasi Algoritma Text Mining Untuk Klasifikasi Review Hotel,” J. Tek. Komput., vol. IV, no. 2, pp. 112–118, 2018, doi: 10.31294/jtk.v4i2.3461.

B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 704–712, 2021, doi: 10.1016/j.procs.2021.01.059.

R. Umar, I. Riadi, and Purwono, “Perbandingan Metode SVM, RF dan SGD untuk Penentuan Model Klasifikasi Kinerja Programmer pada Aktivitas Media Sosial,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 2, pp. 329–335, 2020.

A. Briliani, B. Irawan, and C. Setianingsih, “Hate Speech Detection in Indonesian Language on Instagram Comment Section Using K-Nearest Neighbor Classification Method,” Proc. - 2019 IEEE Int. Conf. Internet Things Intell. Syst. IoTaIS 2019, pp. 98–104, 2019, doi: 10.1109/IoTaIS47347.2019.8980398.

S. Kumar, A. K. Kar, and P. V. Ilavarasan, “Applications of Text Mining in Services Management: A Systematic Literature Review,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 1, p. 100008, 2021, doi: 10.1016/j.jjimei.2021.100008.

A. D. Poernomo and S. Suharjito, “Indonesian Online Travel Agent Sentiment Analysis Using Machine Learning Methods,” Indones. J. Electr. Eng. Comput. Sci., vol. 14, no. 1, p. 113, 2019, doi: 10.11591/ijeecs.v14.i1.pp113-117.

N. R. Fatahillah, P. Suryati, and C. Haryawan, “Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech,” Proc. - 2017 Int. Conf. Sustain. Inf. Eng. Technol. SIET 2017, vol. 2018-Janua, pp. 128–131, 2018, doi: 10.1109/SIET.2017.8304122.

V. A. Fitri, R. Andreswari, and M. A. Hasibuan, “Sentiment Analysis of Social Media Twitter with Case of Anti-LGBT Campaign in Indonesia using Naïve Bayes, Decision Tree, and Random Forest Algorithm,” Procedia Comput. Sci., vol. 161, pp. 765–772, 2019, doi: 10.1016/j.procs.2019.11.181.

A. C. Sitepu, W. Wanayumini, and Z. Situmorang, “Determining Bullying Text Classification Using Naive Bayes Classification on Social Media,” J. Varian, vol. 4, no. 2, pp. 133–140, 2021, doi: 10.30812/varian.v4i2.1086.

R. C. Chairani, B. Irawan, and C. Setianingsih, “Klasifikasi Data Politik Pada Media Sosial Dengan Algoritma Naive Bayes,” eProceedings Eng., vol. 8, no. 2, p. 6, 2021.

D. A. Kristiyanti, A. H. Umam, M. Wahyudi, R. Amin, and L. Marlinda, “Comparison of SVM Naïve Bayes Algorithm for Sentiment Analysis Toward West Java Governor Candidate Period 2018-2023 Based on Public Opinion on Twitter,” 2018 6th Int. Conf. Cyber IT Serv. Manag. CITSM 2018, no. Citsm, pp. 1–6, 2019, doi: 10.1109/CITSM.2018.8674352.

B. Y. Pratama and R. Sarno, “Personality Classification Based on Twitter Text Using Naive Bayes, KNN and SVM,” Proc. 2015 Int. Conf. Data Softw. Eng. ICODSE 2015, pp. 170–174, 2016, doi: 10.1109/ICODSE.2015.7436992.

E. Barfian, B. H. Iswanto, and S. M. Isa, “Twitter Pornography Multilingual Content Identification Based on Machine Learning,” Procedia Comput. Sci., vol. 116, pp. 129–136, 2017, doi: 10.1016/j.procs.2017.10.024.

N. Anggraini and M. J. Tursina, “Sentiment Analysis of School Zoning System on Youtube Social Media Using the K-Nearest Neighbor with Levenshtein Distance Algorithm,” 2019 7th Int. Conf. Cyber IT Serv. Manag. CITSM 2019, no. May, pp. 1–4, 2019, doi: 10.1109/CITSM47753.2019.8965407.

T. Mustaqim, K. Umam, and M. A. Muslim, “Twitter Text Mining for Sentiment Analysis on Government’s Response to Forest Fires with Vader Lexicon Polarity Detection and K-Nearest Neighbor Algorithm,” J. Phys. Conf. Ser., vol. 1567, no. 3, 2020, doi: 10.1088/1742-6596/1567/3/032024.

R. Damarta, A. Hidayat, and A. S. Abdullah, “The Application of K-Nearest Neighbors Classifier for Sentiment Analysis Of PT PLN (Persero) Twitter Account Service Quality,” J. Phys. Conf. Ser., vol. 1722, no. 1, 2021, doi: 10.1088/1742-6596/1722/1/012002.

F. Firmansyah et al., “Comparing Sentiment Analysis of Indonesian Presidential Election 2019 with Support Vector Machine and K-Nearest Neighbor Algorithm,” 6th Int. Conf. Comput. Eng. Des., pp. 1–6, 2020, doi: 10.1109/ICCED51276.2020.9415767.

S. Saifullah, Y. Fauziyah, and A. S. Aribowo, “Comparison Of Machine Learning for Sentiment Analysis in Detecting Anxiety Based on Social Media Data,” J. Inform., vol. 15, no. 1, p. 45, 2021, doi: 10.26555/jifo.v15i1.a20111.

P. H. Prastyo, I. Ardiyanto, and R. Hidayat, “Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF,” Int. Conf. Data Anal. Bus. Ind. W. Towar. a Sustain. Econ., pp. 1–6, 2020, doi: 10.1109/ICDABI51230.2020.9325685.

M. Andriansyah et al., “Cyberbullying Comment Classification on Indonesian Selebgram Using Support Vector Machine Method,” Proc. 2nd Int. Conf. Informatics Comput. ICIC 2017, vol. 2018-Janua, pp. 1–5, 2018, doi: 10.1109/IAC.2017.8280617.

P. G. Pratama and N. A. Rakhmawati, “Social Bot Detection on 2019 Indonesia President Candidate’s Supporter’s Tweets,” Procedia Comput. Sci., vol. 161, pp. 813–820, 2019, doi: 10.1016/j.procs.2019.11.187.

H. Syahputra, L. K. Basyar, and A. A. S. Tamba, “Setiment Analysis of Public Opinion on the Go-Jek Indonesia Through Twitter Using Algorithm Support Vector Machine,” J. Phys. Conf. Ser., vol. 1462, no. 1, 2020, doi: 10.1088/1742-6596/1462/1/012063.

W. Kaswidjanti, H. Himawan, and P. D. P. Silitonga, “The Accuracy Comparison of Social Media Sentiment Analysis Using Lexicon Based and Support Vector Machine on Souvenir Recommendations,” Test Eng. Manag., vol. 82, no. 3–4, pp. 3953–3961, 2020.

S. Bukhori et al., “Social Media Sentiment Analysis to Measure Community Response in The Millennial Road Safety Festival Program Using TF-IDF and Support Vector Machine,” J. Indones. Road Saf., vol. 3, no. 2, pp. 69–82, 2020.

F. Romadoni, Y. Umaidah, and B. N. Sari, “Text Mining Untuk Analisis Sentimen Pelanggan Terhadap Layanan Uang Elektronik Menggunakan Algoritma Support Vector Machine,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 9, no. 2, pp. 247–253, 2020, doi: 10.32736/sisfokom.v9i2.903.




DOI: http://dx.doi.org/10.26418/jp.v7i2.47975

Refbacks

  • There are currently no refbacks.