Klasifikasi Pertanyaan Bidang Akademik Berdasarkan 5W1H menggunakan K-Nearest Neighbors

Kristian Adi Nugraha, Herlina Herlina


Pertanyaan merupakan metode terbaik dan termudah untuk menggali sebuah informasi. Menurut aturan 5W1H, terdapat enam bentuk dasar pertanyaan yang dapat digunakan untuk memperoleh informasi, yaitu: what, where, when, why, who, how. Banyak jurnalis yang menggunakan metode ini, karena dapat diimplementasikan dengan cepat dan mudah untuk membangun sebuah pertanyaan. Untuk membuat sebuah sistem yang dapat memahami sebuah pertanyaan, misalnya seperti pada chatbot, terdapat metode khusus yang harus diterapkan untuk dapat membedakan keenam jenis pertanyaan yang ada. Penelitian ini mencoba untuk melakukan klasifikasi terhadap dokumen pertanyaan berdasarkan aturan 5W1H, dengan menggunakan tokenisasi dan stemming pada tahap pra-pemrosesan, kemudian K-Nearest Neighbors (K-NN) untuk mengklasifikasikan pertanyaan. Berdasarkan hasil pengujian, nilai akurasi tertinggi adalah 70.27% untuk k = 5.


K-nearest Neighbors; Klasifikasi; Pemrosesan Teks; Pertanyaan; 5w1h

Full Text:



B. Ojokoh, T. Igbe, A. Araoye and F. Ameh, “Question identification and classification on an academic question answering site,” in 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), Newark, NJ, USA, 2016.

Y. Dong, P. Liu, Z. Zhu, Q. Wang and Q. Zhang, “A Fusion Model-Based Label Embedding and Self-Interaction Attention for Text Classification,” IEEE Access , vol. 8, pp. 30548 - 30559, 2019.

L. Li, Y. Yu, S. Bai, Y. Hou and X. Chen, “An Effective Two-Step Intrusion Detection Approach Based on Binary Classification and k -NN,” IEEE Access , vol. 6, pp. 12060 - 12073, 2017.

M. Gramajo, L. Ballejos and M. Ale, “Seizing Requirements Engineering Issues through Supervised Learning Techniques,” IEEE Latin America Transactions, vol. 18, no. 7, pp. 1164 - 1184, 2020.

Y.-F. Li, L.-Z. Guo and Z.-H. Zhou, “Towards Safe Weakly Supervised Learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 334 - 346, 2021.

M. A. Zardari and L. T. Jung, “Data classification with k-NN using novel character frequency-direct word frequency (CF-DWF) similarity formula,” in 2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC), Ipoh, Malaysia, 2015.

K. A. Nugraha, W. Hapsari and N. A. Haryono, “Analisis Tekstur Pada Citra Motif Batik Untuk Klasifikasi Menggunakan K-NN,” Informatika: Jurnal Teknologi Komputer dan Informatika, vol. 10, no. 2, pp. 135-140, 2014.

K. A. Nugraha, “Deteksi Area Parkir Mobil Berbasis Marker Menggunakan Moment Invariants dan K-NN,” Jurnal Teknik Informatika dan Sistem Informasi , vol. 5, no. 1, pp. 112-121, 2019.

M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M. Abdelmajeed, A. Mehmood and M. T. Sadiq, “Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network,” IEEE Access , vol. 8, pp. 42689 - 42707, 2020.

S. S. Samant, N. L. B. Murthy and A. Malapati, “Improving Term Weighting Schemes for Short Text Classification in Vector Space Model,” IEEE Access , vol. 7, pp. 166578 - 166592, 2019.

S. S. Mullick, S. Datta and S. Das, “Adaptive Learning-Based k -Nearest Neighbor Classifiers With Resilience to Class Imbalance,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 11, pp. 5713 - 5725, 2018.

H. Ma, J. Gou, X. Wang, J. Ke and S. Zeng, “Sparse Coefficient-Based k -Nearest Neighbor Classification,” IEEE Access , vol. 5, pp. 16618 - 16634, 2017.

A. Moldagulova and R. B. Sulaiman, “Using KNN algorithm for classification of textual documents,” in 2017 8th International Conference on Information Technology (ICIT), Amman, 2017.

M. A. Rahman and Y. A. Akter, “Topic Classification from Text Using Decision Tree, K-NN and Multinomial Naïve Bayes,” in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019.

Z. Jianqiang and G. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,” IEEE Access, vol. 5, pp. 2870 - 2879, 2017.

D. Sebastian and K. A. Nugraha, “Text normalization for indonesian abbreviated word using crowdsourcing method,” in 2019 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2019.

X. T. Nguyen, H. Kim and H.-J. Lee, “An Efficient Sampling Algorithm With a K-NN Expanding Operator for Depth Data Acquisition in a LiDAR System,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4700 - 4714, 2020.

E. T. Maddalena and C. N. Jones, “NSM Converges to a k-NN Regressor Under Loose Lipschitz Estimates,” IEEE Control Systems Letters, vol. 4, no. 4, pp. 880 - 885, 2020.

A. Navlani, “KNN Classification using Scikit-learn,” 2 Agustus 2018. [Online]. Available: https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn. [Accessed 20 Februari 2021].

K. A. Nugraha and D. Sebastian, “Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 4, no. 3, pp. 376-386, 2018.

P. Dangeti, “Statistics for Machine Learning by Pratap Dangeti,” [Online]. Available: https://www.oreilly.com/library/view/statistics-for-machine/9781788295758/eb9cd609-e44a-40a2-9c3a-f16fc4f5289a.xhtml. [Accessed 20 Februari 2021].

D. Sebastian and K. A. Nugraha, “Sistem Perbaikan Kata Tidak Baku Bahasa Indonesia Menggunakan Metode Crowdsourcing,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 5, no. 3, pp. 386-396, 2019.

K. A. Nugraha and D. Sebastian, “Analisis Trend Akun Media Sosial Twitter Menggunakan TF-IDF dan Cosine Similarity,” in Rekayasa Teknologi Industri dan Informasi XIII Tahun 2018 (ReTII), Yogyakarta, 2018.

T. Xia, “A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems,” IEEE Access, vol. 8, pp. 82653 - 82661, 2020.

D. Harris, “What Is Text Analytics? We Analyze the Jargon,” 3 Oktober 2016. [Online]. Available: https://www.softwareadvice.com/resources/what-is-text-analytics/. [Accessed 20 Februari 2021].

Ç. Ç. Karaman, S. Yalıman and S. A. Oto, “Event detection from social media: 5W1H analysis on big data,” in 2017 25th Signal Processing and Communications Applications Conference (SIU), Antalya, 2017.

DOI: http://dx.doi.org/10.26418/jp.v7i1.45322


  • There are currently no refbacks.