Classification Model of Consumer Question about Motorbike Problems by Using Naïve Bayes and Support Vector Machine

  • Ekky Wicaksana
  • Danang Triantoro Murdiansyah
  • Isman Kurniawan
Abstract views: 280 , 561 downloads: 202
Keywords: classification, naïve bayes, SVM, n-gram, TF-IDF

Abstract

The motorbike plays an important role in supporting daily activity. The motorbike is known as one of the transportation modes that is frequently used in Indonesia. The number of motorbikes used in Indonesia is continuously increasing time by time. Hence, the occurrence of motorbike problems can affect community activity and disturb the economic condition in society. Since the problem of the motorbike can occur at any time, a prevention action is required by providing an online consultation platform. However, a classification model is required to handle a wide range of questions about the motorbike problem. By classifying those questions into a specific class of problems, the solution can be delivered to the consumer faster. In this study, we developed prediction models to classify consumer questions. The data set was collected from consumer questions regarding motorbike problems that are commonly occurring. The model was developed using two machine learning algorithms, i.e., Naïve Bayes and Support Vector Machine (SVM). Text vectorization was performed by using the n-gram and term frequency-inverse document frequency (TF-IDF) method. The results show that the SVM model with the uni-trigram model performs better with the value of accuracy and F-measure, which are 0.910 and 0.910, respectively.

Downloads

Download data is not yet available.

References

[1] KOMINFO, “Setiap Jam Rata-rata 3 Orang Meninggal Akibat Kecelakaan Jalan di Indonesia,” 2017. [Online]. Available: https://kominfo.go.id/index.php/content/detail/10368/rata-rata-tiga-orang-meninggal-setiap-jam-akibat-kecelakaanjalan/0/artikel_ gpr.
[2] Lokadata, “Kecelakaan Lalu Lintas Menurut Jenis Kendaraan,” 2020. [Online]. Available: https://lokadata.id/data/kecelakaan-lalu-lintas-menurut-jenis-kendaraan-2020-1582708742.
[3] M. Baygin, "Classification of text documents based on naive bayes using N-gram features," International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018, pp. 1-5.
[4] Venkatesh and K. V. Ranjitha, "Classification and optimization scheme for text data using machine learning naïve bayes classifier," IEEE World Symposium on Communication Engineering (WSCE), Singapore, 2018, pp. 33-36.
[5] D. Bužić and J. Dobša, "Lyrics classification using naive bayes," 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, 2018, pp. 1011-1015.
[6] M. A. Rahman and Y. A. Akter, "Topic classification from text using decision tree, K-NN and multinomial naïve bayes," 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-4.
[7] G. Singh, B. Kumar, L.Gaur, and A.Tyagi, “Comparison between multinomial and bernoulli naïve bayes for text classification,” International Conference on Automation, Computational and Technology Management (ICACTM), India, 2019.
[8] A. Nugroho, “Analisis sentimen pada media sosial twitter menggunakan naive bayes classifier dengan ekstrasi fitur N-gram,” J-SAKTI, vol. 2, no. 2, 2018, p. 200.
[9] F. Peng and D. Schuurmans, “Combining naive bayes and N-gram language models for text classification,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2633, 2003, pp. 335–350.
[10] L. Kobyliński and A. Przepiórkowski, “Definition extraction with balanced random forests,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5221, 2008, pp. 237–24.
[11] M. Hakiem and M. A. Fauzi, “Klasifikasi ujaran kebencian pada twitter menggunakan metode naïve bayes berbasis N-gram dengan seleksi fitur information gain,” vol. 3, no. 3, 2019, pp. 2443–2451.
[12] A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Syst. Appl., vol. 57, 2016, pp. 117–126.
[13] M. Shirakawa, T. Hara, and S. Nishio, “N-gram IDF: A global term weighting scheme based on information distance,” WWW 2015 - Proc. 24th Int. Conf. World Wide Web, 2015, pp. 960–970.
[14] Suyanto, “Data Mining: Untuk klasifikasi dan klasterisasi data,” Informatika, 2017, pp. 196-210.
[15] P. A. Octaviani, Y. Wilandari, and D. Ispriyanti, “Penerapan metode klasifikasi support vector machine pada data akreditasi sekolah dasar di kabupaten magelang,” Jurnal Gaussian, vol. 3, no. 4, 2014, pp. 811-820.
[16] X. Zhou and A. Del Valle, "Range based confusion matrix for imbalanced time series classification," 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 2020, pp. 1-6.
Published
2021-09-28
How to Cite
Wicaksana, E., Murdiansyah, D. T., & Kurniawan, I. (2021). Classification Model of Consumer Question about Motorbike Problems by Using Naïve Bayes and Support Vector Machine. Indonesia Journal on Computing (Indo-JC), 6(2), 1-10. https://doi.org/10.34818/INDOJC.2021.6.2.561
Section
Computer Science