Classification Model of Consumer Question about Motorbike Problems by Using NaÃ¯ve Bayes and Support Vector Machine

Ekky  Wicaksana; Danang Triantoro  Murdiansyah; Isman Kurniawan

doi:10.34818/INDOJC.2021.6.2.561

Ekky Wicaksana
Danang Triantoro Murdiansyah
Isman Kurniawan

https://doi.org/10.34818/INDOJC.2021.6.2.561

Abstract views: 280 ,

561 downloads: 202

Keywords: classification, naÃ¯ve bayes, SVM, n-gram, TF-IDF

Abstract

The motorbike plays an important role in supporting daily activity. The motorbike is known as one of the transportation modes that is frequently used in Indonesia. The number of motorbikes used in Indonesia is continuously increasing time by time. Hence, the occurrence of motorbike problems can affect community activity and disturb the economic condition in society. Since the problem of the motorbike can occur at any time, a prevention action is required by providing an online consultation platform. However, a classification model is required to handle a wide range of questions about the motorbike problem. By classifying those questions into a specific class of problems, the solution can be delivered to the consumer faster. In this study, we developed prediction models to classify consumer questions. The data set was collected from consumer questions regarding motorbike problems that are commonly occurring. The model was developed using two machine learning algorithms, i.e., NaÃ¯ve Bayes and Support Vector Machine (SVM). Text vectorization was performed by using the n-gram and term frequency-inverse document frequency (TF-IDF) method. The results show that the SVM model with the uni-trigram model performs better with the value of accuracy and F-measure, which are 0.910 and 0.910, respectively.

Downloads

Download data is not yet available.

References

[1] KOMINFO, â€œSetiap Jam Rata-rata 3 Orang Meninggal Akibat Kecelakaan Jalan di Indonesia,â€ 2017. [Online]. Available: https://kominfo.go.id/index.php/content/detail/10368/rata-rata-tiga-orang-meninggal-setiap-jam-akibat-kecelakaanjalan/0/artikel_ gpr.
[2] Lokadata, â€œKecelakaan Lalu Lintas Menurut Jenis Kendaraan,â€ 2020. [Online]. Available: https://lokadata.id/data/kecelakaan-lalu-lintas-menurut-jenis-kendaraan-2020-1582708742.
[3] M. Baygin, "Classification of text documents based on naive bayes using N-gram features," International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018, pp. 1-5.
[4] Venkatesh and K. V. Ranjitha, "Classification and optimization scheme for text data using machine learning naÃ¯ve bayes classifier," IEEE World Symposium on Communication Engineering (WSCE), Singapore, 2018, pp. 33-36.
[5] D. BuÅ¾iÄ‡ and J. DobÅ¡a, "Lyrics classification using naive bayes," 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, 2018, pp. 1011-1015.
[6] M. A. Rahman and Y. A. Akter, "Topic classification from text using decision tree, K-NN and multinomial naÃ¯ve bayes," 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-4.
[7] G. Singh, B. Kumar, L.Gaur, and A.Tyagi, â€œComparison between multinomial and bernoulli naÃ¯ve bayes for text classification,â€ International Conference on Automation, Computational and Technology Management (ICACTM), India, 2019.
[8] A. Nugroho, â€œAnalisis sentimen pada media sosial twitter menggunakan naive bayes classifier dengan ekstrasi fitur N-gram,â€ J-SAKTI, vol. 2, no. 2, 2018, p. 200.
[9] F. Peng and D. Schuurmans, â€œCombining naive bayes and N-gram language models for text classification,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2633, 2003, pp. 335â€“350.
[10] L. KobyliÅ„ski and A. PrzepiÃ³rkowski, â€œDefinition extraction with balanced random forests,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5221, 2008, pp. 237â€“24.
[11] M. Hakiem and M. A. Fauzi, â€œKlasifikasi ujaran kebencian pada twitter menggunakan metode naÃ¯ve bayes berbasis N-gram dengan seleksi fitur information gain,â€ vol. 3, no. 3, 2019, pp. 2443â€“2451.
[12] A. Tripathy, A. Agrawal, and S. K. Rath, â€œClassification of sentiment reviews using n-gram machine learning approach,â€ Expert Syst. Appl., vol. 57, 2016, pp. 117â€“126.
[13] M. Shirakawa, T. Hara, and S. Nishio, â€œN-gram IDF: A global term weighting scheme based on information distance,â€ WWW 2015 - Proc. 24th Int. Conf. World Wide Web, 2015, pp. 960â€“970.
[14] Suyanto, â€œData Mining: Untuk klasifikasi dan klasterisasi data,â€ Informatika, 2017, pp. 196-210.
[15] P. A. Octaviani, Y. Wilandari, and D. Ispriyanti, â€œPenerapan metode klasifikasi support vector machine pada data akreditasi sekolah dasar di kabupaten magelang,â€ Jurnal Gaussian, vol. 3, no. 4, 2014, pp. 811-820.
[16] X. Zhou and A. Del Valle, "Range based confusion matrix for imbalanced time series classification," 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 2020, pp. 1-6.

Classification Model of Consumer Question about Motorbike Problems by Using NaÃ¯ve Bayes and Support Vector Machine

Abstract

Downloads

References

Most read articles by the same author(s)