Reducing Lending Risk: SVM Model Development with SMOTE for Unbalanced Credit Data

Josya Ryan Alexandro Purba; Qilbaaini Effendi Muftikhali; Bony Parulian Josaphat

doi:10.21108/ijoict.v9i2.860

Josya Ryan Alexandro Purba Badan Pusat Statistik Kabupaten Nias Selatan
Qilbaaini Effendi Muftikhali Universitas Telkom
Bony Parulian Josaphat Politeknik Statistika STIS

https://doi.org/10.21108/ijoict.v9i2.860

Abstract views: 104 ,

pdf downloads: 38

Keywords: Lending, Machine Learning, Support Vector Machine, SMOTE

Abstract

Lending is an important activity for banks in managing available funds. However, lending is also an activity that has a high risk, because not all customers who borrow funds can fulfill the responsibilities of the existing agreement. Because of this, it is necessary to have a method that can predict creditworthiness to customers in order to minimize the risks that arise. This research uses machine learning method, namely Support Vector Machine (SVM) in predicting creditworthiness. This method is applied and compared before and after the Synthetic Minority Oversampling Technique (SMOTE) on historical bank credit data BPR NBP 16 Rantau Prapat, North Sumatra and find the best parameters with grid search. According to the results of the analysis based on Area Under the Receiver Operating Characteristic Curve (AUC-ROC), SVM with SMOTE shows better results, namely 96%, than SVM without SMOTE, namely 56%.

Downloads

Download data is not yet available.

References

[1] Bambang Sudiyatno. (2013). PENGARUH RISIKO KREDIT DAN EFISIENSI OPERASIONAL TERHADAP KINERJA BANK (Studi Empirik pada Bank yang Terdaftar di Bursa Efek Indonesia). Jurnal Organisasi Dan Manajemen, 9(1), 73–86. https://doi.org/10.33830/jom.v9i1.39.2013
[2] Wang, Y., Zhang, Y., Lu, Y., & Yu, X. (2020). ScienceDirect ScienceDirect A Comparative Assessment of Credit Risk Model Based on Machine Learning-a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069
[3] Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of Machine Learning second edition. MIT Press
[4] Nurachim, R. I. (2019). PEMILIHAN MODEL PREDIKSI INDEKS HARGA SAHAM YANG DIKEMBANGKAN BERDASARKAN ALGORITMA SUPPORT VECTOR MACHINE(SVM) ATAU MULTILAYER PERCEPTRON(MLP) STUDI KASUS?: SAHAM PT TELEKOMUNIKASI INDONESIA TBK. Jurnal Teknologi Informatika & Komputer |, 5(1).
[5] Lusiyanti, D., & Nacong, D. N. (2018). SISTEM SEDERHANA UNTUK MEMPREDIKSI RISIKO PEMBERIAN KREDIT. JURNAL ILMIAH MATEMATIKA DAN TERAPAN, 15(2), 248–255. https://doi.org/10.22487/2540766X.2018.V15.I2.11360
[6] Kubat, M. (2021). An Introduction to Machine Learning. An Introduction to Machine Learning, 1–458. https://doi.org/10.1007/978-3-030-81935-4/COVER
[7] Namvar, A., Siami, M., Rabhi, F., & Naderpour, M. (2018). Credit risk prediction in an imbalanced social lending environment.
[8] Alam, T. M., Shaukat, K., Hameed, I. A., Luo, S., Sarwar, M. U., Shabbir, S., Li, J., & Khushi, M. (2020). An investigation of credit card default prediction in the imbalanced datasets. IEEE Access, 8, 201173–201198. https://doi.org/10.1109/ACCESS.2020.3033784
[9] Doko, F., Kalajdziski, S., & Mishkovski, I. (2021). Credit Risk Model Based on Central Bank Credit Registry Data. Journal of Risk and Financial Management, 14(3). https://doi.org/10.3390/jrfm14030138
[10] Boiko Ferreira, L. E., Barddal, J. P., Gomes, H. M., & Enembreck, F. (2018). Improving credit risk prediction in online peer-To-peer (P2P) lending using imbalanced learning techniques. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, 2017-November, 175–181. https://doi.org/10.1109/ICTAI.2017.00037
[11] Yang, P.-A. F.-F., Kelancaran, M., Nurani, P., Syari’ati Pramono, N., Manajemen, D., Ekonomi, F., Manajemen, D., Pertanian, I., Kampus, B., Bogor, D., & Permanasari, Y. (2016). Analisis Faktor-faktor yang Memengaruhi Kelancaran Kredit dan Penilaian Kesehatan Keuangan pada Amartha Microfinance. Jurnal Manajemen Dan Organisasi, 7(1), 1–16. https://doi.org/10.29244/JMO.V7I1.14065
[12] Frye, M., Mohren, J., & Schmitt, R. H. (2021). Benchmarking of Data Preprocessing Methods for Machine Learning-Applications in Production. Procedia CIRP, 104, 50–55. https://doi.org/10.1016/j.procir.2021.11.009
[13] Qu, Z., Li, H., Wang, Y., Zhang, J., Abu-Siada, A., & Yao, Y. (2020). Detection of electricity theft behavior based on improved synthetic minority oversampling technique and random forest classifier. Energies, 13(8). https://doi.org/10.3390/en13082039
[14] Erlin, E., Desnelita, Y., Nasution, N., Suryati, L., & Zoromi, F. (2022). Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang. MATRIK?: Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 677–690. https://doi.org/10.30812/matrik.v21i3.1726
[15] Nugroho, A. S., Witarto, A. B., & Handoko, D. (2003). Support Vector Machine-Teori dan Aplikasinya dalam Bioinformatika 1. http://asnugroho.net
[16] M, H., & M.N, S. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 01–11. https://doi.org/10.5121/ijdkp.2015.5201
[17] Valero-Carreras, D., Alcaraz, J., & Landete, M. (2022). Comparing two SVM models through different metrics based on the confusion matrix. https://doi.org/10.1016/j.cor.2022.106131
[18] Normawati, D., & Prayogi, S. A. (2021). Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter. J-SAKTI (Jurnal Sains Komputer Dan Informatika), 5(2), 697–711. http://ejurnal.tunasbangsa.ac.id/index.php/jsakti/article/view/369