Study on the Effect of Preprocessing Methods for Spam Email Detection

  • Fariska Zakhralativa Ruskanda Widyatama University
Abstract views: 482 , PDF downloads: 390


The use of email as a communication technology is now increasingly being exploited. Along with its progress, email spam problem becomes quite disturbing to email user. The resulting negative impacts make effective spam email detection techniques indispensable. A spam email detection algorithm or spam classifier will work effectively if supported by proper preprocessing steps (noise removal, stop words removal, stemming, lemmatization, term frequency). This research studies the effect of preprocessing steps on the performance of supervised spam classifier algorithms. Experiments were conducted on two widely used supervised spam classifier algorithms: Naïve Bayes and Support Vector Machine. The evaluation is performed on the Ling-spam corpus dataset and uses evaluation metrics: accuracy. The experimental results show that different preprocessing steps give different effects to different classifier.


Download data is not yet available.

Author Biography

Fariska Zakhralativa Ruskanda, Widyatama University
Department of Informatics


G. V. Cormack, “Email Spam Filtering: A Systematic Review,” Foundations and Trends® in Information Retrieval, vol. 1, no. 4, pp. 335–455, 2008.

E. Blanzieri and A. Bryl, “A survey of learning-based techniques of email spam filtering,” Artificial Intelligence Review, vol. 29, no. 1, pp. 63–92, 2008.

W. Yerazunis, “Correspondence with Paul Graham.” 2002.

B. Leiba, J. Ossher, V. Rajan, R. Segal, and M. Wegman, “SMTP Path Analysis,” in Conference on Email and Anti-spam, 2005, vol. 2, no. 1, pp. 54–66.

S. Balakrishnan and K. L. Shunmuganathan, “An Agent Based Collaborative Spam Filtering Assistance Using JADE,” International Journal of Applied Engineering Research, vol. 10, no. 21, pp. 42476–42479, 2015.

T. A. Almeida, J. Almeida, and A. Yamakami, “Spam filtering: How the dimensionality reduction affects the accuracy of Naive Bayes classifiers,” Journal of Internet Services and Applications, vol. 1, no. 3, pp. 183–200, 2011.

W. Feng, J. Sun, L. Zhang, C. Cao, and Q. Yang, “A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering,” in 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), 2016, no. IEEE, p. 8.

A. Sharma and A. Suryawanshi, “A Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure,” International Journal of Computer Applications, vol. 136, no. 6, pp. 975–8887, 2016.

O. Kufandirimbwa and R. Gotora, “Spam Detection Using Artificial Neural Networks (Perceptron Learning Rule),” Online Journal of Physical and Environmental Science Research, vol. 1, no. 2, pp. 22–29, 2012.

A. S. Rao, P. S. Avadhani, and N. B. Chaudhuri, “A Content-Based Spam E-Mail Filtering Approach Using Multilayer Perceptron Neural Networks,” International Journal of Engineering Trends and Technology (IJETT), vol. 41, no. 1, pp. 44–55, 2016.

J. Bluszcz, D. Fitisova, A. Hamann, A. Trifonov, and P. Jahnichen, “Application of Support Vector Machine Algorithm in E-Mail Spam Filtering,” pp. 1–5, 2016.

Z. Khan and U. Qamar, “Text Mining Approach to Detect Spam in Emails,” Proceedings of The International Conference on Innovations in Intelligent Systems and Computing Technologies, no. February, 2016.

H. Wei-chih and T. Yu, “E-mail Spam Filtering Using Support Vector Machines with Selection of Kernel,” Information and Control, pp. 764–767, 2009.

D. C. Trudgian and Z. R. Yang, “Spam Classification Using Nearest Neighbour Techniques,” in Intelligent Data Engineering and Automated Learning – IDEAL 2004, 2004, pp. 578–585.

S. B. Rathod and T. M. Pattewar, “Content Based Spam Detection in Email using Bayesian Classfifier,” in 2015 International Conference on Communications and Signal Processing (ICCSP), 2015, pp. 1257–1261.

G. Sakkis, I. O. N. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, and P. Stamatopoulos, “A Memory-Based Approach to Anti-Spam Filtering,” pp. 49–73, 2003.

S. K. Trivedi, “A study of machine learning classifiers for spam detection,” in 2016 4th International Symposium on Computational and Business Intelligence (ISCBI), 2016, pp. 176–180.

A. R. On and D. Glaucoma, “A Review on Different Spam Detection Approaches,” vol. 11, no. 6, pp. 2–7, 2015.

J. Daniel and J. Martin, “Naive Bayes and Sentiment Classification,” in Speech and Language Processing Stanford University, 2017.

How to Cite
Ruskanda, F. Z. (2019). Study on the Effect of Preprocessing Methods for Spam Email Detection. Indonesian Journal on Computing (Indo-JC), 4(1), 109-118.
Computer Science