RESEARCH APPLICATION OF THE SPAM FILTERING AND SPAMMER DETECTION ALGORITHMS ON SOCIAL MEDIA AND MESSENGERS

Main Article Content

Andrii Podorozhniak
https://orcid.org/0000-0002-6688-8407
Nataliia Liubchenko
Vasyl Oliinyk
Viktoriia Roh

Abstract

In the current era, numerous social networks and messaging platforms have become integral parts of our lives, particularly in relation to work activities, due to the prevailing COVID-19 pandemic and russian war in Ukraine. Amidst this backdrop, the issue of spam and spammers has become more pertinent than ever, with a continuous rise in the incidence of spam within work-related text streams. Spam refers to textual content that is extraneous to a specific text stream, while a spammer denotes an individual who disseminates unsolicited messages for personal gain. The proposed article is devoted to address this scientific and practical challenge of identifying spammers and detecting spam messages within the textual context of any social network or messenger. This endeavor encompasses the utilization of diverse spam detection algorithms and approaches for spammer identification. Four algorithms were implemented, namely a naive Bayesian classifier, Support-vector machine, multilayer perceptron neural network, and convolutional neural network. The research objective was to develop a spam detection algorithm that can be seamlessly integrated into a messenger platform, exemplified by the utilization of Telegram as a case study. The designed algorithm discerns spam based on the contextual characteristics of a specific text stream, subsequently removing the spam message and blocking the spammer-user until authorized by one of the application administrators.

Article Details

How to Cite
Podorozhniak , A. ., Liubchenko , N. ., Oliinyk , V. ., & Roh , V. . (2023). RESEARCH APPLICATION OF THE SPAM FILTERING AND SPAMMER DETECTION ALGORITHMS ON SOCIAL MEDIA AND MESSENGERS . Advanced Information Systems, 7(3), 60–66. https://doi.org/10.20998/2522-9052.2023.3.09
Section
Methods of information systems protection
Author Biographies

Andrii Podorozhniak , National Technical University "Kharkiv Polytechnic Institute", Kharkiv

Candidate of Technical Sciences, Associate Professor, Associate Professor of Computer Engineering and Programming Department

Nataliia Liubchenko , National Technical University "Kharkiv Polytechnic Institute", Kharkiv

Candidate of Technical Sciences, Associate Professor, Associate Professor of Systems Analysis and Information-Analytical Technologies Department

Vasyl Oliinyk , National Technical University "Kharkiv Polytechnic Institute", Kharkiv

master student of Computer Engineering and Programming Department

Viktoriia Roh , Kharkiv National University of Internal Affairs, Kharkiv

Senior Lecturer of Combating Cybercrime Department

References

Yasin, S. M. and Azmi, I. H. (2023), “Email spam filtering technique: challenges and solutions”, Journal of Theoretical and Applied Information Technology, 2023, vol. 101, iss. 13, pp. 5130–5138.

Liu, B., Blasch, E., Chen, Y., Shen, D. and Chen, G. (2013), “Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier”, Proceedings of the IEEE International Conference on Big Data, 2013, USA, pp. 99-104. doi: https://doi.org/10.1109/BigData.2013.6691740.

Chaudhry, S., Dhawan, S., and Tanwar, R. (2020), “Spam Detection in Social Network Using Machine Learning Approach”, Data Science and Analytics. REDSET 2019. Communications in Computer and Information Science, 2020, vol. 1230, pp. 236-245. doi: https://doi.org/10.1007/978-981-15-5830-6_20.

Liubchenko, N., Podorozhniak, A., Oliinyk, V. (2021), “Research of antispam bot algorithms for social networks”, CEUR Workshop Proceedings, vol. 2870, 2021, pp. 822– 831, available at: http://ceur-ws.org/Vol-2870/paper61.pdf.

Sarkar, S. D., Goswami, S., Agarwal, A. and Aktar, J. (2014), “A Novel Feature Selection Technique for Text Classification Using Naive Bayes,” Int. Scholarly Research Notices, 2014, article no. 717092. doi: https://doi.org/10.1155/2014/717092.

McCallum, A. and Nigam, K. (1998), “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI 1998: Learning for Text Categorization, pp. 41-48, available at:

http://courses.washington.edu/ling572/papers/mccallum1998_AAAI.pdf.

Zhang, W., Gao, F. (2013), “Performance analysis and improvement of naïve Bayes in text classification application,” Proceedings of the IEEE Conference Anthology, China, pp. 1-4. doi: https://doi.org/10.1109/ANTHOLOGY.2013.6784818.

Nguyen, L. (2017), “Tutorial on Support Vector Machine,” Applied and Computational Mathematics, vol. 6, pp. 1-15, available at: https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.s.2017060401.11.pdf.

Sastry, P. S. (2003), An Introduction to Support Vector Machines, 49 p., available at:

http://www2.cs.uh.edu/~ceick/DM/Sastry_svm_notes.pdf.

Sharma, S. (2017), What is the Perceptron, available at: https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53.

Chollet, F. (2021), Deep learning with python, Second Ed., Manning Publications, 504 p.

Deep, A.I. (2023), Perceptron, available at: https://deepai.org/machine-learning-glossary-and-terms/perceptron.

LeCun, Y., Bottou, L., Bengio, Y. and Haffner P. (1998), “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, 1998, vol. 86, no. 11, pp. 2278 – 2324, doi: https://doi.org/10.1109/5.726791.

Yaloveha, V., Hlavcheva, D. and Podorozhniak, A. (2019), “Usage of convolutional neural network for multispectral image processing applied to the problem of detecting fire hazardous forest areas”, Advanced Information Systems, 2019, vol. 3, no. 1, pp. 116-120, doi: https://doi.org/10.20998/2522-9052.2019.1.19.

Liubchenko, N., Podorozhniak, A. and Oliinyk, V. (2022), “Research Application of the Spam Filtering and Spammer Detection Algorithms on Social Media,” CEUR Workshop Proceedings, vol. 3171, 2022, pp. 116-126, available at: https://ceur-ws.org/Vol-3171/paper13.pdf.

Masood, F., Ammad, G., Almogren, A., Abbas, A., and Zuair, M. (2019), “Spammer Detection and Fake User Identification on Social Networks,” IEEE Access, 2019, vol. 7, pp. 68140-68152, doi: https://doi.org/10.1109/ACCESS.2019.2918196.

SMS Spam Collection Dataset [Data set], available at: https://www.kaggle.com/uciml/sms-spam-collection-dataset.

Python for Beginners. Python Software Foundation, available at: https://www.python.org/about/gettingstarted/.

Applications for Python. Python Software Foundation, available at: https://www.python.org/about/apps/.

Oliinyk, V., Podorozhniak, A. and Liubchenko, N. (2020), “Method of comprehensive spam recognition in social networks,” Proceedings of the 8th international scientific and technical conference Problems of informatization, Ukraine, Vol. 2, p. 39, available at: http://repository.kpi.kharkov.ua/bitstream/KhPI-Press/52856/1/Oliinyk_Method_comprehensive_2020.pdf.

Oliinyk, V., Podorozhniak, A. and Liubchenko, N. (2021), “Method of comprehensive spam recognition in social networks”, Proc. of the 8th int. scientific and technical conference Problems of informatization, Ukraine, Vol. 1, p. 46, available at: http://repository.kpi.kharkov.ua/bitstream/KhPI-Press/54913/1/Conference_NTU_KhPI_2021_Problemy_informatyzatsii_Ch_1.pdf.