Informativity assessment and attributes selection in a computer system state identification

Main Article Content

Svitlana Gavrylenko
Illia Sheverdin
Hennadii Heiko

Abstract

The subject of the article is a study of methods of determining the informativeness of attributes. The aim of the article is improvement of the classification quality of a computer system state by selecting the most informative features. Objective: To explore methods for selecting optimal information features to identify a computer system state based on an analysis of the Windows operating system events. The methods used are: machine learning methods, ensemble methods, methods of selecting the optimal information features. The following results were obtained: analysis of the Windows operating system events was performed, methods of selection the optimal information features were investigated: wrapper methods (Wrappers), embedded methods (Embedded) and filter methods (Filters). The informativeness assessment and selection features were performed for identifying a computer system state. An ensemble method for classifying a computer system state based on a bagging and J48 decision tree was developed to evaluate the effectiveness of selected features. The dependency of the classification accuracy of a computer system state on the selected features was investigated, and the attributes set that provides the maximum classification accuracy of a computer system state was determined. Conclusions. The scientific novelty of the results is in the analysis of the Windows operating system events, assessment of their informativeness and selection of features in the identification a computer system state.

Article Details

How to Cite
Gavrylenko, S., Sheverdin, I., & Heiko, H. (2021). Informativity assessment and attributes selection in a computer system state identification. Advanced Information Systems, 5(2), 5–12. https://doi.org/10.20998/2522-9052.2021.2.01
Section
Identification problems in information systems
Author Biographies

Svitlana Gavrylenko, National Technical University «Kharkiv Polytechnic Institute», Kharkiv

Doctor of Technical Sciences, Associate Professor, Professor of Computer Engineering and Programming Department

Illia Sheverdin, National Technical University «Kharkiv Polytechnic Institute», Kharkiv

PhD Student of Computer Engineering and Programming Department

Hennadii Heiko, National Technical University «Kharkiv Polytechnic Institute», Kharkiv

Candidate of Technical Sciences, Associate Professor of Computer Engineering and Programming Department

References

Andreea, Bendovschi (2015), “Cyber-Attacks – Trends, Patterns and Security Countermeasures”, 7th International conference on financial criminology, 13-14 April 2015,Wadham College, Oxford, United Kingdom, pp. 24-31.

Kulbak, S. (1967), Information Theory and Statistics, Science, Moscow, 408 p.

Kohavi, R. and John, G. (1997), “Wrappers tor feature selection”, Artificial Intelligence, 91(1-2), pp. 273-324.

Isabelle, Guyon and Andre, Elissceff (2003), “An introduction to variable and feature selection”, Journal of Machine Learning Research, 3'(2003), pp. 1157-1182.

Molina, L.C., Belanche, L. and Ncbot, A. (2002), “Feature Selection Algorithms: A Survey And Experimental Evaluation”, Proceedings of the 2002 IEEE International Conference on Data Mining, IEEE Computer Society, pp. 306-313.

Stańczyk, U. (2015), “Feature Evaluation by Filter, Wrapper, and Embedded Approaches”, Stańczyk U., Jain L. (eds), Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, Springer, Berlin, Heidelberg, vol 584, 568 p.

Phuong, T.M., Lin, Z. and Altman, R.B. (2016), “Choosing SNPs using feature selection. Archived at the Wayback Machine Proceedings”, IEEE Computational Systems Bioinformatics Conference, CSB, pp. 301-309, DOI: https://doi.org/10.1142/s0219720006001941

Saghapour, E.; Kermani, S. and Sehhati, M. (2017), “A novel feature ranking method for prediction of cancer stages using proteomics data”, Lille University of Science and Technology, 12 (9), pp. 24-29.

Hamon, Julie (2013), “Optimisation combinatoire pour la sélection de variables en régression en grande dimension: Application en génétique animale” (Thesis) (in French), Lille University of Science and Technology.

Yiming, Yang and Jan O., Pedersen (1997), “A comparative study on feature selection in text categorization”, Proceedings of the Fourteenth International Conference on Machine Learning (ICML’ 97), pp. 412-420.

Gavrilenko, S.Yu. and Sheverdin, I.V. (2020), “Identification of the state of a computer system based on the ensemble method of classification”, Navigation and communication control systems, Vol. 3 (61), PNTU, Poltava, pp. 75-79, DOI: https://doi.org/10.26906/SUNZ.2020.3.075

Gavrylenko, S. and Sheverdin, I. (2020), “The ensemble method development of classification of the computer system state based on decisions trees”, Advanced Information Systems, Vol. 4, No. 2, pp. 5-10, DOI: https://doi.org/10.20998/2522-9052.2020.3.01

Tom, Carter (2014), “An introduction to information theory andentropy”, Complex Systems Summer School, Santa Fe, September 3, pp. 34-39.

David J. C., MacKay (2003), Information Theory, Inference, and Learning Algorithms, Cambridge: Cambridge University Press, 629 p.

Narendra, P. and Fukunaga, K. (1977), “A Branch and Bound Algorithm for Feature Subset Selection”, IEEE Transactions on Computer, 26(9), pp. 917-922.

Lai, Chun Sing; Tao, Yingshan; Xu, Fangyuan; Ng, Wing W.Y.; Jia, Youwei; Yuan, Haoliang; Huang, Chao; Lai, Loi Lei; Xu, Zhao; Locatelli, Giorgio (2019), “A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty”, Information Sciences, pp. 58-77.

Gavrilenko, S.Y. and Sheverdin, I.V. (2018), “Development of a method for assessing the state of the computer based on the analysis of system events”, Methods and devices of quality control, Ivano-Frankivsk, pp. 108-114.