A vector method for finding sequences in big data

Main Article Content

Hanna Khakhanova

Abstract

A technological software solution is proposed for metric search and identification of logical-temporal patterns of a business data flow by creating additional vector data structures and a parallel method for their processing. The subject of research is the methods of searching and identifying logical-temporal patterns in big data. The purpose of the study is to increase the efficiency of searching and recognizing logical-temporal patterns that semantically form business functionality in an 8-hour frame of screenshots with "garbage" data. Applied methods: apparatus of set theory and Boolean algebra, metric models for determining parameters for sets of binary vectors, elements of probability theory, theory of algorithms, software modeling. The results obtained: a method for searching and recognizing patterns based on a vector problem of character sequences that identify patterns in big data streams using unitary coding of information primitives and data; vector models are unitary-encoded data structures for describing a big data flow as Cartesian products of a set of primitive-string-markers and a discrete sequence of implementation of a given time frame. The practical significance of the work: the implementation of the vector method, which made it possible to create a pattern recognition program in a big data stream with a probability of 0.77%.

Article Details

How to Cite
Khakhanova, H. (2022). A vector method for finding sequences in big data. Advanced Information Systems, 6(3), 13–22. https://doi.org/10.20998/2522-9052.2022.3.02
Section
Identification problems in information systems
Author Biography

Hanna Khakhanova, Kharkiv National University of RadioElectronics, Kharkiv

Candidate of Technical Sciences, associate professor, associate professor of Computer Aided Design of Computers Department

References

(2021), Gartner Top 6 Trends Impacting Infrastructure & Operations in 2021, available at:

https://www.gartner.com/smarterwithgartner/gartner-top-6-trends-impacting-infrastructure-operations-in-2021.

Joshi A. (2020), Machine Learning and Artificial Intelligence, Springer Nature Switzerland AG, 261 p., doi:

https://doi.org/10.1007/978-3-030-26622-6.

(2021), IEEE Guide для Архітектурної Framework та Application of Federated Machine Learning, IEEE Std 3652.1-2020, 69 p., , available at: https://lib.ugent.be/catalog/ebk01:5590000000440557.

Bolte F., Nourani, M., Ragan, E. and Bruckner, S. (2020), “SplitStreams: A Visual Metaphor for Evolving Hierarchies”, IEEE Trans. on Vis. & Computer Graphics, vol. 27, no. 08, pp. 3571-3584, doi: https://doi.org/10.1109/TVCG.2020.2973564.

Huang, J.-W., Lee, P.-J. and Jaysawal, B.P. (2022), “Multiscale Control Chart Pattern Recognition Using Histogram-Based Representation of Value and Zero-Crossing Rate”, IEEE Transactions on Industrial Electronics, vol. 69, no. 1, pp. 684-693, Jan. 2022, doi: https://doi.org/10.1109/TIE.2021.3050355.

Han, H., Li, W., Feng, Z., Fang, G., Xu, Y. and Xu, Y. (2021), “Proceed From Known to Unknown: Jamming Pattern Recognition Under Open-Set Setting”, IEEE Wireless Communications Letters, vol. 11, no. 4, pp. 693-697, April 2022, doi: https://doi.org/10.1109/LWC.2021.3140145.

Zhang, F., Xu, M. amd Xu, C. (2022), “Weakly-Supervised Facial Expression Recognition в Wild with Noisy Data”, IEEE Transactions on Multimedia, vol. 24, pp. 1800-1814, doi: https://doi.org/10.1109/TMM.2021.3072786.

(2022), “Pattern Recognition”, Schintler L.A., McNeely CL (eds), Encyclopedia of Big Data, Springer, Cham, doi: https://doi.org/10.1007/978-3-319-32010-6_300166.