A vector method for finding sequences in big data
Main Article Content
Abstract
A technological software solution is proposed for metric search and identification of logical-temporal patterns of a business data flow by creating additional vector data structures and a parallel method for their processing. The subject of research is the methods of searching and identifying logical-temporal patterns in big data. The purpose of the study is to increase the efficiency of searching and recognizing logical-temporal patterns that semantically form business functionality in an 8-hour frame of screenshots with "garbage" data. Applied methods: apparatus of set theory and Boolean algebra, metric models for determining parameters for sets of binary vectors, elements of probability theory, theory of algorithms, software modeling. The results obtained: a method for searching and recognizing patterns based on a vector problem of character sequences that identify patterns in big data streams using unitary coding of information primitives and data; vector models are unitary-encoded data structures for describing a big data flow as Cartesian products of a set of primitive-string-markers and a discrete sequence of implementation of a given time frame. The practical significance of the work: the implementation of the vector method, which made it possible to create a pattern recognition program in a big data stream with a probability of 0.77%.
Article Details
References
(2021), Gartner Top 6 Trends Impacting Infrastructure & Operations in 2021, available at:
Joshi A. (2020), Machine Learning and Artificial Intelligence, Springer Nature Switzerland AG, 261 p., doi:
https://doi.org/10.1007/978-3-030-26622-6.
(2021), IEEE Guide для Архітектурної Framework та Application of Federated Machine Learning, IEEE Std 3652.1-2020, 69 p., , available at: https://lib.ugent.be/catalog/ebk01:5590000000440557.
Bolte F., Nourani, M., Ragan, E. and Bruckner, S. (2020), “SplitStreams: A Visual Metaphor for Evolving Hierarchies”, IEEE Trans. on Vis. & Computer Graphics, vol. 27, no. 08, pp. 3571-3584, doi: https://doi.org/10.1109/TVCG.2020.2973564.
Huang, J.-W., Lee, P.-J. and Jaysawal, B.P. (2022), “Multiscale Control Chart Pattern Recognition Using Histogram-Based Representation of Value and Zero-Crossing Rate”, IEEE Transactions on Industrial Electronics, vol. 69, no. 1, pp. 684-693, Jan. 2022, doi: https://doi.org/10.1109/TIE.2021.3050355.
Han, H., Li, W., Feng, Z., Fang, G., Xu, Y. and Xu, Y. (2021), “Proceed From Known to Unknown: Jamming Pattern Recognition Under Open-Set Setting”, IEEE Wireless Communications Letters, vol. 11, no. 4, pp. 693-697, April 2022, doi: https://doi.org/10.1109/LWC.2021.3140145.
Zhang, F., Xu, M. amd Xu, C. (2022), “Weakly-Supervised Facial Expression Recognition в Wild with Noisy Data”, IEEE Transactions on Multimedia, vol. 24, pp. 1800-1814, doi: https://doi.org/10.1109/TMM.2021.3072786.
(2022), “Pattern Recognition”, Schintler L.A., McNeely CL (eds), Encyclopedia of Big Data, Springer, Cham, doi: https://doi.org/10.1007/978-3-319-32010-6_300166.