Excel-oriented calculator for calculating results of entropy analysis of data distributed by categories

Main Article Content

Svitlana Gadetska
Valeriy Dubnitskiy
Alexander Khodyrev
Yuri Kushneruk

Abstract

The goal of the work. Development of EXCEL-oriented calculator for calculating the results of entropy analysis of data, which are distributed by categories. The subject of research is histograms of arbitrary distribution laws and conjugation tables 2×2. Research methods: Entropy and information analysis of histograms of arbitrary distribution laws and conjugation tables. The obtained results. It is proposed to use methods of entropy analysis for the analysis of data distributed by categories; information on the structure of the EXCEL-oriented calculator designed for this purpose is given. The calculator makes it possible to calculate entropy characteristics of histograms, namely: histogram entropy, histogram dispersion, histogram confidence intervals, diversity information index. The calculator performs a pairwise comparison of entropies of histograms using the Hutcheson method, determines Hellinger and Kullback-Leibler distances between histograms of arbitrary distribution laws and thus complements the chi-square criterion, determines the informational correlation coefficient. The correspondence between the Pearson correlation coefficient and the information correlation coefficient is established by the method of statistical modeling. For 2×2 conjugation tables, the calculator makes it possible to estimate the significance of the interaction between the row factor and the column factor. The calculator determines the values of conditional entropies for 2×2 conjugation tables. The proposed calculator fills the gaps in existing software products and can be used to process data distributed by categories using entropy analysis methods. It is shown that entropy methods of analysis are appropriate to use in cases where histograms determine arbitrary distribution laws.

Article Details

How to Cite
Gadetska, S., Dubnitskiy, V., Khodyrev, A., & Kushneruk, Y. (2023). Excel-oriented calculator for calculating results of entropy analysis of data distributed by categories. Advanced Information Systems, 7(2), 28–40. https://doi.org/10.20998/2522-9052.2023.2.05
Section
Methods of information systems synthesis
Author Biographies

Svitlana Gadetska, Kharkiv National Automobile and Highway University, Kharkiv

PhD in Physics and Mathematics, Associate Professor, Associate Professor of Department of Higher Mathematics

Valeriy Dubnitskiy, “Karazin Banking Institute” of V.N. Karazin Kharkiv National University, Kharkiv

PhD in Engineering, Senior Researcher, Senior Researcher

Alexander Khodyrev, “Karazin Banking Institute” of V.N. Karazin Kharkiv National University, Kharkiv

Senior Lecturer

Yuri Kushneruk, Ivan Kozhedub Kharkiv National Air Force University, Kharkiv

Candidate of Technical Sciences, Associate Professor, Associate Professor of Civil Aviation Institute

References

Motalo, V. (2015), “Analysis of measurement scales”, Measuring technique and metrology, 2015, No. 76, pp. 21-35, available at: http://nbuv.gov.ua/UJRN/metrolog_2015_76_4.

Kendall, Maurice G. and Stuart, Alan (1961). The Advanced Theory of Statistics. Vol. 2, Inference and Relationship. Charles Griffin, London, 676 р.

Dubnitsky, V. Yu., Kobylin, A. M. and Kobylin, O. A. (2018), “Estimation of the lower bound of the reliability of a physically realizable system during its operation under arbitrary distribution laws of the generalized load and strength”, Information processing systems, 2018, No. 1(152), pp. 53-60, doi: https://doi.org/10.30748/soi.2018.152.08.

Zhluktenko, V. I., Nakonechnyi, S. I. and Savina, S. S. (2001), “Probability Theory and Mathematical Statistics. Part II. Mathematical statistics, KNEU, Kyiv, 336 p., available at: https://www.studmed.ru/zhluktenko-v-nakonechniy-s-savna-ss-teorya-ymovrnostey-matematichna-statistika-u-2-h-ch-ch-matematichna-statistika_3976c660ed4.html.

Kulldorff, Gunnar (1961), Contributions to the Theory of Estimation from Grouped and Partially Grouped Samples. Almqvist & Wiksell / John Wiley & Sons, Stokholm, 176 р., available at: https://www.amazon.com/Contributions-Estimation-Grouped-Partially-Samples/dp/B0010VDR26.

Jun I. V. (1993), “On the number of gradations of histograms of errors in astronomical observations”, Kinematics and physics of celestial bodies, No. 1, vol. 9, pp. 88-92, available at: https://www.mao.kiev.ua/biblio/jscans/kfnt/1993-09/kfnt-1993-09-1-11.pdf.

Jun, I. V. “Mathematical processing of astronomical and space information with non-Gaussian observation errors: Abstract of the thesis. for the competition uch. doctorate degrees. Phys.-Math. sciences: spec. 01.03.01 "Astrometry and Celestial Mechanics”, Kyiv, GAO NAS of Ukraine, 1992, 46 p., available at: https://issuu.com/blindguardian/docs/asd.

Paniotto, V. I., Maksymenko, V. S. and Kharchenko, N.M. (2004), Statistical analysis of sociological data, KM Acad-emy, Kyiv, 2004, 270 p.

Tsvetkov, O. V. (2015), Entropy analysis of data in physics, biology, and technology], LETI, SPb, 202 p., available at: https://www.researchgate.net/profile/Oleg-Tsvetkov/publication/331686300_entropijnyj_analiz_dannyh_v_fizike_biologii_ i_tehnike/links/5c87f3afa6fdcc38174f8a14/entropijnyj-analiz-dannyh-v-fizike-biologii-i-tehnike.pdf.

Dubnickij, V. Ju., Filatova, L. D. and Khodyrev, A. I. (2017), “The stability of the estimate of the entropy of the histo-gram of a continuous random variable with respect to the change in the number of its intervals”, Control, Navigation and Communication Systems, No 5 (45), pp. 42-46, available at: http://nbuv.gov.ua/UJRN/suntz_2017_5_12.

Agresti, A. (2002), Categorical data analysis, John Wiley & Sons Inc., New York, 742 p., available at:

https://onlinelibrary.wiley.com/doi/book/10.1002/0471249688.

Joseph L., Fleiss, Bruce, Levin and Myunghee Cho, Paik (2003), Statistical Methods for Rates and Proportions, John Wiley & Sons, Inc. New York, 768 p., available at: https://onlinelibrary.wiley.com/doi/book/10.1002/0471445428.

Nobuoki, Eshima (2020), Statistical Data Analysis and Entropy, Springer Nature Singapore Pte Ltd, Singapore, 498 p., available at: https://link.springer.com/book/10.1007/978-981-15-2552-0.

Graham J. G., Upton (1978), The Analysis of Cross-tabulated Data, J. Wiley, New York, 160 p., available at: https://www.amazon.com/Analysis-Cross-tabulated-Data-Graham-Upton/dp/0471996599.

Duncan, Crammer (2003), Advanced Quantative Data Analysis, Open University Press, Philadelphia, 272 p., available at: https://www.amazon.com/Advanced-Quantitative-Analysis-Understanding-Research/dp/0335200591.

Anne E., Magurran (1983), Ecological Diversity and its Measurement, London, Sydney, CROOM HELM Royal Socie-ty University Research Fellow University, 184 p., available at: https://link.springer.com/book/10.1007/978-94-015-7358-0.

Margalef, R. (1958), “Information theory in ecology”, Gen. Syst., No 3, pp. 36-71, available at:

https://www.scirp.org/(S(351jmbntvnsjt1aadkposzje))/reference/ReferencesPapers.aspx?ReferenceID=1134401.

Hutcheson, K. (1970), “A Test for Comparing Diversities Based on the Shannon Formula”, Journal of Theoretical Biology, vol. 29, pp. 151- 154, doi: http://dx.doi.org/10.1016/0022-5193(70)90124-4.

Michalowicz, J.V., Nichols, J.M. and Bucholtz, F. (2014), Handbook of differential entropy, Taylor & Francis Group, LLC, London, 241 p., available at: https://www.routledge.com/Handbook-of-Differential-Entropy/Michalowicz-Nichols-Bucholtz/p/book/9781138374799.

Linfооt, E. and Linfoot, E. H. (1957), “An Informational Measure of Correlation”, Information and Control, vol. 1, No. 1, pp. 85-89, available at: https://www.sciencedirect.com/science/article/pii/S001999585790116X.

Vistelius, A. B. (1960), “The skew frequency distributions and the fundamental law of Geochemical processes”, The Journal of Geology, vol. 68, No. 1, pp. 1-22, available at: https://www.jstor.org/stable/30058252.

Marian, P. and Marian, T. A. (2015), “Hellinger distance as a measure of Gaussian discord”, The Journal of Physics A: Math. Theor., 48:11 (2015), 115301, 21 p., arXiv: 1408.4477, doi: http://dx.doi.org/10.1088/1751-8113/48/11/115301.

Kullback, S. and Leibler, R.A. (1951), “On information and sufficiency”, The Annals of Mathematical Statistics, Vol. 22. No. 1. pp. 79-86, doi: http://dx.doi.org/10.1214/aoms/1177729694.

Dubnytskyi, V. Yu., Skorykova, I. G. and Khodyrev, O. I. (2017), “Optimal approximation of the distribution density function according to the minimum information loss criterion”, Information processing systems, No. 4, pp. 45-51, available at: https://www.hups.mil.gov.ua/periodic-app/article/17653.

Solomon, Kullback (1978), Information Theory and Statistics, Peter Smith, Gloucester, Mass, 399 p., available at: https://books.google.com.ua/books/about/Information_Theory_and_Statistics.html?id=XeRQAAAAMAAJ&redir_esc=y.

Hadetska, S. V., Dubnytskyi, V. Yu., Kushneruk, Yu. I. and Hodyrev, O. I. (2020), “A specialized software calculator for evaluating the clinical informativeness of laboratory tests”, Advaned Inforvation Systems, No. 2, Vol. 4, pp. 80-84, doi: https://doi.org/10.20998/2522-9052.2020.2.12.

Soloshenko, O. M. (2014), “Study of the Kullback-Leibler distance in modeling problems in credit scoring”, Develop-ment of information-resource support for education and science in the mining and metallurgical industry and in transport, September 27-28, 2014, Dnepropetrovsk, pp. 328-333, available at: https://ir.nmu.org.ua/handle/123456789/150310.

Dubnytskyi, V. Yu., Krylenko, I. M., Fesenko, G. V. and Cherepnev, I. A. (2017), “The history of the development of means of eye protection for military personnel in combat conditions and modern requirements for controlling their impact resistance”, Weapons and military equipment systems, No. 1(49), pp. 23-37, available at:

https://www.hups.mil.gov.ua/periodic-app/article/17565.

Brovko, D. V. (2020), “Construction of a system for monitoring the reliability of elements of the built and constructed surface complex of mines based on entropy estimation”, Mining Bulletin: Scientific and Technical. coll., Kryvyi Rih, Issue 107, pp. 73–83, doi: https://doi.org/10.31721/2306-5435-2020-1-107-73-83.

Azarenkova, H. M., Zhuravel, T. M. and Mykhaylenko, R. M. (2009), “Enterprise finance: a study guide”, Knowledge-Press, Kyiv, 299 p., available at: http://www.library.univ.kiev.ua/ukr/elcat/new/detail.php3?doc_id=1247203.

Prohonov, D. O. (2018), “Theoretical and informational evaluations of container distortions during the formation of steganograms”, Scientific and Technical Conference radioengineering fields, signals, devices and systems. Confer-ence Proceeding March 19-25, 2018, Kyiv, Ukraine, pp. 276-278, available at: http://ptmip.ipt.kpi.ua/list/progonov17.

Yehorshyn, O. O., Panova, N. V. and Polevych, V. V. (1955), Regression analysis in examples and problems, tutorial, Kharkiv State University of Economics, Kharkiv, 155 р.

Ulanova, E. S. and Zabelin, V. N. (1990), Methods of correlation and regression analysis in agrometeorology, Gidrometeoizdat, 207 p., available at: https://koha.lib.tsu.ru/cgi-bin/koha/opac-detail.pl?biblionumber=13364&shelfbrowse_itemnumber=40398.

Bondarenko V. N. (1970), “Statistical solutions of some problems of geology”, NEDRA, Moscow, 244 p., available at: https://www.libex.ru/detail/book482854.html.

Chaddock, Robert Emmet (1925), Principles and Methods of Statistics, Hardcover, Houghton, Mifflin, 471 p., availa-ble at: https://books.google.com.sg/books/about/Principles_and_Methods_of_Statistics.html?id=-YxBTYcdnIoC&redir_esc=y.