ANALYSIS OF TEXT AUGMENTATION ALGORITHMS IN ARTIFICIAL LANGUAGE MACHINE TRANSLATION SYSTEMS

Main Article Content

Anton Havrashenko
Olesia Barkovska

Abstract

The work is devoted to the development of an organizational model of the machine translation system of artificial languages. The main goal is the analysis of text augmentation algorithms, which are significant elements of the developed machine translation system at the stage of improvement of new dictionaries created on the basis of already existing dictionaries. In the course of the work was developed a model of the machine translation system, created dictionaries based on texts and based on already existing dictionaries using augmentation methods such as back translation and crossover; improved dictionary based on algorithms of n-grams, Knuth-Morris-Pratt and word search in the text (such as binary search, tree search, sqrt decomposition). In addition, the work implements the possibility of using the prepared dictionary for translation. Obtained results can improve existing systems of machine translation of the text of artificial languages. Practical significance of this work is the analysis and improvement of text augmentation algorithms by changing the prefix tree type. Compared to the conventional algorithm, the improved algorithm reduced the memory usage by almost 13 times, which allows it to be used on much larger test data. This was achieved by changing the internal system of the node of the prefix tree from constant references to an expandable list.

Article Details

How to Cite
Havrashenko , A. ., & Barkovska , O. . (2023). ANALYSIS OF TEXT AUGMENTATION ALGORITHMS IN ARTIFICIAL LANGUAGE MACHINE TRANSLATION SYSTEMS. Advanced Information Systems, 7(1), 47–53. https://doi.org/10.20998/2522-9052.2023.1.08
Section
Intelligent information systems
Author Biographies

Anton Havrashenko , Kharkiv National University of Radio Electronics, Kharkiv

postgraduate student at of Electronic Computers Department

Olesia Barkovska , Kharkiv National University of Radio Electronics, Kharkiv

Candidate of Technical Sciences, Associate Professor, Associate Professor of Electronic Computers Department

References

Manuel, K., Indukuri, K.V. and Krishna, P.R. (2010), “Analyzing Internet Slang for Sentiment Mining”, 2010 Second Vaagdevi International Conference on Information Technology for Real World Problems, pp. 9–11, doi: https://doi.org/10.1109/VCON.2010.9

Ren, F. and Matsumoto, K. (2016), “Semi-Automatic Creation of Youth Slang Corpus and Its Application to Affective Computing”, IEEE Transactions on Affective Computing, April-June 2016, vol. 7, no. 2, pp. 176–189, doi: https://doi.org/10.1109/TAFFC.2015.2457915

Kazakov, D. (2017), “Artificial naturalness”, Science and life, no. 10, pp. 100–107, available at: https://www.nkj.ru/archive/articles/32254/

Karen, S. Jones (2001), Natural language processing: a historical review, Cambridge: Computer Laboratory, University of Cambridge, available at: https://link.springer.com/chapter/10.1007/978-0-585-35958-8_1

Ryzhkova, V. (2020), “Possibilities of Computer Lexicography in Compiling Highly Specialized Terminological Printed and Electronic Dictionaries (Field of Aviation Engineering)”, Ivannikov Memorial Workshop (IVMEM) 2020, pp. 40–42, doi: https://doi.org/10.1109/IVMEM51402.2020.00013

Ranaivo-Malançon, B., Saee, S. and Wilfred Busu, J.F. (2014), “Discovering linguistic knowledge by converting printed dictionaries of minority languages into machine readable dictionaries”, 2014 International Conference on Asian Language Processing (IALP), pp. 140–143, doi: https://doi.org/10.1109/IALP.2014.6973522

Chumarina, G.R. (2013), “Classification of electronic dictionaries in modern lexicography and lexicologists and features of their use”, Baltic Humanitarian Journal, No. 4, pp. 123–126.

Anggreani, D., Putri, D.P.I., Handayani, A.N. and Azis, H. (2020), “Knuth Morris Pratt Algorithm in Enrekang-Indonesian Language Translator”, 2020 4th International Conference on Vocational Education and Training (ICOVET), 2020, pp. 144–148, doi: https://doi.org/10.1109 / ICOVET50258.2020.9230139

Zaiceva, S. and Barkovska, O. (2020), ”Analysis of Accelerated Problem Solutions of Word Search in Texts”, The Fourth International Scientific and Technical Conference «Computer and information systems and technologies», NURE Kharkiv, p. 66, doi: https://doi.org/10.30837/IVcsitic2020201445

Barkovska, Olesia, Mikhal, Oleg, Pyvovarova, Daria, Liashenko, Oleksii, Diachenko, Vladyslav and Volk, Maxim (2020), “Local Concurrency in Text Block Search Tasks”, International Journal of Emerging Trends in Engineering Research, Vol. 8. No. 3, March 2020, pp. 690–694, doi: https://doi.org/10.30534/ijeter/2020/13832020

Barkovska, О., Pyvovarova, D., Serdechnyi, V. and Liashova, А. (2019), “Accelerated word-image search algorithm in text with adaptive decomposition of input data”, Control, Navigation and Communication Systems, vol. 4 (56), pp. 28–34, doi: https://doi.org/10.26906/SUNZ.2019.4.028 (in Ukrainian)