RESEARCH ON THE SPECIFIC FEATURES OF DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY-LENGTH TEXT CONTENT USING MULTILINGUAL TRANSFORMER-BASED MODELS
Main Article Content
Abstract
Article Details
References
Olizarenko, S. and Argunov, V. (2019), Research into the possibilities of the multilingual BERT model for determining semantic similarities of text content, available at: https://hipsto.global/BERT-Application-Research-for-HIPSTO-Related-News-Detection.pdf
Devlin, J., Ming-Wei Chang, Lee, Ke.and Toutanova, K. (2019), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv:1810.04805v2 [cs.CL] 24 May 2019.
Sanh, V., Debut, L., Chaumond, J. and Wolf, T. (2020), DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, arXiv:1910.01108v4 [cs.CL] 1 Mar 2020.
Guillaume, Lample and Alexis, Conneau (2019), Cross-lingual Language Model Pretraining, arXiv:1901.07291v1 [cs.CL] 22 Jan 2019.
Sun, C., Qiu, X., Xu, Y. and Huang X. (2020), How to Fine-Tune BERT for Text Classification, arXiv:1905.05583v3 [cs.CL] 5 Feb 2020.
Yang, Y., Cer, D., Ahmad, A., Guo, M., Law, J., Constant, N., Abrego, G.H., Yuan, S., Tar, C., Sung, Y., Strope, B. and Kurzweil R. (2019), Multilingual Universal Sentence Encoder for Semantic Retrieval, arXiv:1907.04307v1 [cs.CL] 9 Jul 2019.
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R.St., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Strope, B. and Kurzweil, R. (2018), “Universal sentence encoder for English”, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174.
Yoon, Kim (2014), “Convolutional neural networks for sentence classification”, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751.
Ashish, Vaswani, Noam, Shazeer, Niki, Parmar, Jakob, Uszkoreit, Llion, Jones, Aidan, Gomez, Łukasz, Kaiser, and Illia, Polosukhin (2017), “Attention is all you need”, Proceedings of NIPS, pp. 6000–6010.
(2020), Multilingual Similarity Search Using Pretrained Bidirectional LSTM Encoder. Evaluating LASER (Language-Agnostic SEntence Representations), available at: https://medium.com/the-artificial-impostor/multilingual-similarity-search-using-pretrained-bidirectional-lstm-encoder-e34fac5958b0.
(2019), Zero-shot transfer across 93 languages: Open-sourcing enhanced LASER library, POSTED ON JAN 22, 2019 TO AI RESEARCH, available at: https://engineering.fb.com/ai-research/laser-multilingual-sentence-embeddings.
Reimers, N. and Gurevych I. (2019), Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, arXiv:1908.10084v1 [cs.CL] 27 Aug 2019.
Patel, M. (2019), TinySearch - Semantics-based Search Engine using Bert Embeddings, available at:
https://arxiv.org/ftp/arxiv/papers/1908/1908.02451.pdf.
Han, X. (2020), Bert-as-service, available at: https://github.com/hanxiao/bert-as-service.
(2020), State of the art Natural Language Processing for Pytorch and TensorFlow 2.0, available at:
https://huggingface.co/transformers/index.html.
Arun, S. Maiya (2020), Ktrain: A Low-Code Library for Augmented Machine Learning, available at:
https://arxiv.org/pdf/2004.10703v2.pdf.
Dolan, B. and Brockett, C. (2005), “Automatically Constructing a Corpus of Sentential Paraphrases”, Proceedings of the 3rd International Workshop on Paraphrasing (IWP 2005), Jeju Island, pp. 9–16.
Goodfellow, I., Bengio, Y. and Courville, A. (2018), Softmax Units for Multinoulli Output Distributions. Deep Learning, MIT Press. pp. 180–184, ISBN 978-0-26203561-3.
Markovsky, I. (2012), Low-Rank Approximation: Algorithms, Implementation, Applications, Springer, ISBN 978-1-4471-2226-5.
Daniel, Cer, Yinfei, Yang, Sheng-yi, Kong, Nan Hua, Nicole, Limtiaco, Rhomni, St. John, Noah, Constant, Mario, Guajardo-Cespedes, Steve, Yuan, Chris, Tar, Brian, Strope, and Ray, Kurzweil (2018), “Universal sentence encoder for English”, Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174.