PERFORMANCE EVALUATION OF PYTHON LIBRARIES FOR MULTITHREADING DATA PROCESSING
Main Article Content
Abstract
Topicality. The rapid growth of data in various domains has necessitated the development of efficient tools and libraries for data processing and analysis. Python, a popular programming language for data analysis, offers several libraries, such as NumPy and Numba, for numerical computations. However, there is a lack of comprehensive studies comparing the performance of these libraries across different tasks and data sizes. The aim of the study. This study aims to fill this gap by comparing the performance of Python, NumPy, Numba, and Numba.Cuda across different tasks and data sizes. Additionally, it evaluates the impact of multithreading and GPU utilization on computation speed. Research results. The results indicate that Numba and Numba.Cuda significantly optimizes the performance of Python applications, especially for functions involving loops and array operations. Moreover, GPU and multithreading in Python further enhance computation speed, although with certain limitations and considerations. Conclusion. This study contributes to the field by providing valuable insights into the performance of different Python libraries and the effectiveness of GPU and multithreading in Python, thereby aiding researchers and practitioners in selecting the most suitable tools for their computational needs.
Article Details
References
Motamarri, S., Akter, S., Yanamandram, V. and Wamba, S. F. (2017), “Why is Empowerment Important in Big Data Analytics?”, Procedia Computer Science, vol. 121, pp. 1062–1071, doi: https://doi.org/10.1016/j.procs.2017.11.136
Zhang, J., Cui, Y., Fan, X. and Ren, J. (2023), “Asynchronous Multithreading Reinforcement Control Decision Method for Unmanned Surface Vessel,” IEEE Internet of Things Journal, Vol. 10, Is. 24, doi: https://doi.org/10.1109/jiot.2023.3305387
(2018), “Numba: A High-Performance Python Compiler,” Pydata.org, available at: https://numba.pydata.org/
(2009), “NumPy”, Numpy.org, available at: https://numpy.org/
(2021), “python/cpython”, GitHub, available at: https://github.com/python/cpython
(2023), “Numba for CUDA GPUs — Numba 0.50.1 documentation”, numba.pydata.org, available at:
https://numba.pydata.org/numba-doc/latest/cuda/index.html
(2018), “Python Data Analysis Library — pandas: Python Data Analysis Library,” Pydata.org, available at: https://pandas.pydata.org/
Haleem, A., Javaid, Mohd., Khan, I. H. and Vaishya, R.(2020), “Significant Applications of Big Data in COVID-19 Pandemic”, Indian Journal of Orthopaedics, Vol. 54, No. 4, pp. 1–3, doi: https://doi.org/10.1007/s43465-020-00129-z
Garattini, C., Raffle, J., Aisyah, D. N., Sartain, F. and Kozlakidis, Z. (2017), “Big Data Analytics, Infectious Diseases and Associated Ethical Impacts”, Philosophy & Technology, Vol. 32, No. 1, pp. 69–85, doi: https://doi.org/10.1007/s13347-017-0278-y
Yakovlev S., Bazilevych, K., Chumachenko, D., Chumachenko, T., Hulianytskyi, L., Meniailov, Ie. and Tkachenko, A. (2020), “The Concept of Developing a Decision Support System for the Epidemic Morbidity Control”, CEUR Workshop Proceedings, Vol. 2753, pp. 265–274, available at: https://ceur-ws.org/Vol-2753/paper19.pdf
Hasan, Md. M., Popp, J. and Oláh, J. (2020), “Current landscape and influence of big data on finance,” Journal of Big Data, Vol. 7, No. 1, pp. 1–17, doi: https://doi.org/10.1186/s40537-020-00291-z
Izonin, I., Tkachenko, R., Verhun, V. and Zub, K. (2021), “An approach towards missing data management using improved GRNN-SGTM ensemble method”, Engineering Science and Technology, an International Journal, Vol. 24, No. 3, pp. 749–759, doi: https://doi.org/10.1016/j.jestch.2020.10.005
Davidich, N., Chumachenko, I., Davidich, Y., Taisiia, H., Artsybasheva, N. and Tatiana, M. (2020), “Advanced Traveller Information Systems to Optimizing Freight Driver Route Selection”, 2020 13th International Conference on Developments in eSystems Engineering (DeSE), doi: https://doi.org/10.1109/dese51703.2020.9450763
Chew, A. M. K. and Gunasekeran, D. V. (2021), “Social Media Big Data: The Good, The Bad, and the Ugly (Un)truths”, Frontiers in Big Data, Vol. 4, doi: https://doi.org/10.3389/fdata.2021.623794
Ahmed, R., Shaheen, S. and Philbin, S. P. (2022), “The role of big data analytics and decision-making in achieving project success”, Journal of Engineering and Technology Management, Vol. 65, 101697, doi: https://doi.org/10.1016/j.jengtecman.2022.101697
Harris, C. R., Millman, K. Ja., Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, Ju., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Kerkwijk, M. H., Brett, M., Haldane, A., Río, Ja. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C. and Oliphant T. E. (2020), “Array programming with NumPy,” Nature, Vol. 585, No. 7825, pp. 357–362, doi: https://doi.org/10.1038/s41586-020-2649-2
Lam, S. K., Pitrou, A. and Seibert, S. (2015), “Numba: A LLVM-based Python JIT Compiler”, Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC - LLVM ’15, doi: https://doi.org/10.1145/2833157.2833162
Oden, L. and Saidi, T. (2021), “Implementation and Evaluation of CUDA-Unified Memory in Numba,” Springer eBooks, pp. 197–208, Jan. 2021, doi: https://doi.org/10.1007/978-3-030-71593-9_16
Nguyen G., Dlugolinsky S., Bobák M., Tran V., García, Á. L., Heredia, I., Malík, P. and Hluch, L. (2019), “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey”, Artificial Intelligence Review, Vol. 52, No. 1, pp. 77–124, Jan. 2019, doi: https://doi.org/10.1007/s10462-018-09679-z
(2013), “CUDA Toolkit,” NVIDIA Developer, Jul. 02, 2013, available at: https://developer.nvidia.com/cuda-toolkit
Pala, A. and Sadecki, J. (2018), “Application of the Nvidia CUDA Technology to Solve the System of Ordinary Differential Equations”, Advances in intelligent systems and computing (AISC), Vol. 720, Jan. 2018, pp. 207–217, doi: https://doi.org/10.1007/978-3-319-75025-5_19.
Dash, S., Shakyawar, S. K., Sharma, M. and Kaushik, S. (2019), “Big data in healthcare: management, analysis and future prospects”, Journal of Big Data, Vol. 6, No. 1, Jun. 2019, pp. 1–25, doi: https://doi.org/10.1186/s40537-019-0217-0
Packhäuser, K., Gündel, S., Münster, N., Syben, C., Christlein, V. and Maier, A. (2022), “Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data,” Scientific Reports, Vol. 12, No. 1, Sep. 2022, doi: https://doi.org/10.1038/s41598-022-19045-3
(2023), “PyTorch,” Pytorch.org, available at: https://pytorch.org/
(2019), “Home - Keras Documentation,” Keras.io, 2019, available at: https://keras.io/