PROCESSING THE UZBEK LANGUAGE CORPUS TEXTS
Maqola haqida umumiy ma'lumotlar
This paper investigates the application of various text processing techniques within the realm of artificial intelligence, with a particular focus on natural language processing (NLP) for the Uzbek language. It discusses methods such as Bag-of-Words, CountVectorizer, TF-IDF, Co-Occurrence Matrix, Word2Vec, CBOW, Skip-Gram, GloVe, ELMO, and BERT, and evaluates their advantages and disadvantages in the context of text representation. The paper highlights the importance of discrete numerical representations of text for simplicity, ease of implementation, and interpretability, and emphasizes the significance of distributed text representation algorithms for complex NLP tasks.
- Method, N. W., Goldberg, Y., Levy, O., Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2014). word2vec Explained : Deriving Mikolov et al. ArXiv:1402.3722 [Cs, Stat], 2.
- Xiong, Z., Shen, Q., Xiong, Y., Wang, Y., & Li, W. (2019). New generation model of word vector representation based on CBOW or skip-gram. Computers, Materials and Continua, 60(1). https://doi.org/10.32604/cmc.2019.05155
- Jang, B., Kim, I., & Kim, J. W. (2019). Word2vec convolutional neural networks for classification of news articles and tweets. PLoS ONE, 14(8). https://doi.org/10.1371/journal.pone.0220976
- Kutuzov, A., & Kuzmenko, E. (2021). Representing ELMo embeddings as two-dimensional text online. EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the System Demonstrations. https://doi.org/10.18653/v1/2021.eacl-demos.18
- Eleyan, A., & Demirel, H. (2011). Co-occurrence matrix and its statistical features as a new approach for face recognition. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1). https://doi.org/10.3906/elk-0906-27
- Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. https://doi.org/10.3115/v1/d14-1162
- A. S. Abdullayev, O. T. Allamov, O. A. Rustamov, R. J. Urinov and Y. U. Kadamovna, "Analysis of existing smart crossroads and a new approach of smart crossroad," 2021 International Conference on Information Science and Communications Technologies (ICISCT), Tashkent, Uzbekistan, 2021, pp. 1-5, doi: 10.1109/ICISCT52966.2021.9670167.
- A. K. Nishanov, O. T. Allamov, O. B. Ruzibaev, A. S. Abdullaev and S. T. Allamova, "An approach to finding the most optimal route in a dynamic graph," 2021 International Conference on Information Science and Communications Technologies (ICISCT), Tashkent, Uzbekistan, 2021, pp. 01-05, doi: 10.1109/ICISCT52966.2021.9670385.
- O. K. Khujaev, O. Rustamov and A. Abdullayev, "The use of artificial intelligence in the implementation of the multilingual website of the dorul hikmat project," 2022 International Conference on Information Science and Communications Technologies (ICISCT), Tashkent, Uzbekistan, 2022, pp. 1-2, doi: 10.1109/ICISCT55600.2022.10146980.
- Chai, C. P. (2023). Comparison of text preprocessing methods. Natural Language Engineering, 29(3). https://doi.org/10.1017/S1351324922000213
- Probierz, B., Hrabia, A., & Kozak, J. (2023). A New Method for Graph-Based Representation of Text in Natural Language Processing.
- Electronics, 12(13). https://doi.org/10.3390/electronics12132846
- Zhao, J. S., Song, M. X., Gao, X., & Zhu, Q. M. (2022). Research on Text Representation in Natural Language Processing. Ruan Jian Xue Bao/Journal of Software, 33(1). https://doi.org/10.13328/j.cnki.jos.006304
- Babić, K., Martinčić-Ipšić, S., & Meštrović, A. (2020). Survey of neural text representation models. In Information (Switzerland) (Vol. 11, Issue 11). https://doi.org/10.3390/info11110511
- Cahyani, D. E., & Patasik, I. (2021). Performance comparison of tf- idf and word2vec models for emotion text classification. Bulletin of Electrical Engineering and Informatics, 10(5). https://doi.org/10.11591/eei.v10i5.3157
Abdullaev, A. S. (2024). PROCESSING THE UZBEK LANGUAGE CORPUS TEXTS. Academic Research in Educational Sciences, 5(3), 151–162. https://doi.org/
Abdullaev, Anvar. “PROCESSING THE UZBEK LANGUAGE CORPUS TEXTS.” Academic Research in Educational Sciences, vol. 3, no. 5, 2024, pp. 151–162, https://doi.org/.
Abdullaev, S. 2024. PROCESSING THE UZBEK LANGUAGE CORPUS TEXTS. Academic Research in Educational Sciences. 3(5), pp.151–162.