This article presents the development of a combined method for summarizing Russian-language texts, integrating extractive and abstractive approaches to overcome the limitations of existing methods. The proposed method is preceded by the following stages: text preprocessing, comprehensive linguistic analysis using RuBERT, and semantic similarity-based clustering. The method involves extractive summarization via the TextRank algorithm and abstractive refinement using the RuT5 neural network model. Experiments conducted on the Gazeta.Ru news corpus confirmed the method's superiority in terms of precision, recall, F-score, and ROUGE metrics. The results demonstrated the superiority of the combined approach over purely extractive methods (such as TF-IDF and statistical methods) and abstractive methods (such as RuT5 and mBART).
Keywords: combined method, summarization, Russian-language texts, TextRank, RuT5
The article focuses on developing data clustering algorithms using asymmetric similarity measures, which are relevant in tasks involving directed interactions. Two algorithms are proposed: stepwise cluster formation and a modified version with iterative center refinement. Experiments were conducted, including a comparison with the k-medoids method. The results showed that the fixed-center algorithm is efficient for small datasets, while the center-recalculation algorithm provides more accurate clustering. The choice of algorithm depends on the requirements for speed and quality.
Keywords: clustering, asymmetric similarity measures, clustering algorithms, iterative refinement, k-medoids, directed interactions, adaptive methods
This article presents a comprehensive analysis of Russian-language texts utilizing neural network models based on the Bidirectional Encoder Representations from Transformers (BERT) architecture. The study employs specialized models for the Russian language: RuBERT-tiny, RuBERT-tiny2, and RuBERT-base-cased. The proposed methodology encompasses morphological, syntactic, and semantic levels of analysis, integrating lemmatization, part-of-speech tagging, morphological feature identification, syntactic dependency parsing, semantic role labeling, and relation extraction. The application of BERT-family models achieves accuracy rates exceeding 98% for lemmatization, 97% for part-of-speech tagging and morphological feature identification, 96% for syntactic parsing, and 94% for semantic analysis. The method is suitable for tasks requiring deep text comprehension and can be optimized for processing large corpora.
Keywords: BERT, Russian-language texts, morphological analysis, syntactic analysis, semantic analysis, lemmatization, RuBERT, natural language processing, NLP