Ways to develop a subcorp of the writer's text: structure and functions of meta-markup
DOI:
https://doi.org/10.26577/EJPh.2024.v193.i1.ph2Abstract
The purpose of the article is to provide a sample of the design of the structure and meta–markup of the writer's subcorpus as part of the national corpus of the Kazakh language. The article is investigated within the framework of the field of computational linguistics in world linguistics. The idea of the article is to improve, among other things, the subcorps of the national corpus of the Kazakh language, based on the achievements of the world national corpus.
The scientific description of the article is the corpus of language, the digitalization of language teachings, including the provision of theoretical foundations for the stylistic analysis of the writer's text, the representation of the writer's personality. The practical description of the article is the presentation of models for the introduction of linguistic teachings into the corpus base. The article offers a sample of meta markup, which is designed for each work of fiction in the writer's subcorpus, and a sample of semantic markup, which is transmitted to each expressive word. The meta-markup provided in the article is passed to each text. Each text of the corpus is accompanied by a certification or meta-markup, that is, a complete description of the author and the work.
The research methodology is the EXMARaLDA software package, the HIAT software method, as well as linguistic stylistics, methods of personality cognition. The main result of the research work is a model for the design of a database of texts in digital format, which gives a detailed description of prose and dramatic works and allows you to read the electronic version of the work online. The value of the article is the functionality of the corpus of writers' texts, in addition to transcriptions, there are various possibilities for searching and selecting empirical and statistical data. It is equipped with a meta tag that includes bibliographic data about each work and allows for sociolinguistic diversification of texts. The success of the article is determined by the main mechanism in the development of the corpus base – the difference between annotation and semantic meta–markups. The article was written on the basis of the scientific project " Improving the tool of intercultural communication – the national corpus of the kazakh language (nckl) – and expanding its subcorpora" IRN BR21882249.
Keywords: writer's subcorpus, annotation meta-markup, semantic meta-markup, model, text base, prose, drama.